Coding with Titans

so breaking things happens constantly, but never on purpose

DBF and language code-page

Recently I had a problem importing data from a 10-years-old set of DBF tables. All was fine until it came to reading texts with polish diacritic marks. It worked fine on 9 out of 10 machines, all with identical configurations (or at least I had hoped they are identical and couldn’t find any differences - Windows 7 x64 PL, .NET 4.5.2, the same regional options). On that single one all special letters got converted into some eye-hurting characters and looked purely wrong.

As it started to reveal, the OleDbConnection class I used to connect (with “Microsoft.Jet.OLEDB.4.0” provider) magically treated strings as Windows-1250 encoded, event though they were CP852 Latin-2. Thanks to this site, helping me to find out about it.

I tried to enforce the encoding by updating 0x1D byte of the DBF header with proper code page. Following is the list of all possible values (I used 0x64), but still it didn’t help much.

0x00No codepage defined
0x01Codepage 437 (US MS-DOS)
0x02Codepage 850 (International MS-DOS)
0x03Codepage 1252 Windows ANSI
0x04Codepage 10000 Standard MacIntosh
0x64Codepage 852 Easern European MS-DOS
0x65Codepage 866 Russian MS-DOS
0x66Codepage 865 Nordic MS-DOS
0x67Codepage 861 Icelandic MS-DOS
0x68Codepage 895 Kamenicky (Czech) MS-DOS
0x69Codepage 620 Mazovia (Polish) MS-DOS
0x6ACodepage 737 Greek MS-DOS (437G)
0x6BCodepage 857 Turkish MS-DOS
0x78Codepage 950 Chinese (Hong Kong SAR, Taiwan) Windows
0x79Codepage 949 Korean Windows
0x7ACodepage 936 Chinese (PRC, Singapore) Windows
0x7BCodepage 932 Japanese Windows
0x7CCodepage 874 Thai Windows
0x7DCodepage 1255 Hebrew Windows
0x7ECodepage 1256 Arabic Windows
0x96Codepage 10007 Russian MacIntosh
0x97Codepage 10029 MacIntosh EE
0x98Codepage 10006 Greek MacIntosh
0xC8Codepage 1250 Eastern European Windows
0xC9Codepage 1251 Russian Windows
0xCACodepage 1254 Turkish Windows
0xCBCodepage 1253 Greek Windows
all othersUnknown / invalid

Ultimately, the very old Visual FoxPro driver did the trick (with switched provider to “VFPOLEDB.1”) and respected encoding, saving me from manual strings transcoding in my C# application.

Now you have seen everything!