Lost on the subcontinent

Distributed Agile, .NET, ThoughtLife

<December 2008>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910


Navigation

Subscriptions

Post Categories



Consoles, CodePages and CruiseControl.NET

In CruiseControl.NET, we make extensive use of the command-line to communicate with external tools.  We have typically relied on receiving data from these tools by having them pipe it directly to the console's standard output stream.  This has, however, created a problem when dealing with non-ANSI characters.  Trying to read an å or a from the console will typically lead to a munged character (that looks something like this ?).  This is because the Windows console uses a non-unicode encoding (eg. codepage 850 or 1252) that depends on the version and regionalisation settings of Windows.

In order to get around this problem, I tried converting the output from the console's standard output stream encoding into unicode but to no avail -- I presume that the Process class' StandardOutput stream has already attempted this conversion for me.  Fundamentally, the problem with this approach is that it depends on the external application converting its data to the console's codepage correctly.  If the output contains characters that are unsupported by the console's codepage then that data may be lost as a product of the byte conversion between codepages.  In the case of Visual SourceSafe, for example, this seems to happen.  I've noticed similar behaviour in other command-line tools as well.

After spending a fair bit of time experiment with different approaches, the only solution that I can find it to try to avoid reading data from the console.  If the tool supports writing its output directly to file then likelihood of correctly preserving non-ANSI data is much higher, so you should use that instead.  Incidentally, when reading the output from the file, you also need to make sure that you open the output file using the correct encoding (eg. new StreamReader(file, Console.Out.Encoding);).  I had been trying to avoid this approach as it is a bit of a hassle to read from the file and then clean it up afterwards as opposed to just reading input directly from stdout, but it is the only way that I can find that works.  Maybe someone with more internationalisation experience can elucidate on this problem...

posted on Saturday, September 18, 2004 5:02 PM by exortech





Powered by Dot Net Junkies, by Telligent Systems