Lost on the subcontinent

Distributed Agile, .NET, ThoughtLife

<December 2008>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910


Navigation

Subscriptions

Post Categories



Consoles, CodePages and CruiseControl.NET -- Part II

One thing that I've learned from all of this experimentation with consoles and codepages is that converting between encodings can be lossy.  It's obvious when you think about it -- convertng from Unicode (a double or variable byte encoding) to ANSI (a single byte encoding) means that information in the second byte has nowhere to go. 

Most encodings share the first 128 characters with the ASCII character set.  So, you don't notice the impact of conversion until you start to use characters outside of that range.  The best way to minimise the likelihood of losing non-ASCII characters is to just avoid converting between encodings altogether. 

From the previous post, although it may not be readily obvious, writing to the console will convert your data to the default Windows code page which generally uses a single byte encoding (unless you are using the Chinese or Japanese versions of Windows).  Similarly, reading from the console streams (using the Process class), will automatically attempt to convert the output from the default Windows encoding to Unicode.  As far as I can determine you have no ability to change the encoding that the Process' streams will use.  This is why using files to transmit data is far preferrable -- you can control which encoding you use to write to and read from the file, and, as a result, you can avoid any unnecessary, and potentially lossy, encoding conversions.

posted on Sunday, September 19, 2004 1:53 PM by exortech





Powered by Dot Net Junkies, by Telligent Systems