Wednesday, July 20, 2005

XML Writer and encoding schemes

This all started when my friend Richard asked me to look into some code snippets, all he was trying to do is framing a xml document and flushing it to the console .. and although he has specified utf-8 as encoding scheme( he has used XmlWriterSetting object for this purpose ), but when it comes to console it was like this

so .. it is confusing .. isn't it ? I will request to read his blog as well .. < http://blogs.geekdojo.net/richardhsu/archive/2005/07/16/8675.aspx >
one thing to mention Richard has written codes in .NET Framework 2.0 ... so if you try his code snippet in framwwork 1.1 .. you will get couple of compilation errors.

I did some research on the code and I think I have found the reason of IBM437 issue. Please look at the code folliowing code sinppets ... ( I have used framework 1.1 )

private void btnWriter_Click(object sender, System.EventArgs e)
{
try
{
string fileName = @"C:\test.txt";
System.IO.TextWriter tx = new System.IO.StreamWriter( fileName );
XmlTextWriter writer = new XmlTextWriter( Console.Out );
//XmlTextWriter writer = new XmlTextWriter( tx ); // this will give u utf-8 encoding.

//Console.Out.Encoding = System.Text.Encoding.UTF8; // can't do this..!!!
writer.Formatting = Formatting.Indented;
writer.WriteStartDocument( true );
WriteQuote(writer, "MSFT", 74.125, 5.89, 69020000);
writer.Flush();
writer.Close();
MessageBox.Show( " Done ");
}
catch(Exception ex)
{
MessageBox.Show( ex.Message );
}
}

private void WriteQuote(XmlWriter writer, string symbol,
double price, double change, long volume)
{
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", symbol);
writer.WriteElementString("Price", XmlConvert.ToString(price));
writer.WriteElementString("Change", XmlConvert.ToString(change));
writer.WriteElementString("Volume", XmlConvert.ToString(volume));
writer.WriteEndElement();
}

My understanding ....
Console.Out gives you an object of type TextWriter( which is essentially point to console .. unless you are doing any console.SetOut- > and pointing the output to any object, other than console ) and if you look at the encoding scheme of this TextWriter object, it is IBM437 .. just use this simple code you can get the encoding scheme of the Console.Out ..

System.Text.Encoding enc = Console.Out.Encoding;
MessageBox.Show( enc.HeaderName);

and unfortunately it is a Readonly property, -:(

Now .. the most important point is Richard have used another object XMLWritersetting .. which is a new addition of .NET framework 2.0( I have played very lil time with 2.0 framework ) ... I am not sure if we set encoding scheme using this object whether it will override the default encoding scheme, if it should .. then it's just another bug in the MS code .. -:).
alternatively
1) we can use a StreamWriter object which also inherits TextWriter object and here the default encoding scheme is utf-8, which solve the purpose.
2) we can use WriteProcessingInstruction method which overrides the default behavior , which Richard does as well.





No comments: