Strip the BOM from the UTF-8 encoded text

The BOM charachters are used to indicate that the following text is enocded as UTF-8, what makes it easy to detect encoding for various text editors. However for automated processsing, it sometimes unnecessary since the protocol already defines the encoding and it is known by all participants.

Strip the BOM before reading


string text = File.ReadAllText(@"...").Trim(new char[]{'\uFEFF','\u200B'});

Strip the BOM when writing


Encoding encoding = new UTF8Encoding(false);
File.WriteAllText(@"...", text, encoding);

Post a Comment

Previous Post Next Post