Class UnicodeBom


  • public final class UnicodeBom
    extends Object
    Utilities that allow code to use the Unicode Byte Order Mark.

    A Unicode file may contain a Byte Order Mark (BOM) that specifies which encoding is used. Sadly, neither the JDK nor Guava handle this properly.

    This class supports the BOM for UTF-8, UTF-16LE and UTF-16BE. The UTF-32 formats are rarely seen and cannot be easily determined as the UTF-32 BOMs are similar to the UTF-16 BOMs.

    • Method Detail

      • toString

        public static String toString​(byte[] input)
        Converts a byte[] to a String.

        This ensures that any Unicode byte order marker is used correctly. The default encoding is UTF-8 if no BOM is found.

        Parameters:
        input - the input byte array
        Returns:
        the equivalent string
      • toCharSource

        public static CharSource toCharSource​(ByteSource byteSource)
        Converts a ByteSource to a CharSource.

        This ensures that any Unicode byte order marker is used correctly. The default encoding is UTF-8 if no BOM is found.

        Parameters:
        byteSource - the byte source
        Returns:
        the char source, that uses the BOM to determine the encoding
      • toReader

        public static Reader toReader​(InputStream inputStream)
                               throws IOException
        Converts an InputStream to a Reader.

        This ensures that any Unicode byte order marker is used correctly. The default encoding is UTF-8 if no BOM is found.

        Parameters:
        inputStream - the input stream to wrap
        Returns:
        the reader, that uses the BOM to determine the encoding
        Throws:
        IOException - if an IO error occurs