Class UnicodeBom


  • public final class UnicodeBom
    extends java.lang.Object
    Utilities that allow code to use the Unicode Byte Order Mark.

    A Unicode file may contain a Byte Order Mark (BOM) that specifies which encoding is used. Sadly, neither the JDK nor Guava handle this properly.

    This class supports the BOM for UTF-8, UTF-16LE and UTF-16BE. The UTF-32 formats are rarely seen and cannot be easily determined as the UTF-32 BOMs are similar to the UTF-16 BOMs.

    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static com.google.common.io.CharSource toCharSource​(com.google.common.io.ByteSource byteSource)
      Converts a ByteSource to a CharSource.
      static java.io.Reader toReader​(java.io.InputStream inputStream)
      Converts an InputStream to a Reader.
      static java.lang.String toString​(byte[] input)
      Converts a byte[] to a String.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • toString

        public static java.lang.String toString​(byte[] input)
        Converts a byte[] to a String.

        This ensures that any Unicode byte order marker is used correctly. The default encoding is UTF-8 if no BOM is found.

        Parameters:
        input - the input byte array
        Returns:
        the equivalent string
      • toCharSource

        public static com.google.common.io.CharSource toCharSource​(com.google.common.io.ByteSource byteSource)
        Converts a ByteSource to a CharSource.

        This ensures that any Unicode byte order marker is used correctly. The default encoding is UTF-8 if no BOM is found.

        Parameters:
        byteSource - the byte source
        Returns:
        the char source, that uses the BOM to determine the encoding
      • toReader

        public static java.io.Reader toReader​(java.io.InputStream inputStream)
                                       throws java.io.IOException
        Converts an InputStream to a Reader.

        This ensures that any Unicode byte order marker is used correctly. The default encoding is UTF-8 if no BOM is found.

        Parameters:
        inputStream - the input stream to wrap
        Returns:
        the reader, that uses the BOM to determine the encoding
        Throws:
        java.io.IOException - if an IO error occurs