Class CsvFile


  • public final class CsvFile
    extends Object
    A CSV file.

    Represents a CSV file together with the ability to parse it from a CharSource. The separator may be specified, allowing TSV files (tab-separated) and other similar formats to be parsed.

    This class loads the entire CSV file into memory. To process the CSV file row-by-row, use CsvIterator.

    The CSV file format is a general-purpose comma-separated value format. The format is parsed line-by-line, with lines separated by CR, LF or CRLF. Each line can contain one or more fields. Each field is separated by a comma character (,) or tab. Any field may be quoted using a double quote at the start and end. A quoted field may additionally be prefixed by an equals sign. The content of a quoted field may include commas and additional double quotes. Two adjacent double quotes in a quoted field will be replaced by a single double quote. Quoted fields are not trimmed. Non-quoted fields are trimmed.

    The first line may be treated as a header row. The header row is accessed separately from the data rows.

    Blank lines are ignored. Lines may be commented with has '#' or semicolon ';'.

    • Method Detail

      • of

        public static CsvFile of​(CharSource source,
                                 boolean headerRow)
        Parses the specified source as a CSV file, using a comma as the separator.

        CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using UnicodeBom.

        Parameters:
        source - the CSV file resource
        headerRow - whether the source has a header row, an empty source must still contain the header
        Returns:
        the CSV file
        Throws:
        UncheckedIOException - if an IO exception occurs
        IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvFile of​(CharSource source,
                                 boolean headerRow,
                                 char separator)
        Parses the specified source as a CSV file where the separator is specified and might not be a comma.

        This overload allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.

        CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using UnicodeBom.

        Parameters:
        source - the file resource
        headerRow - whether the source has a header row, an empty source must still contain the header
        separator - the separator used to separate each field, typically a comma, but a tab is sometimes used
        Returns:
        the CSV file
        Throws:
        UncheckedIOException - if an IO exception occurs
        IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvFile of​(Reader reader,
                                 boolean headerRow)
        Parses the specified reader as a CSV file, using a comma as the separator.

        This factory method takes a Reader. Callers are encouraged to use CharSource instead of Reader as it allows the resource to be safely managed.

        This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.

        CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using UnicodeBom.

        Parameters:
        reader - the file resource
        headerRow - whether the source has a header row, an empty source must still contain the header
        Returns:
        the CSV file
        Throws:
        UncheckedIOException - if an IO exception occurs
        IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvFile of​(Reader reader,
                                 boolean headerRow,
                                 char separator)
        Parses the specified reader as a CSV file where the separator is specified and might not be a comma.

        This factory method takes a Reader. Callers are encouraged to use CharSource instead of Reader as it allows the resource to be safely managed.

        This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.

        CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using UnicodeBom.

        Parameters:
        reader - the file resource
        headerRow - whether the source has a header row, an empty source must still contain the header
        separator - the separator used to separate each field, typically a comma, but a tab is sometimes used
        Returns:
        the CSV file
        Throws:
        UncheckedIOException - if an IO exception occurs
        IllegalArgumentException - if the file cannot be parsed
      • findSeparator

        public static char findSeparator​(CharSource source)
        Finds the separator used by the specified CSV file.

        The search includes comma, semicolon, colon, tab and pipe (in that order of priority).

        The algorithm operates in a number of steps. Firstly, it looks for occurrences where a separator is followed by valid quoted text. If this matches, the separator is assumed to be correct. Secondly, it looks for lines that only consist of a separator. If this matches, the separator is assumed to be correct. Thirdly, it looks to see which separator is the most common on the line. If that separator is also the most common on the next line, and the number of columns matches, the separator is assumed to be correct. Otherwise another line is processed. Thus to match a separator, there must be two lines with the same number of columns. At most, 100 content lines are read from the file. The default is comma if the file is empty.

        Parameters:
        source - the source to read as CSV
        Returns:
        the CSV file
        Throws:
        UncheckedIOException - if an IO exception occurs
        IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvFile of​(List<String> headers,
                                 List<? extends List<String>> rows)
        Obtains an instance from a list of headers and rows.

        The headers may be an empty list. All the rows must contain a list of the same size, matching the header if present.

        Parameters:
        headers - the headers, empty if no headers
        rows - the data rows
        Returns:
        the CSV file
        Throws:
        IllegalArgumentException - if the rows do not match the headers
      • headers

        public ImmutableList<String> headers()
        Gets the header row.

        If there is no header row, an empty list is returned.

        Returns:
        the header row
      • rows

        public ImmutableList<CsvRow> rows()
        Gets all data rows in the file.
        Returns:
        the data rows
      • rowCount

        public int rowCount()
        Gets the number of data rows.
        Returns:
        the number of data rows
      • row

        public CsvRow row​(int index)
        Gets a single row.
        Parameters:
        index - the row index, zero-based
        Returns:
        the row
      • containsHeader

        public boolean containsHeader​(String header)
        Checks if the header is present in the file.

        Matching is case insensitive.

        Parameters:
        header - the column header to match
        Returns:
        true if the header is present
      • containsHeaders

        public boolean containsHeaders​(Collection<String> headers)
        Checks if the headers are present in the file.

        Matching is case insensitive.

        Parameters:
        headers - the column headers to match
        Returns:
        true if all the headers are present
      • containsHeader

        public boolean containsHeader​(Pattern headerPattern)
        Checks if the header pattern is present in the file.

        Matching is case insensitive.

        Parameters:
        headerPattern - the header pattern to match
        Returns:
        true if the header is present
      • withHeaders

        public CsvFile withHeaders​(List<String> headers)
        Returns an instance with the specified headers.
        Parameters:
        headers - the new headers
        Returns:
        the instance with the specified headers
      • equals

        public boolean equals​(Object obj)
        Checks if this CSV file equals another.

        The comparison checks the content.

        Overrides:
        equals in class Object
        Parameters:
        obj - the other file, null returns false
        Returns:
        true if equal
      • hashCode

        public int hashCode()
        Returns a suitable hash code for the CSV file.
        Overrides:
        hashCode in class Object
        Returns:
        the hash code
      • toString

        public String toString()
        Returns a string describing the CSV file.
        Overrides:
        toString in class Object
        Returns:
        the descriptive string