Class CsvIterator

  • All Implemented Interfaces:
    com.google.common.collect.PeekingIterator<CsvRow>, java.lang.AutoCloseable, java.util.Iterator<CsvRow>

    public final class CsvIterator
    extends java.lang.Object
    implements java.lang.AutoCloseable, com.google.common.collect.PeekingIterator<CsvRow>
    Iterator over the rows of a CSV file.

    Provides the ability to iterate over a CSV file together with the ability to parse it from a CharSource. The separator may be specified, allowing TSV files (tab-separated) and other similar formats to be parsed. See CsvFile for more details of the CSV format.

    This class processes the CSV file row-by-row. To load the entire CSV file into memory, use CsvFile.

    This class must be used in a try-with-resources block to ensure that the underlying CSV file is closed:

      try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
        // use the CsvIterator
      }
     
    One way to use the iterable is with the for-each loop, using a lambda to adapt Iterator to Iterable:
      try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
        for (CsvRow row : () -> csvIterator) {
          // process the row
        }
      }
     
    This class also allows the headers to be obtained without reading the whole CSV file:
      try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
        ImmutableList<String> headers = csvIterator.headers();
      }
     
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.stream.Stream<CsvRow> asStream()
      Returns a stream that wraps this iterator.
      void close()
      Closes the underlying reader.
      boolean containsHeader​(java.lang.String header)
      Checks if the header is known.
      boolean containsHeader​(java.util.regex.Pattern headerPattern)
      Checks if the header pattern is known.
      boolean hasNext()
      Checks whether there is another row in the CSV file.
      com.google.common.collect.ImmutableList<java.lang.String> headers()
      Gets the header row.
      CsvRow next()
      Returns the next row from the CSV file.
      java.util.List<CsvRow> nextBatch​(int count)
      Returns the next batch of rows from the CSV file.
      java.util.List<CsvRow> nextBatch​(java.util.function.Predicate<CsvRow> selector)
      Returns the next batch of rows from the CSV file using a predicate to determine the rows.
      static CsvIterator of​(com.google.common.io.CharSource source, boolean headerRow)
      Parses the specified source as a CSV file, using a comma as the separator.
      static CsvIterator of​(com.google.common.io.CharSource source, boolean headerRow, char separator)
      Parses the specified source as a CSV file where the separator is specified and might not be a comma.
      static CsvIterator of​(java.io.Reader reader, boolean headerRow)
      Parses the specified reader as a CSV file, using a comma as the separator.
      static CsvIterator of​(java.io.Reader reader, boolean headerRow, char separator)
      Parses the specified reader as a CSV file where the separator is specified and might not be a comma.
      CsvRow peek()
      Peeks the next row from the CSV file without changing the iteration position.
      void remove()
      Throws an exception as remove is not supported.
      java.lang.String toString()
      Returns a string describing the CSV iterator.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.util.Iterator

        forEachRemaining
    • Method Detail

      • of

        public static CsvIterator of​(com.google.common.io.CharSource source,
                                     boolean headerRow)
        Parses the specified source as a CSV file, using a comma as the separator.

        This method opens the CSV file for reading. The caller is responsible for closing it by calling close().

        Parameters:
        source - the CSV file resource
        headerRow - whether the source has a header row, an empty source must still contain the header
        Returns:
        the CSV file
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvIterator of​(com.google.common.io.CharSource source,
                                     boolean headerRow,
                                     char separator)
        Parses the specified source as a CSV file where the separator is specified and might not be a comma.

        This overload allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.

        This method opens the CSV file for reading. The caller is responsible for closing it by calling close().

        Parameters:
        source - the file resource
        headerRow - whether the source has a header row, an empty source must still contain the header
        separator - the separator used to separate each field, typically a comma, but a tab is sometimes used
        Returns:
        the CSV file
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvIterator of​(java.io.Reader reader,
                                     boolean headerRow)
        Parses the specified reader as a CSV file, using a comma as the separator.

        This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.

        The caller is responsible for closing the reader, such as by calling close().

        Parameters:
        reader - the file reader
        headerRow - whether the source has a header row, an empty source must still contain the header
        Returns:
        the CSV file
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • of

        public static CsvIterator of​(java.io.Reader reader,
                                     boolean headerRow,
                                     char separator)
        Parses the specified reader as a CSV file where the separator is specified and might not be a comma.

        This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.

        The caller is responsible for closing the reader, such as by calling close().

        Parameters:
        reader - the file reader
        headerRow - whether the source has a header row, an empty source must still contain the header
        separator - the separator used to separate each field, typically a comma, but a tab is sometimes used
        Returns:
        the CSV file
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • headers

        public com.google.common.collect.ImmutableList<java.lang.String> headers()
        Gets the header row.

        If there is no header row, an empty list is returned.

        Returns:
        the header row
      • containsHeader

        public boolean containsHeader​(java.lang.String header)
        Checks if the header is known.

        Matching is case insensitive.

        Parameters:
        header - the column header to match
        Returns:
        true if the header is known
      • containsHeader

        public boolean containsHeader​(java.util.regex.Pattern headerPattern)
        Checks if the header pattern is known.

        Matching is case insensitive.

        Parameters:
        headerPattern - the header pattern to match
        Returns:
        true if the header is known
      • asStream

        public java.util.stream.Stream<CsvRow> asStream()
        Returns a stream that wraps this iterator.

        The stream will process any remaining rows in the CSV file. As such, it is recommended that callers should use this method or the iterator methods and not both.

        Returns:
        the stream wrapping this iterator
      • hasNext

        public boolean hasNext()
        Checks whether there is another row in the CSV file.
        Specified by:
        hasNext in interface java.util.Iterator<CsvRow>
        Returns:
        true if there is another row, false if not
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • peek

        public CsvRow peek()
        Peeks the next row from the CSV file without changing the iteration position.
        Specified by:
        peek in interface com.google.common.collect.PeekingIterator<CsvRow>
        Returns:
        the peeked row
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
        java.util.NoSuchElementException - if the end of file has been reached
      • next

        public CsvRow next()
        Returns the next row from the CSV file.
        Specified by:
        next in interface java.util.Iterator<CsvRow>
        Specified by:
        next in interface com.google.common.collect.PeekingIterator<CsvRow>
        Returns:
        the next row
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
        java.util.NoSuchElementException - if the end of file has been reached
      • nextBatch

        public java.util.List<CsvRow> nextBatch​(int count)
        Returns the next batch of rows from the CSV file.

        This will return up to the specified number of rows from the file at the current iteration point. An empty list is returned if there are no more rows.

        Parameters:
        count - the number of rows to try and get, negative returns an empty list
        Returns:
        the next batch of rows, up to the number requested
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • nextBatch

        public java.util.List<CsvRow> nextBatch​(java.util.function.Predicate<CsvRow> selector)
        Returns the next batch of rows from the CSV file using a predicate to determine the rows.

        This is useful for CSV files where information is grouped with an identifier or key. For example, a variable notional trade file might have one row for the trade followed by multiple rows for the variable aspects, all grouped by a common trade identifier. In general, callers should peek or read the first row and use information within it to create the selector:

          while (it.hasNext()) {
            CsvRow first = it.peek();
            String id = first.getValue("ID");
            List<CsvRow> batch = it.nextBatch(row -> row.getValue("ID").equals(id));
            // process batch
          }
         
        This will return a batch of rows where the selector returns true for the row. An empty list is returned if the selector returns false for the first row.
        Parameters:
        selector - selects whether a row is part of the batch or part of the next batch
        Returns:
        the next batch of rows, as determined by the selector
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
        java.lang.IllegalArgumentException - if the file cannot be parsed
      • remove

        public void remove()
        Throws an exception as remove is not supported.
        Specified by:
        remove in interface java.util.Iterator<CsvRow>
        Specified by:
        remove in interface com.google.common.collect.PeekingIterator<CsvRow>
        Throws:
        java.lang.UnsupportedOperationException - always
      • close

        public void close()
        Closes the underlying reader.
        Specified by:
        close in interface java.lang.AutoCloseable
        Throws:
        java.io.UncheckedIOException - if an IO exception occurs
      • toString

        public java.lang.String toString()
        Returns a string describing the CSV iterator.
        Overrides:
        toString in class java.lang.Object
        Returns:
        the descriptive string