Class CsvFile
- java.lang.Object
-
- com.opengamma.strata.collect.io.CsvFile
-
public final class CsvFile extends Object
A CSV file.Represents a CSV file together with the ability to parse it from a
CharSource
. The separator may be specified, allowing TSV files (tab-separated) and other similar formats to be parsed.This class loads the entire CSV file into memory. To process the CSV file row-by-row, use
CsvIterator
.The CSV file format is a general-purpose comma-separated value format. The format is parsed line-by-line, with lines separated by CR, LF or CRLF. Each line can contain one or more fields. Each field is separated by a comma character (,) or tab. Any field may be quoted using a double quote at the start and end. A quoted field may additionally be prefixed by an equals sign. The content of a quoted field may include commas and additional double quotes. Two adjacent double quotes in a quoted field will be replaced by a single double quote. Quoted fields are not trimmed. Non-quoted fields are trimmed.
The first line may be treated as a header row. The header row is accessed separately from the data rows.
Blank lines are ignored. Lines may be commented with has '#' or semicolon ';'.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
containsHeader(String header)
Checks if the header is present in the file.boolean
containsHeader(Pattern headerPattern)
Checks if the header pattern is present in the file.boolean
containsHeaders(Collection<String> headers)
Checks if the headers are present in the file.boolean
equals(Object obj)
Checks if this CSV file equals another.static char
findSeparator(CharSource source)
Finds the separator used by the specified CSV file.int
hashCode()
Returns a suitable hash code for the CSV file.ImmutableList<String>
headers()
Gets the header row.static CsvFile
of(CharSource source, boolean headerRow)
Parses the specified source as a CSV file, using a comma as the separator.static CsvFile
of(CharSource source, boolean headerRow, char separator)
Parses the specified source as a CSV file where the separator is specified and might not be a comma.static CsvFile
of(Reader reader, boolean headerRow)
Parses the specified reader as a CSV file, using a comma as the separator.static CsvFile
of(Reader reader, boolean headerRow, char separator)
Parses the specified reader as a CSV file where the separator is specified and might not be a comma.static CsvFile
of(List<String> headers, List<? extends List<String>> rows)
Obtains an instance from a list of headers and rows.CsvRow
row(int index)
Gets a single row.int
rowCount()
Gets the number of data rows.ImmutableList<CsvRow>
rows()
Gets all data rows in the file.String
toString()
Returns a string describing the CSV file.CsvFile
withHeaders(List<String> headers)
Returns an instance with the specified headers.
-
-
-
Method Detail
-
of
public static CsvFile of(CharSource source, boolean headerRow)
Parses the specified source as a CSV file, using a comma as the separator.CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using
UnicodeBom
.- Parameters:
source
- the CSV file resourceheaderRow
- whether the source has a header row, an empty source must still contain the header- Returns:
- the CSV file
- Throws:
UncheckedIOException
- if an IO exception occursIllegalArgumentException
- if the file cannot be parsed
-
of
public static CsvFile of(CharSource source, boolean headerRow, char separator)
Parses the specified source as a CSV file where the separator is specified and might not be a comma.This overload allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using
UnicodeBom
.- Parameters:
source
- the file resourceheaderRow
- whether the source has a header row, an empty source must still contain the headerseparator
- the separator used to separate each field, typically a comma, but a tab is sometimes used- Returns:
- the CSV file
- Throws:
UncheckedIOException
- if an IO exception occursIllegalArgumentException
- if the file cannot be parsed
-
of
public static CsvFile of(Reader reader, boolean headerRow)
Parses the specified reader as a CSV file, using a comma as the separator.This factory method takes a
Reader
. Callers are encouraged to useCharSource
instead ofReader
as it allows the resource to be safely managed.This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using
UnicodeBom
.- Parameters:
reader
- the file resourceheaderRow
- whether the source has a header row, an empty source must still contain the header- Returns:
- the CSV file
- Throws:
UncheckedIOException
- if an IO exception occursIllegalArgumentException
- if the file cannot be parsed
-
of
public static CsvFile of(Reader reader, boolean headerRow, char separator)
Parses the specified reader as a CSV file where the separator is specified and might not be a comma.This factory method takes a
Reader
. Callers are encouraged to useCharSource
instead ofReader
as it allows the resource to be safely managed.This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
CSV files sometimes contain a Unicode Byte Order Mark. Callers are responsible for handling this, such as by using
UnicodeBom
.- Parameters:
reader
- the file resourceheaderRow
- whether the source has a header row, an empty source must still contain the headerseparator
- the separator used to separate each field, typically a comma, but a tab is sometimes used- Returns:
- the CSV file
- Throws:
UncheckedIOException
- if an IO exception occursIllegalArgumentException
- if the file cannot be parsed
-
findSeparator
public static char findSeparator(CharSource source)
Finds the separator used by the specified CSV file.The search includes comma, semicolon, colon, tab and pipe (in that order of priority).
The algorithm operates in a number of steps. Firstly, it looks for occurrences where a separator is followed by valid quoted text. If this matches, the separator is assumed to be correct. Secondly, it looks for lines that only consist of a separator. If this matches, the separator is assumed to be correct. Thirdly, it looks to see which separator is the most common on the line. If that separator is also the most common on the next line, and the number of columns matches, the separator is assumed to be correct. Otherwise another line is processed. Thus to match a separator, there must be two lines with the same number of columns. At most, 100 content lines are read from the file. The default is comma if the file is empty.
- Parameters:
source
- the source to read as CSV- Returns:
- the CSV file
- Throws:
UncheckedIOException
- if an IO exception occursIllegalArgumentException
- if the file cannot be parsed
-
of
public static CsvFile of(List<String> headers, List<? extends List<String>> rows)
Obtains an instance from a list of headers and rows.The headers may be an empty list. All the rows must contain a list of the same size, matching the header if present.
- Parameters:
headers
- the headers, empty if no headersrows
- the data rows- Returns:
- the CSV file
- Throws:
IllegalArgumentException
- if the rows do not match the headers
-
headers
public ImmutableList<String> headers()
Gets the header row.If there is no header row, an empty list is returned.
- Returns:
- the header row
-
rows
public ImmutableList<CsvRow> rows()
Gets all data rows in the file.- Returns:
- the data rows
-
rowCount
public int rowCount()
Gets the number of data rows.- Returns:
- the number of data rows
-
row
public CsvRow row(int index)
Gets a single row.- Parameters:
index
- the row index, zero-based- Returns:
- the row
-
containsHeader
public boolean containsHeader(String header)
Checks if the header is present in the file.Matching is case insensitive.
- Parameters:
header
- the column header to match- Returns:
- true if the header is present
-
containsHeaders
public boolean containsHeaders(Collection<String> headers)
Checks if the headers are present in the file.Matching is case insensitive.
- Parameters:
headers
- the column headers to match- Returns:
- true if all the headers are present
-
containsHeader
public boolean containsHeader(Pattern headerPattern)
Checks if the header pattern is present in the file.Matching is case insensitive.
- Parameters:
headerPattern
- the header pattern to match- Returns:
- true if the header is present
-
withHeaders
public CsvFile withHeaders(List<String> headers)
Returns an instance with the specified headers.- Parameters:
headers
- the new headers- Returns:
- the instance with the specified headers
-
equals
public boolean equals(Object obj)
Checks if this CSV file equals another.The comparison checks the content.
-
hashCode
public int hashCode()
Returns a suitable hash code for the CSV file.
-
-