sorcha.readers.CSVReader

Classes

CSVDataReader

A class to read in object data files stored as CSV or whitespace

Module Contents

class CSVDataReader(filename, sep='csv', header=-1, **kwargs)[source]

Bases: sorcha.readers.ObjectDataReader.ObjectDataReader

A class to read in object data files stored as CSV or whitespace separated values.

Requires that the file's first column is ObjID.

filename[source]
sep = 'csv'[source]
header_row[source]
obj_id_table = None[source]
get_reader_info()[source]

Return a string identifying the current reader name and input information (for logging and output).

Returns:

name -- The reader information.

Return type:

string

_find_and_validate_header_line(header=-1)[source]

Read and validate the header line. If no line number is provided, use a heuristic match to find the header line. This is used in cases where the header is not the first line and we want to skip down.

Parameters:

header (integer, optional) -- The row number of the header. If not provided, does an automatic search. Default = -1

Returns:

The line index of the header.

Return type:

integer

_check_header_line(header_line)[source]

Check that a given header line is valid and exit if it is invalid.

Parameters:

header_line (str) -- The proposed header line.

_validate_csv(header)[source]

Perform a validation of the CSV file, such as checking for blank lines.

This is an expensive test and should only be performed when something has gone wrong. This is needed because panda's read_csv() function can given vague errors (such as failing with an index error if the file has blank lines at the end).

Parameters:

header (integer) -- The row number of the header.

Returns:

True indicating success.

Return type:

bool

_read_rows_internal(block_start=0, block_size=None, **kwargs)[source]

Reads in a set number of rows from the input.

Parameters:
  • block_start (integer, optional) -- The 0-indexed row number from which to start reading the data. For example in a CSV file block_start=2 would skip the first two lines after the header and return data starting on row=2. Default =0

  • block_size (integer, optional, default=None) -- The number of rows to read in. Use block_size=None to read in all available data. default =None

  • **kwargs (dictionary, optional) -- Extra arguments

Returns:

res_df -- Dataframe of the object data.

Return type:

pandas dataframe

_build_id_map()[source]

Builds a table of just the object IDs

_read_objects_internal(obj_ids, **kwargs)[source]

Read in a chunk of data for given object IDs.

Parameters:
  • obj_ids (list) -- A list of object IDs to use.

  • **kwargs (dictionary, optional) -- Extra arguments

Returns:

res_df -- The dataframe for the object data.

Return type:

pandas dataframe

_process_and_validate_input_table(input_table, **kwargs)[source]

Perform any input-specific processing and validation on the input table. Modifies the input dataframe in place.

Notes

The base implementation includes filtering that is common to most input types. Subclasses should call super.process_and_validate() to ensure that the ancestor’s validation is also applied.

Parameters:
  • input_table (Pandas dataframe) -- A loaded table.

  • **kwargs (dictionary, optional) -- Extra arguments

Returns:

input_table -- Returns the input dataframe modified in-place.

Return type:

pandas dataframe