| |
- builtins.Exception(builtins.BaseException)
-
- ParseError
- builtins.object
-
- JsonToCsv
class JsonToCsv(builtins.object) |
|
JsonToCsv - Public class containing methods for dealing with converting
Json to csv data, merging data, etc |
|
Methods defined here:
- __init__(self, formatStr, nullValue='', debug=False)
- __init__ - Create a JsonToCsv object.
@param formatStr <str> - The format formatStr for the json data to be converted.
@param nullValue <str> Default empty string - The value to assign to a "null" result.
@param debug <bool> Default False - If True, will output some debug data on stderr.
- convertToCsv(self, data, asList=False)
- convertToCsv - Convert given data to csv.
@param data <string/dict> - Either a string of json data, or a dict
@param asList <bool> Default False - If True, will return a list of the lines (as strings), otherwise will just return a string.
@return <list/str> - see "asList" param above.
- extractData(self, data)
- extractData - Return a list of lists. The outer list represents lines, the inner list data points.
e.x. returnData[0] is first line, returnData[0][2] is first line third data point.
@param data <string/dict> - Either a string of JSON data, or a dict.
@return list<list<str>> - List of lines, each line containing a list of datapoints.
Static methods defined here:
- dataToStr(csvData, separator=',')
- dataToStr - Convert a list of lists of csv data to a string.
@param csvData list<list> - A list of lists, first list is lines, inner-list are values.
@param separator <str> - Default ',' this is the separator used between fields (i.e. would be a tab in TSV format)
This is the data returned by JsonToCsv.extractData
@return str - csv data
- findDuplicates(csvData, fieldNum, flat=False)
- findDuplicates - Find lines with duplicate values in a specific field number.
This is useful to strip duplicates before using JsonToCsv.joinCsv
which requires unique values in the join field.
@see JsonToCsv.joinCsv for example code
@param csvData list<list<str>> - List of lines, each line containing string field values.
JsonToCsv.extractData returns data in this form.
@param fieldNum int - Index of the field number in which to search for duplicates
@param flat bool Default False - If False, return is a map of { "duplicateKey" : lines(copy) }.
If True, return is a flat list of all duplicate lines
@return :
When #flat is False:
dict { duplicateKeyValue[str] : lines[list<list<str>>] (copy) } -
This dict has the values with duplicates as the key, and a COPY of the lines as each value.
When #flat is True
lines[list<list<str>>] (copy)
Copies of all lines with duplicate value in #fieldNum. Duplicates will be adjacent
- joinCsv(csvData1, joinFieldNum1, csvData2, joinFieldNum2)
- joinCsv - Join two sets of csv data based on a common field value in the two sets.
csvData should be a list of list (1st is lines, second is items). Such data is gathered by using JsonToCsv.extractData method
Combined data will append the fields of csvData2 to csvData1, omitting the common field from csvData2
@param csvData1 list<list> - The "primary" data set
@param joinFieldNum1 <int> - The index of the common field in csvData1
@param csvData2 list<list> - The secondary data set
@param joinFieldNum2 <int> - The index of the common field in csvData2
@return tuple( mergedData [list<list>], onlyCsvData1 [list<list>], onlyCsvData2 [list<list>] )
Return is a tuple of 3 elements. The first is the merged csv data where a join field matched.
The second is the elements only present in csvData1
The third is the elements only present in csvData2
@raises ValueError - If csvData1 or csvData2 are not in the right format (list of lists)
@raises KeyError - If there are duplicate keys preventing a proper merge
NOTE: each csvData MUST have unique values in the "join field", or it cannot join.
Maybe try out something new for today, and check out "multiJoinCsv" function.
Use multiJoinCsv to link all matches in csvData1 to all matches in csvData2 where join fields match.
JsonToCsv.findDuplicates will identify duplicate values for a given joinfield.
So you can have something like:
myCsvData = JsonToCsv.extractData(....)
joinFieldNum = 3 # Example, 4th field is the field we will join on
myCsvDataDuplicateLines = JsonToCsv.findDuplicates(myCsvData, joinFieldNum, flat=True)
if myCsvDataDuplicateLines:
myCsvDataUniq = [line for line in myCsvData if line not in myCsvDataDuplicateLines]
else:
myCsvDataUniq = myCsvData
- multiJoinCsv(csvData1, joinFieldNum1, csvData2, joinFieldNum2)
- multiJoinCsv - Join two sets of csv data based on a common field value, but this time merge any results, i.e. if key is repeated on A then you'd have:
AA and AB.
csvData should be a list of list (1st is lines, second is items). Such data is gathered by using JsonToCsv.extractData method
Combined data will append the fields of csvData2 to csvData1, omitting the common field from csvData2
@param csvData1 list<list> - The "primary" data set
@param joinFieldNum1 <int> - The index of the common field in csvData1
@param csvData2 list<list> - The secondary data set
@param joinFieldNum2 <int> - The index of the common field in csvData2
@return tuple( mergedData [list<list>], onlyCsvData1 [list<list>], onlyCsvData2 [list<list>] )
Return is a tuple of 3 elements. The first is the merged csv data where a join field matched.
The second is the elements only present in csvData1
The third is the elements only present in csvData2
@raises ValueError - If csvData1 or csvData2 are not in the right format (list of lists)
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- OPER_CHARS = (',', '.', '[', ']', '/', '+')
|
class ParseError(builtins.Exception) |
|
ParseError - Raised if there is an error in parsing the format string.
TODO: Better name. |
|
- Method resolution order:
- ParseError
- builtins.Exception
- builtins.BaseException
- builtins.object
Data descriptors defined here:
- __weakref__
- list of weak references to the object (if defined)
Methods inherited from builtins.Exception:
- __init__(self, /, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- __new__(*args, **kwargs) from builtins.type
- Create and return a new object. See help(type) for accurate signature.
Methods inherited from builtins.BaseException:
- __delattr__(self, name, /)
- Implement delattr(self, name).
- __getattribute__(self, name, /)
- Return getattr(self, name).
- __reduce__(...)
- helper for pickle
- __repr__(self, /)
- Return repr(self).
- __setattr__(self, name, value, /)
- Implement setattr(self, name, value).
- __setstate__(...)
- __str__(self, /)
- Return str(self).
- with_traceback(...)
- Exception.with_traceback(tb) --
set self.__traceback__ to tb and return self.
Data descriptors inherited from builtins.BaseException:
- __cause__
- exception cause
- __context__
- exception context
- __dict__
- __suppress_context__
- __traceback__
- args
| |