VerityDotNet 1.0
C# library for Verity data profiling, quality control, remediation
|
Object containing settings and results of quality inspection. More...
Public Member Functions | |
QualityAnalysis () | |
Constructor that sets defaults for all members. | |
List< string > | GetJSON (bool addLF) |
Builds JSON output as list of strings. | |
Public Attributes | |
string | title = "" |
general purpose title | |
string | status = "" |
status assigned at various workflow stages | |
long | numRecs |
number of records used | |
long | numRecsCur |
number of records used as modifiable per thread or loop counter | |
int | maxuv |
maximum number of unique values to collect | |
bool | isCaseSens |
whether values are case sensitive | |
bool | isQuoted |
whether values are quoted in source records | |
bool | hasHeader |
whether data set has header line | |
bool | extractFields |
whether to extract field names from data set instead of using supplied Field list | |
string | debug |
debug level: empty, info, trace | |
string | delim |
name of delimiter (comma, pipe, tab, caret) | |
string | delimChar |
character representation of delimiter | |
List< Field > | fields |
indexed list of field objects | |
List< string > | fieldNamesLower |
indexed list of field names lower case as helper collection | |
Dictionary< string, int > | hashFields |
Dictionary of field title in lower case to list index. | |
List< Dictionary< string, long > > | fieldDatatypeDist |
list of field datatype distributions correlated to fields list. Each field has counts for detected datatypes (int, real, bool, date, string, empty). | |
List< List<(string uv, long count)> > | fieldUniqVals |
list correlated to fields. Each entry is a descending sorted list of uniquevalue tuples with each tuple(uv, count) where uv = string of unique value and count= integer number of instances. Max 50 values are kept with additional grouped into -other- | |
List< Dictionary< string, long > > | hashFieldUVs |
Working hash map per field of unique values. | |
List< string > | fieldQuality |
list correlated to fields. String of an integer 0-100 as a quality metric computed from discovered field characteristics | |
List< Field > | fieldsOut |
indexed list of field objects used in output records | |
List< string > | fieldOutNamesLower |
indexed list of field names lower case as helper collection | |
Dictionary< string, int > | hashFieldsOut |
Dictionary of output field title in lower case to list index. | |
List< Dictionary< string, long > > | fieldOutDatatypeDist |
list of field datatype distributions correlated to fields list. Each field has counts for detected datatypes (int, real, bool, date, string, empty). | |
List< List<(string uv, long count)> > | fieldOutUniqVals |
list correlated to fields. Each entry is a descending sorted list of uniquevalue tuples with each tuple(uv, count) where uv = string of unique value and count= integer number of instances. Max 50 values are kept with additional grouped into -other- | |
List< Dictionary< string, long > > | hashOutFieldUVs |
Working hash map per field of unique values. | |
List< string > | fieldOutQuality |
list correlated to fields. String of an integer 0-100 as a quality metric computed from discovered field characteristics | |
Dictionary< string, long > | recSizeDist |
dictionary of record sizes (byte lengths) as a string to counts. Max 100 sizes. | |
Dictionary< string, long > | recParseErrs |
dictionary of parsing errors (number fields after parsing relative to defined fields) by type as small1 (1 too few fields), small2 (2 or more missing fields), big (1 or more too many fields). | |
Dictionary< string, List< string > > | recParseErrsExamples |
dictionary of parsing errors example records by type as small1 (1 too few fields), small2 (2 or more missing fields), big (1 or more too many fields). | |
Dictionary< string, long > | recParseDist |
dictionary of number of parsed fields as string to count | |
Dictionary< string, long > | specCharDist |
dictionary of special characters and their counts. Special characters are(some use aliases as dictionary keys): tab, !, doublequote, #, , >, [, ], backslash, ^, {, }, ~, ascii_[0-31, 127-255], unicode_[256 - 65535] | |
List< Dictionary< string, long > > | specCharDistField |
list correlated to fields. Each entry is a dictionary of special character to its count of instances for specific field. Same organization as in specCharDist | |
List< string > | specCharExamples |
list of source lines with special characters. Each entry is (nline)[sp char list] record with nline being the number line read from source data(excluding empty and comments lines) and[sp char list] comma delimited string of each special character found in the record such as [spchar1, spchar2] lineIn. A single field can have more than 1 special character.For example, input line (pipe delimited) as record line #5 (although actual file line number could be larger due to comments and empty lines) and data = !dog |{ House}|123^456 will be stored as an example as (5)[!,{,},^]!dog|{House}| 123 ^ 456 | |
List< CoValue > | coValues |
list of coValue objects | |
List< List<(string uv, long count)> > | coValueUniqVals |
correlated to covalues array. Similar to field unique values. | |
List< Dictionary< string, long > > | hashCoValueUVs |
Working hash map per coValue of unique values. | |
Dictionary< string, long > | errStats |
dictionary of: numrecsErr: number records with any kind of error numrecsErrDatatype: number records with datatype error numrecsErrFmt: number records with format error | |
List< long > | fieldsErrDatatypeCount |
correlated to fields. List of number datatype errors per field | |
List< Dictionary< string, long > > | fieldsErrDatatypeReasons |
correlated to fields. dictionary of datatype error reasons to count per field | |
List< long > | fieldsErrFmtCount |
correlated to fields. List of number format errors per field | |
List< Dictionary< string, long > > | fieldsErrFmtReasons |
correlated to fields. dictionary of format error reasons to count per field | |
List< string > | errDatatypeExamples |
list of records from source associated with datatype errors. Has prefix of (nline) with nline being the number line read from source data (excluding empty and comments lines). Syntax is: (nline)[fieldinfo]|[fieldinfo]..... where[fieldinfo] is fieldTitle:reason:fieldValue.fieldValue will be set to -empty- if the actual value is empty. nline is 1-based and therefore 1 larger than the line's index | |
List< string > | errFmtExamples |
list of records from source associated with format errors. Has prefix of (nline) with nline being the number line read from source data (excluding empty and comments lines). Syntax is: (nline)[fieldinfo]|[fieldinfo]..... where[fieldinfo] is fieldTitle:reason:fieldValue.fieldValue will be set to -empty- if the actual value is empty. nline is 1-based and therefore 1 larger than the line's index | |
List< string > | logMsgs |
log messages | |
Object containing settings and results of quality inspection.
List< string > VerityDotNet.QualityAnalysis.GetJSON | ( | bool | addLF | ) |
Builds JSON output as list of strings.
addLF | bool whether to add line feed at end of each list entry |