VerityDotNet 1.0
C# library for Verity data profiling, quality control, remediation
Public Member Functions | Public Attributes | List of all members
VerityDotNet.QualityAnalysis Class Reference

Object containing settings and results of quality inspection. More...

Public Member Functions

 QualityAnalysis ()
 Constructor that sets defaults for all members.
 
List< string > GetJSON (bool addLF)
 Builds JSON output as list of strings.
 

Public Attributes

string title = ""
 general purpose title
 
string status = ""
 status assigned at various workflow stages
 
long numRecs
 number of records used
 
long numRecsCur
 number of records used as modifiable per thread or loop counter
 
int maxuv
 maximum number of unique values to collect
 
bool isCaseSens
 whether values are case sensitive
 
bool isQuoted
 whether values are quoted in source records
 
bool hasHeader
 whether data set has header line
 
bool extractFields
 whether to extract field names from data set instead of using supplied Field list
 
string debug
 debug level: empty, info, trace
 
string delim
 name of delimiter (comma, pipe, tab, caret)
 
string delimChar
 character representation of delimiter
 
List< Fieldfields
 indexed list of field objects
 
List< string > fieldNamesLower
 indexed list of field names lower case as helper collection
 
Dictionary< string, int > hashFields
 Dictionary of field title in lower case to list index.
 
List< Dictionary< string, long > > fieldDatatypeDist
 list of field datatype distributions correlated to fields list. Each field has counts for detected datatypes (int, real, bool, date, string, empty).
 
List< List<(string uv, long count)> > fieldUniqVals
 list correlated to fields. Each entry is a descending sorted list of uniquevalue tuples with each tuple(uv, count) where uv = string of unique value and count= integer number of instances. Max 50 values are kept with additional grouped into -other-
 
List< Dictionary< string, long > > hashFieldUVs
 Working hash map per field of unique values.
 
List< string > fieldQuality
 list correlated to fields. String of an integer 0-100 as a quality metric computed from discovered field characteristics
 
List< FieldfieldsOut
 indexed list of field objects used in output records
 
List< string > fieldOutNamesLower
 indexed list of field names lower case as helper collection
 
Dictionary< string, int > hashFieldsOut
 Dictionary of output field title in lower case to list index.
 
List< Dictionary< string, long > > fieldOutDatatypeDist
 list of field datatype distributions correlated to fields list. Each field has counts for detected datatypes (int, real, bool, date, string, empty).
 
List< List<(string uv, long count)> > fieldOutUniqVals
 list correlated to fields. Each entry is a descending sorted list of uniquevalue tuples with each tuple(uv, count) where uv = string of unique value and count= integer number of instances. Max 50 values are kept with additional grouped into -other-
 
List< Dictionary< string, long > > hashOutFieldUVs
 Working hash map per field of unique values.
 
List< string > fieldOutQuality
 list correlated to fields. String of an integer 0-100 as a quality metric computed from discovered field characteristics
 
Dictionary< string, long > recSizeDist
 dictionary of record sizes (byte lengths) as a string to counts. Max 100 sizes.
 
Dictionary< string, long > recParseErrs
 dictionary of parsing errors (number fields after parsing relative to defined fields) by type as small1 (1 too few fields), small2 (2 or more missing fields), big (1 or more too many fields).
 
Dictionary< string, List< string > > recParseErrsExamples
 dictionary of parsing errors example records by type as small1 (1 too few fields), small2 (2 or more missing fields), big (1 or more too many fields).
 
Dictionary< string, long > recParseDist
 dictionary of number of parsed fields as string to count
 
Dictionary< string, long > specCharDist
 dictionary of special characters and their counts. Special characters are(some use aliases as dictionary keys): tab, !, doublequote, #, , >, [, ], backslash, ^, {, }, ~, ascii_[0-31, 127-255], unicode_[256 - 65535]
 
List< Dictionary< string, long > > specCharDistField
 list correlated to fields. Each entry is a dictionary of special character to its count of instances for specific field. Same organization as in specCharDist
 
List< string > specCharExamples
 list of source lines with special characters. Each entry is (nline)[sp char list] record with nline being the number line read from source data(excluding empty and comments lines) and[sp char list] comma delimited string of each special character found in the record such as [spchar1, spchar2] lineIn. A single field can have more than 1 special character.For example, input line (pipe delimited) as record line #5 (although actual file line number could be larger due to comments and empty lines) and data = !dog |{ House}|123^456 will be stored as an example as (5)[!,{,},^]!dog|{House}| 123 ^ 456
 
List< CoValuecoValues
 list of coValue objects
 
List< List<(string uv, long count)> > coValueUniqVals
 correlated to covalues array. Similar to field unique values.
 
List< Dictionary< string, long > > hashCoValueUVs
 Working hash map per coValue of unique values.
 
Dictionary< string, long > errStats
 dictionary of: numrecsErr: number records with any kind of error numrecsErrDatatype: number records with datatype error numrecsErrFmt: number records with format error
 
List< long > fieldsErrDatatypeCount
 correlated to fields. List of number datatype errors per field
 
List< Dictionary< string, long > > fieldsErrDatatypeReasons
 correlated to fields. dictionary of datatype error reasons to count per field
 
List< long > fieldsErrFmtCount
 correlated to fields. List of number format errors per field
 
List< Dictionary< string, long > > fieldsErrFmtReasons
 correlated to fields. dictionary of format error reasons to count per field
 
List< string > errDatatypeExamples
 list of records from source associated with datatype errors. Has prefix of (nline) with nline being the number line read from source data (excluding empty and comments lines). Syntax is: (nline)[fieldinfo]|[fieldinfo]..... where[fieldinfo] is fieldTitle:reason:fieldValue.fieldValue will be set to -empty- if the actual value is empty. nline is 1-based and therefore 1 larger than the line's index
 
List< string > errFmtExamples
 list of records from source associated with format errors. Has prefix of (nline) with nline being the number line read from source data (excluding empty and comments lines). Syntax is: (nline)[fieldinfo]|[fieldinfo]..... where[fieldinfo] is fieldTitle:reason:fieldValue.fieldValue will be set to -empty- if the actual value is empty. nline is 1-based and therefore 1 larger than the line's index
 
List< string > logMsgs
 log messages
 

Detailed Description

Object containing settings and results of quality inspection.

Member Function Documentation

◆ GetJSON()

List< string > VerityDotNet.QualityAnalysis.GetJSON ( bool addLF)

Builds JSON output as list of strings.

Parameters
addLFbool whether to add line feed at end of each list entry
Returns
list of strings ready to save to file

The documentation for this class was generated from the following file: