VerityDotNet 1.0
C# library for Verity data profiling, quality control, remediation
|
Performs deep inspection of data records to discover and assess a variety of structure, syntax, and semantic problems and inconsistencies. More...
Static Public Member Functions | |
static QualityAnalysis | Inspect (List< Field > srcFields, List< Dictionary< string, string > > coValues, Dictionary< string, string > settings, List< string > recs) |
Performs deep inspection of data records to discover and assess a variety of structure, syntax, and semantic problems and inconsistencies. | |
Performs deep inspection of data records to discover and assess a variety of structure, syntax, and semantic problems and inconsistencies.
|
static |
Performs deep inspection of data records to discover and assess a variety of structure, syntax, and semantic problems and inconsistencies.
srcFields | list of field objects with attributes- title: field name, datatype: int, real, bool, date, string. For date, there should be an entry in field.fmtDate specifying the date format otherwise it is set to ISO yyyyMMdd fmt_strcase: (upper, lower, empty) fmt_strcut: (front, back, empty). Side to cut characters from if it is larger than specified fmt_strlen. Default is back. fmt_strpad: (front, back, empty). Side to add characters to if it is smaller than specified fmt_strlen. Default is back. fmt_strpadchar: (single character or character alias). Character to add if needed to make too small string meet specified fmt_strlen. Default is _ fmt_strlen: integer number of characters(>0) if a fixed size is required. Ignored if less than 0 fmt_decimal: number of decimal digits(0 - N). Ignored if less than 0 fmt_date: without time part - yyyymmdd, yymmdd, yyyyddmm, yyddmm, mmddyyyy, mmddyy, ddmmyyyy, ddmmyy (mmm = month abbreviation like Jan) yyyymmmdd, yyyyddmmm, ddmmmyyyy, ddmmmyy (month = full month name like January) yyyymonthdd, yyyyddmonth, ddmonthyyyy, ddmonthyy with time part: suffix to above date formats as (T= letter T, S = space)-Thhmmss, Thhmm, Thh, Shhmmss, Shhmm, Shh like mmddyyyyThhmm for 11282024T1407 or 11 / 28 / 2024T14:07 or 11 - 28 - 2024 14:07 with time zone: if time zone is required at end of time part add suffix Z like mmddyyyyThhmmZ 11282024T1407 |
coValues | optional.List of Dictionaries. Each dictionary is for one coValue object with keys for: field1, field2, field3. Each fieldn is title of a source field. Field1 and field2 are required and field3 is optional but if specified makes the coValue for the joint values of all three fields, otherwise it is for joint values of two fields.The title of the coValue object is the concantenation of field titles with a comma delimiter. |
settings | Dictionary for settings -isCaseSens: bool whether is case sensitive. Default false -isQuoted: bool field values may be enclosed (allows delimiter within) by double quotes. Default false -hasHeader: bool whether has header line in recs. Default true.Must be true if extractFields is true -extractFields: bool whether to read in field titles from header line(first non-comment, non-empty line). Default is false.If true then hasHeader must also be true, and srcFields list will only be used to copy its datatype and formatting to the report field object. Thus, you can extract field titles from data set and still define characteristics if desired.If not, ensure srcFields is empty. -delim: record delimiter (comma*,pipe,tab,colon) -maxuv: optional. string of integer value that is maximum number of unique values per field to collect. Default is 50 and set to default if supplied value less than 1 or greater than 1000 -maxThreads: optional. Default 40. string of integer value that is maximum number of threads to use when multi-threading is allowed. -nRecsPerThreadMin: optional. Default 500 (min is 1). Minimum number of records to send to each thread if using multi-threading -nRecsPerThreadMax: optional. Default 100000 (max is 1e6). Maximum number of records to send to each thread if using multi-threading -useThreads: bool default false. Multi-threading will be used if the license active -license: optional string of VerityDotNet license. Required to be active to use multi-Threading. -licenseId: required when license is used. Id used to make license string. Is used to decrypt the license. -debug: (info,trace,"") to collect log messages |
recs | List of strings that are records read as lines from source data. If first data entry (not empty, does not start with // nor #) is a header with delimited field names make sure to include setting hasHeader=true |