Do Quality Inspect. Performs deep inspection of data records to discover and assess
a variety of structure, syntax, and semantic problems and inconsistencies.
Field information can either be supplied with the 'fields' parameter or extracted from a header line in
records. If using 'fields', there can also be specified datatypes and formats per field which will be
used to detect errors when values do not meet these rules.
fields: list of field objects with attributes-
title: field name
datatype: int, real, bool, date, string. For date,
there should be an entry in field.fmt_date specifying the date format otherwise it is set to ISO yyyyMMdd
fmt_strlen: integer number of characters (>0) if a fixed size is required. Ignored if < 0
fmt_strcase: (upper, lower, empty)
fmt_strcut: (front, back, empty). Used in Refining records. Side to cut characters from if it is larger than specified fmt_strlen. Default is back.
fmt_strpad: (front, back, empty). Used in Refining records. Side to add characters to if it is smaller than specified fmt_strlen. Default is back.
fmt_strpadchar: single character or character alias (-space-, -fslash-, -bslash-, -tab-). Used in Refining records. Character to add if needed to make too small string meet specified fmt_strlen. Default is _
fmt_decimal: number of decimal digits (0-N). Ignored if < 0
fmt_date: without time part- yyyymmdd, yymmdd, yyyyddmm, yyddmm, mmddyyyy, mmddyy, ddmmyyyy, ddmmyy
(mmm= month abbreviation like Jan) yyyymmmdd, yyyyddmmm, ddmmmyyyy, ddmmmyy
(month= full month name like January) yyyymonthdd, yyyyddmonth, ddmonthyyyy, ddmonthyy
with time part: suffix to above date formats as (T=letter T, S=space)- Thhmmss, Thhmm, Thh,
Shhmmss, Shhmm, Shh like mmddyyyyThhmm for 11282024T1407 or 11/28/2024T14:07 or 11-28-2024 14:07
with time zone: if time zone is required at end of time part add suffix Z like mmddyyyyThhmmZ 11282024T1407
covalues: optional list of field titles (2-3) for joint value analysis with each list entry as field1,field2 and optionally with ,field3
recs: list of records. The delimiter should be specified in the settings object ('delim'= comma,pipe,tab,colon)
settings: dictionary object with entries for options to use in inspection. Includes:
delim: record delimiter (comma,pipe,tab,colon). Default is comma.
is_case_sens: is case sensitive (true,false). Default is false.
is_quoted: field values may be enclosed (allows delimiter within) by double quotes (true, false). Default is false.
maxuv: optional. string of integer value that is maximum number of unique values per field
to collect. Default is 50 and set to default if supplied value <1 or >1000
extract_fields: bool whether to read in field titles from header line (first non-comment, non-empty line).
Default is False. If True then has_header must also be True, and submitted 'fields' list will only be
used to copy its datatype and formatting to the report field object. Thus, you can extract field
titles from data set and still define characteristics if desired. If not, ensure 'fields' is empty.
has_header: bool whether has header line in file. Default is True. Must be True if extract_fields is True
Returns report as a QualityAnalysis class instance.
Definition at line 24 of file analyzequality.py.