Architecture¶
Package Structure¶
tsqc/
__init__.py # Public API: check(), QCResult, rule classes
checker.py # Core check() function and rule application
result.py # QCResult class with all downstream methods
config/
yaml_parser.py # Parse YAML rule files into Rule objects
rules/
base.py # Abstract Rule base class
builtins.py # NullRule, FlatlineRule, DeltaRule, RangeRule, CustomRule
time_health/
checker.py # Timestamp validation (gaps, duplicates, drift, DST)
viz/
rle.py # Run-length encoding for quality segments
timeline.py # Plotly Gantt-style timeline figure builder
Data Flow¶
- Input: User provides a pandas DataFrame with timestamp, tag_name, and value columns
- Validation: Column presence is checked, timestamps are normalized to UTC
- Rule Resolution: Rules come from YAML file, programmatic list, or auto-configured defaults
- Per-Tag Processing: Each tag's data is sorted by time, rules are applied vectorized
- Quality Assignment: Each row gets quality (good/sus/bad) and quality_reasons columns
- QCResult: Returned with the annotated DataFrame and downstream methods
Design Decisions¶
Pandas-Native¶
The library uses pandas DataFrames throughout, making it easy to integrate into existing data pipelines that already use pandas.
Vectorized Rule Application¶
Rules are applied using vectorized numpy operations, not Python loops, for performance at scale.
Three-Level Classification¶
Unlike binary pass/fail systems (Pecos, Great Expectations), timeseries-qc uses three levels: good, suspect, and bad. This gives more nuanced insight into data quality.
Worst-Level Wins¶
When multiple rules fire for the same row, the worst classification wins. This ensures that any serious issue takes precedence over minor warnings.
Offline-First¶
The HTML report export embeds all JavaScript and styling, making it viewable without internet access.
Next Steps¶
- Rule Engine — how rules work
- API Reference — complete method documentation
- Contributing — how to contribute to the library