User Guide¶

A comprehensive walkthrough of the timeseries-qc library.

Loading Data¶

timeseries-qc works with any pandas DataFrame containing the following columns:

Column	Required	Description
`timestamp`	Yes	Datetime column (tz-aware or tz-naive)
`tag_name`	No	Sensor identifier for multi-tag data
`value`	Yes	Numeric measurement to check

Multi-Tag Data¶

import pandas as pd

df = pd.DataFrame({
    "timestamp": ["2026-01-01 00:00", "2026-01-01 01:00", "2026-01-01 02:00"],
    "tag_name": ["SENSOR.A", "SENSOR.A", "SENSOR.B"],
    "value": [100.0, 100.0, 200.0],
})

Single-Tag Data¶

Omit the tag_name column or pass tag_col=None:

result = tsqc.check(df, tag_col=None, assume_tz="UTC")

Running Quality Checks¶

Auto-Configured Defaults¶

result = tsqc.check(df, assume_tz="UTC")

When no rules are provided, timeseries-qc automatically configures rules based on 3-sigma delta thresholding. This covers the majority of real-world use cases.

YAML-Driven Rules¶

result = tsqc.check(df, rules="tsqc_rules.yaml")

See the YAML Configuration Guide for the full syntax.

Programmatic Rules¶

from tsqc import FlatlineRule, RangeRule

rules = [
    FlatlineRule(window="1h", min_delta=0.5, level="sus"),
    RangeRule(min_val=0, max_val=100, level="bad"),
]
result = tsqc.check(df, rules=rules, assume_tz="UTC")

Timezone Handling¶

timeseries-qc automatically preserves the timezone of your input data through the entire pipeline:

Tz-naive input: Pass assume_tz="America/Edmonton" (or your source timezone). The library normalises to UTC internally for consistent rule evaluation, then converts all output back to your source timezone.
Tz-aware input: Your existing timezone is detected and used automatically. assume_tz is optional.
Chart display: result.plot() shows the x-axis and hover tooltips in the input timezone.
Data inspection: result.df contains timestamps in the input timezone. Use result.display_tz to see which timezone was applied.

result = tsqc.check(df, assume_tz="America/Edmonton")
print(result.display_tz)  # "America/Edmonton"

Interpreting Results¶

Quality Classification¶

Every row is classified as one of three levels:

good — data passed all rules
sus — data triggered a suspect-level rule (e.g., flatline warning)
bad — data triggered a bad-level rule (e.g., null value, out of range)

When multiple rules fire, the worst level wins: bad > sus > good.

Summary¶

result.summary()

Returns a DataFrame with per-tag percentages of good, suspect, and bad data, sorted by pct_bad descending.

Issue Breakdown¶

result.issue_summary()

Lists contiguous segments of non-good quality with start/end timestamps, row counts, durations, and the rule names that triggered the issue.

Using an External Quality Column¶

If your data already has a quality/status column from a SCADA historian (e.g. OSIsoft PI quality codes, OPC UA status), you can use it directly instead of or alongside the internal rules.

Exclusive Mode — External Quality Only¶

result = tsqc.check(
    df,
    external_quality_col="status",       # column with 0,1,2,3,4 values
    quality_mode="exclusive",
    quality_map={0: "good", 1: "sus", 2: "bad", 3: "bad", 4: "bad"},
    assume_tz="UTC",
)

When a value is not present in quality_map, it is automatically treated as bad with reason external_quality_value: <raw_value>.

Combined Mode — External + Internal Rules¶

Merges both sources with worst-wins logic:

result = tsqc.check(
    df,
    external_quality_col="status",
    quality_mode="combined",
    quality_map={0: "good", 1: "sus", 2: "bad"},
    rules=[NullRule(), RangeRule(min_val=0, max_val=100)],
    assume_tz="UTC",
)

If external says bad and internal says good → final is bad with reason external_quality_value: <raw_value>
If external says good and internal says bad → final is bad with reason null values (internal reason preserved)
If both say bad → reasons are pipe-delimited: null values|external_quality_value: <raw_value>

None Mode — Internal Only¶

Ignores the external column entirely. Does not require a quality_map.

result = tsqc.check(df, external_quality_col="status", quality_mode="none", assume_tz="UTC")

This is useful when you want to keep the same code path but toggle off external quality handling.

Column Conflict Handling¶

If your external quality column has the same name as the output column (e.g. both are "quality"), the output is automatically renamed to qc_quality / qc_quality_reasons and the original input column is preserved. A warning is issued.

YAML `quality_map`¶

You can also define the quality map in your YAML rules file:

quality_map:
  0: good
  1: sus
  2: bad
  3: bad
  4: bad

default_rules:
  - check: null
    level: bad

YAML quality_map takes precedence over the quality_map= function parameter when both are provided.

Timestamp Health¶

result.check_timestamps()

Detects gaps, duplicates, non-monotonic timestamps, frequency drift, and DST ambiguities.

Generating Reports¶

result.export_report("quality_report.html")

Produces a self-contained HTML report with the timeline chart, summary tables, and timestamp health — no internet connection required.

Next Steps¶

API Reference — complete method documentation
Rule Engine — understanding how rules work
YAML Configuration — creating YAML rule files