Rule Engine¶

The rule engine is the core of timeseries-qc. Rules define what constitutes bad or suspect data.

How Rules Work¶

Each rule is a class that evaluates a pandas Series of values and returns a boolean Series indicating which rows are flagged.

Rules are applied per tag in order. When multiple rules fire for the same row, the worst quality level wins: bad > sus > good.

Built-in Rules¶

NullRule¶

Flags rows where the value is NaN, None, or pd.NA.

Default level: bad
Configuration: {check: null, level: bad}

FlatlineRule¶

Flags rows where the value has not changed by more than min_delta within the preceding window time window.

An optional min_duration filter suppresses flags for flat runs that are shorter than the given duration — useful when short-lived flat periods are normal (e.g. pump starts, cloud edges).

Default level: sus
Parameters:
window (required) — pandas offset alias, e.g. "1h", "30min"
min_delta (optional, default 0.0) — minimum required change to NOT be flagged
min_duration (optional) — pandas offset string; minimum time a continuous flat run must last before rows are flagged. None = no filter

Configuration:

- check: flatline
  window: 1h
  min_delta: 0.001
  level: sus

With min_duration:

- check: flatline
  window: 5min
  min_delta: 0.001
  min_duration: 30min
  level: sus

DST behaviour: The window parameter is measured in elapsed UTC time (not wall-clock time). Timestamps are normalised to UTC internally before rule evaluation, so FlatlineRule(window="1h") means one elapsed UTC hour. During DST transitions: - Spring-forward: One local wall-clock hour of flat data will span less UTC time (a shorter window), so the rule may flag fewer points than expected. - Fall-back: Ambiguous timestamps are dropped (set to NaT and flagged as bad), so the rule never evaluates on duplicate local-time rows.

DeltaRule¶

Flags rows based on the absolute change from the previous reading. Two independent thresholds are supported:

max_delta — flags when the change is too large (sensor spike / step change)
min_delta — flags when the change is too small (stuck / frozen sensor)

At least one of min_delta or max_delta must be provided.

Default level: sus
Parameters:
min_delta (optional) — minimum required absolute change; changes below this are flagged
max_delta (optional) — maximum allowed absolute change; changes above this are flagged

Configuration (only max):

- check: delta
  max_delta: 100.0
  level: sus

Only min (stuck sensor):

- check: delta
  min_delta: 0.5
  level: sus

Both bounds:

- check: delta
  min_delta: 0.5
  max_delta: 100.0
  level: sus

RangeRule¶

Flags rows where the value is outside [min, max].

Default level: bad
Parameters: min (lower bound, optional), max (upper bound, optional)
Configuration: {check: range, min: 0, max: 100, level: bad}

OutlierRule¶

Flags rows that are statistical outliers using one of three configurable methods. Supports both global (full-series) and rolling (time-windowed) computation.

Which method should I use?

zscore — Classic approach. Best when your data is roughly normally distributed without extreme outliers in the baseline.
mad — Robust variant using Median Absolute Deviation. Less sensitive to extreme values in the baseline statistics. Good for sensor data with occasional spikes.
iqr — Distribution-free. Works well with skewed data. Tukey's fences (k=1.5) is a standard choice.

Default level: sus
Parameters:
method (required) — One of zscore, mad, iqr
threshold (optional, default 3.0 for zscore/mad, 1.5 for iqr) — Sensitivity
window (optional) — pandas offset alias for rolling mode, e.g. "24h", "7d". Omit or set to null for global mode.
min_periods (optional, default 10) — Minimum non-NaN observations needed

Global mode (full-series):

- check: outlier
  method: zscore
  threshold: 3.0
  level: sus

Rolling mode (time-windowed):

- check: outlier
  method: iqr
  threshold: 2.0
  window: 24h
  level: bad

Rule Ordering¶

Rules are applied in the order they are defined. For each row:

Start with quality = "good"
For each rule, if the rule fires:
If rule level is "bad" → quality = "bad"
If rule level is "sus" and quality is "good" → quality = "sus"
The triggered rule names are appended to quality_reasons

Severity Levels¶

bad — data should be excluded from analysis
sus — data may be unreliable and warrants investigation

Custom Rules¶

You can create custom rules using the CustomRule class:

from tsqc import CustomRule

def check_negative(series):
    return series < 0

rule = CustomRule(fn=check_negative, name="negative", level="bad")

Default Rules¶

When no rules are provided, timeseries-qc auto-configures rules using 3-sigma delta thresholding:

NullRule(level="bad")
FlatlineRule(window="1h", min_delta=0.0, level="sus")
DeltaRule(max_delta=3 * std, level="sus")

Next Steps¶

YAML Configuration — configuring rules via YAML
API Reference — complete rule class documentation
User Guide — walkthrough with examples