Rule Engine¶
The rule engine is the core of timeseries-qc. Rules define what constitutes bad or suspect data.
How Rules Work¶
Each rule is a class that evaluates a pandas Series of values and returns a boolean Series indicating which rows are flagged.
Rules are applied per tag in order. When multiple rules fire for the same row, the worst quality level wins: bad > sus > good.
Built-in Rules¶
NullRule¶
Flags rows where the value is NaN, None, or pd.NA.
- Default level:
bad - Configuration:
{check: null, level: bad}
FlatlineRule¶
Flags rows where the value has not changed by more than min_delta within the preceding window time window.
An optional min_duration filter suppresses flags for flat runs that are shorter than the given duration — useful when short-lived flat periods are normal (e.g. pump starts, cloud edges).
- Default level:
sus - Parameters:
window(required) — pandas offset alias, e.g."1h","30min"min_delta(optional, default0.0) — minimum required change to NOT be flaggedmin_duration(optional) — pandas offset string; minimum time a continuous flat run must last before rows are flagged.None= no filter- Configuration: With min_duration:
DST behaviour: The window parameter is measured in elapsed UTC time (not wall-clock time). Timestamps are normalised to UTC internally before rule evaluation, so FlatlineRule(window="1h") means one elapsed UTC hour. During DST transitions: - Spring-forward: One local wall-clock hour of flat data will span less UTC time (a shorter window), so the rule may flag fewer points than expected. - Fall-back: Ambiguous timestamps are dropped (set to NaT and flagged as bad), so the rule never evaluates on duplicate local-time rows.
DeltaRule¶
Flags rows based on the absolute change from the previous reading. Two independent thresholds are supported:
max_delta— flags when the change is too large (sensor spike / step change)min_delta— flags when the change is too small (stuck / frozen sensor)
At least one of min_delta or max_delta must be provided.
- Default level:
sus - Parameters:
min_delta(optional) — minimum required absolute change; changes below this are flaggedmax_delta(optional) — maximum allowed absolute change; changes above this are flagged- Configuration (only max): Only min (stuck sensor): Both bounds:
RangeRule¶
Flags rows where the value is outside [min, max].
- Default level:
bad - Parameters:
min(lower bound, optional),max(upper bound, optional) - Configuration:
{check: range, min: 0, max: 100, level: bad}
OutlierRule¶
Flags rows that are statistical outliers using one of three configurable methods. Supports both global (full-series) and rolling (time-windowed) computation.
Which method should I use?
zscore— Classic approach. Best when your data is roughly normally distributed without extreme outliers in the baseline.mad— Robust variant using Median Absolute Deviation. Less sensitive to extreme values in the baseline statistics. Good for sensor data with occasional spikes.iqr— Distribution-free. Works well with skewed data. Tukey's fences (k=1.5) is a standard choice.
- Default level:
sus - Parameters:
method(required) — One ofzscore,mad,iqrthreshold(optional, default3.0for zscore/mad,1.5for iqr) — Sensitivitywindow(optional) — pandas offset alias for rolling mode, e.g."24h","7d". Omit or set tonullfor global mode.min_periods(optional, default10) — Minimum non-NaN observations needed- Global mode (full-series):
- Rolling mode (time-windowed):
Rule Ordering¶
Rules are applied in the order they are defined. For each row:
- Start with quality = "good"
- For each rule, if the rule fires:
- If rule level is "bad" → quality = "bad"
- If rule level is "sus" and quality is "good" → quality = "sus"
- The triggered rule names are appended to
quality_reasons
Severity Levels¶
- bad — data should be excluded from analysis
- sus — data may be unreliable and warrants investigation
Custom Rules¶
You can create custom rules using the CustomRule class:
from tsqc import CustomRule
def check_negative(series):
return series < 0
rule = CustomRule(fn=check_negative, name="negative", level="bad")
Default Rules¶
When no rules are provided, timeseries-qc auto-configures rules using 3-sigma delta thresholding:
NullRule(level="bad")
FlatlineRule(window="1h", min_delta=0.0, level="sus")
DeltaRule(max_delta=3 * std, level="sus")
Next Steps¶
- YAML Configuration — configuring rules via YAML
- API Reference — complete rule class documentation
- User Guide — walkthrough with examples