Data Flagging

A variety of outlier and anomaly detection functions are described below. Additional functions for aggregating the results of anomaly detection tests, and aggregating data quality flags, are also provided.

Outlier detection

All outlier detection functions follow a similar template of inputs and outputs. All outlier detection functions accept the following arguments:

  1. a vector of data values; 2. a logical “mask” used to restrict the calculation of certain parameters to a subset of the data; and
  2. A specification of thresholds that discriminate between non-outliers, “mild” outliers, and “extreme” outliers.

All functions return an ordered factor tagging each data value as a non-outlier (1), a mild outlier (2), or an extreme outlier (3). Some outlier detection functions can alternatively return the actual test statistic or score used to classify the data by specifying the argument return.score = TRUE.

outlier_mad()

Median absolute deviation test for outliers

outlier_tscore()

t-score test for outliers

outlier_tukey()

Tukey's test for outliers

Real-time quality control

All real-time quality control functions accept a vector of data values, and may additionally require parameters such as

  • A set of “user thresholds” defining some expected behavior based on expert judgment.
  • A set of “sensor thresholds” defining some expected behavior based on sensor design or manufacturer guidelines.
  • Test-specific parameters. All functions return an ordered factor tagging each data value as “pass” (1), “suspect” (2), or “fail” (3). Note that not all tests are capable of producing any of the three outcomes; for example, the “gap test” only returns “pass” or “fail” flags, and the “rate of change test” only returns “pass” or “suspect” flags.

rtqc_attenuation()

Attenuation Test

rtqc_flat()

Flat Line Test

rtqc_gap()

Gap Test

rtqc_range()

Range Test

rtqc_rate()

Rate Test

rtqc_rate_alt()

Rate Test (Alternate)

rtqc_spike()

Spike test

rtqc_spike_alt()

Spike Test (Alternate)

Smoothing

Functions for smoothing data (e.g., tidal filtering) are provided.

smooth_godin()

Godin Smoother

Adjustment

Functions for adjusting data, e.g., offset or drift over time, are provided.

adjust_known()

Adjust for Known Drift

adjust_linear()

Adjust for Linear Drift

Gap Filling

Functions for filling data gaps, e.g., via linear interpolation or time-series modeling, are provided.

gapfill_kalman()

ARIMIA and Kalman Filter Gap Fill