Theory Introdution Of Outlier Flag

It is generally agreed that outliers detection as well as outliers identification is of primary impotance to quality control(QC) of observation data. As a result of station moves, changes in environment surrounding a station, and a frequent changes in observational criterion etc, the surface meteorological data series become more complex and inhomogenerous, and a widely variety of erroneous values have to detect. Automatic quality control by this software is faster and more convenient than traditional opertation that is completely rely on manual testing.

In this software the data quality control operation include the following three steps:

(1) According to the time order to check target file thereby finding out missing data, and remove duplicate data.

(2) The system automatically labeling outliers.

(3) Manual advanced identification after program checking.

A three-step procedure using object analysis is used to identify outliers in step two which is the core of the software, the following will introduce in detail.

(1) Limitation Check

First of all, to determine whether the target is a numeric value, and give a code as "V1" if it is judged. Then outliers are flagged based on maximum limitation and minimum limitation. When the value of target point is out of the range of maximum or minimum limitation it will be judged to be a outlier and the corresponding code is "V2". The value of limitations depend on the feature of target meteorological data.

(2) Error Check

In this step, set error points number and quantile points number that some surrounding points are choosed to be the basis for the judgement when the target point is the median. Calculate two mean value of half of these surrounding points respectively then the result of target points value minus the two averages in turn and the two results are error(absolute deviation) and quantile error. Set the value of quantile then according to the number of surrounding points to check the quantile of error. An outlier is flagged when

Ei - fEq > 0

where Ei is the error of target point i, Eq is equal to quantile error multiply quantile, f is the multiplication factor which is depend on user-defined value.

(3) Standard Deviation Check

Firstly , set the number of surrounding points for standard deviation checking procedure as the target point is median, then calculate the average of half number of these points and the error(absolute deviation) resulted by target point value minus the average. On the basis of this step, calculate standard deviation and an outlier is flagged when

Ei - kEsd > 0

where Ei is the error of target point i, because of the different number of surrounding ponits between step 2 and 3 , the value of Ei is also different. Esd is the standard deviation and k is the multiplication factor which is depend on user-defined value.

A typical value of f used to identify extreme outliers in this software is 2.3 and k is 3. Meanwhie we always set quantile to be 0.9. All of these parameters can change by user depend on observational data type.

 


Copyright 2015 Outlier Flag development team . All rights reserved.