kepoutlier: remove or replace statistical outliers from time series data¶
kepoutlier(infile, outfile=None, datacol='SAP_FLUX', nsig=3.0, stepsize=1.0, npoly=3, niter=1, operation='remove', ranges='0, 0', plot=False, plotfit=False, overwrite=False, verbose=False, logfile='kepoutlier.log')¶
kepoutlier – Remove or replace statistical outliers from time series data
kepoutlier identifies data outliers relative to piecemeal best-fit polynomials. Outliers are either removed from the output time series or replaced by a noise-treated value defined by the polynomial fit. Identified outliers and the best fit functions are optionally plotted for inspection purposes.
infile : str
The name of a MAST standard format FITS file containing a Kepler light curve within the first data extension.
outfile : str
The name of the output FITS file.
outfilewill be direct copy of infile with either data outliers removed (i.e. the table will have fewer rows) or the outliers will be corrected according to a best-fit function and a noise model.
datacol : str
The column name containing data stored within extension 1 of infile. This data will be searched for outliers. Typically this name is SAP_FLUX (Simple Aperture Photometry fluxes) or PDCSAP_FLUX (Pre-search Data Conditioning fluxes).
nsig : float
The sigma clipping threshold. Data deviating from a best fit function by more than the threshold will be either removed or corrected according to the user selection of operation.
stepsize : float
The data within datacol is unlikely to be well represented by a single polynomial function. stepsize splits the data up into a series of time blocks, each is fit independently by a separate function. The user can provide an informed choice of stepsize after inspecting the data with the kepdraw tool. Units are days.
npoly : int
The polynomial order of each best-fit function.
niter : int
If outliers are found in a particular data section, that data will be removed temporarily and the time series fit again. This will be iterated niter times before freezing upon the best available fit.
operation : str
removethrows away outliers. The output data table will smaller or equal in size to the input table.
replacereplaces outliers with a value that is consistent with the best-fit polynomial function and a random component defined by the rms of the data relative to the fit and calculated using the inverse normal cumulative function and a random number generator.
ranges : str
The user can choose specific time ranges of data on which to work. This could, for example, avoid removing known stellar flares from a dataset. Time ranges are supplied as comma-separated pairs of Barycentric Julian Dates (BJDs). Multiple ranges are separated by a semi-colon. An example containing two time ranges is:
If the user wants to correct the entire time series then providing
ranges = '0,0'will tell the task to operate on the whole time series.
plot : bool
Plot the data and outliers?
plotfit : bool
Overlay the polynomial fits upon the plot?
overwrite : bool
Overwrite the output file?
verbose : bool
Print informative messages and warnings to the shell and logfile?
logfile : str
Name of the logfile containing error and warning messages.
$ kepoutlier kplr002437329-2010355172524_llc.fits --datacol SAP_FLUX --nsig 4 --stepsize 5 --npoly 2 --niter 10 --operation replace --verbose --plot --plotfit