kepoutlier: remove or replace statistical outliers from time series data

pyke.kepoutlier.kepoutlier(infile, outfile=None, datacol='SAP_FLUX', nsig=3.0, stepsize=1.0, npoly=3, niter=1, operation='remove', ranges='0, 0', plot=False, plotfit=False, overwrite=False, verbose=False, logfile='kepoutlier.log')

kepoutlier – Remove or replace statistical outliers from time series data

kepoutlier identifies data outliers relative to piecemeal best-fit polynomials. Outliers are either removed from the output time series or replaced by a noise-treated value defined by the polynomial fit. Identified outliers and the best fit functions are optionally plotted for inspection purposes.

Parameters:

infile : str

The name of a MAST standard format FITS file containing a Kepler light curve within the first data extension.

outfile : str

The name of the output FITS file. outfile will be direct copy of infile with either data outliers removed (i.e. the table will have fewer rows) or the outliers will be corrected according to a best-fit function and a noise model.

datacol : str

The column name containing data stored within extension 1 of infile. This data will be searched for outliers. Typically this name is SAP_FLUX (Simple Aperture Photometry fluxes) or PDCSAP_FLUX (Pre-search Data Conditioning fluxes).

nsig : float

The sigma clipping threshold. Data deviating from a best fit function by more than the threshold will be either removed or corrected according to the user selection of operation.

stepsize : float

The data within datacol is unlikely to be well represented by a single polynomial function. stepsize splits the data up into a series of time blocks, each is fit independently by a separate function. The user can provide an informed choice of stepsize after inspecting the data with the kepdraw tool. Units are days.

npoly : int

The polynomial order of each best-fit function.

niter : int

If outliers are found in a particular data section, that data will be removed temporarily and the time series fit again. This will be iterated niter times before freezing upon the best available fit.

operation : str

  • remove throws away outliers. The output data table will smaller or equal in size to the input table.
  • replace replaces outliers with a value that is consistent with the best-fit polynomial function and a random component defined by the rms of the data relative to the fit and calculated using the inverse normal cumulative function and a random number generator.

ranges : str

The user can choose specific time ranges of data on which to work. This could, for example, avoid removing known stellar flares from a dataset. Time ranges are supplied as comma-separated pairs of Barycentric Julian Dates (BJDs). Multiple ranges are separated by a semi-colon. An example containing two time ranges is:

'2455012.48517,2455014.50072;2455022.63487,2455025.08231'

If the user wants to correct the entire time series then providing ranges = '0,0' will tell the task to operate on the whole time series.

plot : bool

Plot the data and outliers?

plotfit : bool

Overlay the polynomial fits upon the plot?

overwrite : bool

Overwrite the output file?

verbose : bool

Print informative messages and warnings to the shell and logfile?

logfile : str

Name of the logfile containing error and warning messages.

Examples

$ kepoutlier kplr002437329-2010355172524_llc.fits --datacol SAP_FLUX
--nsig 4 --stepsize 5 --npoly 2 --niter 10 --operation replace
--verbose --plot --plotfit
../_images/kepoutlier.png