kepoutlier: remove or replace statistical outliers from time series data¶

pyke.kepoutlier.
kepoutlier
(infile, outfile=None, datacol='SAP_FLUX', nsig=3.0, stepsize=1.0, npoly=3, niter=1, operation='remove', ranges='0, 0', plot=False, plotfit=False, overwrite=False, verbose=False, logfile='kepoutlier.log')¶ kepoutlier – Remove or replace statistical outliers from time series data
kepoutlier identifies data outliers relative to piecemeal bestfit polynomials. Outliers are either removed from the output time series or replaced by a noisetreated value defined by the polynomial fit. Identified outliers and the best fit functions are optionally plotted for inspection purposes.
Parameters: infile : str
The name of a MAST standard format FITS file containing a Kepler light curve within the first data extension.
outfile : str
The name of the output FITS file.
outfile
will be direct copy of infile with either data outliers removed (i.e. the table will have fewer rows) or the outliers will be corrected according to a bestfit function and a noise model.datacol : str
The column name containing data stored within extension 1 of infile. This data will be searched for outliers. Typically this name is SAP_FLUX (Simple Aperture Photometry fluxes) or PDCSAP_FLUX (Presearch Data Conditioning fluxes).
nsig : float
The sigma clipping threshold. Data deviating from a best fit function by more than the threshold will be either removed or corrected according to the user selection of operation.
stepsize : float
The data within datacol is unlikely to be well represented by a single polynomial function. stepsize splits the data up into a series of time blocks, each is fit independently by a separate function. The user can provide an informed choice of stepsize after inspecting the data with the kepdraw tool. Units are days.
npoly : int
The polynomial order of each bestfit function.
niter : int
If outliers are found in a particular data section, that data will be removed temporarily and the time series fit again. This will be iterated niter times before freezing upon the best available fit.
operation : str
remove
throws away outliers. The output data table will smaller or equal in size to the input table.replace
replaces outliers with a value that is consistent with the bestfit polynomial function and a random component defined by the rms of the data relative to the fit and calculated using the inverse normal cumulative function and a random number generator.
ranges : str
The user can choose specific time ranges of data on which to work. This could, for example, avoid removing known stellar flares from a dataset. Time ranges are supplied as commaseparated pairs of Barycentric Julian Dates (BJDs). Multiple ranges are separated by a semicolon. An example containing two time ranges is:
'2455012.48517,2455014.50072;2455022.63487,2455025.08231'
If the user wants to correct the entire time series then providing
ranges = '0,0'
will tell the task to operate on the whole time series.plot : bool
Plot the data and outliers?
plotfit : bool
Overlay the polynomial fits upon the plot?
overwrite : bool
Overwrite the output file?
verbose : bool
Print informative messages and warnings to the shell and logfile?
logfile : str
Name of the logfile containing error and warning messages.
Examples
$ kepoutlier kplr0024373292010355172524_llc.fits datacol SAP_FLUX nsig 4 stepsize 5 npoly 2 niter 10 operation replace verbose plot plotfit