kepsff: remove motion-correlated noise from aperture light curve data

pyke.kepsff.kepsff(infile, outfile=None, datacol='DETSAP_FLUX', cenmethod='moments', stepsize=5.0, npoly_cxcy=1, sigma_cxcy=10.0, npoly_ardx=6, npoly_dsdt=2, sigma_dsdt=3.0, npoly_arfl=3, sigma_arfl=3.0, plot=False, overwrite=False, verbose=False, logfile='kepsff.log')

kepsff – remove motion-correlated noise from aperture light curve data

The systematic noise within aperture-constructed K2 photometry is dominated by the effects of boresight roll under the force of solar pressure. Roll is minimized by orienting the spacecraft in pointing directions that minimize spacecraft structure asymmetry normal to the Sun-vector and compensated for over campaign durations by regular thruster firings, typically every 6 hours. kepsff provides an open source python implementation of the photometric correction for K2 motion systematics provided by Vanderburg and Johnson (2014).The method will also work upon archived Kepler data and provides a relatively-simple and CPU-friendly alternative to cotrending against basis-vector derived ensemble photometry (kepcotrend), pixel-level principal component analysis (keppca), and PSF fitting (kepprfphot). As well, as computational speed, this method has the advantage of requiring no external data in order to perform corrections. All required data is extracted from individual Kepler light curves stored at the archive, or K2 light curves constructed from archived target pixel files using the kepextract task. In the example figure above, the 12th magnitude target, EPIC 202093219, has a median 6.5-hour standard deviation of 60 parts-per-million (according to the tool kepstddev. After motion correction using kepsff, this quantity is reduced to 36 parts-per-million.

The name kepsff is derived from “Self-Flat-Fielding” (SFF) and is propagated from the Vanderburg and Johnson (2014) paper. However the name is somewhat misleading because the effects corrected for in the K2 case are dominated by aperture losses and source crowding rather than flat-field variations. In essence, kepsff corrects motion-induced aperture-photometry artifacts after constructing a position-flux relation from a detrended (astrophysics-removed) version of the light curve. This tool will find most-employment upon K2 data, especially in the early mission phases when the data archives will contain calibrated target pixel files but no light curves. However, in line with the philosophy of the PyKE tools, kepsff provides the user with a degree of customization to meet the demands of a broad range of target types and science goals. The long-term goal is to provide Kepler and K2 users with a range of reduction options and this tool is part of that software kit. Also, in line with the PyKE philosophy, these algorithms, as coded, are not perfect. We recommend and welcome tinkering and further development by the community in order to make these tools better. This case is also precisely what the K2 community requires in the sense that the archive users, here Andrew Vanderburg and John Johnson, are seeking solutions to their own data reduction needs, becoming self-sufficient, and reporting their findings and methods to us all.

Preparation work is required before kepsff can provide a good correction. First, a light curve needs to be extracted a from Target Pixel File (kepextract) using a user-customized pixel aperture defined within kepmask. The resulting light curve also needs to be detrended, i.e. the astrophysics needs to be removed as much as possible (kepflatten) in order for the correction to be as precise as possible. Note that the astrophysics is not removed permanently, but care has to be taken to prepare the data this way before confidence in the correction can be built for individual light curves on a case-by-case basis.


nfile : str

The name of an input light curve FITS file. This file needs to contain a minimum set of information: Mid-exposure times, cadence numbers, simple-aperture flux, moment-derived and/or PSF-derived target centroids, quality flags and Kepler standard keywords. All of these properties are provided in archived light curves from the Kepler mission, but need to be calculated for the K2 mission. The PyKE tool kepextract calculates all these properties and provides an input file suitable for kepsff. In addition, the input file also requires a column of detrended data, where long-term astrophysics and orbit-related systematics have been removed. The PyKE tool kepflatten performs this pre-step, producing a FITS column of detrended data (at low-frequencies) called DETSAP_FLUX.

outfile : str

The name of the output light curve FITS file. This file has the same structure as the input file with three content changes. The aperture photometry, DETSAP_FLUX, and detrended photometry, DETSAP_FLUX are corrected for motion-correlated noise. The quality flag column, SAP_QUALITY, includes a new bit-flag of value 131072 that identifies exposures collected during on-board thruster firings, typically occurring on a 6-hr cadence.

datacol : str

The name of the FITS data column containing detrended flux data (i.e. low-frequency structure removed). If the input file is a product of kepflatten then this column is named DETSAP_FLUX.

cenmethod : str

kepsff requires target position data on the CCD detector in order to correlate spacecraft boresight motion with systematic signal within the flux time-series. The typical Kepler/K2 light curve files have place-holders for two different measures of source centroid:

  • moments – center-of-light within the pixel aperture (alternatively known as the zeroth-moment of the flux distribution over the pixels). These two arrays, the X and Y centroid over time, are stored in FITS columns MOM_CENTR1 and MOM_CENTR2.
  • psf – the PSF-fitting method, the X and Y locations of best-fit PSF models to the aperture pixels. These are rarely available within archived Kepler data but can be stored in PSF_CENTR1 and PSF_CENTR2. Both sets of centroids are provided in the output from kepextract, where the moments data is generally preferred over the PSF data which derives from the simplifying assumptions that sources are not confused and are well-characterized by symmetric Gaussian profiles.

stepsize : float [days]

Ultimately, kepsff will construct a time-series of flux-correction factors based upon a correlation curve between spacecraft boresight motion and aperture-flux variation. Perhaps due to differential velocity aberration, this calibration is not stable over time. The somewhat clunky solution to this in the current version of the tool is re-calibrate the correlation and apply the subsequent correction over a series of discrete time windows. stepsize identifies the length of these windows in units of days. Some trial and error is required to optimize light curve correction. Window sizes of 4-6 days often provide the optimal correction, but astrophysical structure on these timescales can lead the user to adopt longer or shorter step sizes in order to reduce the unwanted impact of astrophysics on the correction.

npoly_cxcy : int

There is a sequence of four polynomial fits required to characterize and filter the input data in order to produce a flux-motion correlation curve. The following sequence of fit parameters is therefore quite dis-orienting until some familiarity is gained with hands-on experience. The diagnostic plots, provided when plot=True provide significant help here with the four fits delivered in the four left-most panels (figure 1). The first fit is to the centroid coordinates selected by cenmethod. These data points are provided in the top-left panel of the figure and generally trace very-close to a linear path across the detector. Fine-point motion is dominated by roll around the boresight and therefore small-angle, near-linear behavior is unsurprising. The two black lines represent the eigenvectors of the covariance matrix of the centroids. These eigenvectors are used to rotate the centroid distribution so that the direction of motion (x’) is aligned to the x-axis of the rotated system. The rotation functionality requires user-optimization in one aspect. Obvious outliers need to be removed from the sample before the eigenvectors are calculated. We achieve this iterative sigma-clipping. The user defines the order of a polynomial fit (1st order is recommended) and a threshold distance from the best fit where data points outside that threshold are removed before the remaining data are re-fit. Iterations continue until no additional data points are rejected. The rejection threshold is provided by the argument sigma_cxcy. Points lying below the rejection threshold are plotted as green symbols whereas data points rejected by this process will be plotted red. In the example above, no data points are rejected. The most likely cause of outliers is data obtained while the spacecraft is not guiding on reference stars. We predict such occasions to be rare during optimal operations, but such observations were common during the engineering tests, including the first half of campaign 0.

sigma_cxcy : float [sigma]

Threshold for rejecting centroid data. The threshold is the number of 1-sigma standard deviation distances from the best-fit. Points further from the best-fit than the threshold provided will be considered unrepresentative of the flux-motion correlation that the tool is attempting to characterize. These points are not thrown away, they are corrected in the final calculation with the rest of the data, but they play no further part in the calibration of the correlation.

npoly_ardx : int

Step 2 (top-middle panel) is the determination of the relatively small deviation of centroid motion from linear. Here, in the rotated frame, we fit a high-order polynomial to the centroid data filtered and rotated in the first step. The green curve is the integrated path length along that curve (s) minus the linear term. The calculation is provided by equation 4 of Vanderburg and Johnson (2014). The red curve is a polynomial fit to the integral which subsequently allows us to cheaply calculate the absolute motion of the centroids relative to linear location along the dominant eigenvector in the unrotated frame. Producing a ‘fit of a fit’ is somewhat inelegant and open source developers could be challenged here to add some additional mathematical rigor to this step. However this section of the calculation has, so far, not proved to be critical - the path length integral is very close to linear, so cleaning this calculation up is low-priority currently.

npoly_dsdt : int

The third step (lower-left panel) is a prescription to filter out exposures collected during thruster firings. Firing occur typically on a 6-hr cadence and occur when the spacecraft has rolled far enough to trigger a pointing adjustment. These data cannot be corrected using the current method. They are flagged here so that they do not bias the motion-flux correlation calculation and so that they can be documented within the quality flags of the output file. A low-order polynomial is fit to the time derivative along the path length (ds/dt) using 3-sigma iterative clipping. npoly_dsdt provides the user with flexibility to choose the polynomial order. In the plot, the best fit is shown as a red, solid line.

sigma_dsdt : float [sigma]

This is a threshold limit in units of 1-sigma standard deviation from the best-fit. Data points falling outside of the threshold are more-likely-than-not collected during thruster firings. These points are flagged in the SAP_QUALITY column of the output file with the bit 131072 and is not employed to calibrate the flux-motion relation. The threshold curve is plot as a red, dashed line and rejected points are colored red.

npoly_arfl : int

The fourth panel (lower-middle) provides the subsequent target motion vs aperture flux distribution. The red line is polynomial fit, the order of which is chosen by the user here. Iterative \(\sigma\)-clipping is again required - the many outliers below the best-fit are astrophysical in nature (binary star eclipses) and would bias the fit if they were unrecognized or simply allowed to. The user has flexibility to change the order and clipping threshold in order to provide the most robust fit and there is a degree of objectivity in this process. A low polynomial is generally adequate, this example employs a 3rd order function.

sigma_arfl : float [sigma]

This is the threshold limit in units of 1-sigma standard deviation from the best-fit polynomial to the motion-flux relation. Data points falling outside of the threshold are in effect rejected from the calibration. Such points potentially contain astrophysical signal that would both bias the calculation and damp the astrophysical signal within the data after the correction has been made. The best-fit red line provides the correction factors within the data window. For each point along the aperture-derived flux curve (upper-right), the position along the arc can be calculated and the multiplicative correction factor to the flux can determined directly from the best-fit motion-flux relation. The subsequent corrected light curve is provided lower-right on the same plotting scale as above. The red points are those flagged as potential thruster firing and their 6-hr cadence suggests these events have generally been flagged well.

plot : bool

If true, diagnostic plots identical to figure 1 above will be rendered and also saved as PNG files. There will be a different plot for every time window defined by the stepsize parameters. If the output FITS file is named filename.fits then each PNG file will be named filename_nn.png, where nn is a sequential number beginning with 1.

overwrite : bool

Overwrite the output FITS file? if overwrite is False and an existing file has the same name as outfile then the task will stop with an error.

verbose : bool

Print informative messages and warnings to the shell and logfile?

logfile : str

Name of the logfile containing error and warning messages.