kepsff: remove motioncorrelated noise from aperture light curve data¶

pyke.kepsff.
kepsff
(infile, outfile=None, datacol='DETSAP_FLUX', cenmethod='moments', stepsize=5.0, npoly_cxcy=1, sigma_cxcy=10.0, npoly_ardx=6, npoly_dsdt=2, sigma_dsdt=3.0, npoly_arfl=3, sigma_arfl=3.0, plot=False, overwrite=False, verbose=False, logfile='kepsff.log')¶ kepsff – remove motioncorrelated noise from aperture light curve data
The systematic noise within apertureconstructed K2 photometry is dominated by the effects of boresight roll under the force of solar pressure. Roll is minimized by orienting the spacecraft in pointing directions that minimize spacecraft structure asymmetry normal to the Sunvector and compensated for over campaign durations by regular thruster firings, typically every 6 hours. kepsff provides an open source python implementation of the photometric correction for K2 motion systematics provided by Vanderburg and Johnson (2014).The method will also work upon archived Kepler data and provides a relativelysimple and CPUfriendly alternative to cotrending against basisvector derived ensemble photometry (
kepcotrend
), pixellevel principal component analysis (keppca
), and PSF fitting (kepprfphot
). As well, as computational speed, this method has the advantage of requiring no external data in order to perform corrections. All required data is extracted from individual Kepler light curves stored at the archive, or K2 light curves constructed from archived target pixel files using the kepextract task. In the example figure above, the 12th magnitude target, EPIC 202093219, has a median 6.5hour standard deviation of 60 partspermillion (according to the tool kepstddev. After motion correction usingkepsff
, this quantity is reduced to 36 partspermillion.The name
kepsff
is derived from “SelfFlatFielding” (SFF) and is propagated from the Vanderburg and Johnson (2014) paper. However the name is somewhat misleading because the effects corrected for in the K2 case are dominated by aperture losses and source crowding rather than flatfield variations. In essence,kepsff
corrects motioninduced aperturephotometry artifacts after constructing a positionflux relation from a detrended (astrophysicsremoved) version of the light curve. This tool will find mostemployment upon K2 data, especially in the early mission phases when the data archives will contain calibrated target pixel files but no light curves. However, in line with the philosophy of the PyKE tools,kepsff
provides the user with a degree of customization to meet the demands of a broad range of target types and science goals. The longterm goal is to provide Kepler and K2 users with a range of reduction options and this tool is part of that software kit. Also, in line with the PyKE philosophy, these algorithms, as coded, are not perfect. We recommend and welcome tinkering and further development by the community in order to make these tools better. This case is also precisely what the K2 community requires in the sense that the archive users, here Andrew Vanderburg and John Johnson, are seeking solutions to their own data reduction needs, becoming selfsufficient, and reporting their findings and methods to us all.Preparation work is required before kepsff can provide a good correction. First, a light curve needs to be extracted a from Target Pixel File (
kepextract
) using a usercustomized pixel aperture defined withinkepmask
. The resulting light curve also needs to be detrended, i.e. the astrophysics needs to be removed as much as possible (kepflatten) in order for the correction to be as precise as possible. Note that the astrophysics is not removed permanently, but care has to be taken to prepare the data this way before confidence in the correction can be built for individual light curves on a casebycase basis.Parameters: nfile : str
The name of an input light curve FITS file. This file needs to contain a minimum set of information: Midexposure times, cadence numbers, simpleaperture flux, momentderived and/or PSFderived target centroids, quality flags and Kepler standard keywords. All of these properties are provided in archived light curves from the Kepler mission, but need to be calculated for the K2 mission. The PyKE tool kepextract calculates all these properties and provides an input file suitable for
kepsff
. In addition, the input file also requires a column of detrended data, where longterm astrophysics and orbitrelated systematics have been removed. The PyKE tool kepflatten performs this prestep, producing a FITS column of detrended data (at lowfrequencies) called DETSAP_FLUX.outfile : str
The name of the output light curve FITS file. This file has the same structure as the input file with three content changes. The aperture photometry, DETSAP_FLUX, and detrended photometry, DETSAP_FLUX are corrected for motioncorrelated noise. The quality flag column, SAP_QUALITY, includes a new bitflag of value 131072 that identifies exposures collected during onboard thruster firings, typically occurring on a 6hr cadence.
datacol : str
The name of the FITS data column containing detrended flux data (i.e. lowfrequency structure removed). If the input file is a product of kepflatten then this column is named DETSAP_FLUX.
cenmethod : str
kepsff requires target position data on the CCD detector in order to correlate spacecraft boresight motion with systematic signal within the flux timeseries. The typical Kepler/K2 light curve files have placeholders for two different measures of source centroid:
moments
– centeroflight within the pixel aperture (alternatively known as the zerothmoment of the flux distribution over the pixels). These two arrays, the X and Y centroid over time, are stored in FITS columns MOM_CENTR1 and MOM_CENTR2.psf
– the PSFfitting method, the X and Y locations of bestfit PSF models to the aperture pixels. These are rarely available within archived Kepler data but can be stored in PSF_CENTR1 and PSF_CENTR2. Both sets of centroids are provided in the output from kepextract, where the moments data is generally preferred over the PSF data which derives from the simplifying assumptions that sources are not confused and are wellcharacterized by symmetric Gaussian profiles.
stepsize : float [days]
Ultimately, kepsff will construct a timeseries of fluxcorrection factors based upon a correlation curve between spacecraft boresight motion and apertureflux variation. Perhaps due to differential velocity aberration, this calibration is not stable over time. The somewhat clunky solution to this in the current version of the tool is recalibrate the correlation and apply the subsequent correction over a series of discrete time windows. stepsize identifies the length of these windows in units of days. Some trial and error is required to optimize light curve correction. Window sizes of 46 days often provide the optimal correction, but astrophysical structure on these timescales can lead the user to adopt longer or shorter step sizes in order to reduce the unwanted impact of astrophysics on the correction.
npoly_cxcy : int
There is a sequence of four polynomial fits required to characterize and filter the input data in order to produce a fluxmotion correlation curve. The following sequence of fit parameters is therefore quite disorienting until some familiarity is gained with handson experience. The diagnostic plots, provided when plot=True provide significant help here with the four fits delivered in the four leftmost panels (figure 1). The first fit is to the centroid coordinates selected by cenmethod. These data points are provided in the topleft panel of the figure and generally trace veryclose to a linear path across the detector. Finepoint motion is dominated by roll around the boresight and therefore smallangle, nearlinear behavior is unsurprising. The two black lines represent the eigenvectors of the covariance matrix of the centroids. These eigenvectors are used to rotate the centroid distribution so that the direction of motion (x’) is aligned to the xaxis of the rotated system. The rotation functionality requires useroptimization in one aspect. Obvious outliers need to be removed from the sample before the eigenvectors are calculated. We achieve this iterative sigmaclipping. The user defines the order of a polynomial fit (1st order is recommended) and a threshold distance from the best fit where data points outside that threshold are removed before the remaining data are refit. Iterations continue until no additional data points are rejected. The rejection threshold is provided by the argument sigma_cxcy. Points lying below the rejection threshold are plotted as green symbols whereas data points rejected by this process will be plotted red. In the example above, no data points are rejected. The most likely cause of outliers is data obtained while the spacecraft is not guiding on reference stars. We predict such occasions to be rare during optimal operations, but such observations were common during the engineering tests, including the first half of campaign 0.
sigma_cxcy : float [sigma]
Threshold for rejecting centroid data. The threshold is the number of 1sigma standard deviation distances from the bestfit. Points further from the bestfit than the threshold provided will be considered unrepresentative of the fluxmotion correlation that the tool is attempting to characterize. These points are not thrown away, they are corrected in the final calculation with the rest of the data, but they play no further part in the calibration of the correlation.
npoly_ardx : int
Step 2 (topmiddle panel) is the determination of the relatively small deviation of centroid motion from linear. Here, in the rotated frame, we fit a highorder polynomial to the centroid data filtered and rotated in the first step. The green curve is the integrated path length along that curve (s) minus the linear term. The calculation is provided by equation 4 of Vanderburg and Johnson (2014). The red curve is a polynomial fit to the integral which subsequently allows us to cheaply calculate the absolute motion of the centroids relative to linear location along the dominant eigenvector in the unrotated frame. Producing a ‘fit of a fit’ is somewhat inelegant and open source developers could be challenged here to add some additional mathematical rigor to this step. However this section of the calculation has, so far, not proved to be critical  the path length integral is very close to linear, so cleaning this calculation up is lowpriority currently.
npoly_dsdt : int
The third step (lowerleft panel) is a prescription to filter out exposures collected during thruster firings. Firing occur typically on a 6hr cadence and occur when the spacecraft has rolled far enough to trigger a pointing adjustment. These data cannot be corrected using the current method. They are flagged here so that they do not bias the motionflux correlation calculation and so that they can be documented within the quality flags of the output file. A loworder polynomial is fit to the time derivative along the path length (ds/dt) using 3sigma iterative clipping.
npoly_dsdt
provides the user with flexibility to choose the polynomial order. In the plot, the best fit is shown as a red, solid line.sigma_dsdt : float [sigma]
This is a threshold limit in units of 1sigma standard deviation from the bestfit. Data points falling outside of the threshold are morelikelythannot collected during thruster firings. These points are flagged in the SAP_QUALITY column of the output file with the bit 131072 and is not employed to calibrate the fluxmotion relation. The threshold curve is plot as a red, dashed line and rejected points are colored red.
npoly_arfl : int
The fourth panel (lowermiddle) provides the subsequent target motion vs aperture flux distribution. The red line is polynomial fit, the order of which is chosen by the user here. Iterative \(\sigma\)clipping is again required  the many outliers below the bestfit are astrophysical in nature (binary star eclipses) and would bias the fit if they were unrecognized or simply allowed to. The user has flexibility to change the order and clipping threshold in order to provide the most robust fit and there is a degree of objectivity in this process. A low polynomial is generally adequate, this example employs a 3rd order function.
sigma_arfl : float [sigma]
This is the threshold limit in units of 1sigma standard deviation from the bestfit polynomial to the motionflux relation. Data points falling outside of the threshold are in effect rejected from the calibration. Such points potentially contain astrophysical signal that would both bias the calculation and damp the astrophysical signal within the data after the correction has been made. The bestfit red line provides the correction factors within the data window. For each point along the aperturederived flux curve (upperright), the position along the arc can be calculated and the multiplicative correction factor to the flux can determined directly from the bestfit motionflux relation. The subsequent corrected light curve is provided lowerright on the same plotting scale as above. The red points are those flagged as potential thruster firing and their 6hr cadence suggests these events have generally been flagged well.
plot : bool
If true, diagnostic plots identical to figure 1 above will be rendered and also saved as PNG files. There will be a different plot for every time window defined by the stepsize parameters. If the output FITS file is named filename.fits then each PNG file will be named filename_nn.png, where nn is a sequential number beginning with 1.
overwrite : bool
Overwrite the output FITS file? if overwrite is False and an existing file has the same name as outfile then the task will stop with an error.
verbose : bool
Print informative messages and warnings to the shell and logfile?
logfile : str
Name of the logfile containing error and warning messages.