home / academic notes /

Identifying QPOs in lightcurve data

A quick and dirty guide for identifying quasi-periodic oscillations (QPOs) in astronomical lightcurve data based on work from the Advanced Data Analysis course at the University of St Andrews.

Plotting the raw data

To begin your analysis you'll need three arrays: photon count, uncertainty in photon count, and time relative to start of observation. If your source data exists in the form of a plaintext file, use the numpy loadtxt function with the optional skiprows argument to ignore anything that isn't raw data. I used the following code:


        

        

        lightcurve_data = np.loadtxt('your_data.dat', skiprows=15)
        

        parameter = lightcurve_data[:,i]

Where i is the column number for whatever parameter you're loading.
Now you're going to want to plot the lightcurve to get an understanding of the overall shape. The Matplotlib library is a standard in the sciences, but any plotting library will work fine.

Smoothing the lightcurve

If the length of your observation significantly exceeds the period of the QPOs you're looking for, your lightcurve may have an underlying shape which you'll want to remove before your analysis. To do this we're going to create a smooth function loosely based on the lightcurve. You want this function to fit the overall shape of the lightcurve but not the short-scale variations. The two best strategies are to use a spline function or a running optimal average (ROA) with a Gaussian memory function.

For a spline function, use the SciPy splrep and BSpline functions as follows:


        tck = splrep(time_array, count_array, w=1/uncertainty_array**2, s=0.9)
        

        smoothed_array = BSpline(*tck)(time_array)

For the splrep function, s is a parameter which controls the looseness of the fit. Experiment with different values and plot them until your smoothing function fits the criteria above.

Alternatively, use a ROA with a Gaussian memory function as follows: $$\hat X(t) = \frac{\Sigma X_i w_i(t)}{\Sigma w_i(t)}$$ $$w_i(t) = \frac{G(t-t_i)}{\sigma_i(t)^2}$$ Where $ X_i $ is the i-th element of the photon count array; $\sigma_i$, the i-th element of the uncertainty array; and $G(t-t_i)$ the Gaussian function: $$G(t-t_i) = \exp{ \left [-\frac12 \left (\frac{t-t_i}{\tau} \right)^2 \right]}$$ Subtracting this smooth line from your astronomical data will give you a residuals array. This is the data that you're going to analyse your QPOs from.

Fitting to a function

The next step is to try to fit your data to a function. Your best bet is to use something of the form: $$F(t) = B + C\cos (f\;t_i) + S \sin (f\;t_i)$$ which will account for any background noise and let you test your fit for any trial frequency $f$. The plan is to calculate the maximum likelihood values for the $B$, $C$, and $S$ coefficients and the corresponding $\chi^2$ value for each trial frequency. This is proportional to the "badness-of-fit" value - AKA a minimum in this value corresponds to a pretty good frequency fit!

You're going to need a range of frequency values to test over, so you might as well read up on Nyquist sampling. If you're in a hurry, use the following code:


        N = len(time_array)
        

        delta_freq = 2 * np.pi / time_array[-1]
        

        nyq_freq = N*delta_freq/2
        

        freq_array = np.linspace(delta_freq, nyq_freq+delta_freq, N)

The resulting freq_array is your range of trial frequencies.

MLE parameters and chi squared values

To determine the best-fit coefficients you first need to create an array for each which contains their "pattern". For a constant like $B$ that array is constructed using np.ones_like(time_array). For the sinusoidal patterns, you'll need to compute the value of $\cos \; \text{or} \; \sin \left (f \; t_i\right )$ for every value of $t_i$

Create a pattern matrix $P$ using the np.vstack function on your three pattern arrays. Next you'll need to create an NxN matrix with inverse data variances $1/\sigma^2_i$ on the diagonals - call this $S$.
Create your Hessian matrix according to: $$H = (P \cdot S)\cdot P^T$$ Now construct a correlation vector, $C$, using your residuals array, $y$according to: $$C = (y \cdot S) \cdot P^T$$ Your best-fit coefficients can now be determined using the numpy linalg.solve function:


        coefficients = np.linalg.solve(H, C)

The amplitude of this particular sinusoidal function can be found using the formula: $$A = \sqrt{C^2 + S^2}$$ To get a $\chi^2$ value for this fit, compute the residuals as follows.


        model = P.T @ coefficients
        

        res = (residuals_array - model)/uncertainty_array
        

        chi_squared = res @ res

Now just plot $\chi^2$ against the trial frequencies and look for minima! You should end up with something like this:

Estimating QPO period

To estimate the period of the strongest QPO (or whichever one you want to analyse), you're going to fit a polynomial to the three lowest points of the $\chi^2$ minimum after converting the frequency array to a period array.
p1, p2, and p3 refer to the frequency value of the three lowest points, where b is the lowest, and c1, c2, and c3 refer to the $\chi^2$ value of the three lowest points.


        # returns the array index for your chi_squared minimum

        min_index = np.where(chi_squared == np.min(chi_squared))[0][0]


        
        def Quadratic(x, a, b, c):

          return a*x**2 + b*x + c
        


        def Polynomial(x1, x2, x3, y1, y2, y3):

             a, b, c = curve_fit(Quadratic, [x1, x2, x3] ,[y1, y2, y3])[0]

             xhat = -b/(2*a)

             yhat = Quadratic(xhat, a, b, c)

             sigx = np.sqrt(1/np.abs(a))

             return xhat, yhat, sigx


        
        p_hat, chiphat, sig_p = Polynomial(p1, p2, p3, c1, c2, c3)

The period and corresponding uncertainty are p_hat and sig_p respectively.

Closing remarks

Congratulations! You've successfully identified a quasi-period oscillation and determined both its period and amplitude. If you want some ideas for next steps, try:

calculating the variation in period and amplitude over time using a memory function
identify which $\chi^2$ stalactites are genuine QPOs and which are simply aliasing effects or harmonics

To calculate the variation of a parameter over time, follow the exact same procedure in the "fitting to a function" section, but create your $S$ matrix by populating the diagonals with weights as described in the "smoothing the curve" section. I plan to update this page with more detailed instructions on how to achieve those goals in the near future. Remember not to blindly trust any of this guide - do your own research!