Exercises on Covariance and Correlation

Exercises on Covariance and Correlation#

1) Covariance

Prove mathematically and numerically the correlation formula for \(N\rightarrow\infty\):

\[ Cov(x,y)=\frac{1}{N-1}\sum_{i=1}^{N}(x_i -\bar{x})(y_i -\bar{y}) = ... = \overline{xy} - \bar{x}\bar{y}\]

What happens if \(N=1\) and for small values of \(N\) (\(N<100\))?

2) Correlation

An advanced satellite performs a simultaneous observation of X-ray and radio spectrum of a Pulsar for a period of 10 hours. A radio-telescope on earth is also monitoring the radio spectrum but it has older instruments and higher noise.

import numpy as np
import pandas as pd

time = np.arange(0, 10)
XrayPulses = np.array([4.67, 6.07, 4.19, 4.63, 6.77, 3.69, 5.20, 4.87, 5.85, 4.73])
RadioPulsesSat = np.array([1.73, 2.24, 1.60, 1.64, 2.43, 1.26, 1.91, 1.58, 2.22, 1.67])
RadioPulsesLab = np.array([1.94, 1.16, 2.01, 2.19, 1.55, 2.05, 2.20, 2.35, 2.12, 2.19])

XrayPulses *= 10**6
RadioPulsesSat *= 10**5
RadioPulsesLab *= 10**5

data = {
    "time": time,
    "x-ray pulses": XrayPulses,
    "radio pulses satellite": RadioPulsesSat,
    "radio pulses laboratory": RadioPulsesLab,
}

df = pd.DataFrame(data)
df
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 2
      1 import numpy as np
----> 2 import pandas as pd
      4 time = np.arange(0, 10)
      5 XrayPulses = np.array([4.67, 6.07, 4.19, 4.63, 6.77, 3.69, 5.20, 4.87, 5.85, 4.73])

ModuleNotFoundError: No module named 'pandas'

3.1) Compute the covariance and the correlation factor of the x-ray pulses and the radio pulses from satellite.

Is there any evidence of correlation between radio pulses and X-Rays? If so try to see if there’s still correlation between the X-Ray pulses and the Radio pulses recorded by the second radio-telescope (the one on Earth), and verify that the correlation of data of the two radio-telescope give the same result.

Do you expect any correlation between the time and the data from the satellite or the telescope? Answer by computing the correlation between time and Xray or radio pulses.

3.2) Fill the x-ray pulses and the radio pulses data into a 2D histogram, get the correlation and the covariance from the histogram and compare it with the calculated one. The difference with respect to (2.1) is that here your data are binned.

Comment on the information loss from unbinned to binned data for the calculation of covariance/correlation and discuss the limit for which unbinned and binned data reach the same result

Note: getting correlation and covariance from the histogram means computing these quantities using the average values of the bins.