Histograms#
Histograms are used to represent graphically the frequency of a set of measurements.
See the matplotlib user manual for more details: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
data = [-1.17, -1.23, -2.94, 0.36, -0.63, 0.09, -0.42, -0.39, 1.03, -1.,
1.12, 0.25, -0.84, 0.47, 0.42, -0.15, 1.36, -0.7, -0.71,
0.13, -0.77, 0.44, 0.41, -0.52, -0.59, 0.25, -0.69, -1.34, -0.96,
0.17, -1.78, 0.08, 0.53, 0.35, -1.37, -0.05, -0.24, -0.36, 1.49,
0.16, 1.26, -0.08, 1.25, 0.19, 1.37, 0.96, -1.45, 0.11, 1.67,
1.28, -0.61, -0.43, -1.01, 0.11, 0.59, -0.62, 0.21, 0.94, -0.88,
-0.95, -0.65, -0.42, -1.24, 0.53, -0.38, -1.05, -1.68, -1.95, -1.07,
0.38, 0.09, -0.24, -1.74, 0.23, 0.11, 0.9, 0.07, -0.45, 0.25,
1.81, 0.23, 0.01, -2.77, -0.14, 0.26, -0.73, -1.82, -0.52, -0.1,
-0.5, 0.79, -0.74, 1.2, 1., -0.5, -2.19, -2.01, -0.86, -0.41, 0.8 ]
# Define range and number of bins
min_range = -5
max_range = 5
num_bins = 20
# Normalized to unit area ?
norm = True
# Plot histogram
plt.figure(figsize=[6,6])
entries, bin_boundaries, patches = plt.hist(data, bins=num_bins, range=(min_range, max_range), facecolor='green', density=norm)
print ("Entries =", entries)
print ("Bin boundaries =", bin_boundaries)
plt.title('Histogram of Data Points')
plt.xlabel('Data Points')
plt.ylabel('Frequency')
plt.show()
Entries = [0. 0. 0. 0. 0.04 0.04 0.1 0.18 0.38 0.34 0.52 0.16 0.2 0.04
0. 0. 0. 0. 0. 0. ]
Bin boundaries = [-5. -4.5 -4. -3.5 -3. -2.5 -2. -1.5 -1. -0.5 0. 0.5 1. 1.5
2. 2.5 3. 3.5 4. 4.5 5. ]
Any time you in your data you lose information: data are now represented by a vector of size nbins. On the other hand you can see this as a simple (lossy) information compression.
Physical example of bins: Galton machine https://en.wikipedia.org/wiki/Galton_board