If you’re just interested in the code, it’s super easy to use pingouin to do this:
import numpy as np
import pingouin as pg
data = np.random.normal(size=(100, 3))
output = pg.multivariate_normality(data, alpha=.05)
print(output.hz, output.pval, output.normal)
Sometimes, it’s useful to know when a sample looks a lot like a normal distribution. In a single dimension, we can use
the function from scipy scipy.stats.normaltest
to test whether a sample is normal. This test combines tests from
D’Agostino and Pearson which measure the skew and kurtosis of the sample, and report when those differ from a similar
normal population. Unfortunately, this test isn’t immediately generalizable to multiple dimensions, which makes a new
test necessary. One example is a Henze-Zirkler test, which is based on a non-negative functional \(D\) which measures the
distance between two distribution functions, and has the property that \(D(N_d(0, I_d), Q) = 0\) if and only if \(Q\) is a
multivariate normal distribution with identity covariance matrix. In practice, the Henze-zirkler test computes a
weighted integral of the difference between the empirical characteristic function (ECF) and it’s pointwise normal
approximation (in the limit).
The test statistic can be calculated as:
$$ T_{\alpha} = n\left(4I_{S_{singular}} + D_{n,\alpha}I_{S_{nonsingular}}\right) $$
where $S$ is the sample covariance matrix, I is an indicator function, and:
$$ D_{n,\alpha} = \int_{R^d} \left| \psi_n(t) - \exp(-\frac{1}{2} ||t||^2) \right|^2 \phi_{\alpha}(t) dt $$
with $\psi_n$ being the empirical characteristic function, $\phi_{\alpha}$ being a weighting function with parameter $\alpha$:
$$ \phi_{\alpha}(t) = (2\pi\alpha^2)^{-m/2}\exp(-\frac{||t||^2}{2\alpha^2}), t \sim R^m $$
Notice that because \(D_{n,\alpha}\) is undefined with S is singular, we set it to 4 (the maximum value) in that case. Because \(T_{\alpha}\) is approximately distributed as log-normal, we can use the log-normal distribution to compute the null hypothesis probability.