May 12, 2023 - Testing for Multivariate Normality with a Henze-Zirkler Test (In Python)
If you’re just interested in the code, it’s super easy to use pingouin to do this: import numpy as np import pingouin as pg data = np.random.normal(size=(100, 3)) output = pg.multivariate_normality(data, alpha=.05) print(output.hz, output.pval, output.normal) Sometimes, it’s useful to know when a sample looks a lot like a normal distribution. In a single dimension, we can use the function from scipy scipy.stats.normaltest to test whether a sample is normal. This test combines tests from D’Agostino and Pearson which measure the skew and kurtosis of the sample, and report when those differ from a similar normal population. Unfortunately, this test isn’t immediately generalizable to multiple dimensions, which makes a new test necessary. One example is a Henze-Zirkler test, which is based on a non-negative functional \(D\) which measures the distance between two distribution functions, and has the property that \(D(N_d(0, I_d), Q) = 0\) if and only if \(Q\) is a multivariate normal distribution with identity covariance matrix. In practice, the Henze-zirkler test computes a weighted integral of the difference between the empirical characteristic function (ECF) and it’s pointwise normal approximation (in the limit). ...