KernelTest#

class QuadratiK.kernel_test.KernelTest(h: float | None = None, method: str = 'subsampling', num_iter: str = 150, b: float = 0.9, quantile: float = 0.95, mu: ndarray | None = None, sigma: ndarray | None = None, centering_type: str = 'nonparam', alternative: str | None = None, k_threshold: int = 10, random_state: int | None = None, n_jobs: int = 8)#

Class for performing the kernel-based quadratic distance goodness-of-fit tests using the Gaussian kernel with tuning parameter h. Depending on the input y the function performs the test of multivariate normality, the non-parametric two-sample tests or the k-sample tests. More details on kernel-based quadratic distance goodness-of-fit tests can be found in User Guide.

Parameters#

hfloat, optional

Bandwidth for the kernel function.

methodstr, optional

The method used for critical value estimation (“subsampling”, “bootstrap”, or “permutation”).

num_iterint, optional

The number of iterations to use for critical value estimation. Defaults to 150.

bfloat, optional

The size of the subsamples used in the subsampling algorithm. Defaults to 0.9 i.e. 0.9N samples are used, where N represents the total sample size.

quantilefloat, optional

The quantile to use for critical value estimation. Defaults to 0.95.

munumpy.ndarray, optional

Mean vector for the reference distribution. Mandatory for the normality test and for the two-sample test with parametric centering. Defaults to None.

sigmanumpy.ndarray, optional

Covariance matrix of the reference distribution. Mandatory for the normality test and for the two-sample test with parametric centering. Defaults to None.

alternativestr, optional

String indicating the type of alternative to be used for calculating “h” by the tuning parameter selection algorithm when h is not provided. Must be one of “location”, “scale” and “skewness”. Defaults to ‘None’

k_thresholdint, optional

Maximum number of groups allowed. Defaults to 10. Change in case of more than 10 groups.

random_stateint, None, optional.

Seed for random number generation. Defaults to None.

n_jobsint, optional.

n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.

Attributes#

For Normality Test:
test_type_str

The type of test performed on the data.

execution_timefloat

Time taken for the test method to execute.

un_h0_rejected_boolean

Whether the null hypothesis using Un is rejected (True) or not (False).

vn_h0_rejected_boolean

Whether the null hypothesis using Vn is rejected (True) or not (False).

un_test_statistic_float

Un Test statistic of the perfomed test type.

vn_test_statistic_float

Vn Test statistic of the perfomed test type.

un_cv_float

Critical value for Un.

un_cv_float

Critical value for Vn.

For Two-Sample and K-Sample Test:
test_type_str

The type of test performed on the data.

execution_timefloat

Time taken for the test method to execute.

dn_h0_rejected_boolean

Whether the null hypothesis using Un is rejected (True) or not (False).

dn_test_statistic_float

Un Test statistic of the perfomed test type.

dn_cv_float

Critical value for Un.

trace_h0_rejected_boolean

Whether the null hypothesis using trace statistic is rejected (True) or not (False).

trace_test_statistic_float

Trace Test statistic of the perfomed test type.

trace_cv_float

Critical value for trace statistic.

cv_method_str

Critical value method used for performing the test.

References#

Markatou, M., & Saraceno, G. (2024). A unified framework for multivariate two-sample and k-sample kernel-based quadratic distance goodness-of-fit tests. arXiv preprint arXiv:2407.16374.

Lindsay BG, Markatou M. & Ray S. (2014) Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests, Journal of the American Statistical Association, 109:505, 395-410, DOI: 10.1080/01621459.2013.836972.

Examples#

import numpy as np
np.random.seed(78990)
from QuadratiK.kernel_test import KernelTest
# data generation
data_norm = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4),size = 500)
# performing the normality test
mu = np.zeros(4)
sigma = np.eye(4)
normality_test = KernelTest(h=0.4, num_iter=150, method= "subsampling", mu=mu, sigma=sigma, random_state=42).test(data_norm)
print(normality_test)
KernelTest(
  Test Type=Kernel-based quadratic distance Normality test,
  Execution Time=13.353809356689453 seconds,
  U-Statistic=-0.11459366046288307,
  U-Statistic Critical Value=1.7841253047274597,
  U-Statistic Null Hypothesis Rejected=False,
  U-Statistic Variance=1.108021332522181e-08,
  V-Statistic=0.9779550271616873,
  V-Statistic Critical Value=42.460022848761945,
  V-Statistic Null Hypothesis Rejected=False,
  Selected tuning parameter h=0.4
)
import numpy as np
np.random.seed(0)
from scipy.stats import skewnorm
from QuadratiK.kernel_test import KernelTest
# data generation
X_2 = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4), size=200)
Y_2 = skewnorm.rvs(size=(200, 4),loc=np.zeros(4), scale=np.ones(4),a=np.repeat(0.5,4), random_state=20)
# performing the two sample test
two_sample_test = KernelTest(h = 2,num_iter = 150, random_state=42).test(X_2,Y_2)
print(two_sample_test)
KernelTest(
  Test Type=Kernel-based quadratic distance two-sample test,
  Execution Time=0.4157063961029053 seconds,
  Dn-Statistic=5.061212999055006,
  Dn-Statistic Critical Value=0.6311271530521103,
  Dn-Statistic Null Hypothesis Rejected=True,
  Dn-Statistic Variance=3.037711857184591e-10,
  Trace-Statistic=15.75171816373428,
  Trace-Statistic Critical Value=1.9647673822217238,
  Trace-Statistic Null Hypothesis Rejected=True,
  Trace-Statistic Variance=7.87978087705095e-12,
  Selected tuning parameter h=2,
  Critical Value Method=subsampling
)

Methods

KernelTest.stats()

Function to generate descriptive statistics per variable (and per group if available).

KernelTest.summary([print_fmt])

Summary function generates a table for the kernel test results and the summary statistics.

KernelTest.test(x[, y])

Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h.


KernelTest.stats() DataFrame#

Function to generate descriptive statistics per variable (and per group if available).

Returns#

summary_stats_dfpandas.DataFrame

Dataframe of descriptive statistics.

KernelTest.summary(print_fmt: str = 'simple_grid') str#

Summary function generates a table for the kernel test results and the summary statistics.

Parameters#

print_fmtstr, optional.

Used for printing the output in the desired format. Defaults to “simple_grid”. Supports all available options in tabulate, see here: https://pypi.org/project/tabulate/.

Returns#

summarystr

A string formatted in the desired output format with the kernel test results and summary statistics.

KernelTest.test(x: ndarray | DataFrame, y: ndarray | DataFrame | None = None) KernelTest#

Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. Depending on the shape of the y, the function performs the tests of multivariate normality, the non-parametric two-sample tests or the k-sample tests.

Parameters#

xnumpy.ndarray or pandas.DataFrame.

A numeric array of data values.

ynumpy.ndarray or pandas.DataFrame, optional

A numeric array data values (for two-sample test) and a 1D array of class labels (for k-sample test). Defaults to None.

Returns#

selfobject

Fitted estimator.