KernelTest#

class QuadratiK.kernel_test.KernelTest(h: float | None = None, method: str = 'subsampling', num_iter: str = 150, b: float = 0.9, quantile: float = 0.95, mu: ndarray | None = None, sigma: ndarray | None = None, centering_type: str = 'nonparam', alternative: str | None = None, k_threshold: int = 10, random_state: int | None = None, n_jobs: int = 8)#

Class for performing the kernel-based quadratic distance goodness-of-fit tests using the Gaussian kernel with tuning parameter h. Depending on the input y the function performs the test of multivariate normality, the non-parametric two-sample tests or the k-sample tests. More details on kernel-based quadratic distance goodness-of-fit tests can be found in User Guide.

Parameters#

hfloat, optional: Bandwidth for the kernel function.
methodstr, optional: The method used for critical value estimation (“subsampling”, “bootstrap”, or “permutation”).
num_iterint, optional: The number of iterations to use for critical value estimation. Defaults to 150.
bfloat, optional: The size of the subsamples used in the subsampling algorithm. Defaults to 0.9 i.e. 0.9N samples are used, where N represents the total sample size.
quantilefloat, optional: The quantile to use for critical value estimation. Defaults to 0.95.
munumpy.ndarray, optional: Mean vector for the reference distribution. Mandatory for the normality test and for the two-sample test with parametric centering. Defaults to None.
sigmanumpy.ndarray, optional: Covariance matrix of the reference distribution. Mandatory for the normality test and for the two-sample test with parametric centering. Defaults to None.
alternativestr, optional: String indicating the type of alternative to be used for calculating “h” by the tuning parameter selection algorithm when h is not provided. Must be one of “location”, “scale” and “skewness”. Defaults to ‘None’
k_thresholdint, optional: Maximum number of groups allowed. Defaults to 10. Change in case of more than 10 groups.
random_stateint, None, optional.: Seed for random number generation. Defaults to None.
n_jobsint, optional.: n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.

Attributes#

For Normality Test:

test_type_str: The type of test performed on the data.
execution_timefloat: Time taken for the test method to execute.
un_h0_rejected_boolean: Whether the null hypothesis using Un is rejected (True) or not (False).
vn_h0_rejected_boolean: Whether the null hypothesis using Vn is rejected (True) or not (False).
un_test_statistic_float: Un Test statistic of the perfomed test type.
vn_test_statistic_float: Vn Test statistic of the perfomed test type.
un_cv_float: Critical value for Un.
un_cv_float: Critical value for Vn.

For Two-Sample and K-Sample Test:

test_type_str: The type of test performed on the data.
execution_timefloat: Time taken for the test method to execute.
dn_h0_rejected_boolean: Whether the null hypothesis using Un is rejected (True) or not (False).
dn_test_statistic_float: Un Test statistic of the perfomed test type.
dn_cv_float: Critical value for Un.
trace_h0_rejected_boolean: Whether the null hypothesis using trace statistic is rejected (True) or not (False).
trace_test_statistic_float: Trace Test statistic of the perfomed test type.
trace_cv_float: Critical value for trace statistic.
cv_method_str: Critical value method used for performing the test.

References#

Markatou, M., & Saraceno, G. (2024). A unified framework for multivariate two-sample and k-sample kernel-based quadratic distance goodness-of-fit tests. arXiv preprint arXiv:2407.16374.

Lindsay BG, Markatou M. & Ray S. (2014) Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests, Journal of the American Statistical Association, 109:505, 395-410, DOI: 10.1080/01621459.2013.836972.

Examples#

import numpy as np
np.random.seed(78990)
from QuadratiK.kernel_test import KernelTest
# data generation
data_norm = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4),size = 500)
# performing the normality test
mu = np.zeros(4)
sigma = np.eye(4)
normality_test = KernelTest(h=0.4, num_iter=150, method= "subsampling", mu=mu, sigma=sigma, random_state=42).test(data_norm)
print(normality_test)

KernelTest(
  Test Type=Kernel-based quadratic distance Normality test,
  Execution Time=13.353809356689453 seconds,
  U-Statistic=-0.11459366046288307,
  U-Statistic Critical Value=1.7841253047274597,
  U-Statistic Null Hypothesis Rejected=False,
  U-Statistic Variance=1.108021332522181e-08,
  V-Statistic=0.9779550271616873,
  V-Statistic Critical Value=42.460022848761945,
  V-Statistic Null Hypothesis Rejected=False,
  Selected tuning parameter h=0.4
)

import numpy as np
np.random.seed(0)
from scipy.stats import skewnorm
from QuadratiK.kernel_test import KernelTest
# data generation
X_2 = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4), size=200)
Y_2 = skewnorm.rvs(size=(200, 4),loc=np.zeros(4), scale=np.ones(4),a=np.repeat(0.5,4), random_state=20)
# performing the two sample test
two_sample_test = KernelTest(h = 2,num_iter = 150, random_state=42).test(X_2,Y_2)
print(two_sample_test)

KernelTest(
  Test Type=Kernel-based quadratic distance two-sample test,
  Execution Time=0.4157063961029053 seconds,
  Dn-Statistic=5.061212999055006,
  Dn-Statistic Critical Value=0.6311271530521103,
  Dn-Statistic Null Hypothesis Rejected=True,
  Dn-Statistic Variance=3.037711857184591e-10,
  Trace-Statistic=15.75171816373428,
  Trace-Statistic Critical Value=1.9647673822217238,
  Trace-Statistic Null Hypothesis Rejected=True,
  Trace-Statistic Variance=7.87978087705095e-12,
  Selected tuning parameter h=2,
  Critical Value Method=subsampling
)

Methods

`KernelTest.stats`()	Function to generate descriptive statistics per variable (and per group if available).
`KernelTest.summary`([print_fmt])	Summary function generates a table for the kernel test results and the summary statistics.
`KernelTest.test`(x[, y])	Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h.

KernelTest.stats() → DataFrame#

Function to generate descriptive statistics per variable (and per group if available).

Returns#

summary_stats_dfpandas.DataFrame: Dataframe of descriptive statistics.

KernelTest.summary(print_fmt: str = 'simple_grid') → str#

Summary function generates a table for the kernel test results and the summary statistics.

Parameters#

print_fmtstr, optional.: Used for printing the output in the desired format. Defaults to “simple_grid”. Supports all available options in tabulate, see here: https://pypi.org/project/tabulate/.

Returns#

summarystr: A string formatted in the desired output format with the kernel test results and summary statistics.

KernelTest.test(x: ndarray | DataFrame, y: ndarray | DataFrame | None = None) → KernelTest#

Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. Depending on the shape of the y, the function performs the tests of multivariate normality, the non-parametric two-sample tests or the k-sample tests.

Parameters#

xnumpy.ndarray or pandas.DataFrame.: A numeric array of data values.
ynumpy.ndarray or pandas.DataFrame, optional: A numeric array data values (for two-sample test) and a 1D array of class labels (for k-sample test). Defaults to None.

Returns#

selfobject: Fitted estimator.

KernelTest

Contents

KernelTest#

Parameters#

Attributes#

References#

Examples#

Returns#

Parameters#

Returns#

Parameters#

Returns#