KernelTest#
- class QuadratiK.kernel_test.KernelTest(h: float | None = None, method: str = 'subsampling', num_iter: str = 150, b: float = 0.9, quantile: float = 0.95, mu: ndarray | None = None, sigma: ndarray | None = None, centering_type: str = 'nonparam', alternative: str | None = None, k_threshold: int = 10, random_state: int | None = None, n_jobs: int = 8)#
Class for performing the kernel-based quadratic distance goodness-of-fit tests using the Gaussian kernel with tuning parameter h. Depending on the input y the function performs the test of multivariate normality, the non-parametric two-sample tests or the k-sample tests. More details on kernel-based quadratic distance goodness-of-fit tests can be found in User Guide.
Parameters#
- hfloat, optional
Bandwidth for the kernel function.
- methodstr, optional
The method used for critical value estimation (“subsampling”, “bootstrap”, or “permutation”).
- num_iterint, optional
The number of iterations to use for critical value estimation. Defaults to 150.
- bfloat, optional
The size of the subsamples used in the subsampling algorithm. Defaults to 0.9 i.e. 0.9N samples are used, where N represents the total sample size.
- quantilefloat, optional
The quantile to use for critical value estimation. Defaults to 0.95.
- munumpy.ndarray, optional
Mean vector for the reference distribution. Mandatory for the normality test and for the two-sample test with parametric centering. Defaults to None.
- sigmanumpy.ndarray, optional
Covariance matrix of the reference distribution. Mandatory for the normality test and for the two-sample test with parametric centering. Defaults to None.
- alternativestr, optional
String indicating the type of alternative to be used for calculating “h” by the tuning parameter selection algorithm when h is not provided. Must be one of “location”, “scale” and “skewness”. Defaults to ‘None’
- k_thresholdint, optional
Maximum number of groups allowed. Defaults to 10. Change in case of more than 10 groups.
- random_stateint, None, optional.
Seed for random number generation. Defaults to None.
- n_jobsint, optional.
n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.
Attributes#
- For Normality Test:
- test_type_str
The type of test performed on the data.
- execution_timefloat
Time taken for the test method to execute.
- un_h0_rejected_boolean
Whether the null hypothesis using Un is rejected (True) or not (False).
- vn_h0_rejected_boolean
Whether the null hypothesis using Vn is rejected (True) or not (False).
- un_test_statistic_float
Un Test statistic of the perfomed test type.
- vn_test_statistic_float
Vn Test statistic of the perfomed test type.
- un_cv_float
Critical value for Un.
- un_cv_float
Critical value for Vn.
- For Two-Sample and K-Sample Test:
- test_type_str
The type of test performed on the data.
- execution_timefloat
Time taken for the test method to execute.
- dn_h0_rejected_boolean
Whether the null hypothesis using Un is rejected (True) or not (False).
- dn_test_statistic_float
Un Test statistic of the perfomed test type.
- dn_cv_float
Critical value for Un.
- trace_h0_rejected_boolean
Whether the null hypothesis using trace statistic is rejected (True) or not (False).
- trace_test_statistic_float
Trace Test statistic of the perfomed test type.
- trace_cv_float
Critical value for trace statistic.
- cv_method_str
Critical value method used for performing the test.
References#
Markatou, M., & Saraceno, G. (2024). A unified framework for multivariate two-sample and k-sample kernel-based quadratic distance goodness-of-fit tests. arXiv preprint arXiv:2407.16374.
Lindsay BG, Markatou M. & Ray S. (2014) Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests, Journal of the American Statistical Association, 109:505, 395-410, DOI: 10.1080/01621459.2013.836972.
Examples#
import numpy as np np.random.seed(78990) from QuadratiK.kernel_test import KernelTest # data generation data_norm = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4),size = 500) # performing the normality test mu = np.zeros(4) sigma = np.eye(4) normality_test = KernelTest(h=0.4, num_iter=150, method= "subsampling", mu=mu, sigma=sigma, random_state=42).test(data_norm) print(normality_test)
KernelTest( Test Type=Kernel-based quadratic distance Normality test, Execution Time=13.353809356689453 seconds, U-Statistic=-0.11459366046288307, U-Statistic Critical Value=1.7841253047274597, U-Statistic Null Hypothesis Rejected=False, U-Statistic Variance=1.108021332522181e-08, V-Statistic=0.9779550271616873, V-Statistic Critical Value=42.460022848761945, V-Statistic Null Hypothesis Rejected=False, Selected tuning parameter h=0.4 )
import numpy as np np.random.seed(0) from scipy.stats import skewnorm from QuadratiK.kernel_test import KernelTest # data generation X_2 = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4), size=200) Y_2 = skewnorm.rvs(size=(200, 4),loc=np.zeros(4), scale=np.ones(4),a=np.repeat(0.5,4), random_state=20) # performing the two sample test two_sample_test = KernelTest(h = 2,num_iter = 150, random_state=42).test(X_2,Y_2) print(two_sample_test)
KernelTest( Test Type=Kernel-based quadratic distance two-sample test, Execution Time=0.4157063961029053 seconds, Dn-Statistic=5.061212999055006, Dn-Statistic Critical Value=0.6311271530521103, Dn-Statistic Null Hypothesis Rejected=True, Dn-Statistic Variance=3.037711857184591e-10, Trace-Statistic=15.75171816373428, Trace-Statistic Critical Value=1.9647673822217238, Trace-Statistic Null Hypothesis Rejected=True, Trace-Statistic Variance=7.87978087705095e-12, Selected tuning parameter h=2, Critical Value Method=subsampling )
Methods
Function to generate descriptive statistics per variable (and per group if available). |
|
|
Summary function generates a table for the kernel test results and the summary statistics. |
|
Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. |
- KernelTest.stats() DataFrame#
Function to generate descriptive statistics per variable (and per group if available).
Returns#
- summary_stats_dfpandas.DataFrame
Dataframe of descriptive statistics.
- KernelTest.summary(print_fmt: str = 'simple_grid') str#
Summary function generates a table for the kernel test results and the summary statistics.
Parameters#
- print_fmtstr, optional.
Used for printing the output in the desired format. Defaults to “simple_grid”. Supports all available options in tabulate, see here: https://pypi.org/project/tabulate/.
Returns#
- summarystr
A string formatted in the desired output format with the kernel test results and summary statistics.
- KernelTest.test(x: ndarray | DataFrame, y: ndarray | DataFrame | None = None) KernelTest#
Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. Depending on the shape of the y, the function performs the tests of multivariate normality, the non-parametric two-sample tests or the k-sample tests.
Parameters#
- xnumpy.ndarray or pandas.DataFrame.
A numeric array of data values.
- ynumpy.ndarray or pandas.DataFrame, optional
A numeric array data values (for two-sample test) and a 1D array of class labels (for k-sample test). Defaults to None.
Returns#
- selfobject
Fitted estimator.