Select the value of the kernel tuning parameter (h)

Contents

Select the value of the kernel tuning parameter (h)#

The function performs the selection of the optimal value for the tuning parameter \(h\) of the normal kernel function, for normality test, the two-sample, and k-sample KBQD tests. It performs a small simulation study, generating samples according to the family of alternative specified, for the chosen values of h_values and delta.

We consider target alternatives \(F_\delta(\hat{\mathbf{\mu}}, \hat{\mathbf{\Sigma}}, \hat{\mathbf{\lambda}})\), where \(\hat{\mathbf{\mu}}, \hat{\mathbf{\Sigma}}\) and \(\hat{\mathbf{\lambda}}\) indicate the location, covariance, and skewness parameter estimates from the pooled sample.

  • Compute the estimates of the mean \(\hat{\mu}\), covariance matrix \(\hat{\Sigma}\), and skewness \(\hat{\lambda}\) from the pooled sample.

  • Choose the family of alternatives \(F_\delta = F_\delta(\hat{\mu}, \hat{\Sigma}, \hat{\lambda})\).

For each value of \(\delta\) and \(h\):

  • Generate \(\mathbf{X}_1, \ldots, \mathbf{X}_{k-1} \sim F_0\), for \(\delta = 0\);

  • Generate \(\mathbf{X}_k \sim F_\delta\);

  • Compute the \(k\)-sample test statistic between \(\mathbf{X}_1, \mathbf{X}_2, \ldots, \mathbf{X}_k\) with kernel parameter \(h\);

  • Compute the power of the test. If it is greater than 0.5, select \(h\) as the optimal value.

  • If an optimal value has not been selected, choose the \(h\) which corresponds to maximum power.

The available alternative options are:

  • location alternatives, \(F_\delta = SN_d(\hat{\mu} + \delta, \hat{\Sigma}, \hat{\lambda})\), with \(\delta = 0.2, 0.3, 0.4\);

  • scale alternatives, \(F_\delta = SN_d(\hat{\mu}, \hat{\Sigma} \cdot \delta, \hat{\lambda})\), with \(\delta = 1.1, 1.3, 1.5\);

  • skewness alternatives, \(F_\delta = SN_d(\hat{\mu}, \hat{\Sigma}, \hat{\lambda} + \delta)\), with \(\delta = 0.2, 0.3, 0.6\).

The values of \(h = 0.6, 1, 1.4, 1.8, 2.2\) and \(N = 50\) are set as default values. The function select_h() allows the user to set the values of \(\delta\) and \(h\) for a more extensive grid search. We suggest a more extensive grid search when computational resources permit.

Note

Please be aware that the select_h() function may take a significant amount of time to run, especially with larger datasets or when using a larger number of parameters in h_values and delta. Consider this when applying the function to large or complex data.

References#

Markatou, M., & Saraceno, G. (2024). A unified framework for multivariate two-sample and k-sample kernel-based quadratic distance goodness-of-fit tests. arXiv preprint arXiv:2407.16374.

Saraceno, G., Markatou, M., Mukhopadhyay, R., & Golzy, M. (2024). Goodness-of-Fit and Clustering of Spherical Data: the QuadratiK package in R and Python. arXiv preprint arXiv:2402.02290.