Usage Instructions for Dashboard Application#

This notebook describes how the dashboard application in the QuadratiK package can be used.

Instantiating the Dashboard Application#

[1]:

# uncomment the below code to instantiate the dashboard on a local machine
"""
from QuadratiK.ui import UI
UI().run()
""";

We will illustrate the usage of the dashboard application with the help of a number of datasets. These datasets can be obtained using the Python script located at data_generation_dashboard.py

Normality Test#

Once the user clicks on the Normality Test functionality on the left pane, the interface for performing the “Normality Test” is displayed (see image below). A concise description of the method is included in the beginning, followed by an expandable component to view the actual Python and R code used in the backend to generate the results. Presenting the actual code enables users of the GUI to quickly replicate the results, should they wish to integrate this package in their pre-existing pipelines.

Normality Test Options :

The user needs to specify the delimiter for the data files and upload their datasets in order to perform the test. We allow the users to upload data files up to a maximum of size 20MBs. Once the data files are uploaded, the user is presented with a number of input fields where the user can choose to change the default parameters used for performing the tests. Depending on the parameters chosen, the test is performed and the results are shown to the user. We also provide the functionality to the user to download the results. Furthermore, we present statistical measures such as the mean, standard deviation, median, interquartile range (IQR), as well as minimum and maximum values for each variable or feature in the data. A sample output consisting the results of the normality test and the summary statistics for the uploaded dataset is shown in below:

Normality Test Results :

Additionally, we display the QQ plots for the different variables or features present in the uploaded dataset as shown below:

QQ Plots :

Two- and K-sample Tests#

We implement a similar interface for the Two- and k-sample tests. The user needs to upload two data files corresponding to the two samples to be tested. Next, we display the statistical measures for the uploaded datasets. For the two-sample test, we also display the Q-Q plot for each feature, plotting the two sets of quantiles from the uploaded datasets.

Examples from the K-sample test are shown below:

K Sample Test Options1 :

K Sample Test Options2 :

The user needs upload the data set and provide values for the various options. Once these steps have been performed the test is computed and the results alongwith various summary statistics are shown as below:

K Sample Test Results :

Tuning Parameter Selection#

The interface for Tuning Parameter Selection (shown below) is used to compute the kernel bandwidth of the Gaussian kernel for the proposed kernel-based quadratic distance (KBQD) tests. We employ a similar interface in which the user starts by indicating the delimiter, followed by uploading the data file. Subsequently, the user can specify the column in the data file containing the label, along with the number of iterations for critical value estimation, the proportion of subsampling samples to be used, and the alternative, or they can choose to use the default values. In case of determining h for the normality test, the user needs to upload the data file with a column containing only a single label for all observations.

Select h options :

The result for the tuning parameter selection algorithm showing the plot for the \(h\) vs power for different \(\delta\) is shown below:

Select h results :

Uniformity Test#

To conduct the Uniformity Test, users are again required to indicate the delimiter utilized in the data file and upload the corresponding file. After uploading the data file, users have the option to either specify the number of iterations for critical value estimation and the concentration parameter, or they can choose to use the default values.

Data generation from PKBD Models#

We also include the functionality for users to generate data from the supported PKBD algorithms - rejvmf and rejacg.

To generate data from the supported PKBD models users can specify the number of samples, the concentration parameter value, and the list of location parameters, as displayed below:

Random Generation from PKBD Options :

If users generate 2-dimensional or 3-dimensional data, we also offer a visualization of the generated data on a circle or an interactive 3D sphere. However, if the user generates data for more than 3 dimensions, the visualization functionality is not supported. An example of 3D visualization of the generated samples is shown below:

Random Generation from PKBD Sphere Visualization :

Clustering on Sphere#

We also present an interface for users to use the Poisson Kernel-Based Clustering algorithm. In order to describe the functionalities and visualizations incorporated for Poisson Kernel-based clustering, we use the Crabs dataset from R package MASS and perform the clustering on it. The various options which need to be selected by the user is shown below:

Options :

We upload the Crabs dataset as a single data file and explicitly indicate the delimiter used in the file. Additionally, we also specify whether the true labels for the clusters are present in the data file and, if so, provide the column number for the same. Once these parameters are provided, the results are generated. Initially, the results are displayed in a collapsed manner, requiring the user to click on them to access the complete generated result. Along with the results for the clustering, we also show the value for ARI, Macro Precision and Macro Recall, which for this particular dataset is 0.81, 0.95 and 0.95 respectively. The results are shown below:

Clustering Results :

We offer supplementary functionalities such as performing the K-sample test on the clusters identified by the PKBC algorithm as shown below:

K-Sample Test of Clusters :

Finally, the image below displays additional visualization tools, such as the elbow plot. The elbow plot is provided to assist the user in selecting the optimal number of clusters for the given data file. Additionally, a plot of the data on a three-dimensional interactive sphere is also available to the user.

Elbow Plots :

When visualizing the data on the interactive three-dimensional sphere, if the true labels are provided, we create two adjacent subplots. In the first subplot, the data points are colored based on the membership identified by the PKBD algorithm, and in the next subplot, the data points are colored according to their true class label. In case the true labels are not present we just show the first subplot. For the purposes of visualizing the data on the sphere, if the provided data file has more than three dimensions, PCA is performed, and only the first three dimensions are chosen for creating the visualization. This is shown in the image below:

Visualization of Clusters on Sphere :

Usage Instructions for Dashboard Application

Contents