Usage Instructions for Dashboard Application#
This notebook describes how the dashboard application in the QuadratiK package can be used.
Instantiating the Dashboard Application#
[4]:
# uncomment the below code to instantiate the dashboard on a local machine
"""
from QuadratiK.ui import UI
UI().run()
"""
[4]:
'\nfrom QuadratiK.ui import UI\nUI().run()\n'
Main Menu#
The above code will display the landing page of the dashboard application as shown below -
:
The available functionalities are displayed on the left hand pane of the landing page. Currently the functionalities available are: Normality Test
, Two Sample Test
, K-Sample Test
, Tuning Parameter Selection
, Uniformity Test
,Data generation from PKBD Models
and Clustering on Sphere
.
Now, in the next sections, we will be describing in more detail the usage of the various functionalities.
Normality Test#
Once the user clicks on the Normality Test
functionality on the left pane, the interface for performing the “Normality Test” is displayed (see image below). A concise description of the method is included in the beginning, followed by an expandable component to view the actual Python
and R
code used in the backend to generate the results. Presenting the actual code enables users of the GUI to quickly replicate the results, should they wish to integrate this package in their
pre-existing pipelines.
:
The user needs to specify the delimiter for the data files and upload their datasets in order to perform the test. We allow the users to upload data files up to a maximum of size 20MBs. Once the data files are uploaded, the user is presented with a number of input fields where the user can choose to change the default parameters used for performing the tests. Depending on the parameters chosen, the test is performed and the results are shown to the user. We also provide the functionality to the user to download the results. Furthermore, we present statistical measures such as the mean, standard deviation, median, interquartile range (IQR), as well as minimum and maximum values for each variable or feature in the data. A sample output consisting the results of the normality test and the summary statistics for the uploaded dataset is shown in below:
:
Additionally, we display the QQ plots for the different variables or features present in the uploaded dataset as shown below:
:
Two- and K-sample Tests#
We implement a similar interface for the Two- and k-sample tests. The user needs to upload two data files corresponding to the two samples to be tested. Next, we display the statistical measures for the uploaded datasets. For the two-sample test, we also display the Q-Q plot for each feature, plotting the two sets of quantiles from the uploaded datasets.
Examples from the K-sample test are shown below:
:
The user needs upload the data set and provide values for the various options. Once these steps have been performed the test is computed and the results alongwith various summary statistics are shown as below:
:
Tuning Parameter Selection#
The interface for Tuning Parameter Selection (shown below) is used to compute the kernel bandwidth of the Gaussian kernel for the proposed kernel-based quadratic distance (KBQD) tests. We employ a similar interface in which the user starts by indicating the delimiter, followed by uploading the data file. Subsequently, the user can specify the column in the data file containing the label, along with the number of iterations for critical value estimation, the proportion of subsampling samples to
be used, and the alternative, or they can choose to use the default values. In case of determining h
for the normality test, the user needs to upload the data file with a column containing only a single label for all observations.
:
The result for the tuning parameter selection algorithm showing the plot for the \(h\) vs power for different \(\delta\) is shown below:
:
Uniformity Test#
To conduct the Uniformity Test, users are again required to indicate the delimiter utilized in the data file and upload the corresponding file. After uploading the data file, users have the option to either specify the number of iterations for critical value estimation and the concentration parameter, or they can choose to use the default values.
Data generation from PKBD Models#
We also include the functionality for users to generate data from the supported PKBD algorithms - rejvmf
and rejacg
.
To generate data from the supported PKBD models users can specify the number of samples, the concentration parameter value, and the list of location parameters, as displayed below:
:
If users generate 2-dimensional or 3-dimensional data, we also offer a visualization of the generated data on a circle or an interactive 3D sphere. However, if the user generates data for more than 3 dimensions, the visualization functionality is not supported. An example of 3D visualization of the generated samples is shown below:
:
Clustering on Sphere#
We also present an interface for users to use the Poisson Kernel-Based Clustering algorithm. In order to describe the functionalities and visualizations incorporated for Poisson Kernel-based clustering, we use the Crabs
dataset from R
package MASS
and perform the clustering on it. The various options which need to be selected by the user is shown below:
:
We upload the Crabs
dataset as a single data file and explicitly indicate the delimiter used in the file. Additionally, we also specify whether the true labels for the clusters are present in the data file and, if so, provide the column number for the same. Once these parameters are provided, the results are generated. Initially, the results are displayed in a collapsed manner, requiring the user to click on them to access the complete generated result. Along with the results for the
clustering, we also show the value for ARI, Macro Precision and Macro Recall, which for this particular dataset is 0.809, 0.949 and 0.949 respectively. The results are shown below:
:
We offer supplementary functionalities such as performing the K-sample test on the clusters identified by the PKBC algorithm as shown below:
:
Finally, the image below displays additional visualization tools, such as the elbow plot. The elbow plot is provided to assist the user in selecting the optimal number of clusters for the given data file. Additionally, a plot of the data on a three-dimensional interactive sphere is also available to the user.
:
When visualizing the data on the interactive three-dimensional sphere, if the true labels are provided, we create two adjacent subplots. In the first subplot, the data points are colored based on the membership identified by the PKBD algorithm, and in the next subplot, the data points are colored according to their true class label. In case the true labels are not present we just show the first subplot. For the purposes of visualizing the data on the sphere, if the provided data file has more than three dimensions, PCA is performed, and only the first three dimensions are chosen for creating the visualization. This is shown in the image below:
: