load_wisconsin_breast_cancer_data

load_wisconsin_breast_cancer_data#

QuadratiK.datasets.load_wisconsin_breast_cancer_data(desc: bool = False, return_X_y: bool = False, as_dataframe: bool = True, scaled: bool = False) tuple[str, DataFrame, DataFrame] | tuple[str, DataFrame] | tuple[str, ndarray] | tuple[DataFrame, DataFrame] | tuple[ndarray, ndarray] | DataFrame | ndarray#

The Wisconsin breast cancer dataset data frame has 569 rows and 31 columns. The first 30 variables report the features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. The last column indicates the class labels (Benign = 0 or Malignant = 1).

The function load_wisconsin_breast_cancer_data loads the Breast Cancer Wisconsin (Diagnostic).

Read more in the User Guide.

Parameters#

descboolean, optional

If set to True, the function will return the description along with the data. If set to False, the description will not be included. Defaults to False.

return_X_yboolean, optional

Determines whether the function should return the data as separate arrays (X and y). Defaults to False.

as_dataframeboolean, optional

Determines whether the function should return the data as a pandas DataFrame (True) or as a numpy array (False). Defaults to True.

scaledboolean, optional

Determines whether or not the data should be scaled. If set to True, the data will be divided by its Euclidean norm along each row. Defaults to False.

Returns#

  • If desc=True, return_X_y=True, as_dataframe=True:

    Returns a tuple containing: (str, pd.DataFrame, pd.DataFrame)

    • fdescrstr

      The description of the dataset.

    • Xpd.DataFrame

      A DataFrame with the features.

    • ypd.DataFrame

      A DataFrame with the class labels.

  • If desc=True, return_X_y=True, as_dataframe=False:

    Returns a tuple containing: (str, np.ndarray, np.ndarray)

    • fdescrstr

      The description of the dataset.

    • Xnp.ndarray

      A numpy array with the features .

    • ynp.ndarray

      A numpy array with the class labels .

  • If desc=True, return_X_y=False, as_dataframe=True:

    Returns a tuple containing: (str, pd.DataFrame)

    • fdescrstr

      The description of the dataset.

    • data_dfpd.DataFrame

      A DataFrame containing the entire dataset.

  • If desc=True, return_X_y=False, as_dataframe=False:

    Returns a tuple containing: (str, np.ndarray)

    • fdescrstr

      The description of the dataset.

    • datanp.ndarray

      A numpy array containing the entire dataset.

  • If desc=False, return_X_y=True, as_dataframe=True:

    Returns a tuple containing: (pd.DataFrame, pd.DataFrame)

    • Xpd.DataFrame

      A DataFrame with the features.

    • ypd.DataFrame

      A DataFrame with the class labels.

  • If desc=False, return_X_y=True, as_dataframe=False:

    Returns a tuple containing: (np.ndarray, np.ndarray)

    • Xnp.ndarray

      A numpy array with the features.

    • ynp.ndarray

      A numpy array with the class labels.

  • If desc=False, return_X_y=False, as_dataframe=True:

    Returns: pd.DataFrame

    • data_dfpd.DataFrame

      A DataFrame containing the entire dataset.

  • If desc=False, return_X_y=False, as_dataframe=False:

    Returns: np.ndarray

    • datanp.ndarray

      A numpy array containing the entire dataset.

References#

Street, W. N., Wolberg, W. H., & Mangasarian, O. L. (1993, July). Nuclear feature extraction for breast tumor diagnosis. In Biomedical image processing and biomedical visualization (Vol. 1905, pp. 861-870). SPIE.

Source#

Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B.

Examples#

from QuadratiK.datasets import load_wisconsin_breast_cancer_data
X, y = load_wisconsin_breast_cancer_data(return_X_y=True)
print(X.head())
   radius1  texture1  perimeter1   area1  smoothness1  compactness1  \
0    17.99     10.38      122.80  1001.0      0.11840       0.27760   
1    20.57     17.77      132.90  1326.0      0.08474       0.07864   
2    19.69     21.25      130.00  1203.0      0.10960       0.15990   
3    11.42     20.38       77.58   386.1      0.14250       0.28390   
4    20.29     14.34      135.10  1297.0      0.10030       0.13280   

   concavity1  concave_points1  symmetry1  fractal_dimension1  ...  radius3  \
0      0.3001          0.14710     0.2419             0.07871  ...    25.38   
1      0.0869          0.07017     0.1812             0.05667  ...    24.99   
2      0.1974          0.12790     0.2069             0.05999  ...    23.57   
3      0.2414          0.10520     0.2597             0.09744  ...    14.91   
4      0.1980          0.10430     0.1809             0.05883  ...    22.54   

   texture3  perimeter3   area3  smoothness3  compactness3  concavity3  \
0     17.33      184.60  2019.0       0.1622        0.6656      0.7119   
1     23.41      158.80  1956.0       0.1238        0.1866      0.2416   
2     25.53      152.50  1709.0       0.1444        0.4245      0.4504   
3     26.50       98.87   567.7       0.2098        0.8663      0.6869   
4     16.67      152.20  1575.0       0.1374        0.2050      0.4000   

   concave_points3  symmetry3  fractal_dimension3  
0           0.2654     0.4601             0.11890  
1           0.1860     0.2750             0.08902  
2           0.2430     0.3613             0.08758  
3           0.2575     0.6638             0.17300  
4           0.1625     0.2364             0.07678  

[5 rows x 30 columns]