The purpose of classification or discriminant analysis is to analyze the set of measurements based on observation to classify objects into one of several groups or classes. Based on the loss function, discriminant analysis is classified as linear discriminant analysis and quadratic discriminant analysis. These discriminant analyzes have some shortcomings that can be mitigated by regularization. In this article, we will therefore discuss how discriminant analysis can be regularized. Here are the topics to discuss.
Contents
- Brief description of LDA and QDA
- Regularized discriminant analysis
- Implementation in python
Let’s first briefly discuss linear and quadratic discriminant analysis.
Brief description of LDA and QDA
Linear discriminant analysis or discriminant function analysis is a commonly used dimensionality reduction technique for supervised classification problems. It uses a linear line to explain the relationship between the dependent variable and the explanatory variable. It is used to model differences in groups, that is, to separate two or more classes. It is used to project features from a higher dimensional space into a lower dimensional space. In LDA, there are some assumptions that are taken into account by the learner.
- The data has a Gaussian distribution
- There are no outliers in the data
- There is the same variance in the data set
Quadratic discriminant analysis is quite similar to linear discriminant analysis as both operate on Bayes’ probability theorem, except QDA does not assume that the data has equal mean and covariance for all classes. Therefore, the mean and the covariance must be calculated separately. It uses a quadratic curve to explain the relationship between the dependent variable and the independent variable. There are a few assumptions to consider before implementing QDA.
- The explanatory classes must have a different covariance.
- Response classes are normally distributed with a class-specific mean and a class-specific covariance.
Both discriminant analysis techniques are biased when generating the eigenvalues and the eigenvectors associated with them. This phenomenon of bias on the discriminant analysis has the net effect of exaggerating the importance of the low-variance subspace covered by the eigenvectors corresponding to the smallest eigenvalues of the sample. Thus, the majority of variance in estimating discriminant scores (which are used in data classification) is associated with directions of low sample variance in the measurement space.
Are you looking for a comprehensive repository of Python libraries used in data science, check here.
Regularized discriminant analysis
Since regularization techniques have been very effective in solving badly posed and badly posed inverse problems, so to alleviate this problem, the most reliable way is to use the regularization technique.
- An ill-posed problem arises when the number of parameters to be estimated is comparable to the number of observations.
- Also, ill-posed if this number exceeds the sample size.
In these cases, parameter estimates can be very unstable, giving rise to high variance. Regularization would help improve the estimates by moving them from their sample values to more physically valid values; this would be achieved by applying shrinkage to each class.
Although regularization reduces the variance associated with the sample-based estimate, it can also increase the bias. This process known as the bias-variance trade-off is usually controlled by one or more degree-of-belief parameters that determine the strength of the bias toward “plausible” values of the population parameters.
Whenever the sample size is not significantly greater than the dimension of the measurement space for any class, the quantitative discriminant analysis (QDA) is ill-posed. Typically, regularization is applied to a discriminant analysis by replacing the individual class sample covariance matrices with the mean weights assigned to the eigenvalues.
This applies a considerable degree of regularization by greatly reducing the number of parameters to be estimated. The regularization parameter () that is added to the equation of QDA and LDA takes a value between 0 and 1. It controls the degree to which the individual class covariance matrix estimates shrink to the pooled estimate. Values between these limits represent degrees of regularization.
Let’s see the implementation of this concept in python on a dataset.
Implementation in python
Let’s mitigate the shortcomings of linear discriminant analysis (LDA) in python and build a regularized discriminant analysis (RDA) learner.
Import the necessary libraries
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split,cross_val_score,RepeatedStratifiedKFold,GridSearchCV from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.metrics import ConfusionMatrixDisplay,precision_score,recall_score,confusion_matrix from imblearn.over_sampling import SMOTE
Read and pre-process data
df=pd.read_csv("/content/drive/MyDrive/Datasets/healthcare-dataset-stroke-data.csv") print("Records=",df.shape[0],"nFeatures=",df.shape[1]) df.head()
The contains a total of 12 features, including the dependent variable, few of which are categorical and need to be encoded before the fitting process. The data is healthcare related, it contains records for patients who have suffered a heart attack. Although the analysis found missing values in the “BMI” function, which is mitigated by removing the missing values since they cannot be synthesized.
Mitigation of missing values
df.isnull().sum()/len(df)*100

df.dropna(axis=0,inplace=True)
Create mannequins
df_pre=pd.get_dummies(df,drop_first=True)
Need to remove the first one to save the learner from the dummy variable trap.
Divide the data into training and testing for the learner’s training and testing phase. Use the standard ratio of 70:30 for splitting.
X=df_pre.drop(['stroke'],axis=1) y=df_pre['stroke'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
Build the ADL
LDA = LinearDiscriminantAnalysis() LDA.fit_transform(X_train,y_train) X_test['predictions']=LDA.predict(X_test) ConfusionMatrixDisplay.from_predictions(y_test, X_test['predictions']) plt.show() tn, fp, fn, tp = confusion_matrix(list(y_test), list(X_test['predictions']), labels=[0, 1]).ravel() print('True Positive :', tp) print('True Negative :', tn) print('False Positive :', fp) print('False Negative :', fn) print("Precision score",precision_score(y_test,X_test['predictions']))

The learner has 35% accuracy in predicting that the patient will have a heart attack in the future, which is pretty bad. Let’s analyze what is wrong with the learner.
Regularization and reduction of LDA
Starting with the analysis of the target variable.
df_pre['stroke'].value_counts()
0 4700 1 209
As observed by the number of values of the dependent variable, the data is unbalanced because the amount of 1 is about 4% of the total dependent variable. It must therefore be balanced for the learner to be a good predictor.
Balancing the dependent variable
There are two ways to synthesize the data: one by oversampling and the other by undersampling. In this scenario, oversampling is better, which synthesizes the lower category linear interpolation.
oversample = SMOTE() X_smote, y_smote = oversample.fit_resample(X, y) Xs_train, Xs_test, ys_train, ys_test = train_test_split(X_smote, y_smote, test_size=0.30, random_state=42)
The imbalance is smoothed out using the Synthetic Minority Oversampling (SMOTE) technique but that won’t help much we also need to regularize the leaner using the GridSearchCV which will find the best parameters for the learner and add a penalty to the solver which will shrink the eigenvalue i.e. the regularization.
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42) grid = dict() grid['solver'] = ['eigen','lsqr'] grid['shrinkage'] = ['auto',0.2,1,0.3,0.5] search = GridSearchCV(LDA, grid, scoring='precision', cv=cv, n_jobs=-1) results = search.fit(Xs_train, ys_train) print('Precision: %.3f' % results.best_score_) print('Configuration:',results.best_params_)
Precision: 0.873 Configuration: {'shrinkage': 'auto', 'solver': 'eigen'}
Accuracy score increased from 35% to 87% using learner regularization and shrinkage and best solver for linear discriminant analysis is ‘eigen’ and shrinkage method is ‘auto’ which uses the Ledoit-Wolf lemma to find the withdrawal penalty.
Build the RDA
LDA_final=LinearDiscriminantAnalysis(shrinkage="auto", solver="eigen") LDA_final.fit_transform(Xs_train,ys_train) Xs_test['predictions']=LDA_final.predict(Xs_test) ConfusionMatrixDisplay.from_predictions(ys_test, Xs_test['predictions']) plt.show() tn, fp, fn, tp = confusion_matrix(list(ys_test), list(Xs_test['predictions']), labels=[0, 1]).ravel() print('True Positive :', tp) print('True Negative :', tn) print('False Positive :', fp) print('False Negative :', fn) print("Precision score",np.round(precision_score(ys_test,Xs_test['predictions']),3))


A regression graph for the end learner with a linear relationship line explaining the classification. This learner can be further improved by decreasing false negatives as they are type 2 errors, leaving it up to you.
final verdict
The regularization method applied here has the potential to (sometimes dramatically) increase the power of discriminant analysis. With a practical implementation of this concept in this article, we could understand regularized discriminant analysis (RDA).