|

A deep learning technique for imputing missing healthcare data.

Researchers

Journal

Modalities

Models

Abstract

Missing data is a frequent occurrence in medical and health datasets. The analysis of datasets with missing data can lead to loss in statistical power or biased results. We address this issue with a novel deep learning technique to impute missing values in health data. Our method extends upon an autoencoder to derive a deep learning architecture that can learn the hidden representations of data even when data is perturbed by missing values (noise). Our model is constructed with overcomplete representation and trained with denoising regularization. This allows the latent/hidden layers of our model to effectively extract the relationships between different variables; these relationships are then used to reconstruct missing values. Our contributions include a new loss function designed to avoid local optima, and this helps the model to learn the real distribution of variables in the dataset. We evaluate our method in comparison with other well-established imputation strategies (mean, median imputation, SVD, KNN, matrix factorization and soft impute) on 48,350 Linked Birth/Infant Death Cohort Data records. Our experiments demonstrate that our method achieved lower imputation mean squared error (MSE=0.00988) compared with other imputation methods (with MSE ranging from 0.02 to 0.08). When assessing the imputation quality using the imputed data for prediction tasks, our experiments show that the data imputed by our method yielded better results (F1=70.37%) compared with other imputation methods (ranging from 66 to 69%).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *