Clear Filters
Clear Filters

What to do when dataset for linkage/Dendogram functions contains NANs?

8 views (last 30 days)
Hi all,
I am new to the subject of cluster analysis and try to use the linkage and then dendrogram function while feeding it with an array that contains a few NANs. I receive an 3-by-Y array from linkage whereby the third column is filled with NANs only. Dendogram then crushes.
Any advice how to deal with it?
Cheers,
Jan

Accepted Answer

Sandeep
Sandeep on 31 Aug 2023
Hi Jan Skerswetat,
When dealing with missing values (NANs) in cluster analysis, there are a few approaches you can consider:
  1. Imputation: Fill in the missing values with estimated values. This can be done using various imputation techniques such as mean imputation, median imputation, or regression imputation. However, it's important to note that imputation may introduce some bias into the analysis.
  2. Deletion: Remove the rows or columns with missing values from the dataset. This approach is suitable when the missing values are relatively few and randomly distributed. However, it may lead to a loss of information if the missing values are not missing at random.
  3. Distance-based imputation: Instead of imputing the missing values directly, you can calculate the pairwise distances between observations and use these distances to estimate the missing values. This approach can be useful if you have a good understanding of the underlying data structure.
Before applying any of these approaches, it's important to understand the nature and pattern of missing values in your dataset. Additionally, consider the potential impact of missing values on your cluster analysis results and whether imputation or deletion is more appropriate for your specific case.
You can refer to the following page for some commonly used imputation functions:
Hope you find it helpful.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!