In a multiple instance learning (MIL) scenario, the outcome annotation is usually only reported at the bag level. Considering simplicity and convergence criteria, the lazy learning approach, i.e., k-nearest neighbors (kNN), plays a crucial role in predicting bag labels in the MIL domain. Notably, two variations of the kNN algorithm tailored to the MIL framework have been introduced, namely Bayesian-kNN (BkNN) and Citation-kNN (CkNN). These adaptations leverage the Hausdorff metric along with Bayesian or citation approaches. However, neither BkNN nor CkNN explicitly integrates feature selection methodologies, and when irrelevant and redundant features are present, the model’s generalization decreases. In the single-instance learning scenario, to overcome this limitation of kNN, a feature weighting algorithm named Neighborhood Component Feature Selection (NCFS) is often applied to find the optimal degree of influence of each feature. To address the significant gap existing in the literature, we introduce the NCFS method for the MIL framework. The proposed methodologies, i.e. NCFS-BkNN, NCFSCkNN, and NCFS-Bayesian Citation-kNN (NCFS-BCkNN), learn the optimal features weighting vector by minimizing the regularized leaveone-out error of the training bags. Hence, the prediction of unseen bags is computed by combining the Bayesian and citation approaches based on the minimum optimally weighted Hausdorff distance. Through experiments with various benchmark MIL datasets in the biomedical informatics and affective computing fields, we provide statistical evidence that the proposed methods outperform state-of-the-art MIL algorithms that do not employ any a priori feature weighting strategy.

Neighborhood Component Feature Selection for Multiple Instance Learning Paradigm

Luca Romeo
2024-01-01

Abstract

In a multiple instance learning (MIL) scenario, the outcome annotation is usually only reported at the bag level. Considering simplicity and convergence criteria, the lazy learning approach, i.e., k-nearest neighbors (kNN), plays a crucial role in predicting bag labels in the MIL domain. Notably, two variations of the kNN algorithm tailored to the MIL framework have been introduced, namely Bayesian-kNN (BkNN) and Citation-kNN (CkNN). These adaptations leverage the Hausdorff metric along with Bayesian or citation approaches. However, neither BkNN nor CkNN explicitly integrates feature selection methodologies, and when irrelevant and redundant features are present, the model’s generalization decreases. In the single-instance learning scenario, to overcome this limitation of kNN, a feature weighting algorithm named Neighborhood Component Feature Selection (NCFS) is often applied to find the optimal degree of influence of each feature. To address the significant gap existing in the literature, we introduce the NCFS method for the MIL framework. The proposed methodologies, i.e. NCFS-BkNN, NCFSCkNN, and NCFS-Bayesian Citation-kNN (NCFS-BCkNN), learn the optimal features weighting vector by minimizing the regularized leaveone-out error of the training bags. Hence, the prediction of unseen bags is computed by combining the Bayesian and citation approaches based on the minimum optimally weighted Hausdorff distance. Through experiments with various benchmark MIL datasets in the biomedical informatics and affective computing fields, we provide statistical evidence that the proposed methods outperform state-of-the-art MIL algorithms that do not employ any a priori feature weighting strategy.
2024
978-3-031-70341-6
File in questo prodotto:
File Dimensione Formato  
MIL_NCFS___ECML2024___Camera_ready-2.pdf

solo utenti autorizzati

Descrizione: Camera Ready Paper
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Copyright dell'editore
Dimensione 568.12 kB
Formato Adobe PDF
568.12 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11393/342930
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact