Statistical comparison in empirical computer science with minimal computation usage
Résumé
The replicability of computational experiments remains a fundamental question. For example, the machine learning community has recently become aware of the poor replicability of many experimental studies that aim at comparing the performance of various algorithms. Due to computational costs, it is often necessary to use methods that require as few computations as possible to obtain a replicable conclusion. The conclusion of the comparison should also be replicable which calls for appropriate statistical tests. AdaStop is a recently introduced statistical test based on multiple group sequential tests. AdaStop adapts the number of executions of each experiment to stop as early as possible while ensuring that enough information is available to distinguish algorithms that perform better than the others in a statistically significant way. AdaStop has been initially exemplified on reinforcement learning tasks. In this short paper, we consider 3 case studies to investigate the use AdaStop beyond its original field of application, and demonstrate that it is a test that may be used on a wide range of application domains.
Mots clés
Domaines
Autres [stat.ML]Origine | Fichiers produits par l'(les) auteur(s) |
---|