Dichotomy for reachability and label synchronisation in large version-controlled repositories - Algorithmique Discrète et Applications
Pré-Publication, Document De Travail Année : 2024

Dichotomy for reachability and label synchronisation in large version-controlled repositories

Résumé

DAGs are commonly used for modelling successive versions of a project in systems such as Git or Mercurial, where each version (node) is based on one or two former versions (with arcs from each node to its former versions). Some specificities make these graphs unique: they grow in "append-only" mode, i.e. nodes are created with in-degree 0, so the out-degree of existing nodes remains constant. Also, several copies of the graph grow in parallel, with synchronisation steps when an agent sends their new nodes to other agents.

For the largest graphs, which can reach up to several million revisions, sub-linear algorithms become necessary for recurring tasks, creating a need for a pre-computed index that can grow and be shared along with the graph nodes. In this paper we focus on the reachability problem (deciding if a path exists between two nodes), as well as label discovery: if each agent has a label attached to each node, determine the set of nodes with distinct labels between two agents. Algorithms are to be designed not only in a dynamic setting, i.e. allowing for node insertions, but also with the multi-agent constraint that different agents, having received different nodes and common nodes in different orders, may need to share large sets of nodes among themselves. Very efficient reachability indices exist in static DAGs, but they often do not apply well to rapidly-growing graphs in a multi-agent model. We present a framework allowing for fast algorithms for both tasks, using a compact index (with a few bytes per node in practice) that can easily be updated when the graph grows and nodes are shared. At the core of our approach lies the notion of ranges, i.e. specific sets of nodes that admit a concise representation and can be partitioned (split) into smaller ranges.

Fichier principal
Vignette du fichier
Dichotomy for reachability and label synchronisation in large version-controlled repositories.pdf (1.34 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-04778558 , version 1 (12-11-2024)

Licence

Identifiants

  • HAL Id : hal-04778558 , version 1

Citer

Laurent Bulteau, Pierre-Yves David, Florian Horn, Euxane Tran-Girard. Dichotomy for reachability and label synchronisation in large version-controlled repositories. 2024. ⟨hal-04778558⟩
17 Consultations
20 Téléchargements

Partager

More