On 2-Clubs in Graph-Based Data Clustering: Theory and Algorithm Engineering
Aleksander Figiel, Anne-Sophie Himmel, André Nichterlein, and Rolf Niedermeier
Vol. 25, no. 1, pp. 521-547, 2021. Regular paper.
Abstract Editing a graph into a disjoint union of clusters is a standard optimization task in graph-based data clustering. Here, complementing classical work where the clusters shall be cliques, we focus on clusters that shall be 2-clubs, that is, subgraphs of diameter at most two. This naturally leads to the two NP-hard problems $\rm{2-C \small{LUB}}$ $\rm{C{\small LUSTER}}$ $\rm{E{\small DITING}}$ (the editing operations are edge insertions and edge deletions) and $\rm{2-C \small{LUB}}$ $\rm{C{\small LUSTER}}$ $\rm{V{\small ERTEX}}$ $\rm{D{\small ELETION}}$ (the editing operations are vertex deletions). Answering an open question from the literature, we show that $\rm{2-C \small{LUB}}$ $\rm{C{\small LUSTER}}$ $\rm{E{\small DITING}}$ is W[2]-hard with respect to the number of edge modifications, thus contrasting the fixed-parameter tractability result for the classical $\rm{C{\small LUSTER}}$ $\rm{E{\small DITING}}$ problem (considering cliques instead of 2-clubs). Then, focusing on $\rm{2-C \small{LUB}}$ $\rm{C{\small LUSTER}}$ $\rm{V{\small ERTEX}}$ $\rm{D{\small ELETION}}$, which is easily seen to be fixed-parameter tractable, we show that under standard complexity-theoretic assumptions it does not have a polynomial-size problem kernel when parameterized by the number of vertex deletions. Nevertheless, we develop several effective data reduction and pruning rules, resulting in a competitive solver, outperforming a standard ILP formulation in most instances of an established biological test data set.

 This work is licensed under the terms of the CC-BY license.
Submitted: February 2021.
Reviewed: August 2021.
Revised: August 2021.
Accepted: October 2021.
Final: October 2021.
Published: October 2021.
Communicated by Martin Nöllenburg
article (PDF)