https://arxiv.org/api/87jQXWw1RcvDVz2IjEH/Sgh507c2026-06-21T19:49:36Z1296772015http://arxiv.org/abs/2602.17530v1Provably Explaining Neural Additive Models2026-02-19T16:42:29ZDespite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a cardinally-minimal subset of input features which by itself is provably sufficient to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case exponential number of verification queries in the number of input features, each of which is NP-hard.
In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can efficiently generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries
in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component.
Our algorithm not only makes the task of obtaining cardinally-minimal explanations feasible, but even outperforms existing algorithms designed to find the relaxed variant of subset-minimal explanations - which may be larger and less informative but easier to compute - despite our algorithm solving a much more difficult task.
Our experiments demonstrate that, compared to previous algorithms, our approach provides provably smaller explanations than existing works and substantially reduces the computation time. Moreover, we show that our generated provable explanations offer benefits that are unattainable by standard sampling-based techniques typically used to interpret NAMs.2026-02-19T16:42:29ZTo appear in ICLR 2026Shahaf BassanYizhak Yisrael ElboherTobias LadnerVolkan ŞahinJan KretinskyMatthias AlthoffGuy Katzhttp://arxiv.org/abs/2510.01931v4Minimum Selective Subset on Unit Disk Graphs and Circle Graphs2026-02-19T13:07:04ZIn a connected simple graph G = (V(G),E(G)), each vertex is assigned one of c colors, where V(G) can be written as a union of a total of c subsets V_{1},...,V_{c} and V_{i} denotes the set of vertices of color i. A subset S of V(G) is called a selective subset if, for every i, every vertex v in V_{i} has at least one nearest neighbor in $S \cup (V(G) \setminus V_{i})$ that also lies in V_{i}. The Minimum Selective Subset (MSS) problem asks for a selective subset of minimum size.
We show that the MSS problem is log-APX-hard on general graphs, even when c=2. As a consequence, the problem does not admit a polynomial-time approximation scheme (PTAS) unless P = NP. On the positive side, we present a PTAS for unit disk graphs, which works without requiring a geometric representation and applies for arbitrary c. We further prove that MSS remains NP-complete in unit disk graphs for arbitrary c. In addition, we show that the MSS problem is log-APX-hard on circle graphs, even when c=2.2025-10-02T11:48:13ZThis work has been accepted in the conference CALDAM 2026Bubai Mannahttp://arxiv.org/abs/2602.17309v1Some Remarks on Marginal Code Languages2026-02-19T12:19:43ZA prefix code L satisfies the condition that no word of L is a proper prefix of another word of L. Recently, Ko, Han and Salomaa relaxed this condition by allowing a word of L to be a proper prefix of at most k words of L, for some `margin' k, introducing thus the class of k-prefix-free languages, as well as the similar classes of k-suffix-free and k-infix-free languages. Here we unify the definitions of these three classes of languages into one uniform definition in two ways: via the method of partial orders and via the method of transducers. Thus, for any known class of code-related languages definable via the transducer method, one gets a marginal version of that class. Building on the techniques of Ko, Han and Salomaa, we discuss the \emph{uniform} satisfaction and maximality problems for marginal classes of languages.2026-02-19T12:19:43ZStavros Konstantinidishttp://arxiv.org/abs/2502.18942v2Finding Minimum Matching Cuts in $H$-free Graphs2026-02-19T09:17:15ZA matching cut is a matching that is also an edge cut. In the problem Minimum Matching Cut, we ask for a matching cut with the minimum number of edges in the matching. We investigate the differences in complexity between Minimum Matching Cut, its counterpart Maximum Matching Cut, and the decision problem Matching Cut. Our polynomial-time algorithms for $P_8$-free, $S_{1,1,3}$-free and $(P_6 + P_4)$-free graphs extend the cases where Minimum Matching Cut and Maximum Matching Cut are known to differ in complexity. In addition, they solve open cases for the well-studied problem Matching Cut. The NP-hardness proof for $3P_3$-free graphs implies that Minimum Matching Cut and Matching Cut, which is polynomial-time solvable even for $sP_3$-free graphs, for any $s \geq 1$, differ in complexity on certain graph classes. Further, we give complexity dichotomies for both general and bipartite graphs of bounded radius and diameter.2025-02-26T08:45:55ZFelicia LuckeJoseph MarchandJannik Olbrichhttp://arxiv.org/abs/2601.20775v2Active Learning for Decision Trees with Provable Guarantees2026-02-18T19:30:22ZThis paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1+ε)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a polylogarithmic number of label queries in the dataset size, under the stated assumptions. Finally, we establish a label complexity lower bound, showing our algorithm's dependence on the error tolerance $ε$ is close to optimal.2026-01-28T17:02:25Z10 pages, 43 pages with appendix, ICLR 2026, Conference URL: https://openreview.net/forum?id=NOkjJPJIitArshia Soltani MoakharTanapoom LaoaronFaraz GhahremaniKiarash BanihashemMohammadTaghi Hajiaghayihttp://arxiv.org/abs/2602.16240v1Submodular Maximization under Supermodular Constraint: Greedy Guarantees2026-02-18T07:33:51ZMotivated by a wide range of applications in data mining and machine learning, we consider the problem of maximizing a submodular function subject to supermodular cost constraints. In contrast to the well-understood setting of cardinality and matroid constraints, where greedy algorithms admit strong guarantees, the supermodular constraint regime remains poorly understood -- guarantees for greedy methods and other efficient algorithmic paradigms are largely open. We study this family of fundamental optimization problems under an upper-bound constraint on a supermodular cost function with curvature parameter $γ$. Our notion of supermodular curvature is less restrictive than prior definitions, substantially expanding the class of admissible cost functions. We show that our greedy algorithm that iteratively includes elements maximizing the ratio of the objective and constraint functions, achieves a $\left(1 - e^{-(1-γ)}\right)$-approximation before stopping. We prove that this approximation is indeed tight for this algorithm. Further, if the objective function has a submodular curvature $c$, then we show that the bound further improves to $\left(1 - (1- (1-c)(1-γ))^{1/(1-c)}\right)$, which can be further improved by continuing to violate the constraint. Finally, we show that the Greedy-Ratio-Marginal in conjunction with binary search leads to a bicriteria approximation for the dual problem -- minimizing a supermodular function under a lower bound constraint on a submodular function. We conduct a number of experiments on a simulation of LLM agents debating over multiple rounds -- the task is to select a subset of agents to maximize correctly answered questions. Our algorithm outperforms all other greedy heuristics, and on smaller problems, it achieves the same performance as the optimal set found by exhaustive search.2026-02-18T07:33:51Z16 pages, 9 figuresAjitesh SrivastavaShanghua Tenghttp://arxiv.org/abs/2509.14461v3Learning depth-3 circuits via quantum agnostic boosting2026-02-17T18:38:29ZWe initiate the study of quantum agnostic learning of phase states with respect to a function class $\mathsf{C}\subseteq \{c:\{0,1\}^n\rightarrow \{0,1\}\}$: given copies of an unknown $n$-qubit state $|ψ\rangle$ which has fidelity $\textsf{opt}$ with a phase state $|φ_c\rangle=\frac{1}{\sqrt{2^n}}\sum_{x\in \{0,1\}^n}(-1)^{c(x)}|x\rangle$ for some $c\in \mathsf{C}$, output $|φ\rangle$ which has fidelity $|\langle φ| ψ\rangle|^2 \geq \textsf{opt}-\varepsilon$. To this end, we give agnostic learning protocols for the following classes: (i) Size-$t$ decision trees which runs in time $\textsf{poly}(n,t,1/\varepsilon)$. This also implies $k$-juntas can be agnostically learned in time $\textsf{poly}(n,2^k,1/\varepsilon)$. (ii) $s$-term DNF formulas in time $\textsf{poly}(n,(s/\varepsilon)^{\log \log (s/\varepsilon) \cdot \log(1/\varepsilon)})$.
Our main technical contribution is a quantum agnostic boosting protocol which converts a weak agnostic learner, which outputs a parity state $|φ\rangle$ such that $|\langle φ|ψ\rangle|^2\geq \textsf{opt}/\textsf{poly}(n)$, into a strong learner which outputs a superposition of parity states $|φ'\rangle$ such that $|\langle φ'|ψ\rangle|^2\geq \textsf{opt} - \varepsilon$.
Using quantum agnostic boosting, we obtain a $n^{O(\log(n/\varepsilon) \cdot \log \log n)}$-time algorithm for $\varepsilon$-learning $\textsf{poly}(n)$-sized depth-$3$ circuits (consisting of $\textsf{AND}$, $\textsf{OR}$, $\textsf{NOT}$ gates) in the uniform $\textsf{PAC}$ model given quantum examples. Classically, obtaining an algorithm with a similar complexity has been an open question in the $\textsf{PAC}$ model and our work answers this given quantum examples.2025-09-17T22:28:29Z53 pages; Typos fixed for depth-3 circuits resultSrinivasan ArunachalamArkopal DuttAlexandru GheorghiuMichael de Oliveirahttp://arxiv.org/abs/2510.19620v2On Minimal Achievable Quotas in Multiwinner Voting2026-02-17T16:23:02ZJustified representation (JR) and extended justified representation (EJR) are well-established proportionality axioms in approval-based multiwinner voting. Both axioms are always satisfiable, but they rely on a fixed quota (typically Hare or Droop), with the Droop quota being the smallest one that guarantees existence across all instances. With this in mind, we take a step beyond the fixed-quota paradigm by studying instance-dependent proportionality notions. More specifically, we minimize the quota requirements for JR and EJR using the parameter $α$. We demonstrate that all commonly studied voting rules can have an additive gap to the optimum of $\frac{k^2}{(k+1)^2}$. Moreover, we examine the computational aspects of our instance-dependent quota and prove that determining the optimal value of $α$ for a given approval profile that allows some committee to satisfy $α$-JR is NP-complete. To address this, we introduce an integer linear programming (ILP) formulation for computing committees that satisfy $α$-JR, and we provide positive computational results in the voter interval (VI) and candidate interval (CI) domains.2025-10-22T14:18:45ZPatrick BeckerFabian Frankhttp://arxiv.org/abs/2504.15206v2How Global Calibration Strengthens Multiaccuracy2026-02-17T10:01:02ZMultiaccuracy and multicalibration are multigroup fairness notions for prediction that have found numerous applications in learning and computational complexity. They can be achieved from a single learning primitive: weak agnostic learning. Here we investigate the power of multiaccuracy as a learning primitive, both with and without the additional assumption of calibration. We find that multiaccuracy in itself is rather weak, but that the addition of global calibration (this notion is called calibrated multiaccuracy) boosts its power substantially, enough to recover implications that were previously known only assuming the stronger notion of multicalibration.
We give evidence that multiaccuracy might not be as powerful as standard weak agnostic learning, by showing that there is no way to post-process a multiaccurate predictor to get a weak learner, even assuming the best hypothesis has correlation $1/2$. Rather, we show that it yields a restricted form of weak agnostic learning, which requires some concept in the class to have correlation greater than $1/2$ with the labels. However, by also requiring the predictor to be calibrated, we recover not just weak, but strong agnostic learning.
A similar picture emerges when we consider the derivation of hardcore measures from predictors satisfying multigroup fairness notions. On the one hand, while multiaccuracy only yields hardcore measures of density half the optimal, we show that (a weighted version of) calibrated multiaccuracy achieves optimal density.
Our results yield new insights into the complementary roles played by multiaccuracy and calibration in each setting. They shed light on why multiaccuracy and global calibration, although not particularly powerful by themselves, together yield considerably stronger notions.2025-04-21T16:22:44ZPresented at FOCS 2025Sílvia CasacubertaParikshit GopalanVarun KanadeOmer Reingoldhttp://arxiv.org/abs/2511.09763v2Is nasty noise actually harder than malicious noise?2026-02-16T22:16:46ZWe consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrupt an adversarially chosen subset of examples given to the learner.
We consider both the distribution-independent and fixed-distribution settings. Our main results highlight a dramatic difference between these two settings: For distribution-independent learning, we prove a strong equivalence between the two noise models: If a class ${\cal C}$ of functions is efficiently learnable in the presence of $η$-rate malicious noise, then it is also efficiently learnable in the presence of $η$-rate nasty noise. In sharp contrast, for the fixed-distribution setting we show an arbitrarily large separation: Under a standard cryptographic assumption, for any arbitrarily large value $r$ there exists a concept class for which there is a ratio of $r$ between the rate $η_{malicious}$ of malicious noise that polynomial-time learning algorithms can tolerate, versus the rate $η_{nasty}$ of nasty noise that such learning algorithms can tolerate.
To offset the negative result for the fixed-distribution setting, we define a broad and natural class of algorithms, namely those that ignore contradictory examples (ICE). We show that for these algorithms, malicious noise and nasty noise are equivalent up to a factor of two in the noise rate: Any efficient ICE learner that succeeds with $η$-rate malicious noise can be converted to an efficient learner that succeeds with $η/2$-rate nasty noise. We further show that the above factor of two is necessary, again under a standard cryptographic assumption.2025-11-12T21:56:15ZSODA 2026Guy BlancYizhi HuangTal MalkinRocco A. Servediohttp://arxiv.org/abs/2602.15180v1Efficient quantum circuits for high-dimensional representations of SU(n) and Ramanujan quantum expanders2026-02-16T20:38:26ZWe present efficient quantum circuits that implement high-dimensional unitary irreducible representations (irreps) of $SU(n)$, where $n \ge 2$ is constant. For dimension $N$ and error $ε$, the number of quantum gates in our circuits is polynomial in $\log(N)$ and $\log(1/ε)$. Our construction relies on the Jordan-Schwinger representation, which allows us to realize irreps of $SU(n)$ in the Hilbert space of $n$ quantum harmonic oscillators. Together with a recent efficient quantum Hermite transform, which allows us to map the computational basis states to the eigenstates of the quantum harmonic oscillator, this allows us to implement these irreps efficiently. Our quantum circuits can be used to construct explicit Ramanujan quantum expanders, a longstanding open problem. They can also be used to fast-forward the evolution of certain quantum systems.2026-02-16T20:38:26Z39 pages, 2 figuresVishnu IyerSiddhartha JainStephen JordanRolando Sommahttp://arxiv.org/abs/2602.14915v1The antiferromagnetic Ising model beyond line graphs2026-02-16T16:50:16ZBoth the antiferromagnetic Ising model and the hard-core model could be said to be tractable on line graphs of bounded degree. For example, Glauber dynamics is rapidly mixing in both cases. In the case of the hard-core model, we know that tractability extends further, to claw-free graphs and somewhat beyond. In contrast, it is shown here that the corresponding extensions are not possible in the case of the antiferromagnetic Ising model.2026-02-16T16:50:16ZMark Jerrumhttp://arxiv.org/abs/2602.14722v1Geometric Characterization of Context-Free Intersections via the Inner Segment Dichotomy2026-02-16T13:08:22ZThe intersection of two context-free languages is not generally context-free, but no geometric criterion has characterized when it remains so. The crossing gap (max(i'-i, j'-j) for two crossing push-pop arcs) is the natural candidate. We refute this: we exhibit CFLs whose intersection is CFL despite unbounded-gap crossings. The governing quantity is the inner segment measure: for crossing arcs inducing a decomposition w = P1 P2 P3 P4, it is max(|P2|,|P3|), the length of the longer inner segment between interleaved crossing endpoints. We prove a dichotomy for this measure: bounded inner segments imply context-freeness via a finite buffer construction; growing inner segments with pump-sensitive linkages imply non-context-freeness. The inner segment concept applies to all CFL intersections; the strictness of the resulting characterization depends on the language class. For block-counting CFLs (languages requiring equality among designated pairs of block lengths), the dichotomy is complete: the intersection is CFL if and only if the combined arcs are jointly well-nested. For general CFLs, the CFL direction is unconditional; the non-CFL direction requires pump-sensitive linkages whose necessity is the main open problem, reducing the general CFL intersection problem to a specific property of pump-sensitive decompositions.2026-02-16T13:08:22Z44 pages, 4 figures, 1 tableJorge Miguel Silvahttp://arxiv.org/abs/2506.23363v2Parameterized Critical Node Cut Revisited2026-02-16T10:18:00ZWe study how to sparsify connectivity in graphs under a tight deletion budget. Given a graph $G$ and integers $k,x \ge 0$, Critical Node Cut (CNC) asks whether we can delete at most $k$ vertices so that the number of remaining unordered pairs of connected vertices is at most $x$. CNC generalizes Vertex Cover (the case $x=0$) and models tasks in network design, epidemiology, and social network analysis. We comprehensively map the structural parameterized complexity landscape for Critical Node Cut. First, we prove W[1]-hardness for the combined parameter $k + \mathrm{fes} + Δ+ \mathrm{pw}$, where $\mathrm{fes}$ is the feedback edge set number, $Δ$ the maximum degree, and $\mathrm{pw}$ the pathwidth of the input graph respectively. This significantly improves over the known W[1]-hardness for $k+\mathrm{tw}$, where $\mathrm{tw}$ denotes the treewidth, and is tight in that tree-depth together with maximum degree trivially yields FPT. Second, we give new positive results. Specifically, we identify three structural parameters--max-leaf number, vertex integrity, and modular-width--that render the problem fixed-parameter tractable, and develop a polynomial-time algorithm for graphs of constant clique-width. Third, leveraging a technique introduced by Lampis~[ICALP '14], we develop an FPT approximation scheme that, for any $\varepsilon > 0$, computes a $(1+\varepsilon)$-approximate solution in time $(\mathrm{tw} / \varepsilon)^{\mathcal{O}(\mathrm{tw})} n^{\mathcal{O}(1)}$, where $\mathrm{tw}$ denotes the treewidth of the input graph. Finally, we show that CNC admits no polynomial kernel when parameterized by vertex cover number, unless standard assumptions fail. Together, these results substantially sharpen the known complexity landscape for CNC.2025-06-29T18:51:21ZDušan KnopNikolaos MelissinosManolis Vasilakishttp://arxiv.org/abs/2602.14379v1Fine-Grained Complexity for Quantum Problems from Size-Preserving Circuit-to-Hamiltonian Constructions2026-02-16T01:11:55ZThe local Hamiltonian (LH) problem is the canonical $\mathsf{QMA}$-complete problem introduced by Kitaev. In this paper, we show its hardness in a very strong sense: we show that the 3-local Hamiltonian problem on $n$ qubits cannot be solved classically in time $O(2^{(1-\varepsilon)n})$ for any $\varepsilon>0$ under the Strong Exponential-Time Hypothesis (SETH), and cannot be solved quantumly in time $O(2^{(1-\varepsilon)n/2})$ for any $\varepsilon>0$ under the Quantum Strong Exponential-Time Hypothesis (QSETH). These lower bounds give evidence that the currently known classical and quantum algorithms for LH cannot be significantly improved.
Furthermore, we are able to demonstrate fine-grained complexity lower bounds for approximating the quantum partition function (QPF) with an arbitrary constant relative error. Approximating QPF with relative error is known to be equivalent to approximately counting the dimension of the solution subspace of $\mathsf{QMA}$ problems. We show the SETH and QSETH hardness to estimate QPF with constant relative error. We then provide a quantum algorithm that runs in $O(\sqrt{2^n})$ time for an arbitrary $1/\mathrm{poly}(n)$ relative error, matching our lower bounds and improving the state-of-the-art algorithm by Bravyi, Chowdhury, Gosset, and Wocjan (Nature Physics 2022) in the low-temperature regime.
To prove our fine-grained lower bounds, we introduce the first size-preserving circuit-to-Hamiltonian construction that encodes the computation of a $T$-time quantum circuit acting on $N$ qubits into a $(d+1)$-local Hamiltonian acting on $N+O(T^{1/d})$ qubits. This improves the standard construction based on the unary clock, which uses $N+O(T)$ qubits.2026-02-16T01:11:55Z37 pages. Supersedes arXiv:2510.07495Nai-Hui ChiaAtsuya HasegawaFrançois Le GallYu-Ching Shen