Efficient, traceable, and numerically error-free implementation of the MMS voting rule

2026-07-16T16:10:40Z

We propose an alternative algorithm to compute the MMS voting rule. Instead of using linear programming, in this new algorithm the maximin support value of a committee is computed using a sequence of maximum flow problems.

Data-Driven Games with Coherent Risk Measures

2026-07-16T13:41:39Z

We introduce Coherent Utility Measure Games (CUMGs) in which players' uncertainty about the distribution of payoffs is modeled using coherent utility (risk) measures. Such measures, including mean semideviation risk and conditional value-at-risk, allow for interpretable notions of players' risk aversion while retaining formal equivalence to distributionally robust games. While CUMGs, which are a subclass of distributionally robust games, are continuous games in general, they can be viewed as finite games ``lifted'' to the mixed strategy space, which illustrates computational challenges. Prior results extend to guarantee equilibrium existence in data-driven CUMGs. We show that the computation of approximate equilibria for CUMGs parameterized by several risk measures lies in PPAD. Consequently, we obtain finite multilinear complementarity programs for the computation of equilibrium for these games, which grow with $K$, the number of data samples. Unlike standard games, these programs are not linear in a two-player setting. Next, we establish the existence of approximate equilibria in finite data-driven CUMGs with small supports in the pure actions for the players, together with sparse data subsamples that guide the search for such equilibria. We also develop a stochastic first-order approach for smoothed CUMGs using data mini-batches, with bounds linking first-order error to approximate equilibrium. We include numerical experiments comparing the sparse-support search algorithm with complementarity-program solvers.

PAC Learning in Turn-Based Stochastic Games with Reachability Objectives: A Decentralized Private Approach via Expected Conditional Distance

2026-07-16T11:48:58Z

Reachability is the most fundamental logical objective, yet it is notoriously difficult to learn in reinforcement learning settings: even for Markov decision processes, PAC learning of reachability is impossible without additional assumptions. This difficulty also holds in turn-based stochastic games (TBSGs), where two adversarial players interact on a finite state space. In this work, we consider turn-based stochastic games with reachability objectives. For such settings, adversarial learning, in which players are adversarial even in the learning phase, is impossible. Therefore, the goal is to consider learning, in which both players learn the unknown model together. In this spirit, previous literature on PAC learning in TBSGs considers (a)~public information shared by both players; and (b)~centralized learning, which means that players share the same learning algorithm. In this work, our contribution is two-fold. First, we relax these strong assumptions and ensure learning: (i)~with private information not shared with the other player; and (ii)~decentralized learning where the players do not share the same learning algorithm. To the best of our knowledge, this work is the first positive result for decentralized and private information learning of TBSGs with reachability objectives. Second, we introduce a game-theoretic generalization of the Expected Conditional Distance (ECD) parameter, which measures the expected length of reaching the target set. We establish a polynomial-sample complexity bound with respect to the number of states, actions, ECD parameter, and inverses of error tolerance and failure probability.

Perpetual Fully-Online Approximate Fairness

2026-07-16T10:20:56Z

Many decision processes run for a long and unknown duration: in each round new requests arrive, an irrevocable choice must be made immediately, and the system is judged by ongoing fairness requirements. Examples include food banks allocating donations, computing systems repeatedly scheduling scarce resources across users, and institutions making repeated decisions while remaining fair over time. We propose a general approach based on \emph{deficits}, which measure how far the current outcome is from satisfying each fairness requirement. The goal is to keep all deficits small at each time step, without knowing the horizon or future agent valuations. This viewpoint also highlights a natural modeling question for long-running systems: how much of the past should be counted when fairness is evaluated? We first study the full-history model, where all past rounds count equally. We propose an efficient fully-online rule. For $n$ agents, we prove anytime guarantees: after any $t$ rounds, all requirements remain satisfied up to a slack of order $\tilde O(\sqrt{t/n})$. We instantiate the rule for online allocation of indivisible goods, yielding natural relaxations of proportionality and envy-free, and for online public decision-making. We show that this slack is tight even for weak proportionality. For unrestricted classical $\mathrm{EF}c$, the exact worst-case parameter at horizon $T$ is $\lceil T/n\rceil$. We then study discounted-memory fairness, where older deficits carry smaller weight. The same fully-online rule applies to these discounted deficits, and the resulting threshold is controlled by the discount function. In particular, the time dependence is never worse than the full-history $\sqrt t$ dependence. Overall, our results show that memory is a central part of perpetual fairness. The question is not only which requirement to impose, but also how the system should count past unfairness.

Game Theory in Social Media: A Stackelberg Model of Collaboration, Conflict, and Algorithmic Incentives

2026-07-16T06:49:38Z

This research models the social media content creation and the choices that creators make as a Stackelberg game. The platform's algorithms, such as TikTok's and YouTube's, function as leaders, and they set rules to maximize users' engagement with their platforms. Then, content creators, who function as followers in this Stackelberg Game, respond to this by selecting strategies; in this instance, we are specifically focusing on collaboration or conflict, referred to in this paper as 'beefing.' They do this in order to maximize views and personal payoffs. The viewer's preferences are already placed within the algorithmic utility function, while the external sponsors will impose penalties on high-risk strategies, namely, beefing multiple times. This paper ultimately demonstrates, through the use of math, how shifts in algorithmic weights determine equilibrium creator behavior.

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

2026-07-16T03:42:06Z

Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propose DNQ, a solver-in-the-loop equilibrium supervision framework for training bidding agents. DNQ alternates between trajectory collection, critic-based payoff estimation, equilibrium computation, and policy imitation. At each visited state, a shared critic predicts either pairwise payoff matrices or an exact N-player payoff tensor, an external solver computes equilibrium strategies, and the agents are trained by minimizing the KL divergence between their masked policies and the solver-derived equilibrium targets. We focus on a scalable pairwise formulation that greatly reduces equilibrium-solving cost and training time compared with the exact formulation, while the shared critic amortizes payoff learning across agents and states. Experiments compare the pairwise and exact variants using critic loss, policy entropy, bidding resource usage, and training cost, showing that the pairwise method scales to larger numbers of agents, whereas the exact method becomes computationally impractical as the joint game grows. These results illustrate the trade-off between strategic fidelity and scalability in repeated competitive environments.

Compensation Design

2026-07-16T00:15:25Z

We introduce compensation design, the problem of designing payment rules that incentivize high-quality contributions in decentralized environments. Here, a budget-constrained principal with a monotone submodular value function aims to design a payment rule, while agents decide whether to opt in or out depending on their private cost. We show that a simple cost-oblivious and anonymous marginal-contribution payment rule guarantees that pure Nash equilibria always exist and attain a price of anarchy (PoA) of at most $2+o_λ(1)$ in the large-market regime ($λ\to 0$) where each individual cost is at most a $λ$ fraction of the budget. We further show that the factor $2$ is unavoidable among deterministic cost-oblivious rules. Surprisingly, we identify a counterexample showing that a payment rule based on the Shapley value may admit no pure Nash equilibria. We then extend our scope to coarse correlated equilibria. This is further motivated by our intractability result: although a pure Nash equilibrium always exists, computing one is PLS-complete. We establish that coarse correlated equilibria also attain a PoA bound of at most $2+o_λ(1)$, and this guarantee in fact extends even under the payment rule induced by the Shapley value. Moreover, we move beyond monotone submodular value functions and binary actions. First, for (monotone) XOS valuations, we show that no oracle-efficient payment rule can attain a PoA bound of $O(n^{1/2 - ε})$. Second, for submodular but non-monotone valuations, we show that a broad class of natural payment rules fails to guarantee a bounded PoA. Finally, we extend compensation design to the setting where each agent has a combinatorial action set. We provide randomized payment rules with logarithmic PoA guarantees for subadditive values, and matching lower bounds that apply even in the single-agent additive-value setting.

Stable Voting is PSPACE-Complete

2026-07-15T21:12:38Z

Stable Voting and Simple Stable Voting, introduced by Holliday and Pacuit, are Condorcet-consistent voting rules defined recursively: a candidate wins if they would win after removing some opponent they beat, taking the pair with the largest margin first. The computational complexity of winner determination under these rules has been an open question. We resolve this problem: winner determination is PSPACE-complete under both Stable Voting and Simple Stable Voting.

When Is Delegated Play Truthful? Within-Range Regret and the Trilemma of Aligned Delegation

2026-07-15T20:45:21Z

Advertisers delegate bidding to autobidders; users delegate tasks to language-model agents. A person describes what they want to an automated proxy that acts in a mechanism on their behalf. This is the revelation principle in production, and it forces a question classical theory assumes away: when is it optimal to describe yourself honestly to your own proxy? We show the answer turns on one quantity, the proxy's within-range regret. The most a principal can gain by misreporting equals the regret of the proxy's honest-report action against those the principal could have steered it to take. Honest self-description is optimal exactly when the proxy already plays the best action it can reach, that is, when it is loyal (Theorem 1). The identity unifies auction-specific autobidding results and pins down when the faithful-communication assumption behind language-model elicitation proxies (Huang et al.) holds. The identity constrains guardrails placed on proxies, from bid caps to a model's alignment layer. No guardrail can be at once binding (it displaces the truthful action from the proxy's best reachable outcome), truthful (honest reporting stays optimal), and capability-preserving (that outcome stays reachable through some report); any two preclude the third (Theorem 2). A safety constraint that alters what a model does while leaving its best output reachable makes honest description of intent suboptimal, so a sharper report can gain. This is the incentive behind prompt-engineering and jailbreaking. Because within-range regret is #P-hard to compute exactly, we estimate it from samples and maintain it as a model is updated, at a cost set by how far the model drifts, not how often it changes. Running it on production language models from five providers under an alignment-style cap, we find honest reporting leaves surplus unclaimed on every model, recovered by inflating the report.

Stationary Online Contention Resolution Schemes

2026-07-15T18:46:05Z

Online contention resolution schemes (OCRSs) are a central tool in Bayesian online selection and resource allocation: they convert fractional ex-ante relaxations into feasible online policies while preserving each marginal probability up to a constant factor. Despite their importance, designing (near) optimal OCRSs is often technically challenging, and many existing constructions rely on indirect reductions to prophet inequalities and LP duality, resulting in algorithms that are difficult to interpret or implement. In this paper, we introduce "stationary online contention resolution schemes (S-OCRSs)," a permutation-invariant class of OCRSs in which the distribution of the selected feasible set is independent of arrival order. We show that S-OCRSs admit an exact distributional characterization together with a universal online implementation. We then develop a general `maximum-entropy' approach to construct and analyze S-OCRSs, reducing the design of online policies to constructing suitable distributions over feasible sets. This yields a new technical framework for designing simple and possibly improved OCRSs. We demonstrate the power of this framework across several canonical feasibility environments. In particular, we obtain an improved $(3-\sqrt{5})/2$-selectable OCRS for bipartite matchings, attaining the independence benchmark conjectured to be optimal and yielding the best known prophet inequality for this setting. We also obtain a $1-\sqrt{2/(πk)} + O(1/k)$-selectable OCRS for $k$-uniform matroids and a simple, explicit $1/2$-selectable OCRS for weakly Rayleigh matroids (including all $\mathbb{C}$-representable matroids such as graphic and laminar). While these guarantees match the best known bounds, our framework also yields concrete and systematic constructions, providing transparent algorithms in settings where previous OCRSs were implicit or technically involved.

Generalised Reachability Games

2026-07-15T17:04:32Z

We study two-player zero-sum turn-based games played on graphs with multiple reachability objectives called generalised reachability games. In classic reachability games the goal of one player, Eve, is to visit a given target set of vertices, and that of the other player, Adam, is to prevent this. In generalised reachability games, the single target set is replaced with a family of target sets and the objective of Eve is to visit all of them in any order. We study the complexity of deciding the winner in two-player games with generalised reachability objectives. Our study reveals that an important parameter that determines the complexity of this problem is the size of the target sets. We first prove that deciding the winner in such games is PSPACE-complete, and the PSPACE lower bound holds even when the size of each target set is at most three. By contrast, we show that the problem is FPT in the number of target sets of size greater than one. Moreover, we consider the memory requirements for both players and give matching upper and lower bounds on the sizes of winning strategies. We also study optimisation variants of these games. For the optimisation problems, we show intractability for most interesting cases. Particularly, in contrast to the tractability of generalised reachability in the case with singleton target sets, the optimisation problem is coNP-hard when Eve tries to maximise the number of target sets that are visited. Tractability of this case can be recovered in a different optimisation setting where Eve is required to pledge a maximum sized subset of target sets that she can guarantee to visit.

Dynamic Rental Games with Stagewise Individual Rationality

2026-07-15T17:00:52Z

We study \emph{rental games} -- a single-parameter dynamic mechanism design problem, in which a designer rents out an indivisible asset over $n$ days. Each day, an agent arrives with a private valuation per day of rental, drawn from that day's (known) distribution. The designer can either rent out the asset to the current agent for any number of remaining days, charging them a (possibly different) payment per day, or turn the agent away. Agents who arrive when the asset is not available are turned away. A defining feature of our dynamic model is that agents are \emph{stagewise-IR} (individually rational), meaning they reject any rental agreement that results in temporary negative utility, even if their final utility is positive. We ask whether and under which economic objectives it is useful for the designer to exploit the stagewise-IR nature of the agents. We show that an optimal rental mechanism can be modeled as a sequence of dynamic auctions with seller costs. However, the stagewise-IR behavior of the agents makes these auctions quite different from classical single-parameter auctions: Myerson's Lemma does not apply, and indeed we show that truthful mechanisms are not necessarily monotone, and payments do not necessarily follow Myerson's unique payment rule. We develop alternative characterizations of optimal mechanisms under several classes of economic objectives, including generalizations of welfare, revenue and consumer surplus. These characterizations allow us to use Myerson's unique payment rule in several cases, and for the other cases we develop optimal mechanisms from scratch. Our work shows that rental games raise interesting questions even in the single-parameter regime.

The Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop (DVM-HALL) Model and the Net Human-Agent Score (NHAS) in Autonomous Commerce

2026-07-15T16:27:48Z

The rapid proliferation of Agentic Artificial Intelligence fundamentally disrupts traditional customer loyalty paradigms. As AI evolves from passive recommendation algorithms to autonomous, goal-directed agents capable of executing purchasing decisions, the conventional understanding of consumer-brand relationships requires a structural reevaluation. By synthesizing extant literature across human-machine teaming, consumer decision-making, and algorithmic trust dynamics, we demonstrate that traditional loyalty models fail to account for algorithmic bounded rationality and constructed autonomy. To address this, we introduce the Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop (DVM-HALL) model. We formalize brand choice via a softmax probability formulation where human emotional equity, agentic machine-experience utility, calibrated trust, delegated authority, and verifiable execution jointly determine selection. The model features recursive updating mechanisms to dynamically calibrate trust and delegation after each interaction. Crucially, the framework integrates a verifiable execution layer for Decentralized Finance (DeFi) and tokenized loyalty settings, incorporating execution risks -- such as gas costs, slippage, MEV exposure, and smart-contract vulnerabilities -- as core predictors of agentic brand preference. Furthermore, we introduce the Net Human-Agent Score (NHAS), an auditable, risk-weighted metric designed to measure human-agent alignment using human feedback, execution logs, benchmark comparisons, and verifiable receipts. Finally, we propose a comprehensive three-stage empirical validation plan spanning controlled shopping experiments, multi-agent market simulations, and DeFi testbeds. This framework provides the foundational theory required for brands to navigate the impending transition toward machine customers.

Tighter Bounds for the Random-Offerer Mechanism in Bilateral Trade

2026-07-15T15:46:04Z

The random-offerer mechanism for bilateral trade selects the seller or the buyer uniformly and lets the selected agent make a profit-maximizing take-it-or-leave-it offer. Let $ρ_{\rm RO}$ be the infimum, over independent value distributions, of the mechanism's gains from trade divided by first-best gains from trade. We prove $\frac1π\le ρ_{\rm RO}<0.460242308085529$. For the lower bound, we improve the previous guarantee from approximately $0.317844$ to $1/π\approx 0.318310$. The proof uses a parameterized Lagrangian bound for pointwise-monotone allocations. At multiplier one, this bound has coefficient $2/π$, and the Lagrangian separates into two terms controlled by the optimal seller-offering and buyer-offering profits. For the upper bound, we construct an explicit family consisting of a truncated equal-revenue buyer and a seller distribution with a tilted power-law lower tail and a constant-virtual-cost segment. The family satisfies $\operatorname{FB}/\operatorname{RO}>2.17276852308451$, improving the previous explicit ratio $2.0749$; rigorous interval arithmetic certifies the numerical inequality.

Efficiency, Feasibility, and Incentive-Awareness in Constrained Online Resource Allocation

2026-07-15T15:36:32Z

We study the dynamic allocation of indivisible resources to strategic agents under long-term constraints, where the planner aims to maximize social welfare, satisfy multiple constraints, and elicit near-truthful reports. We find standard primal-dual methods fragile in this setting: agents easily manipulate their reports to distort dual variables, sacrificing social efficiency for individual utility. To address this, we propose the Incentive-Aware Primal-Dual (IAPD) framework. On the primal side, we integrate three components to suppress manipulation: a VCG-based payment neutralizes immediate misreporting benefits, while epoch-based lazy updates and random exploration together ensure potential future gains are outweighed by immediate penalties. On the dual side, to overcome a learning barrier due to lazy updates -- which we call the "price of incentives" -- we design a novel optimistic online learning algorithm, O-FTRL-FP. It utilizes a fixed-point oracle to resolve the circular dependency between optimistic dual variables and the resulting allocations. Ultimately, our mechanism attains $\tilde{\mathcal O}(\sqrt T)$ social welfare regret, satisfies all long-term constraints, and induces a near-truthful equilibrium. It also smoothly generalizes to multi-unit multi-demand allocation problems. Notably, this $\tilde{\mathcal O}(\sqrt T)$ regret near-matches the non-strategic $Ω(\sqrt T)$ lower bound, demonstrating that incentive-awareness can be accommodated at nearly no cost.