https://arxiv.org/api/M6Nd4Cqh9rhsP6tD7lIoBLETyNw2026-03-28T09:16:19Z601297515http://arxiv.org/abs/2510.13458v3On Zermelo's planar navigation problem for convex bodies, and implications for non-convex optimal routing2026-03-25T04:17:15ZWe study a generalized version of Zermelo's navigation problem where the set of admissible velocities is a general compact convex set, replacing the classical Euclidean ball. After establishing existence results under the natural assumption of weak currents, we derive necessary optimality conditions via Pontryagin's maximum principle and convex analysis. Consequently, in the planar case, the domain of any optimal control is shown to be partitioned into regular and singular regimes. In the former, the optimal control is regular and satisfies a Zermelo-like navigation equation while in the latter it is largely undetermined. A necessary condition that can exclude singular regimes is stated and proved, providing a useful tool in applications. In regular regimes our results extend the classical Zermelo navigation equation to general convex control sets within a non-parametric setting. Furthermore, we discuss direct applications to the case of a non-convex control set. As an application, we develop the relevant case of an affine current. The results are illustrated with examples relevant to sailing and ship routing with asymmetric or sail-assisted propulsion, including the presence of waves.2025-10-15T11:59:43ZMatteo Della RossaLorenzo FreddiMattia Pinattohttp://arxiv.org/abs/2601.16399v3A Hessian-Free Actor-Critic Algorithm for Bi-Level Reinforcement Learning with Applications to LLM Fine-Tuning2026-03-25T03:57:14ZWe study a structured bi-level optimization problem where the upper-level objective is a smooth function and the lower-level problem is policy optimization in a Markov decision process (MDP). The upper-level decision variable parameterizes the reward of the lower-level MDP, and the upper-level objective depends on the optimal induced policy. Existing methods for bi-level optimization and RL often require second-order information, impose strong regularization at the lower level, or inefficiently use samples through nested-loop procedures. In this work, we propose a single-loop, first-order actor-critic algorithm that optimizes the bi-level objective via a penalty-based reformulation. We introduce into the lower-level RL objective an attenuating entropy regularization, which enables asymptotically unbiased upper-level hyper-gradient estimation without solving the unregularized RL problem exactly. We establish the finite-time and finite-sample convergence of the proposed algorithm to a stationary point of the original, unregularized bi-level optimization problem through a novel lower-level residual analysis under a special type of Polyak-Lojasiewicz condition. We validate the performance of our method through experiments on a GridWorld goal position problem and on happy tweet generation through reinforcement learning from human feedback (RLHF).2026-01-23T02:12:24ZSihan ZengSujay BhattSumitra GaneshAlec Koppelhttp://arxiv.org/abs/2603.23799v1Resolving gradient pathology in physics-informed epidemiological models2026-03-25T00:19:21ZPhysics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine similarity between the data and physics gradients to dynamically modulate the penalty weight. Unlike standard annealing schemes that only normalize scales, CGGS acts as a geometric gate: it suppresses the physical constraint when directional conflict is high, allowing the optimizer to prioritize data fidelity, and restores the constraint when gradients align. We prove that this gating mechanism preserves the standard $O(1/T)$ convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods.2026-03-25T00:19:21Z16 pages, 4 figures. Submitted to Neural NetworksNickson GoloobaWoldegebriel Assefa Woldegerimahttp://arxiv.org/abs/2503.05594v2Multi-asset optimal trade execution with stochastic cross-effects: An Obizhaeva-Wang-type framework2026-03-24T23:48:02ZWe analyze a continuous-time optimal trade execution problem in multiple assets where the price impact and the resilience can be matrix-valued stochastic processes that incorporate cross-impact effects. In addition, we allow for stochastic terminal and running targets. Initially, we formulate the optimal trade execution task as a stochastic control problem with a finite-variation control process that acts as an integrator both in the state dynamics and in the cost functional. We then extend this problem continuously to a stochastic control problem with progressively measurable controls. By identifying this extended problem as equivalent to a certain linear-quadratic stochastic control problem, we can use established results in linear-quadratic stochastic control to solve the extended problem. This work generalizes [Ackermann, Kruse, Urusov; FinancStoch'24] from the single-asset setting to the multi-asset case. In particular, we reveal cross-hedging effects, showing that it can be optimal to trade in an asset despite having no initial position. Moreover, as a subsetting we discuss a multi-asset variant of the model in [Obizhaeva, Wang; JFinancMark'13].2025-03-07T17:22:33ZJulia AckermannThomas KruseMikhail Urusovhttp://arxiv.org/abs/2603.20503v2Perturbation Duality for Robust and Distributionally Robust Optimization: Short and General Proofs2026-03-24T22:13:01ZDuality is a foundational tool in robust and distributionally robust optimization (RO and DRO), underpinning both analytical insights and tractable reformulations. The prevailing approaches in the literature primarily rely on saddle-point arguments, Lagrangian techniques, and conic duality. In contrast, this paper applies perturbation duality in the sense of Fenchel--Rockafellar convex analysis and demonstrates its effectiveness as a general and unifying methodology for deriving dual formulations in RO and DRO. We first apply perturbation duality to a recently proposed DRO framework that unifies phi-divergence and Wasserstein ambiguity sets through optimal transport with conditional moment constraints. We establish the associated dual representation without imposing compactness assumptions previously conjectured to be necessary, instead introducing alternative conditions motivated by perturbation analysis and leveraging the Interchangeability Principle. We then revisit the concept of robust duality -- commonly described as ``primal-worst equals dual-best'' -- and show that perturbation-based formulations provide a unified and transparent characterization of this principle. In particular, we develop a bifunction-based representation that encompasses existing formulations in the literature and yields concise and general proofs, substantially simplifying recent results. This work positions perturbation duality as a versatile and underutilized framework for RO and DRO, offering both conceptual unification and technical generality across a broad class of models.2026-03-20T21:13:00Z25 pagesLouis L. ChenJake RothJohannes O. Roysethttp://arxiv.org/abs/2603.17875v2Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs2026-03-24T21:52:47ZMarkov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. A new existence result is established for the existence of optimal policies in general MDPs, which differs from the existence result derived previously in the literature. Using the well-established perturbation theory of linear operators, policy difference lemma is established for general MDPs and the Gauteaux derivative of the objective function as a function of the policy operator is derived. By upper bounding the policy difference via the theory of integral probability metric, a new majorization-minimization type policy gradient algorithm for general MDPs is derived. This leads to generalization of many well-known algorithms in reinforcement learning to cases with general state and action spaces. Further, by taking the integral probability metric as maximum mean discrepancy, a low-complexity policy gradient algorithm is derived for finite MDPs. The new algorithm, called MM-RKHS, appears to be superior to PPO algorithm due to low computational complexity, low sample complexity, and faster convergence.2026-03-18T16:01:49ZAbhishek GuptaAditya Mahajanhttp://arxiv.org/abs/2603.23737v1Risk-Aware Linear-Quadratic Regulation with Temporally Coupled States2026-03-24T21:46:57ZWe formulate and solve a discrete-time linear-quadratic regulation (LQR) problem in a finite horizon that penalizes temporal variability and stochastic variability of the state trajectory. Our approach enables the user to strike a balance between regulating the state and reducing temporal variability, with explicit sensitivity to risk. We achieve this by extending a risk measure called predictive variance to a setting with temporally coupled states. Numerical examples demonstrate the effect of temporal coupling in both risk-aware and risk-neutral control settings. Particularly, we observe that explicitly penalizing temporal variability alone can also reduce stochastic variability.2026-03-24T21:46:57ZPreprint submitted to AutomaticaChuanning WeiKin Fung LiDionysis KalogeriasMargaret P. Chapmanhttp://arxiv.org/abs/2208.04411v2A Note on Generalizing Power Bounds for Physical Design2026-03-24T21:32:39ZIn this note we show how to construct a number of nonconvex quadratic inequalities for a variety of physics equations appearing in physical design problems. These nonconvex quadratic inequalities can then be used to construct bounds on physical design problems where the objective is a quadratic or a ratio of quadratics. We show that the quadratic inequalities and the original physics equations are equivalent under a technical condition that holds in many practical cases which is easy to computationally (and, in some cases, manually) verify.2022-08-08T20:51:37Zadded addendum (and small fixes)Guillermo Angerishttp://arxiv.org/abs/2603.23708v1Effective rates for continuous-time quasi-Fejér monotone dynamical systems2026-03-24T20:51:31ZWe provide quantitative convergence results for continuous-time dynamical systems in metric spaces that satisfy a continuous-time analog of quasi-Fejér monotonicity. More precisely, we provide a (strong) convergence result for such dynamical systems over compact metric spaces which is quantitatively outfitted with a continuous-time rate of metastability, which moreover can be explicitly and effectively constructed in a very uniform way, only depending on a few moduli representing quantitative witnesses to key properties of the dynamical system and a measure for the compactness of the space. We further show how this convergence result can be extended to non-compact spaces under a regularity assumption of the associated problem, where moreover rates of convergence can then be explicitly constructed which are similarly uniform. In both cases, already the associated ``infinitary'' convergence result is qualitatively novel in its present generality. Beyond this abstract quantitative theory for such dynamical systems, we motivate how the presently studied continuous-time variant of quasi-Fejér monotonicity naturally occurs as a unifying property of many dynamical systems and differential equations and inclusions, and in that way can be used to provide a comprehensive quantitative theory for many such dynamical systems. We illustrate this with three case studies for both classical first- and second-order dynamical systems in Hilbert spaces as well as (generalized) gradient flows and associated semigroups in nonlinear Hadamard spaces.2026-03-24T20:51:31Z54 pagesAnton FreundNicholas Pischkehttp://arxiv.org/abs/2603.24617v1Multi-LLM Query Optimization2026-03-24T19:51:57ZDeploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum.2026-03-24T19:51:57ZArlen DeanZijin ZhangStefanus JasinYuqing Liuhttp://arxiv.org/abs/2510.00559v2Ensemble Kalman Inversion for Constrained Nonlinear MPC: An ADMM-Splitting Approach2026-03-24T19:42:52ZThis work proposes a novel Alternating Direction Method of Multipliers (ADMM)-based Ensemble Kalman Inversion (EKI) algorithm for solving constrained nonlinear model predictive control (NMPC) problems. First, stage-wise nonlinear inequality constraints in the NMPC problem are embedded via an augmented Lagrangian with nonnegative slack variables. We then show that the resulting unconstrained augmented-Lagrangian primal subproblem admits a Bayesian interpretation: under independent Gaussian virtual observations, its minimizers coincide with MAP estimators, enabling solution via EKI. However, since the nonnegativity constraint on the slacks is a hard constraint not naturally encoded by a Gaussian model, our proposed algorithm yields a two-block ADMM scheme that alternates between (i) an inexact primal step that minimizes the augmented-Lagrangian objective (implemented via EKI rollouts), (ii) a nonnegativity projection for the slacks, and (iii) a dual ascent step. To balance exploration and convergence, an annealing schedule tempers sampling covariances while a penalty schedule increases constraint enforcement over outer iterations, encouraging global search early and precise constraint satisfaction later. We evaluate the proposed controller on a 6-DOF UR5e manipulation benchmark in MuJoCo, comparing it against DIAL-MPC (an iterative MPPI variant) as the arm traverses a cluttered tabletop environment.2025-10-01T06:20:16ZAhmed KhalilMohamed SafwatEfstathios Bakolashttp://arxiv.org/abs/2603.23658v1Boost Like a (Var)Pro: Trust-Region Gradient Boosting via Variable Projection2026-03-24T19:01:07ZGradient boosting, a method of building additive ensembles from weak learners, has established itself as a practical and theoretically-motivated approach to approximate functions, especially using decision tree weak learners. Comparable methods for smooth parametric learners, such as neural networks, remain less developed in both training methodology and theory. To this end, we introduce \texttt{VPBoost} ({\bf V}ariable {\bf P}rojection {\bf Boost}ing), a gradient boosting algorithm for separable smooth approximators, i.e., models with a smooth nonlinear featurizer followed by a final linear mapping. \texttt{VPBoost} fuses variable projection, a training paradigm for separable models that enforces optimality of the linear weights, with a second-order weak learning strategy. The combination of second-order boosting, separable models, and variable projection give rise to a closed-form solution for the optimal linear weights and a natural interpretation of \VPBoost as a functional trust-region method. We thereby leverage trust-region theory to prove \VPBoost converges to a stationary point under mild geometric conditions and, under stronger assumptions, achieves a superlinear convergence rate. Comprehensive numerical experiments on synthetic data, image recognition, and scientific machine learning benchmarks demonstrate that \VPBoost learns an ensemble with improved evaluation metrics in comparison to gradient-descent-based boosting and attains competitive performance relative to an industry-standard decision tree boosting algorithm.2026-03-24T19:01:07Z55 pages, 14 figuresAbhijit ChowdharyElizabeth NewmanDeepanshu Vermahttp://arxiv.org/abs/2603.23492v1Universal and Parameter-free Gradient Sliding for Composite Optimization2026-03-24T17:57:36ZWe propose a parameter-free universal gradient sliding (PFUGS) algorithm for computing an approximation solution to the convex composite optimization problem $\min_{x\in X} \{f(x) + g(x)\}$. When $f$ and $g$ have $(M_ν,ν)$-Hölder and $L$-Lipschitz continuous (sub)gradients respectively, our proposed PFUGS method computes an approximate solution within at most $\mathcal{O}((M_ν/\varepsilon)^{{2}/{(1+3ν)}})$ and $\mathcal{O}((L/\varepsilon)^{1/2})$ evaluations of (sub)gradients of $f$ and $g$ respectively. Moreover, the PFUGS algorithm is parameter-free and does not require any prior knowledge on problem constants $ν$, $M_ν$, and $L$. To the best of knowledge, for problems involving two functions with different sets of problem constants, PFUGS is the first gradient sliding algorithm that is parameter-free.2026-03-24T17:57:36ZYan WuYuyuan OuyangZhe ZhangQi Luohttp://arxiv.org/abs/2508.07494v2From Product Hilbert Spaces to the Generalized Koopman Operator and the Nonlinear Fundamental Lemma2026-03-24T17:54:42ZThe generalization of the Koopman operator to systems with control input and the derivation of a nonlinear fundamental lemma are two open problems that play a key role in the development of data-driven control methods for nonlinear systems. In this paper we derive a novel solution to these problems based on basis functions expansion in a product Hilbert space constructed as the tensor product between the Hilbert spaces of the state and input observable functions, respectively. We identify relaxed invariance conditions that guarantee existence of a bounded linear operator, i.e., the generalized Koopman operator, from the constructed product Hilbert space to the Hilbert space corresponding to the lifted state propagated forward in time. Compared to classical Koopman invariance conditions, measure preservation is not required. Moreover, we derive a nonlinear fundamental lemma by exploiting the constructed exact infinite-dimensional bilinear Koopman representation and Hankel operators. The effectiveness of the developed generalized Koopman embedding is illustrated on the Van der Pol oscillator and in predictive control of a soft-robotic manipulator model.2025-08-10T21:57:16ZRevisions compared to first version: formal analysis of the generalized Koopman composition operator, exact bilinear form with finite-dimensional input Hilbert space for input-affine systems, quantitative persistency of excitation notion for infinite-dimensional bilinear systems, nonlinear fundamental lemma in terms of Hankel operators and frames, addition soft-robotic manipulator exampleMircea Lazarhttp://arxiv.org/abs/2603.23472v1Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions2026-03-24T17:39:09ZFederated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.2026-03-24T17:39:09Z12 pages, 3 figuresRustem IslamovGrigory MalinovskyAlexander GaponovAurelien LucchiPeter RichtárikEduard Gorbunov