https://arxiv.org/api/M6Nd4Cqh9rhsP6tD7lIoBLETyNw 2026-03-28T09:16:19Z 60129 75 15 http://arxiv.org/abs/2510.13458v3 On Zermelo's planar navigation problem for convex bodies, and implications for non-convex optimal routing 2026-03-25T04:17:15Z We study a generalized version of Zermelo's navigation problem where the set of admissible velocities is a general compact convex set, replacing the classical Euclidean ball. After establishing existence results under the natural assumption of weak currents, we derive necessary optimality conditions via Pontryagin's maximum principle and convex analysis. Consequently, in the planar case, the domain of any optimal control is shown to be partitioned into regular and singular regimes. In the former, the optimal control is regular and satisfies a Zermelo-like navigation equation while in the latter it is largely undetermined. A necessary condition that can exclude singular regimes is stated and proved, providing a useful tool in applications. In regular regimes our results extend the classical Zermelo navigation equation to general convex control sets within a non-parametric setting. Furthermore, we discuss direct applications to the case of a non-convex control set. As an application, we develop the relevant case of an affine current. The results are illustrated with examples relevant to sailing and ship routing with asymmetric or sail-assisted propulsion, including the presence of waves. 2025-10-15T11:59:43Z Matteo Della Rossa Lorenzo Freddi Mattia Pinatto http://arxiv.org/abs/2601.16399v3 A Hessian-Free Actor-Critic Algorithm for Bi-Level Reinforcement Learning with Applications to LLM Fine-Tuning 2026-03-25T03:57:14Z We study a structured bi-level optimization problem where the upper-level objective is a smooth function and the lower-level problem is policy optimization in a Markov decision process (MDP). The upper-level decision variable parameterizes the reward of the lower-level MDP, and the upper-level objective depends on the optimal induced policy. Existing methods for bi-level optimization and RL often require second-order information, impose strong regularization at the lower level, or inefficiently use samples through nested-loop procedures. In this work, we propose a single-loop, first-order actor-critic algorithm that optimizes the bi-level objective via a penalty-based reformulation. We introduce into the lower-level RL objective an attenuating entropy regularization, which enables asymptotically unbiased upper-level hyper-gradient estimation without solving the unregularized RL problem exactly. We establish the finite-time and finite-sample convergence of the proposed algorithm to a stationary point of the original, unregularized bi-level optimization problem through a novel lower-level residual analysis under a special type of Polyak-Lojasiewicz condition. We validate the performance of our method through experiments on a GridWorld goal position problem and on happy tweet generation through reinforcement learning from human feedback (RLHF). 2026-01-23T02:12:24Z Sihan Zeng Sujay Bhatt Sumitra Ganesh Alec Koppel http://arxiv.org/abs/2603.23799v1 Resolving gradient pathology in physics-informed epidemiological models 2026-03-25T00:19:21Z Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine similarity between the data and physics gradients to dynamically modulate the penalty weight. Unlike standard annealing schemes that only normalize scales, CGGS acts as a geometric gate: it suppresses the physical constraint when directional conflict is high, allowing the optimizer to prioritize data fidelity, and restores the constraint when gradients align. We prove that this gating mechanism preserves the standard $O(1/T)$ convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods. 2026-03-25T00:19:21Z 16 pages, 4 figures. Submitted to Neural Networks Nickson Golooba Woldegebriel Assefa Woldegerima http://arxiv.org/abs/2503.05594v2 Multi-asset optimal trade execution with stochastic cross-effects: An Obizhaeva-Wang-type framework 2026-03-24T23:48:02Z We analyze a continuous-time optimal trade execution problem in multiple assets where the price impact and the resilience can be matrix-valued stochastic processes that incorporate cross-impact effects. In addition, we allow for stochastic terminal and running targets. Initially, we formulate the optimal trade execution task as a stochastic control problem with a finite-variation control process that acts as an integrator both in the state dynamics and in the cost functional. We then extend this problem continuously to a stochastic control problem with progressively measurable controls. By identifying this extended problem as equivalent to a certain linear-quadratic stochastic control problem, we can use established results in linear-quadratic stochastic control to solve the extended problem. This work generalizes [Ackermann, Kruse, Urusov; FinancStoch'24] from the single-asset setting to the multi-asset case. In particular, we reveal cross-hedging effects, showing that it can be optimal to trade in an asset despite having no initial position. Moreover, as a subsetting we discuss a multi-asset variant of the model in [Obizhaeva, Wang; JFinancMark'13]. 2025-03-07T17:22:33Z Julia Ackermann Thomas Kruse Mikhail Urusov http://arxiv.org/abs/2603.20503v2 Perturbation Duality for Robust and Distributionally Robust Optimization: Short and General Proofs 2026-03-24T22:13:01Z Duality is a foundational tool in robust and distributionally robust optimization (RO and DRO), underpinning both analytical insights and tractable reformulations. The prevailing approaches in the literature primarily rely on saddle-point arguments, Lagrangian techniques, and conic duality. In contrast, this paper applies perturbation duality in the sense of Fenchel--Rockafellar convex analysis and demonstrates its effectiveness as a general and unifying methodology for deriving dual formulations in RO and DRO. We first apply perturbation duality to a recently proposed DRO framework that unifies phi-divergence and Wasserstein ambiguity sets through optimal transport with conditional moment constraints. We establish the associated dual representation without imposing compactness assumptions previously conjectured to be necessary, instead introducing alternative conditions motivated by perturbation analysis and leveraging the Interchangeability Principle. We then revisit the concept of robust duality -- commonly described as ``primal-worst equals dual-best'' -- and show that perturbation-based formulations provide a unified and transparent characterization of this principle. In particular, we develop a bifunction-based representation that encompasses existing formulations in the literature and yields concise and general proofs, substantially simplifying recent results. This work positions perturbation duality as a versatile and underutilized framework for RO and DRO, offering both conceptual unification and technical generality across a broad class of models. 2026-03-20T21:13:00Z 25 pages Louis L. Chen Jake Roth Johannes O. Royset http://arxiv.org/abs/2603.17875v2 Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs 2026-03-24T21:52:47Z Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. A new existence result is established for the existence of optimal policies in general MDPs, which differs from the existence result derived previously in the literature. Using the well-established perturbation theory of linear operators, policy difference lemma is established for general MDPs and the Gauteaux derivative of the objective function as a function of the policy operator is derived. By upper bounding the policy difference via the theory of integral probability metric, a new majorization-minimization type policy gradient algorithm for general MDPs is derived. This leads to generalization of many well-known algorithms in reinforcement learning to cases with general state and action spaces. Further, by taking the integral probability metric as maximum mean discrepancy, a low-complexity policy gradient algorithm is derived for finite MDPs. The new algorithm, called MM-RKHS, appears to be superior to PPO algorithm due to low computational complexity, low sample complexity, and faster convergence. 2026-03-18T16:01:49Z Abhishek Gupta Aditya Mahajan http://arxiv.org/abs/2603.23737v1 Risk-Aware Linear-Quadratic Regulation with Temporally Coupled States 2026-03-24T21:46:57Z We formulate and solve a discrete-time linear-quadratic regulation (LQR) problem in a finite horizon that penalizes temporal variability and stochastic variability of the state trajectory. Our approach enables the user to strike a balance between regulating the state and reducing temporal variability, with explicit sensitivity to risk. We achieve this by extending a risk measure called predictive variance to a setting with temporally coupled states. Numerical examples demonstrate the effect of temporal coupling in both risk-aware and risk-neutral control settings. Particularly, we observe that explicitly penalizing temporal variability alone can also reduce stochastic variability. 2026-03-24T21:46:57Z Preprint submitted to Automatica Chuanning Wei Kin Fung Li Dionysis Kalogerias Margaret P. Chapman http://arxiv.org/abs/2208.04411v2 A Note on Generalizing Power Bounds for Physical Design 2026-03-24T21:32:39Z In this note we show how to construct a number of nonconvex quadratic inequalities for a variety of physics equations appearing in physical design problems. These nonconvex quadratic inequalities can then be used to construct bounds on physical design problems where the objective is a quadratic or a ratio of quadratics. We show that the quadratic inequalities and the original physics equations are equivalent under a technical condition that holds in many practical cases which is easy to computationally (and, in some cases, manually) verify. 2022-08-08T20:51:37Z added addendum (and small fixes) Guillermo Angeris http://arxiv.org/abs/2603.23708v1 Effective rates for continuous-time quasi-Fejér monotone dynamical systems 2026-03-24T20:51:31Z We provide quantitative convergence results for continuous-time dynamical systems in metric spaces that satisfy a continuous-time analog of quasi-Fejér monotonicity. More precisely, we provide a (strong) convergence result for such dynamical systems over compact metric spaces which is quantitatively outfitted with a continuous-time rate of metastability, which moreover can be explicitly and effectively constructed in a very uniform way, only depending on a few moduli representing quantitative witnesses to key properties of the dynamical system and a measure for the compactness of the space. We further show how this convergence result can be extended to non-compact spaces under a regularity assumption of the associated problem, where moreover rates of convergence can then be explicitly constructed which are similarly uniform. In both cases, already the associated ``infinitary'' convergence result is qualitatively novel in its present generality. Beyond this abstract quantitative theory for such dynamical systems, we motivate how the presently studied continuous-time variant of quasi-Fejér monotonicity naturally occurs as a unifying property of many dynamical systems and differential equations and inclusions, and in that way can be used to provide a comprehensive quantitative theory for many such dynamical systems. We illustrate this with three case studies for both classical first- and second-order dynamical systems in Hilbert spaces as well as (generalized) gradient flows and associated semigroups in nonlinear Hadamard spaces. 2026-03-24T20:51:31Z 54 pages Anton Freund Nicholas Pischke http://arxiv.org/abs/2603.24617v1 Multi-LLM Query Optimization 2026-03-24T19:51:57Z Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum. 2026-03-24T19:51:57Z Arlen Dean Zijin Zhang Stefanus Jasin Yuqing Liu http://arxiv.org/abs/2510.00559v2 Ensemble Kalman Inversion for Constrained Nonlinear MPC: An ADMM-Splitting Approach 2026-03-24T19:42:52Z This work proposes a novel Alternating Direction Method of Multipliers (ADMM)-based Ensemble Kalman Inversion (EKI) algorithm for solving constrained nonlinear model predictive control (NMPC) problems. First, stage-wise nonlinear inequality constraints in the NMPC problem are embedded via an augmented Lagrangian with nonnegative slack variables. We then show that the resulting unconstrained augmented-Lagrangian primal subproblem admits a Bayesian interpretation: under independent Gaussian virtual observations, its minimizers coincide with MAP estimators, enabling solution via EKI. However, since the nonnegativity constraint on the slacks is a hard constraint not naturally encoded by a Gaussian model, our proposed algorithm yields a two-block ADMM scheme that alternates between (i) an inexact primal step that minimizes the augmented-Lagrangian objective (implemented via EKI rollouts), (ii) a nonnegativity projection for the slacks, and (iii) a dual ascent step. To balance exploration and convergence, an annealing schedule tempers sampling covariances while a penalty schedule increases constraint enforcement over outer iterations, encouraging global search early and precise constraint satisfaction later. We evaluate the proposed controller on a 6-DOF UR5e manipulation benchmark in MuJoCo, comparing it against DIAL-MPC (an iterative MPPI variant) as the arm traverses a cluttered tabletop environment. 2025-10-01T06:20:16Z Ahmed Khalil Mohamed Safwat Efstathios Bakolas http://arxiv.org/abs/2603.23658v1 Boost Like a (Var)Pro: Trust-Region Gradient Boosting via Variable Projection 2026-03-24T19:01:07Z Gradient boosting, a method of building additive ensembles from weak learners, has established itself as a practical and theoretically-motivated approach to approximate functions, especially using decision tree weak learners. Comparable methods for smooth parametric learners, such as neural networks, remain less developed in both training methodology and theory. To this end, we introduce \texttt{VPBoost} ({\bf V}ariable {\bf P}rojection {\bf Boost}ing), a gradient boosting algorithm for separable smooth approximators, i.e., models with a smooth nonlinear featurizer followed by a final linear mapping. \texttt{VPBoost} fuses variable projection, a training paradigm for separable models that enforces optimality of the linear weights, with a second-order weak learning strategy. The combination of second-order boosting, separable models, and variable projection give rise to a closed-form solution for the optimal linear weights and a natural interpretation of \VPBoost as a functional trust-region method. We thereby leverage trust-region theory to prove \VPBoost converges to a stationary point under mild geometric conditions and, under stronger assumptions, achieves a superlinear convergence rate. Comprehensive numerical experiments on synthetic data, image recognition, and scientific machine learning benchmarks demonstrate that \VPBoost learns an ensemble with improved evaluation metrics in comparison to gradient-descent-based boosting and attains competitive performance relative to an industry-standard decision tree boosting algorithm. 2026-03-24T19:01:07Z 55 pages, 14 figures Abhijit Chowdhary Elizabeth Newman Deepanshu Verma http://arxiv.org/abs/2603.23492v1 Universal and Parameter-free Gradient Sliding for Composite Optimization 2026-03-24T17:57:36Z We propose a parameter-free universal gradient sliding (PFUGS) algorithm for computing an approximation solution to the convex composite optimization problem $\min_{x\in X} \{f(x) + g(x)\}$. When $f$ and $g$ have $(M_ν,ν)$-Hölder and $L$-Lipschitz continuous (sub)gradients respectively, our proposed PFUGS method computes an approximate solution within at most $\mathcal{O}((M_ν/\varepsilon)^{{2}/{(1+3ν)}})$ and $\mathcal{O}((L/\varepsilon)^{1/2})$ evaluations of (sub)gradients of $f$ and $g$ respectively. Moreover, the PFUGS algorithm is parameter-free and does not require any prior knowledge on problem constants $ν$, $M_ν$, and $L$. To the best of knowledge, for problems involving two functions with different sets of problem constants, PFUGS is the first gradient sliding algorithm that is parameter-free. 2026-03-24T17:57:36Z Yan Wu Yuyuan Ouyang Zhe Zhang Qi Luo http://arxiv.org/abs/2508.07494v2 From Product Hilbert Spaces to the Generalized Koopman Operator and the Nonlinear Fundamental Lemma 2026-03-24T17:54:42Z The generalization of the Koopman operator to systems with control input and the derivation of a nonlinear fundamental lemma are two open problems that play a key role in the development of data-driven control methods for nonlinear systems. In this paper we derive a novel solution to these problems based on basis functions expansion in a product Hilbert space constructed as the tensor product between the Hilbert spaces of the state and input observable functions, respectively. We identify relaxed invariance conditions that guarantee existence of a bounded linear operator, i.e., the generalized Koopman operator, from the constructed product Hilbert space to the Hilbert space corresponding to the lifted state propagated forward in time. Compared to classical Koopman invariance conditions, measure preservation is not required. Moreover, we derive a nonlinear fundamental lemma by exploiting the constructed exact infinite-dimensional bilinear Koopman representation and Hankel operators. The effectiveness of the developed generalized Koopman embedding is illustrated on the Van der Pol oscillator and in predictive control of a soft-robotic manipulator model. 2025-08-10T21:57:16Z Revisions compared to first version: formal analysis of the generalized Koopman composition operator, exact bilinear form with finite-dimensional input Hilbert space for input-affine systems, quantitative persistency of excitation notion for infinite-dimensional bilinear systems, nonlinear fundamental lemma in terms of Hankel operators and frames, addition soft-robotic manipulator example Mircea Lazar http://arxiv.org/abs/2603.23472v1 Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions 2026-03-24T17:39:09Z Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach. 2026-03-24T17:39:09Z 12 pages, 3 figures Rustem Islamov Grigory Malinovsky Alexander Gaponov Aurelien Lucchi Peter Richtárik Eduard Gorbunov