http://arxiv.org/api/xUaf7PXciuOcp0i35YqClgHt3Aw 2025-04-21T00:00:00-04:00 53757 0 15 http://arxiv.org/abs/2410.14054v2 2025-04-21T16:57:16Z 2024-10-17T21:52:00Z Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

Recent studies have shown that many nonconvex machine learning problems satisfy a generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms are not fully adapted to such generalized-smooth nonconvex geometry and encounter significant technical limitations on their convergence analysis. In this work, we first analyze the convergence of adaptively normalized gradient descent under function geometries characterized by generalized-smoothness and generalized P{\L} condition, revealing the advantage of adaptive gradient normalization. Our results provide theoretical insights into adaptive normalization across various scenarios.For stochastic generalized-smooth nonconvex optimization, we propose \textbf{I}ndependent-\textbf{A}daptively \textbf{N}ormalized \textbf{S}tochastic \textbf{G}radient \textbf{D}escent, which leverages adaptive gradient normalization, independent sampling, and gradient clipping to achieve an $\mathcal{O}(\epsilon^{-4})$ sample complexity under relaxed noise assumptions. Experiments on large-scale nonconvex generalized-smooth problems demonstrate the fast convergence of our algorithm.

Yufeng Yang Erin Tripp Yifan Sun Shaofeng Zou Yi Zhou 40 pages, 1 tables http://arxiv.org/abs/2412.17012v3 2025-04-21T16:33:22Z 2024-12-22T13:21:34Z Adaptive Control of Positive Systems with Application to Learning SSP

An adaptive controller is proposed and analyzed for the class of infinite-horizon optimal control problems in positive linear systems presented in (Ohlin et al., 2024b). This controller is derived from the solution of a "data-driven algebraic equation" constructed using the model-free Bellman equation from Q-learning. The equation is driven by data correlation matrices that do not scale with the number of data points, enabling efficient online implementation. Consequently, a sufficient condition guaranteeing stability and robustness to unmodeled dynamics is established. The derived results also provide a quantitative characterization of the interplay between excitation level and robustness to unmodeled dynamics. The class of optimal control problems considered here is equivalent to Stochastic Shortest Path (SSP) problems, allowing for a performance comparison between the proposed adaptive policy and model-free algorithms for learning the stochastic shortest path, as demonstrated in the numerical experiment.

Fethi Bencherki Anders Rantzer Accepted for publication in the Proceedings of the 7th Annual Learning for Dynamics and Control Conference (L4DC) http://arxiv.org/abs/2504.15196v1 2025-04-21T16:07:32Z 2025-04-21T16:07:32Z Fully Adaptive Stepsizes: Which System Benefit More -- Centralized or Decentralized?

In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT generates a sequence of iterates that converges to the optimal consensus solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.

Diyako Ghaderyan Stefan Werner http://arxiv.org/abs/2504.15177v1 2025-04-21T15:39:42Z 2025-04-21T15:39:42Z An $rp$-adaptive method for accurate resolution of shock-dominated viscous flow based on implicit shock tracking

This work introduces an optimization-based $rp$-adaptive numerical method to approximate solutions of viscous, shock-dominated flows using implicit shock tracking and a high-order discontinuous Galerkin discretization on traditionally coarse grids without nonlinear stabilization (e.g., artificial viscosity or limiting). The proposed method adapts implicit shock tracking methods, originally developed to align mesh faces with solution discontinuities, to compress elements into viscous shocks and boundary layers, functioning as a novel approach to aggressive $r$-adaptation. This form of $r$-adaptation is achieved naturally as the minimizer of the enriched residual with respect to the discrete flow variables and coordinates of the nodes of the grid. Several innovations to the shock tracking optimization solver are proposed to ensure sufficient mesh compression at viscous features to render stabilization unnecessary, including residual weighting, step constraints and modifications, and viscosity-based continuation. Finally, $p$-adaptivity is used to locally increase the polynomial degree with three clear benefits: (1) lessens the mesh compression requirements near shock waves and boundary layers, (2) reduces the error in regions where $r$-adaptivity is not sufficient with the given grid topology, and (3) reduces computational cost by performing a majority of the $r$-adaptivity iterations on the coarsest discretization. A series of numerical experiments show the proposed method effectively resolves viscous, shock-dominated flows, including accurate prediction of heat flux profiles produced by hypersonic flow over a cylinder, and compares favorably in terms of accuracy per degree of freedom to $h$-adaptation with a high-order discretization.

Huijing Dong Masayuki Yano Tianci Huang Matthew J. Zahr 43 pages, 35 figures, http://arxiv.org/abs/2504.15117v1 2025-04-21T14:13:15Z 2025-04-21T14:13:15Z Symplectic Geometry in Hybrid and Impulsive Optimal Control

Hybrid dynamical systems are systems which undergo both continuous and discrete transitions. The Bolza problem from optimal control theory is applied to these systems and a hybrid version of Pontryagin's maximum principle is presented. This hybrid maximum principle is presented to emphasize its geometric nature which makes its study amenable to the tools of geometric mechanics and symplectic geometry. One explicit benefit of this geometric approach is that the symplectic structure (and hence the induced volume) is preserved. This allows for a hybrid analog of caustics and conjugate points. Additionally, an introductory analysis of singular solutions (beating and Zeno) is discussed geometrically. This work concludes on a biological example where beating can occur.

William Clark Maria Oprea Comments welcome http://arxiv.org/abs/2404.08383v4 2025-04-21T14:10:07Z 2024-04-12T10:37:24Z Optimal Transport and Wasserstein Barycenter for Radially Contoured Distributions

The optimal transport and Wasserstein barycenter of Gaussian distributions have been solved. In literature, the closed form formulas of the Monge map, the Wasserstein distance and the Wasserstein barycenter have been given. Moreover, when Gaussian distributions extend more generally to elliptically contoured distributions, similar results also hold true. In this case, Gaussian distributions are regarded as elliptically contoured distribution with generator function $e^{-x/2}$. However, there are few results about optimal transport for elliptically contoured distributions with different generator functions. In this paper, we degenerate elliptically contoured distributions to radially contoured distributions and study their optimal transport and prove their Wasserstein barycenter is still radially contoured. For general elliptically contoured distributions, we give two numerical counterexamples to show that the Wasserstein barycenter of elliptically contoured distributions does not have to be elliptically contoured.

Keyu Chen Yunxin Zhang http://arxiv.org/abs/2504.15113v1 2025-04-21T14:06:25Z 2025-04-21T14:06:25Z Adaptive sieving with semismooth Newton proximal augmented Lagrangian algorithm for multi-task Lasso problems

Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the $\ell_{1,\infty}$-norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each subproblem along the path is solved via an inexact semismooth Newton proximal augmented Lagrangian ({\sc Ssnpal}) algorithm, achieving an asymptotically superlinear convergence rate. By exploiting the Karush-Kuhn-Tucker (KKT) conditions and the inherent sparsity of multi-task Lasso solutions, the {\sc Ssnpal} algorithm solves a sequence of reduced subproblems with small dimensions. This approach enables our method to scale effectively to large problems. Numerical experiments on synthetic and real-world datasets demonstrate the superior efficiency and robustness of our algorithm compared to state-of-the-art solvers.

Lanyu Lin Yong-Jin Liu Bo Wang Junfeng Yang http://arxiv.org/abs/2409.04297v2 2025-04-21T13:42:27Z 2024-09-06T14:13:57Z Minimization of the Pseudospectral Abscissa of a Quadratic Matrix Polynomial

For a quadratic matrix polynomial dependent on parameters and a given tolerance $\epsilon > 0$, the minimization of the $\epsilon$-pseudospectral abscissa over the set of permissible parameter values is discussed, with applications in damping optimization and brake squeal reductions in mind. An approach is introduced that is based on nonsmooth and global optimization (or smooth optimization techniques such as BFGS if there are many parameters) equipped with a globally convergent criss-cross algorithm to compute the $\epsilon$-pseudospectral abscissa objective when the matrix polynomial is of small size. For the setting when the matrix polynomial is large, a subspace framework is introduced, and it is argued formally that it solves the minimization problem globally. The subspace framework restricts the parameter-dependent matrix polynomial to small subspaces, and thus solves the minimization problem for such restricted small matrix polynomials. It then expands the subspaces using the minimizers for the restricted polynomials. The proposed approach makes the global minimization of the $\epsilon$-pseudospectral abscissa possible for a quadratic matrix polynomial dependent on a few parameters and for sizes up to at least a few hundreds. This is illustrated on several examples originating from damping optimization.

Volker Mehrmann Emre Mengi 29 pages, 5 figures http://arxiv.org/abs/2504.15084v1 2025-04-21T13:16:28Z 2025-04-21T13:16:28Z Reconfiguration and Real-Time Operation of Networked Microgrids Under Load Uncertainty

Distribution networks are increasingly exposed to threats such as extreme weather, aging infrastructure, and cyber risks--resulting in more frequent contingencies and outages, a trend likely to persist. Microgrids, particularly dynamic networked microgrids (DNMGs), offer a promising solution to mitigate the impacts of such contingencies and enhance resiliency. However, distribution networks present unique challenges due to their unbalanced nature and the inherent uncertainty in both loads and generation. This paper builds upon our prior work on the two-stage mixed-integer robust optimization problem for configuring DNMGs, improving the solve time and scalability. Furthermore, we introduce a model-free, real-time optimal power flow algorithm to manage DNMG operations in the time between reconfigurations. A case study on a realistic network based on part of the San Francisco Bay Area demonstrates the scalability of both approaches. The case study also illustrates the ability to maintain power flow feasibility as loads vary and operating conditions change when the methods are used in tandem.

Hannah Moring Bala Kameshwar Poolla Harsha Nagarajan Johanna L. Mathieu Andrey Bernstein David M. Fobes http://arxiv.org/abs/2504.15062v1 2025-04-21T12:41:35Z 2025-04-21T12:41:35Z OPO: Making Decision-Focused Data Acquisition Decisions

We propose a model for making data acquisition decisions for variables in contextual stochastic optimisation problems. Data acquisition decisions are typically treated as separate and fixed. We explore problem settings in which the acquisition of contextual variables is costly and consequently constrained. The data acquisition problem is often solved heuristically for proxy objectives such as coverage. The more intuitive objective is the downstream decision quality as a result of data acquisition decisions. The whole pipeline can be characterised as an optimise-then-predict-then-optimise (OPO) problem. Analogously, much recent research has focused on how to integrate prediction and optimisation (PO) in the form of decision-focused learning. We propose leveraging differentiable optimisation to extend the integration to data acquisition. We solve the data acquisition problem with well-defined constraints by learning a surrogate linear objective function. We demonstrate an application of this model on a shortest path problem for which we first have to set a drone reconnaissance strategy to capture image segments serving as inputs to a model that predicts travel costs. We ablate the problem with a number of training modalities and demonstrate that the differentiable optimisation approach outperforms random search strategies.

Egon Peršak Miguel F. Anjos http://arxiv.org/abs/2504.08191v2 2025-04-21T11:52:34Z 2025-04-11T01:12:59Z Optimal protection and vaccination against epidemics with reinfection risk

We consider the problem of optimal allocation of vaccination and protection measures for the Susceptible-Infected-Recovered-Infected (SIRI) epidemiological model, which generalizes the classical Susceptible-Infected-Recovered (SIR) and Susceptible-Infected-Susceptible (SIS) epidemiological models by allowing for reinfection. We first introduce the controlled SIRI dynamical model, and discuss the existence and stability of the equilibrium points. We then formulate a finite-horizon optimal control problem where the cost of vaccination and protection is proportional to the mass of population that adopts it. Our main contribution in this work arises from a detailed investigation into the existence/non-existence of singular control inputs, and establishing optimality of bang-bang controls. The optimality of bang-bang control is established by solving an optimal control problem with a running cost that is linear with respect to the input variables. The input variables are associated with actions including vaccination and imposition of protective measures, e.g. masking or isolation. In contrast to most prior works, we rigorously establish the non-existence of singular controls, i.e., the optimality of bang-bang control for our SIRI model. Under the assumption that the reinfection rate exceeds the first-time infection rate, we characterize the structure of both the optimal control inputs, and establish that the vaccination control input admits a bang-bang structure. Numerical results provide valuable insights into the evolution of the disease spread under optimal control.

Urmee Maitra Indian Institute of Technology, Kharagpur Ashish R. Hota Indian Institute of Technology, Kharagpur Rohit Gupta Indian Institute of Technology, Bombay Alfred O. Hero University of Michigan, Ann Arbor 21 pages, 2 figures http://arxiv.org/abs/2410.14592v2 2025-04-21T11:13:32Z 2024-10-18T16:43:10Z Contractivity and linear convergence in bilinear saddle-point problems: An operator-theoretic approach

We study the convex-concave bilinear saddle-point problem $\min_x \max_y f(x) + y^\top Ax - g(y)$, where both, only one, or none of the functions $f$ and $g$ are strongly convex, and suitable rank conditions on the matrix $A$ hold. The solution of this problem is at the core of many machine learning tasks. By employing tools from monotone operator theory, we systematically prove the contractivity (in turn, the linear convergence) of several first-order primal-dual algorithms, including the Chambolle-Pock method. Our approach results in concise proofs, and it yields new convergence guarantees and tighter bounds compared to known results.

Colin Dirren Mattia Bianchi Panagiotis D. Grontas John Lygeros Florian Dörfler AISTATS 2025 http://arxiv.org/abs/2504.15019v1 2025-04-21T11:05:03Z 2025-04-21T11:05:03Z Feedback Stackelberg-Nash equilibria in difference games with quasi-hierarchical interactions and inequality constraints

In this paper, we study a class of two-player deterministic finite-horizon difference games with coupled inequality constraints, where each player has two types of decision variables: one involving sequential interactions and the other simultaneous interactions. We refer to these as quasi-hierarchical dynamic games and define a solution concept called the feedback Stackelberg-Nash (FSN) equilibrium. Under a separability assumption on cost functions, we formulate FSN solutions recursively using a dynamic programming-like approach. We further show that the FSN solution for these constrained games can be derived from the parametric feedback Stackelberg solution of an associated unconstrained game with only sequential interactions, given parameter choices that satisfy implicit complementarity conditions. For the linear-quadratic case, we show that the FSN solutions are obtained by reformulating these complementarity conditions as a single large-scale linear complementarity problem. Finally, we illustrate our results with a dynamic duopoly game with production constraints.

Partha Sarathi Mohapatra Puduru Viswanadha Reddy Georges Zaccour http://arxiv.org/abs/2504.14987v1 2025-04-21T09:30:21Z 2025-04-21T09:30:21Z A general approach to distributed operator splitting

Splitting methods have emerged as powerful tools to address complex problems by decomposing them into smaller solvable components. In this work, we develop a general approach of forward-backward splitting methods for solving monotone inclusion problems involving both set-valued and single-valued operators, where the latter may lack cocoercivity. Our proposed approach, based on some coefficient matrices, not only encompasses several important existing algorithms but also extends to new ones, offering greater flexibility for different applications. Moreover, by appropriately selecting the coefficient matrices, the resulting algorithms can be implemented in a distributed and decentralized manner.

Minh N. Dao Matthew K. Tam Thang D. Truong http://arxiv.org/abs/2504.14892v1 2025-04-21T06:41:29Z 2025-04-21T06:41:29Z A level set topology optimization theory based on Hamilton's principle

In this paper, we present a novel framework for deriving the evolution equation of the level set function in topology optimization, departing from conventional Hamilton-Jacobi based formulations. The key idea is the introduction of an auxiliary domain, geometrically identical to the physical design domain, occupied by fictitious matter which is dynamically excited by the conditions prevailing in the design domain. By assigning kinetic and potential energy to this matter and interpreting the level set function as the generalized coordinate to describe its deformation, the governing equation of motion is determined via Hamilton's principle, yielding a modified wave equation. Appropriate combinations of model parameters enable the recovery of classical physical behaviors, including the standard and biharmonic wave equations. The evolution problem is formulated in weak form using variational methods and implemented in the software environment FreeFEM++. The influence of the numerical parameters is analyzed on the example of minimum mean compliance. The results demonstrate that topological complexity and strut design can be effectively controlled by the respective parameters. In addition, the method allows for the nucleation of new holes and eliminates the need for re-initializing the level set function. The inclusion of a damping term further enhances numerical stability. To showcase the versatility and robustness of our method, we also apply it to compliant mechanism design and a bi-objective optimization problem involving self-weight and compliance minimization under local stress constraints.

Jan Oellerich Takayuki Yamada 66 pages, 27 figures