https://arxiv.org/api/GTdb98p6NaBwwnrnh8y+uRTRDpw 2026-04-11T09:42:40Z 2962 210 15 http://arxiv.org/abs/2502.15865v2 Standard Benchmarks Fail -- Auditing LLM Agents in Finance Must Prioritize Risk 2025-06-02T10:13:24Z Standard benchmarks fixate on how well large language model (LLM) agents perform in finance, yet say little about whether they are safe to deploy. We argue that accuracy metrics and return-based scores provide an illusion of reliability, overlooking vulnerabilities such as hallucinated facts, stale data, and adversarial prompt manipulation. We take a firm position: financial LLM agents should be evaluated first and foremost on their risk profile, not on their point-estimate performance. Drawing on risk-engineering principles, we outline a three-level agenda: model, workflow, and system, for stress-testing LLM agents under realistic failure modes. To illustrate why this shift is urgent, we audit six API-based and open-weights LLM agents on three high-impact tasks and uncover hidden weaknesses that conventional benchmarks miss. We conclude with actionable recommendations for researchers, practitioners, and regulators: audit risk-aware metrics in future studies, publish stress scenarios alongside datasets, and treat ``safety budget'' as a primary success criterion. Only by redefining what ``good'' looks like can the community responsibly advance AI-driven finance. 2025-02-21T12:56:15Z 46 pages, 2 figures, 2 tables Zichen Chen Jiaao Chen Jianda Chen Misha Sra http://arxiv.org/abs/2506.01423v1 FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance 2025-06-02T08:22:28Z Enterprise Resource Planning (ERP) systems serve as the digital backbone of modern financial institutions, yet they continue to rely on static, rule-based workflows that limit adaptability, scalability, and intelligence. As business operations grow more complex and data-rich, conventional ERP platforms struggle to integrate structured and unstructured data in real time and to accommodate dynamic, cross-functional workflows. In this paper, we present the first AI-native, agent-based framework for ERP systems, introducing a novel architecture of Generative Business Process AI Agents (GBPAs) that bring autonomy, reasoning, and dynamic optimization to enterprise workflows. The proposed system integrates generative AI with business process modeling and multi-agent orchestration, enabling end-to-end automation of complex tasks such as budget planning, financial reporting, and wire transfer processing. Unlike traditional workflow engines, GBPAs interpret user intent, synthesize workflows in real time, and coordinate specialized sub-agents for modular task execution. We validate the framework through case studies in bank wire transfers and employee reimbursements, two representative financial workflows with distinct complexity and data modalities. Results show that GBPAs achieve up to 40% reduction in processing time, 94% drop in error rate, and improved regulatory compliance by enabling parallelism, risk control insertion, and semantic reasoning. These findings highlight the potential of GBPAs to bridge the gap between generative AI capabilities and enterprise-grade automation, laying the groundwork for the next generation of intelligent ERP systems. 2025-06-02T08:22:28Z Hongyang Yang Likun Lin Yang She Xinyu Liao Jiaoyang Wang Runjia Zhang Yuquan Mo Christina Dan Wang http://arxiv.org/abs/2410.12801v2 Exploring the Interplay of Skewness and Kurtosis: Dynamics in Cryptocurrency Markets Amid the COVID-19 Pandemic 2025-05-28T11:15:36Z We examine how skewness interacts with kurtosis within the cryptocurrency market. We show that during the COVID-19 pandemic there are more clusters of observations around the two flanks, highlighting the presence of a volatile behavior. Moreover, we document the evolvement of the interrelationship as the pandemic progresses, identifying the domination of the extremes. Our findings advance the thinking that by exploiting the interrelationship between the two higher moments of cryptocurrencies, investors and researchers can have in their arsenal an additional analytic tool. 2024-09-30T10:46:17Z Ariston Karagiorgis Antonis Ballis Konstantinos Drakos Christos Kallandranis http://arxiv.org/abs/2505.12269v3 Vague Knowledge: Evidence from Analyst Reports 2025-05-24T22:50:59Z People in the real world often possess vague knowledge of future payoffs, for which quantification is not feasible or desirable. We argue that language, with differing ability to convey vague information, plays an important but less-known role in representing subjective expectations. Empirically, we find that in their reports, analysts include useful information in linguistic expressions but not numerical forecasts. Specifically, the textual tone of analyst reports has predictive power for forecast errors and subsequent revisions in numerical forecasts, and this relation becomes stronger when analyst's language is vaguer, when uncertainty is higher, and when analysts are busier. Overall, our theory and evidence suggest that some useful information is vaguely known and only communicated through language. 2025-05-18T07:18:58Z Kerry Xiao Amy Zang http://arxiv.org/abs/2506.05357v1 Inventory record inaccuracy in grocery retailing: Impact of promotions and product perishability, and targeted effect of audits 2025-05-22T12:25:01Z We report the results of a study to identify and quantify drivers of inventory record inaccuracy (IRI) in a grocery retailing environment, a context where products are often subject to promotion activity and a substantial share of items are perishable. The analysis covers ~24,000 stock keeping units (SKUs) sold in 11 stores. We find that IRI is positively associated with average inventory level, restocking frequency, and whether the item is perishable, and negatively associated with promotional activity. We also conduct a field quasi-experiment to assess the marginal effect of stockcounts on sales. While performing an inventory audit is found to lead to an 11% store-wide sales lift, the audit has heterogeneous effects with all the sales lift concentrated on items exhibiting negative IRI (i.e., where system inventory is greater than actual inventory). The benefits of inventory audits are also found to be more pronounced on perishable items, that are associated with higher IRI levels. Our findings inform retailers on the appropriate allocation of effort to improve IRI and reframes stock counting as a sales-increasing strategy rather than a cost-intensive necessity. 2025-05-22T12:25:01Z Yacine Rekik Rogelio Oliva Christoph Glock Aris Syntetos http://arxiv.org/abs/2410.13878v2 Corporate Non-Disclosure Disputes: equilibrium settlement where increasing legal liability encourages voluntary disclosures 2025-05-22T10:19:18Z How should a court resolve a shareholder-management dispute after an unexpected price drop, when it is suspected that at an earlier time management chose not to update (disclose to) the market about a material event that was privately observed? An earlier fundamental result in this area (Dye, 2017) has shown that if the court chooses to make public that it will increase awards of damages to try and deter non-disclosure, then this may have the perverse effect that management may rationally choose to disclose less. Schantl and Wagenhofer (2024) call this the pure-insurance effect shareholders receive from higher damages payments. They show that the result may be relaxed if management also face a fixed exogenous reputational cost from non-disclosure. In this research we probe the increased-damages versus reduced-disclosure result via a different route. We introduce a dynamic continuous-time model of management's equilibrium disclosure decision and show that as awards of damages increase this has in a dynamic setting a hitherto unrecognized effect: management rationally switch their disclosure strategy. We characterize the range of damage awards, which we term the legal consistency zone, in which increased awards of damages evoke an endogenous increase in voluntary disclosure. 2024-10-02T19:37:22Z Miles B. Gietzmann Adam J. Ostaszewski http://arxiv.org/abs/2505.15526v1 Measuring inequality in society-oriented Lotka--Volterra-type kinetic equations 2025-05-21T13:51:27Z We present a possible approach to measuring inequality in a system of coupled Fokker-Planck-type equations that describe the evolution of distribution densities for two populations interacting pairwise due to social and/or economic factors. The macroscopic dynamics of their mean values follow a Lotka-Volterra system of ordinary differential equations. Unlike classical models of wealth and opinion formation, which tend to converge toward a steady-state profile, the oscillatory behavior of these densities only leads to the formation of local equilibria within the Fokker-Planck system. This makes tracking the evolution of most inequality measures challenging. However, an insightful perspective on the problem is obtained by using the coefficient of variation, a simple inequality measure closely linked to the Gini index. Numerical experiments confirm that, despite the system's oscillatory nature, inequality initially tends to decrease. 2025-05-21T13:51:27Z Marco Menale Giuseppe Toscani http://arxiv.org/abs/2505.14655v1 Cryptocurrencies in the Balance Sheet: Insights from (Micro)Strategy -- Bitcoin Interactions 2025-05-20T17:43:14Z This paper investigates the evolving link between cryptocurrency and equity markets in the context of the recent wave of corporate Bitcoin (BTC) treasury strategies. We assemble a dataset of 39 publicly listed firms holding BTC, from their first acquisition through April 2025. Using daily logarithmic returns, we first document significant positive co-movements via Pearson correlations and single factor model regressions, discovering an average BTC beta of 0.62, and isolating 12 companies, including Strategy (formerly MicroStrategy, MSTR), exhibiting a beta exceeding 1. We then classify firms into three groups reflecting their exposure to BTC, liquidity, and return co-movements. We use transfer entropy (TE) to capture the direction of information flow over time. Transfer entropy analysis consistently identifies BTC as the dominant information driver, with brief, announcement-driven feedback from stocks to BTC during major financial events. Our results highlight the critical need for dynamic hedging ratios that adapt to shifting information flows. These findings provide important insights for investors and managers regarding risk management and portfolio diversification in a period of growing integration of digital assets into corporate treasuries. 2025-05-20T17:43:14Z 25 pages, 6 tables, 7 figures Sabrina Aufiero Antonio Briola Tesfaye Salarin Fabio Caccioli Silvia Bartolucci Tomaso Aste http://arxiv.org/abs/2505.14565v1 Towards Verifiability of Total Value Locked (TVL) in Decentralized Finance 2025-05-20T16:24:59Z Total Value Locked (TVL) aims to measure the aggregate value of cryptoassets deposited in Decentralized Finance (DeFi) protocols. Although blockchain data is public, the way TVL is computed is not well understood. In practice, its calculation on major TVL aggregators relies on self-reports from community members and lacks standardization, making it difficult to verify published figures independently. We thus conduct a systematic study on 939 DeFi projects deployed in Ethereum. We study the methodologies used to compute TVL, examine factors hindering verifiability, and ultimately propose standardization attempts in the field. We find that 10.5% of the protocols rely on external servers; 68 methods alternative to standard balance queries exist, although their use decreased over time; and 240 equal balance queries are repeated on multiple protocols. These findings indicate limits to verifiability and transparency. We thus introduce ``verifiable Total Value Locked'' (vTVL), a metric measuring the TVL that can be verified relying solely on on-chain data and standard balance queries. A case study on 400 protocols shows that our estimations align with published figures for 46.5% of protocols. Informed by these findings, we discuss design guidelines that could facilitate a more verifiable, standardized, and explainable TVL computation. 2025-05-20T16:24:59Z JEL classification: E42, E58, F31, G12, G19, G23, L50, O33 Pietro Saggese Michael Fröwis Stefan Kitzler Bernhard Haslhofer Raphael Auer http://arxiv.org/abs/2506.03156v1 Gauging Growth: AGI Mathematical Metrics for Economic Progress 2025-05-20T08:44:30Z Today, the economy is greatly influenced by Artificial General Intelligence (AGI). The purpose of this paper is to determine the impact of the quantitative relations of AGI on the country's economic parameters. The authors use the analysis of historical data in the research, develop a new mathematical algorithm that refers to the level of AGI development, and conduct a regression analysis. The economic effect of AGI is deduced if it affects the growth of real GDP. As a result of the analysis, it is revealed that there is a positive Pearson correlation between the growth of AGI and real GDP; that is, to increase GDP by 1%, an average increase of 12.5% of AGI is required. 2025-05-20T08:44:30Z Davit Gondauri http://arxiv.org/abs/2505.13019v1 Characterizing asymmetric and bimodal long-term financial return distributions through quantum walks 2025-05-19T12:04:10Z The analysis of logarithmic return distributions defined over large time scales is crucial for understanding the long-term dynamics of asset price movements. For large time scales of the order of two trading years, the anticipated Gaussian behavior of the returns often does not emerge, and their distributions often exhibit a high level of asymmetry and bimodality. These features are inadequately captured by the majority of classical models to address financial time series and return distributions. In the presented analysis, we use a model based on the discrete-time quantum walk to characterize the observed asymmetry and bimodality. The quantum walk distinguishes itself from a classical diffusion process by the occurrence of interference effects, which allows for the generation of bimodal and asymmetric probability distributions. By capturing the broader trends and patterns that emerge over extended periods, this analysis complements traditional short-term models and offers opportunities to more accurately describe the probabilistic structure underlying long-term financial decisions. 2025-05-19T12:04:10Z 24 pages, 11 figures, 2 tables Stijn De Backer Luis E. C. Rocha Jan Ryckebusch Koen Schoors http://arxiv.org/abs/2505.12413v1 The Stablecoin Discount: Evidence of Tether's U.S. Treasury Bill Market Share in Lowering Yields 2025-05-18T13:33:37Z Stablecoins represent a critical bridge between cryptocurrency and traditional finance, with Tether (USDT) dominating the sector as the largest stablecoin by market capitalization. By Q1 2025, Tether directly held approximately $98.5 billion in U.S. Treasury bills, representing 1.6% of all outstanding Treasury bills, making it one of the largest non-sovereign buyers in this crucial asset class, on par with nation-state-level investors. This paper investigates how Tether's market share of U.S. Treasury bills influences corresponding yields. The baseline semi-log time trend model finds that a 1% increase in Tether's market share is associated with a 1-month yield reduction of 3.8%, corresponding to 14-16 basis points. However, threshold regression analysis reveals a critical market share threshold of 0.973%, above which the yield impact intensifies significantly. In this high regime, a 1% market share increase reduces 1-month yields by 6.3%. At the end of Q1 2025, Tether's market share placed it firmly within this high-impact regime, reducing 1-month yields by around 24 basis points relative to a counterfactual. In absolute terms, Tether's demand for Treasury Bills equates to roughly $15 billion in annual interest savings for the U.S. government. Aligning with theories of liquidity saturation and nonlinear price impact, these results highlight that stablecoin demand can reduce sovereign funding costs and provide a potential buffer against market shocks. 2025-05-18T13:33:37Z 15 pages, 3 tables, 1 figure Lennart Ante Aman Saggu Ingo Fiedler http://arxiv.org/abs/2505.13533v1 FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs 2025-05-18T11:47:55Z Financial tasks are pivotal to global economic stability; however, their execution faces challenges including labor intensive processes, low error tolerance, data fragmentation, and tool limitations. Although large language models (LLMs) have succeeded in various natural language processing tasks and have shown potential in automating workflows through reasoning and contextual understanding, current benchmarks for evaluating LLMs in finance lack sufficient domain-specific data, have simplistic task design, and incomplete evaluation frameworks. To address these gaps, this article presents FinMaster, a comprehensive financial benchmark designed to systematically assess the capabilities of LLM in financial literacy, accounting, auditing, and consulting. Specifically, FinMaster comprises three main modules: i) FinSim, which builds simulators that generate synthetic, privacy-compliant financial data for companies to replicate market dynamics; ii) FinSuite, which provides tasks in core financial domains, spanning 183 tasks of various types and difficulty levels; and iii) FinEval, which develops a unified interface for evaluation. Extensive experiments over state-of-the-art LLMs reveal critical capability gaps in financial reasoning, with accuracy dropping from over 90% on basic tasks to merely 40% on complex scenarios requiring multi-step reasoning. This degradation exhibits the propagation of computational errors, where single-metric calculations initially demonstrating 58% accuracy decreased to 37% in multimetric scenarios. To the best of our knowledge, FinMaster is the first benchmark that covers full-pipeline financial workflows with challenging tasks. We hope that FinMaster can bridge the gap between research and industry practitioners, driving the adoption of LLMs in real-world financial practices to enhance efficiency and accuracy. 2025-05-18T11:47:55Z Junzhe Jiang Chang Yang Aixin Cui Sihan Jin Ruiyu Wang Bo Li Xiao Huang Dongning Sun Xinrun Wang http://arxiv.org/abs/2505.12198v1 Multivariate Affine GARCH with Heavy Tails: A Unified Framework for Portfolio Optimization and Option Valuation 2025-05-18T02:27:44Z This paper develops and estimates a multivariate affine GARCH(1,1) model with Normal Inverse Gaussian innovations that captures time-varying volatility, heavy tails, and dynamic correlation across asset returns. We generalize the Heston-Nandi framework to a multivariate setting and apply it to 30 Dow Jones Industrial Average stocks. The model jointly supports three core financial applications: dynamic portfolio optimization, wealth path simulation, and option pricing. Closed-form solutions are derived for a Constant Relative Risk Aversion (CRRA) investor's intertemporal asset allocation, and we implement a forward-looking risk-adjusted performance comparison against Merton-style constant strategies. Using the model's conditional volatilities, we also construct implied volatility surfaces for European options, capturing skew and smile features. Empirically, we document substantial wealth-equivalent utility losses from ignoring time-varying correlation and tail risk. These findings underscore the value of a unified econometric framework for analyzing joint asset dynamics and for managing portfolio and derivative exposures under non-Gaussian risks. 2025-05-18T02:27:44Z Ayush Jha Abootaleb Shirvani Ali Jaffri Svetlozar T. Rachev Frank J. Fabozzi http://arxiv.org/abs/2501.17490v2 Pricing Carbon Allowance Options on Futures: Insights from High-Frequency Data 2025-05-16T10:11:34Z Leveraging a unique dataset of carbon futures option prices traded on the ICE market from December 2015 until December 2020, we present the results from an unprecedented calibration exercise. Within a multifactor stochastic volatility framework with jumps, we employ a three-dimensional pricing kernel compensating for equity and variance components' risk to derive an analytically tractable and numerically practical approach to pricing. To the best of our knowledge, we are the first to provide an estimate of the equity and variance risk premia for the carbon futures option market. We gain insights into daily option and futures dynamics by exploiting the information from tick-by-tick futures trade data. Decomposing the realized measure of futures volatility into continuous and jump components, we employ them as auxiliary variables for estimating futures dynamics via indirect inference. Our approach provides a realistic description of carbon futures price, volatility, and jump dynamics and an insightful understanding of the carbon option market. 2025-01-29T09:00:03Z Main text 38 pages, supplementary online information 11 pages, 6 figures, 12 tables. W.r.t Version 1, few typos fixed Simone Serafini Giacomo Bormetti