https://arxiv.org/api/diqpaVH+rQMBFbjhGewCTbIEa8Q2026-03-26T11:14:04Z29537515http://arxiv.org/abs/2512.08270v1Reasoning Models Ace the CFA Exams2025-12-09T05:57:19ZPrevious research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate state-of-the-art reasoning models on a set of mock CFA exams consisting of 980 questions across three Level I exams, two Level II exams, and three Level III exams. Using the same pass/fail criteria from prior studies, we find that most models clear all three levels. The models that pass, ordered by overall performance, are Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1. Specifically, Gemini 3.0 Pro achieves a record score of 97.6% on Level I. Performance is also strong on Level II, led by GPT-5 at 94.3%. On Level III, Gemini 2.5 Pro attains the highest score with 86.4% on multiple-choice questions while Gemini 3.0 Pro achieves 92.0% on constructed-response questions.2025-12-09T05:57:19ZJaisal PatelYunzhe ChenKaiwen HeKeyi WangDavid LiKairong XiaoXiao-Yang Liuhttp://arxiv.org/abs/2512.07526v1The Suicide Region: Option Games and the Race to Artificial General Intelligence2025-12-08T13:00:23ZStandard real options theory predicts delay in exercising the option to invest or deploy when extreme asset volatility or technological uncertainty are present. However, in the current race to develop artificial general intelligence (AGI), sovereign actors are exhibiting behaviors contrary to theoretical predictions: the US and China are accelerating AI investment despite acknowledging the potential for catastrophic failure from AGI misalignment. We resolve this puzzle by formalizing the AGI race as a continuous-time preemption game with endogenous existential risk. In our model, the cost of failure is no longer bounded only by the sunk cost of investment (I), but rather a systemic ruin parameter (D) that is correlated with development velocity and shared globally. As the disutility of catastrophe is embedded in both players' payoffs, the risk term mathematically cancels out of the equilibrium indifference condition. This creates a "suicide region" in the investment space where competitive pressures force rational agents to deploy AGI systems early, despite a negative risk-adjusted net present value. Furthermore, we show that "warning shots" (sub-existential disasters) will fail to deter AGI acceleration, as the winner-takes-all nature of the race remains intact. The race can only be halted if the cost of ruin is internalized, making safety research a prerequisite for economic viability. We derive the critical private liability threshold required to restore the option value of waiting and propose mechanism design interventions that can better ensure safe AGI research and socially responsible deployment.2025-12-08T13:00:23Z25 pages, 1 figureDavid Tanhttp://arxiv.org/abs/2507.22712v2Order-Flow Filtration and Directional Association with Short-Horizon Returns2025-12-08T04:09:43ZElectronic markets generate dense order flow with many transient orders, which degrade directional signals derived from the limit order book (LOB). We study whether simple structural filters on order lifetime, modification count, and modification timing sharpen the association between order book imbalance (OBI) and short-horizon returns in BankNifty index futures, where unfiltered OBI is already known to be a strong short-horizon directional indicator. The efficacy of each filter is evaluated using a three-step diagnostic ladder: contemporaneous correlations, linear association between discretised regimes, and Hawkes event-time excitation between OBI and return regimes. Our results indicate that filtration of the aggregate order flow produces only modest changes relative to the unfiltered benchmark. By contrast, when filters are applied on the parent orders of executed trades, the resulting OBI series exhibits systematically stronger directional association. Motivated by recent regulatory initiatives to curb noisy order flow, we treat the association between OBI and short-horizon returns as a policy-relevant diagnostic of market quality. We then compare unfiltered and filtered OBI series, using tick-by-tick data from the National Stock Exchange of India, to infer how structural filters on the order flow affect OBI-return dynamics in an emerging market setting.2025-07-30T14:22:47Z21 pagesAditya Nittur AnanthaShashi JainPrithwish Maitihttp://arxiv.org/abs/2512.15728v1FedSight AI: Multi-Agent System Architecture for Federal Funds Target Rate Prediction2025-12-05T16:45:18ZThe Federal Open Market Committee (FOMC) sets the federal funds rate, shaping monetary policy and the broader economy. We introduce \emph{FedSight AI}, a multi-agent framework that uses large language models (LLMs) to simulate FOMC deliberations and predict policy outcomes. Member agents analyze structured indicators and unstructured inputs such as the Beige Book, debate options, and vote, replicating committee reasoning. A Chain-of-Draft (CoD) extension further improves efficiency and accuracy by enforcing concise multistage reasoning. Evaluated at 2023-2024 meetings, FedSight CoD achieved accuracy of 93.75\% and stability of 93.33\%, outperforming baselines including MiniFed and Ordinal Random Forest (RF), while offering transparent reasoning aligned with real FOMC communications.2025-12-05T16:45:18ZNeurIPS 2025 Generative AI in Finance WorkshopYuhan HouTianji RaoJeremy TanAdler VitonXiyue ZhangDavid YeAbhishek KodiSanjana DulamAditya PaulYikai Fenghttp://arxiv.org/abs/2512.03709v1The Effect of High-Speed Rail Connectivity on Capital Market Earnings Forecast Error: Evidence from the Chinese Stock Market2025-12-03T12:00:11ZThis study examines how China's high-speed rail (HSR) expansion affects analyst earnings forecast errors from an economic information friction perspective. Using firm-year panel data from 2008-2019, a period that covers HSR's early introduction and rapid nationwide rollout, the findings show that analysts' relative earnings forecast errors (RFE) decline significantly only after firms' cities become connected by high-speed rail. The placebo test, which artificially shifts HSR connectivity 3 years earlier than the actual opening year, yields an insignificant DID coefficient, rejecting the possibility that forecast errors were improving before the infrastructure shock. This supports the conclusion that forecast error reduction is linked to real geographic accessibility improvements rather than coincidence, pre-existing trends, or analyst anticipation. Economically, the study highlights that HSR reduces analysts' costs of gathering private, incremental information, particularly soft information obtained via plant or management visits. The rail network does not directly alter firms' internal capital allocation or earnings generation paths, but it lowers spatial barriers to information collection, enabling analysts to update EPS expectations under reduced travel friction. This work provides intuitive evidence that geography and mobility improvements contribute to forecasting accuracy in China's emerging, decentralized capital market corridors, and it encourages future research to consider transport accessibility as an exogenous information cost shock rather than an internal firm-capital shock.2025-12-03T12:00:11ZShilong Hanhttp://arxiv.org/abs/2512.03189v1The First Crypto President: Presidential Power and Cryptocurrency Markets During Trump's Second Term (2025-2029)2025-12-02T19:39:03ZThis paper analyzes the intersection of presidential authority and cryptocurrency markets during Donald J. Trump's second term (2025-2029). We examine developments from 2024 through October 2025, focusing on how executive influence, family business ventures, and digital assets became intertwined in ways that blurred boundaries between public office and private profit. Using a mixed-methods approach that combines quantitative market data with qualitative institutional assessment, we identify politically linked digital assets as a distinct class characterized by reflexive valuations, asymmetric risk distribution, and systemic vulnerabilities. The Trump family's integrated cryptocurrency ecosystem reached peak valuations exceeding eleven billion dollars before collapsing by more than one trillion in market capitalization following a tariff announcement in October 2025. Results highlight conflicts of interest, failures in market microstructure, and the emergence of political finance as a monetizable phenomenon in the digital age. The study contributes to understanding how presidential signaling reshapes capital flows, how politically branded tokens function as quasi-currencies, and how sudden policy actions can trigger cascading liquidations across global digital asset systems.2025-12-02T19:39:03Z32 pages, 9 tables, 8 figures. Submitted to Journal of Business Economics and Finance. Revised version includes updated October-November 2025 market dataHabib Badawihttp://arxiv.org/abs/2512.07887v1Does it take two to tango: Interaction between Credit Default Swaps and National Stock Indices2025-12-01T14:03:07ZThis paper investigates both short and long-run interaction between BIST-100 index and CDS prices over January 2008 to May 2015 using ARDL technique. The paper documents several findings. First, ARDL analysis shows that 1 TL increase in CDS shrinks BIST-100 index by 22.5 TL in short-run and 85.5 TL in long-run. Second, 1000 TL increase in BIST index price causes 25 TL and 44 TL reducation in Turkey's CDS prices in short- and long-run respectively. Third, a percentage increase in interest rate shrinks BIST index by 359 TL and a percentage increase in inflation rate scales CDS prices up to 13.34 TL both in long-run. In case of short-run, these impacts are limited with 231 TL and 5.73 TL respectively. Fourth, a kurush increase in TL/USD exchange rate leads 24.5 TL (short-run) and 78 TL (long-run) reductions in BIST, while it augments CDS prices by 2.5 TL (short-run) and 3 TL (long-run) respectively. Fifth, each negative political events decreases BIST by 237 TL in short-run and 538 TL in long-run, while it increases CDS prices by 33 TL in short-run and 89 TL in long-run. These findings imply the highly dollar indebted capital structure of Turkish firms, and overly sensitivity of financial markets to the uncertainties in political sphere. Finally, the paper provides evidence for that BIST and CDS with control variables drift too far apart, and converge to a long-run equilibrium at a moderate monthly speed.2025-12-01T14:03:07ZJournal of Economics and Financial Analysis, 2018, 2(1), pp.129-149Yhlas SovbetovHami Sakahttp://arxiv.org/abs/2512.07886v1The Endogenous Constraint: Hysteresis, Stagflation, and the Structural Inhibition of Monetary Velocity in the Bitcoin Network (2016-2025)2025-11-30T19:51:43ZBitcoin operates as a macroeconomic paradox: it combines a strictly predetermined, inelastic monetary issuance schedule with a stochastic, highly elastic demand for scarce block space. This paper empirically validates the Endogenous Constraint Hypothesis, positing that protocol-level throughput limits generate a non-linear negative feedback loop between network friction and base-layer monetary velocity. Using a verified Transaction Cost Index (TCI) derived from Blockchain.com on-chain data and Hansen's (2000) threshold regression, we identify a definitive structural break at the 90th percentile of friction (TCI ~ 1.63). The analysis reveals a bifurcation in network utility: while the network exhibits robust velocity growth of +15.44% during normal regimes, this collapses to +6.06% during shock regimes, yielding a statistically significant Net Utility Contraction of -9.39% (p = 0.012). Crucially, Instrumental Variable (IV) tests utilizing Hashrate Variation as a supply-side instrument fail to detect a significant relationship in a linear specification (p=0.196), confirming that the velocity constraint is strictly a regime-switching phenomenon rather than a continuous linear function. Furthermore, we document a "Crypto Multiplier" inversion: high friction correlates with a +8.03% increase in capital concentration per entity, suggesting that congestion forces a substitution from active velocity to speculative hoarding.2025-11-30T19:51:43Z42 pages, 13 figures. JEL Classification: E41, E51, G15, C24Hamoon Soleimanihttp://arxiv.org/abs/2512.00142v1DeFi TrustBoost: Blockchain and AI for Trustworthy Decentralized Financial Decisions2025-11-28T18:30:39ZThis research introduces the Decentralized Finance (DeFi) TrustBoost Framework, which combines blockchain technology and Explainable AI to address challenges faced by lenders underwriting small business loan applications from low-wealth households. The framework is designed with a strong emphasis on fulfilling four crucial requirements of blockchain and AI systems: confidentiality, compliance with data protection laws, resistance to adversarial attacks, and compliance with regulatory audits. It presents a technique for tamper-proof auditing of automated AI decisions and a strategy for on-chain (inside-blockchain) and off-chain data storage to facilitate collaboration within and across financial organizations.2025-11-28T18:30:39Z19 pagesSwati SachanDale S. Ficketthttp://arxiv.org/abs/2511.15214v2Corporate Earnings Calls and Analyst Beliefs2025-11-25T18:42:49ZEconomic behavior is shaped not only by quantitative information but also by the narratives through which such information is communicated and interpreted (Shiller, 2017). I show that narratives extracted from earnings calls significantly improve the prediction of both realized earnings and analyst expectations. To uncover the underlying mechanisms, I introduce a novel text-morphing methodology in which large language models generate counterfactual transcripts that systematically vary topical emphasis (the prevailing narrative) while holding quantitative content fixed. This framework allows me to precisely measure how analysts under- and over-react to specific narrative dimensions. The results reveal systematic biases: analysts over-react to sentiment (optimism) and under-react to narratives of risk and uncertainty. Overall, the analysis offers a granular perspective on the mechanisms of expectation formation through the competing narratives embedded in corporate communication.2025-11-19T08:06:46ZGiuseppe Materahttp://arxiv.org/abs/2206.15365v10Most claimed statistical findings in cross-sectional return predictability are likely true2025-11-19T14:48:22ZThe false discovery rate (FDR) measures the share of false positives in a set of statistical tests. I develop simple and intuitive bounds on the FDR in cross-sectional predictability publications. The simplest bound requires just a few lines of math and finds $\text{FDR} \le 25\%$ based on summary statistics in eight out of nine previous studies. A more refined bound finds $\text{FDR} \le 9\%$. The FDR is small because randomly selecting accounting ratios produces statistically significant predictability far more often than would occur if there were no predictability. The bounds also reconcile the disparate FDR estimates in the literature.2022-06-30T15:36:31ZAndrew Y. Chenhttp://arxiv.org/abs/2511.15456v1Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining2025-11-19T14:15:23ZAs Decentralized Finance (DeFi) develops, understanding user intent behind DeFi transactions is crucial yet challenging due to complex smart contract interactions, multifaceted on-/off-chain factors, and opaque hex logs. Existing methods lack deep semantic insight. To address this, we propose the Transaction Intent Mining (TIM) framework. TIM leverages a DeFi intent taxonomy built on grounded theory and a multi-agent Large Language Model (LLM) system to robustly infer user intents. A Meta-Level Planner dynamically coordinates domain experts to decompose multiple perspective-specific intent analyses into solvable subtasks. Question Solvers handle the tasks with multi-modal on/off-chain data. While a Cognitive Evaluator mitigates LLM hallucinations and ensures verifiability. Experiments show that TIM significantly outperforms machine learning models, single LLMs, and single Agent baselines. We also analyze core challenges in intent inference. This work helps provide a more reliable understanding of user motivations in DeFi, offering context-aware explanations for complex blockchain activity.2025-11-19T14:15:23ZWritten in 2025 Q1Qian'ang MaoYuxuan ZhangJiaman ChenWenjun ZhouJiaqi Yanhttp://arxiv.org/abs/2511.15364v1Anonymization and Information Loss2025-11-19T11:44:48ZWe show that while anonymization effectively obscures firm identity, it significantly reduces the power of textual understanding, thereby diminishing models' ability to extract meaningful economic signals from financial texts. This information loss is particularly severe when numerical and object entities are removed from texts and is amplified in texts characterized by high linguistic uncertainty and firm specificity. Importantly, in the setting of sentiment extraction from earnings call transcripts, we find that information loss induced by anonymization is more pervasive and severe than the effects of look-ahead bias, suggesting that the costs of anonymization may outweigh its benefits in certain financial applications.2025-11-19T11:44:48ZKe WuBaozhong YangZhenkun YingDexin Zhouhttp://arxiv.org/abs/2511.15123v1Causal Inference in Financial Event Studies2025-11-19T04:57:19ZFinancial event studies, ubiquitous in finance research, typically use linear factor models with known factors to estimate abnormal returns and identify causal effects of information events. This paper demonstrates that when factor models are misspecified -- an almost certain reality -- traditional event study estimators produce inconsistent estimates of treatment effects. The bias is particularly severe during volatile periods, over long horizons, and when event timing correlates with market conditions. We derive precise conditions for identification and expressions for asymptotic bias. As an alternative, we propose synthetic control methods that construct replicating portfolios from control securities without imposing specific factor structures. Revisiting four empirical applications, we show that some established findings may reflect model misspecification rather than true treatment effects. While traditional methods remain reliable for short-horizon studies with random event timing, our results suggest caution when interpreting long-horizon or volatile-period event studies and highlight the importance of quasi-experimental designs when available.2025-11-19T04:57:19ZPaul Goldsmith-PinkhamTianshu Lyuhttp://arxiv.org/abs/2512.02029v1HODL Strategy or Fantasy? 480 Million Crypto Market Simulations and the Macro-Sentiment Effect2025-11-19T03:46:37ZCrypto enthusiasts claim that buying and holding crypto assets yields high returns, often citing Bitcoin's past performance to promote other tokens and fuel fear of missing out. However, understanding the real risk-return trade-off and what factors affect future crypto returns is crucial as crypto becomes increasingly accessible to retail investors through major brokerages. We examine the HODL strategy through two independent analyses. First, we implement 480 million Monte Carlo simulations across 378 non-stablecoin crypto assets, net of trading fees and the opportunity cost of 1-month Treasury bills, and find strong evidence of survivorship bias and extreme downside concentration. At the 2-3 year horizon, the median excess return is -28.4 percent, the 1 percent conditional value at risk indicates that tail scenarios wipe out principal after all costs, and only the top quartile achieves very large gains, with a mean excess return of 1,326.7 percent. These results challenge the HODL narrative: across a broad set of assets, simple buy-and-hold loads extreme downside risk onto most investors, and the miracles mostly belong to the luckiest quarter. Second, using a Bayesian multi-horizon local projection framework, we find that endogenous predictors based on realized risk-return metrics have economically negligible and unstable effects, while macro-finance factors, especially the 24-week exponential moving average of the Fear and Greed Index, display persistent long-horizon impacts and high cross-basket stability. Where significant, a one-standard-deviation sentiment shock reduces forward top-quartile mean excess returns by 15-22 percentage points and median returns by 6-10 percentage points over 1-3 year horizons, suggesting that macro-sentiment conditions, rather than realized return histories, are the dominant indicators for future outcomes.2025-11-19T03:46:37ZWeikang ZhangAlison Watts