https://arxiv.org/api/+LESr5iu+s+gcW/Wo4LHfn6hDYc 2026-04-19T13:31:45Z 3075 705 15 http://arxiv.org/abs/2206.10540v5 Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery 2024-03-05T07:36:09Z

This paper revisits datasets and evaluation criteria for Symbolic Regression (SR), specifically focused on its potential for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of symbolic regression for scientific discovery (SRSD). For each of the 120 SRSD datasets, we carefully review the properties of the formula and its variables to design reasonably realistic sampling ranges of values so that our new SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method can (re)discover physical laws from such datasets. We also create another 120 datasets that contain dummy variables to examine whether SR methods can choose necessary variables only. Besides, we propose to use normalized edit distances (NED) between a predicted equation and the true equation trees for addressing a critical issue that existing SR metrics are either binary or errors between the target values and an SR model's predicted values for a given input. We conduct benchmark experiments on our new SRSD datasets using various representative SR methods. The experimental results show that we provide a more realistic performance evaluation, and our user study shows that the NED correlates with human judges significantly more than an existing SR metric. We publish repositories of our code and 240 SRSD datasets.

2022-06-21T17:15:45Z Accepted at DMLR. Code and datasets are available at https://github.com/omron-sinicx/srsd-benchmark https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_easy https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_medium https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_hard and another three sets of SRSD datasets with dummy variables Journal of Data-centric Machine Learning Research (2024) Yoshitomo Matsubara Naoya Chiba Ryo Igarashi Yoshitaka Ushiku http://arxiv.org/abs/2403.02042v1 Deep Neural Network for Constraint Acquisition through Tailored Loss Function 2024-03-04T13:47:33Z

The significance of learning constraints from data is underscored by its potential applications in real-world problem-solving. While constraints are popular for modeling and solving, the approaches to learning constraints from data remain relatively scarce. Furthermore, the intricate task of modeling demands expertise and is prone to errors, thus constraint acquisition methods offer a solution by automating this process through learnt constraints from examples or behaviours of solutions and non-solutions. This work introduces a novel approach grounded in Deep Neural Network (DNN) based on Symbolic Regression that, by setting suitable loss functions, constraints can be extracted directly from datasets. Using the present approach, direct formulation of constraints was achieved. Furthermore, given the broad pre-developed architectures and functionalities of DNN, connections and extensions with other frameworks could be foreseen.

2024-03-04T13:47:33Z Eduardo Vyhmeister Rocio Paez Gabriel Gonzalez http://arxiv.org/abs/2401.07768v2 On Hilbert-Poincaré series of affine semi-regular polynomial sequences and related Gröbner bases 2024-03-03T16:31:16Z

Gröbner bases are nowadays central tools for solving various problems in commutative algebra and algebraic geometry. A typical use of Gröbner bases is the multivariate polynomial system solving, which enables us to construct algebraic attacks against post-quantum cryptographic protocols. Therefore, the determination of the complexity of computing Gröbner bases is very important both in theory and in practice: One of the most important cases is the case where input polynomials compose an (overdetermined) affine semi-regular sequence. The first part of this paper aims to present a survey on Gröbner basis computation and its complexity. In the second part, we shall give an explicit formula on the (truncated) Hilbert-Poincaré series associated to the homogenization of an affine semi-regular sequence. Based on the formula, we also study (reduced) Gröbner bases of the ideals generated by an affine semi-regular sequence and its homogenization. Some of our results are considered to give mathematically rigorous proofs of the correctness of methods for computing Gröbner bases of the ideal generated by an affine semi-regular sequence.

2024-01-15T15:26:52Z 25 pages, Comments are welcome! Mathematical Foundations for Post-Quantum Cryptography (T. Takagi et al. eds), Mathematics for Industry, Springer, 2024 Momonari Kudo Kazuhiro Yokoyama http://arxiv.org/abs/2111.10446v4 Multiplicity structure of the arc space of a fat point 2024-02-20T23:21:35Z

The equation $x^m = 0$ defines a fat point on a line. The algebra of regular functions on the arc space of this scheme is the quotient of $k[x, x', x^{(2)}, \ldots]$ by all differential consequences of $x^m = 0$. This infinite-dimensional algebra admits a natural filtration by finite dimensional algebras corresponding to the truncations of arcs. We show that the generating series for their dimensions equals $\frac{m}{1 - mt}$. We also determine the lexicographic initial ideal of the defining ideal of the arc space. These results are motivated by nonreduced version of the geometric motivic Poincaré series, multiplicities in differential algebra, and connections between arc spaces and the Rogers-Ramanujan identities. We also prove a recent conjecture put forth by Afsharijoo in the latter context.

2021-11-19T21:42:24Z Alg. Number Th. 18 (2024) 947-967 Rida Ait El Manssour Gleb Pogudin 10.2140/ant.2024.18.947 http://arxiv.org/abs/2402.13019v1 Improving Neural-based Classification with Logical Background Knowledge 2024-02-20T14:01:26Z

Neurosymbolic AI is a growing field of research aiming to combine neural networks learning capabilities with the reasoning abilities of symbolic systems. This hybridization can take many shapes. In this paper, we propose a new formalism for supervised multi-label classification with propositional background knowledge. We introduce a new neurosymbolic technique called semantic conditioning at inference, which only constrains the system during inference while leaving the training unaffected. We discuss its theoritical and practical advantages over two other popular neurosymbolic techniques: semantic conditioning and semantic regularization. We develop a new multi-scale methodology to evaluate how the benefits of a neurosymbolic technique evolve with the scale of the network. We then evaluate experimentally and compare the benefits of all three techniques across model scales on several datasets. Our results demonstrate that semantic conditioning at inference can be used to build more accurate neural-based systems with fewer resources while guaranteeing the semantic consistency of outputs.

2024-02-20T14:01:26Z 9 pages, 3 figures, submitted to IJCAI 2024 Arthur Ledaguenel Céline Hudelot Mostepha Khouadjia http://arxiv.org/abs/2305.07439v3 Dimension Results for Extremal-Generic Polynomial Systems over Complete Toric Varieties 2024-02-20T08:58:06Z

We study polynomial systems with prescribed monomial supports in the Cox rings of toric varieties built from complete polyhedral fans. We present combinatorial formulas for the dimensions of their associated subvarieties under genericity assumptions on the coefficients of the polynomials. Using these formulas, we identify at which degrees generic systems in polytopal algebras form regular sequences. Our motivation comes from sparse elimination theory, where knowing the expected dimension of these subvarieties leads to specialized algorithms and to large speed-ups for solving sparse polynomial systems. As a special case, we classify the degrees at which regular sequences defined by weighted homogeneous polynomials can be found, answering an open question in the Gröbner bases literature. We also show that deciding whether a sparse system is generically a regular sequence in a polytopal algebra is hard from the point of view of theoretical computational complexity.

2023-05-12T13:01:36Z Accepted for publication in Journal of Algebra Matías Bender Pierre-Jean Spaenlehauer http://arxiv.org/abs/2402.11915v1 Optimal Pseudorandom Generators for Low-Degree Polynomials Over Moderately Large Fields 2024-02-19T07:59:37Z

We construct explicit pseudorandom generators that fool $n$-variate polynomials of degree at most $d$ over a finite field $\mathbb{F}_q$. The seed length of our generators is $O(d \log n + \log q)$, over fields of size exponential in $d$ and characteristic at least $d(d-1)+1$. Previous constructions such as Bogdanov's (STOC 2005) and Derksen and Viola's (FOCS 2022) had either suboptimal seed length or required the field size to depend on $n$. Our approach follows Bogdanov's paradigm while incorporating techniques from Lecerf's factorization algorithm (J. Symb. Comput. 2007) and insights from the construction of Derksen and Viola regarding the role of indecomposability of polynomials.

2024-02-19T07:59:37Z Ashish Dwivedi Zeyu Guo Ben Lee Volk http://arxiv.org/abs/2304.06935v3 Groebner.jl: A package for Gröbner bases computations in Julia 2024-02-12T16:25:18Z

We present Groebner.jl, a Julia package for computing Groebner bases with the F4 algorithm. Groebner.jl is an efficient, portable, and open-source software. Groebner.jl works over integers modulo a prime and over the rationals, supports basic multi-threading, and specializes in computation in the degree reverse lexicographical monomial ordering. The implementation incorporates various symbolic computation techniques and leverages the Julia type system and tooling, which allows Groebner.jl to compete with the existing state of the art, in many instances outperform it, and exceed them in extensibility. Groebner.jl is freely available at https://github.com/sumiya11/Groebner.jl.

2023-04-14T05:47:34Z 10 pages Alexander Demin Shashi Gowda http://arxiv.org/abs/2402.07547v1 Ensuring trustworthy and ethical behaviour in intelligent logical agents 2024-02-12T10:19:17Z

Autonomous Intelligent Agents are employed in many applications upon which the life and welfare of living beings and vital social functions may depend. Therefore, agents should be trustworthy. A priori certification techniques (i.e., techniques applied prior to system's deployment) can be useful, but are not sufficient for agents that evolve, and thus modify their epistemic and belief state, and for open Multi-Agent Systems, where heterogeneous agents can join or leave the system at any stage of its operation. In this paper, we propose/refine/extend dynamic (runtime) logic-based self-checking techniques, devised in order to be able to ensure agents' trustworthy and ethical behaviour.

2024-02-12T10:19:17Z Journal of Logic and Computation, Volume 32, Issue 2, March 2022, Pages 443-478 Stefania Costantini 10.1093/logcom/exab091 http://arxiv.org/abs/2402.07353v1 Optimized Gröbner basis algorithms for maximal determinantal ideals and critical point computations 2024-02-12T01:02:33Z

Given polynomials $g$ and $f_1,\dots,f_p$, all in $\Bbbk[x_1,\dots,x_n]$ for some field $\Bbbk$, we consider the problem of computing the critical points of the restriction of $g$ to the variety defined by $f_1=\cdots=f_p=0$. These are defined by the simultaneous vanishing of the $f_i$'s and all maximal minors of the Jacobian matrix associated to $(g,f_1, \ldots, f_p)$. We use the Eagon-Northcott complex associated to the ideal generated by these maximal minors to gain insight into the syzygy module of the system defining these critical points. We devise new $F_5$-type criteria to predict and avoid more reductions to zero when computing a Gröbner basis for the defining system of this critical locus. We give a bound for the arithmetic complexity of this enhanced $F_5$ algorithm and compare it to the best previously known bound for computing critical points using Gröbner bases.

2024-02-12T01:02:33Z 10 pages, 3 algorithms, 4 figures Sriram Gopalakrishnan Vincent Neiger Mohab Safey El Din http://arxiv.org/abs/2402.07328v1 Computing discrete residues of rational functions 2024-02-11T23:28:31Z

In 2012 Chen and Singer introduced the notion of discrete residues for rational functions as a complete obstruction to rational summability. More explicitly, for a given rational function f(x), there exists a rational function g(x) such that f(x) = g(x+1) - g(x) if and only if every discrete residue of f(x) is zero. Discrete residues have many important further applications beyond summability: to creative telescoping problems, thence to the determination of (differential-)algebraic relations among hypergeometric sequences, and subsequently to the computation of (differential) Galois groups of difference equations. However, the discrete residues of a rational function are defined in terms of its complete partial fraction decomposition, which makes their direct computation impractical due to the high complexity of completely factoring arbitrary denominator polynomials into linear factors. We develop a factorization-free algorithm to compute discrete residues of rational functions, relying only on gcd computations and linear algebra.

2024-02-11T23:28:31Z 20 pages; submitted manuscript (not yet accepted) Proceedings of ISSAC 2024 (2024), pp. 65-73 Carlos E. Arreche Hari P. Sitaula 10.1145/3666000.3669676 http://arxiv.org/abs/2402.06057v1 Subalgebra and Khovanskii bases equivalence 2024-02-08T21:00:13Z

The main results of this paper establish a partial correspondence between two previously-studied analogues of Groebner bases in the setting of algebras: namely, subalgebra (aka SAGBI) bases for quotients of polynomial rings and Khovanskii bases for valued algebras. We aim to bridge the gap between the concrete, computational aspects of the former and the more abstract theory of the latter. Our philosophy is that most interesting examples of Khovanskii bases can also be realized as subalgebra bases and vice-versa. We also discuss the computation of Newton-Okounkov bodies, illustrating how interpreting Khovanskii bases as subalgebra bases makes them more amenable to the existing computer algebra tools.

2024-02-08T21:00:13Z 14 pages, 2 Figures Colin Alstad Michael Burr Oliver Clarke Timothy Duff http://arxiv.org/abs/2402.05579v1 Quantifier Elimination for Normal Cone Computations 2024-02-08T11:27:58Z

We present effective procedures to calculate regular normal cones and other related objects using quantifier elimination. This method of normal cone calculations is complementary to computing Lagrangians and it works best at points where the constraint qualifications fail and extra work for other methods becomes inevitable. This method also serves as a tool to calculate the regular co-derivative for semismooth* Newton methods. We list algorithms and their demonstrations of different use cases for this approach.

2024-02-08T11:27:58Z 15 pages, 2 figures Michael Mandlmayr Ali Kemal Uncu http://arxiv.org/abs/2402.05238v1 Automated Data-Driven Discovery of Material Models Based on Symbolic Regression: A Case Study on Human Brain Cortex 2024-02-07T20:31:07Z

We introduce a data-driven framework to automatically identify interpretable and physically meaningful hyperelastic constitutive models from sparse data. Leveraging symbolic regression, an algorithm based on genetic programming, our approach generates elegant hyperelastic models that achieve accurate data fitting through parsimonious mathematic formulae, while strictly adhering to hyperelasticity constraints such as polyconvexity. Our investigation spans three distinct hyperelastic models -- invariant-based, principal stretch-based, and normal strain-based -- and highlights the versatility of symbolic regression. We validate our new approach using synthetic data from five classic hyperelastic models and experimental data from the human brain to demonstrate algorithmic efficacy. Our results suggest that our symbolic regression robustly discovers accurate models with succinct mathematic expressions in invariant-based, stretch-based, and strain-based scenarios. Strikingly, the strain-based model exhibits superior accuracy, while both stretch- and strain-based models effectively capture the nonlinearity and tension-compression asymmetry inherent to human brain tissue. Polyconvexity examinations affirm the rigor of convexity within the training regime and demonstrate excellent extrapolation capabilities beyond this regime for all three models. However, the stretch-based models raise concerns regarding potential convexity loss under large deformations. Finally, robustness tests on noise-embedded data underscore the reliability of our symbolic regression algorithms. Our study confirms the applicability and accuracy of symbolic regression in the automated discovery of hyperelastic models for the human brain and gives rise to a wide variety of applications in other soft matter systems.

2024-02-07T20:31:07Z 53 pages, 17 figures, and 6 tables Jixin Hou Xianyan Chen Taotao Wu Ellen Kuhl Xianqiao Wang 10.1016/j.actbio.2024.09.005 http://arxiv.org/abs/2402.04392v1 Factorial Basis Method for q-Series Applications 2024-02-06T20:47:58Z

The Factorial Basis method, initially designed for quasi-triangular, shift-compatible factorial bases, provides solutions to linear recurrence equations in the form of definite-sums. This paper extends the Factorial Basis method to its q-analog, enabling its application in q-calculus. We demonstrate the adaptation of the method to q-sequences and its utility in the realm of q-combinatorics. The extended technique is employed to automatically prove established identities and unveil novel ones, particularly some associated with the Rogers-Ramanujan identities.

2024-02-06T20:47:58Z 9 double-column pages Antonio Jiménez-Pastor Ali Kemal Uncu