Lean 4 Theorem Proving With Llm Guided Proof Synthesis

1

o1Model54/100

via “multi-step mathematical proof generation and verification”

OpenAI's reasoning model with chain-of-thought problem solving.

Unique: Generates multi-step mathematical proofs through extended reasoning that explores proof strategies and backtracks when necessary, rather than pattern-matching to training examples. The reasoning phase is visible in the thinking tokens, enabling transparency into proof construction.

vs others: Outperforms standard LLMs on mathematical proof generation because the extended thinking phase allows exploration of proof strategies and verification of intermediate steps, resulting in more rigorous and correct proofs.

2

Leanstral: Open-source agent for trustworthy coding and formal proof engineeringAgent49/100

via “lean 4 theorem proving with llm-guided proof synthesis”

Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37

Unique: Combines LLM generation with Lean 4's kernel verification to create a trustworthy proof loop where every generated proof is cryptographically verified before acceptance, unlike pure LLM-based proof attempts that lack formal guarantees

vs others: Stronger than standalone LLM proof generation (GPT, Claude) because failed proof attempts trigger kernel feedback that retrains the agent's strategy, and stronger than manual Lean because it eliminates boilerplate tactic writing

3

DeepSeek: R1 0528Model24/100

via “mathematical proof verification and derivation”

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

Unique: Applies reinforcement-learning-trained reasoning to mathematical proof tasks, producing explicit step-by-step reasoning that can be audited for logical correctness. Unlike standard LLMs that generate plausible-sounding proofs, R1's reasoning approach enables identification of subtle logical gaps through visible intermediate steps.

vs others: More reliable than GPT-4 for proof verification due to explicit reasoning; slower than specialized proof assistants (Lean, Coq) but more accessible and requires less formal notation expertise.

4

Mathematical discoveries from program search with large language models (FunSearch)Product18/100

via “program-space search with llm-guided exploration”

### Audio Processing <a name="2023ap"></a>

Unique: Uses LLM as a learned heuristic within a structured search loop rather than as a one-shot generator, combining neural guidance with deterministic evaluation to explore discrete program spaces. Implements iterative refinement where the LLM learns from failed attempts through in-context examples, enabling discovery of solutions outside typical training data distributions.

vs others: Outperforms pure LLM code generation by grounding proposals in executable feedback, and outperforms traditional program synthesis by leveraging learned heuristics to prune the search space intelligently rather than relying on exhaustive enumeration or hand-crafted rules.

Top Matches

Also Known As

Company