{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch","slug":"mathematical-discoveries-from-program-search-with-large-language-models-funsearch","name":"Mathematical discoveries from program search with large language models (FunSearch)","type":"product","url":"https://www.nature.com/articles/s41586-023-06924-6?utm_source=substack&utm_medium=email","page_url":"https://unfragile.ai/mathematical-discoveries-from-program-search-with-large-language-models-funsearch","categories":["productivity"],"tags":[],"pricing":{"model":"unknown","free":false,"starting_price":null},"status":"inactive","verified":false},"capabilities":[{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch__cap_0","uri":"capability://planning.reasoning.program.space.search.with.llm.guided.exploration","name":"program-space search with llm-guided exploration","description":"Searches through discrete program spaces (e.g., algorithm implementations, mathematical proofs) by using an LLM as a heuristic guide to propose candidate programs, then evaluates them against test cases or mathematical constraints. The system iteratively refines the search by learning from successful and failed program attempts, effectively treating program synthesis as a guided exploration problem rather than pure generation.","intents":["Discover novel algorithmic solutions that outperform known implementations on specific problem classes","Find mathematical constructs (sequences, functions) that satisfy previously unproven conjectures","Automatically generate optimized code for computationally hard problems without manual algorithmic insight","Explore combinatorial solution spaces too large for exhaustive search but tractable with intelligent pruning"],"best_for":["Research teams exploring mathematical conjectures and algorithm discovery","Optimization specialists seeking novel solutions to NP-hard or combinatorial problems","Academic institutions validating computational mathematics hypotheses"],"limitations":["Requires well-defined evaluation metrics or test suites to judge program correctness — works poorly on subjective or open-ended problems","Search time grows exponentially with program complexity and constraint count; practical for small-to-medium programs only","LLM guidance is probabilistic and may miss solution regions if training data doesn't cover similar problem structures","No guarantees of optimality or completeness — discovered solutions are heuristically good, not proven optimal"],"requires":["LLM API access (GPT-4 or equivalent) with function-calling capability","Formal specification of program constraints or test cases","Computational budget for iterative evaluation (hours to days per discovery)","Domain-specific evaluation harness (mathematical validator, performance benchmarker, etc.)"],"input_types":["natural language problem description","formal mathematical constraints or conjectures","test case suites with expected outputs","performance benchmarks or optimization objectives"],"output_types":["executable program code (Python, pseudocode, or domain-specific language)","mathematical proof sketches or constructive proofs","performance metrics and comparison against baselines","structured explanation of discovered solution logic"],"categories":["planning-reasoning","code-generation-editing","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch__cap_1","uri":"capability://planning.reasoning.iterative.program.refinement.with.failure.driven.learning","name":"iterative program refinement with failure-driven learning","description":"Maintains a feedback loop where failed program attempts are converted into in-context examples that guide the LLM toward better proposals in subsequent iterations. The system tracks which program structures, algorithmic patterns, and constraint violations led to failures, then uses this history to steer the LLM away from unpromising regions of the solution space.","intents":["Progressively improve program quality by learning from past mistakes without retraining the LLM","Reduce search time by avoiding repeated exploration of similar failed patterns","Understand why certain algorithmic approaches fail on specific problem instances","Build a corpus of working solutions that can be used as in-context examples for related problems"],"best_for":["Teams running long-horizon program search experiments where iteration count is high (100s to 1000s of attempts)","Researchers studying how LLMs learn from negative examples in structured domains","Optimization workflows where solution quality improves monotonically with iteration"],"limitations":["Context window limits the number of failure examples that can be retained — typically 10-50 examples before context overflow","LLM may overfit to recent failures and miss alternative solution strategies if failure patterns are not diverse","No mechanism to escape local optima if all nearby proposals fail — requires random restarts or search space diversification","Failure analysis is implicit in the LLM's reasoning; no explicit symbolic explanation of why patterns fail"],"requires":["LLM with large context window (8K+ tokens) to retain failure history","Deterministic evaluation function that produces consistent pass/fail results","Structured logging of program attempts with failure reasons and constraint violations","Mechanism to serialize and deserialize program candidates for storage"],"input_types":["previous program attempts (code or pseudocode)","failure logs with constraint violations or test case failures","performance metrics from prior iterations","domain-specific error messages or validation feedback"],"output_types":["refined program proposals with modified logic or structure","prioritized list of next candidates to evaluate","summary of failure patterns and avoided strategies","convergence metrics showing improvement over iterations"],"categories":["planning-reasoning","memory-knowledge"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch__cap_2","uri":"capability://planning.reasoning.constraint.aware.program.generation.with.multi.objective.evaluation","name":"constraint-aware program generation with multi-objective evaluation","description":"Generates program candidates that must satisfy multiple evaluation criteria simultaneously (e.g., correctness on test cases, runtime performance, code simplicity, mathematical elegance). The system ranks candidates by a composite score that balances these objectives, allowing users to explore trade-offs between solution quality dimensions.","intents":["Find algorithms that are both correct and efficient, not just correct","Discover mathematically elegant solutions that are also computationally practical","Optimize for multiple performance metrics (speed, memory, numerical stability) in a single search","Understand Pareto frontiers of solution quality across different evaluation dimensions"],"best_for":["Algorithm researchers optimizing for both theoretical and practical performance","Mathematicians seeking proofs that are both correct and insightful","Performance engineers tuning code for multiple hardware or resource constraints"],"limitations":["Defining and weighting multiple objectives requires domain expertise; poor objective design leads to irrelevant solutions","Evaluation cost scales linearly with number of objectives — each candidate must be tested against all metrics","Trade-offs between objectives may not be obvious to the LLM; it may generate solutions that are mediocre on all dimensions","No built-in mechanism to detect and handle conflicting objectives (e.g., speed vs. code readability)"],"requires":["Quantifiable evaluation metrics for each objective (test pass rate, execution time, code length, etc.)","Weighting scheme or Pareto ranking method to combine objectives","Evaluation harness capable of measuring all objectives on each candidate","Baseline or reference solutions for comparison"],"input_types":["problem specification with multiple success criteria","weighted objective function or Pareto ranking rules","test suites for correctness evaluation","performance benchmarks or resource constraints"],"output_types":["ranked list of candidate programs with per-objective scores","Pareto frontier of non-dominated solutions","trade-off analysis showing which objectives conflict","recommended solution based on user-specified preference weights"],"categories":["planning-reasoning","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch__cap_3","uri":"capability://code.generation.editing.domain.specific.program.synthesis.with.problem.aware.prompting","name":"domain-specific program synthesis with problem-aware prompting","description":"Tailors LLM prompts to specific problem domains (e.g., combinatorial optimization, mathematical sequences, algorithm design) by embedding domain knowledge, common patterns, and successful solution templates into the prompt context. The system adapts its generation strategy based on the problem class, improving proposal quality without retraining.","intents":["Generate programs that leverage domain-specific algorithmic patterns (e.g., dynamic programming for optimization problems)","Reduce search time by seeding the LLM with relevant solution templates and known techniques","Adapt the search strategy to problem characteristics (e.g., use greedy heuristics for NP-hard problems)","Improve solution quality by incorporating domain expertise into the generation process"],"best_for":["Research teams working repeatedly on problems within a specific domain (e.g., combinatorics, number theory)","Organizations building domain-specific program synthesis tools","Practitioners who can articulate domain patterns and best practices"],"limitations":["Requires manual curation of domain knowledge and solution templates — not automated","Domain-specific prompts may bias the search toward known techniques, reducing novelty","Transferability is limited — prompts optimized for one problem class may not work for related classes","Maintaining domain knowledge as new techniques emerge requires ongoing prompt engineering"],"requires":["Explicit articulation of domain patterns, heuristics, and common solution structures","Library of successful solution templates or reference implementations","Domain expert to design and validate prompts","Evaluation harness specific to the domain"],"input_types":["problem specification in domain-specific language or natural language","domain knowledge base (patterns, heuristics, templates)","reference solutions or exemplars","problem-specific constraints or performance targets"],"output_types":["program candidates tailored to domain conventions","explanation of which domain patterns were applied","comparison against domain-specific baselines","structured solution that follows domain best practices"],"categories":["code-generation-editing","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch__cap_4","uri":"capability://planning.reasoning.mathematical.conjecture.validation.through.program.discovery","name":"mathematical conjecture validation through program discovery","description":"Automatically discovers programs (algorithms, constructions, proofs) that either validate or refute mathematical conjectures by searching for counterexamples or constructive proofs. The system translates mathematical statements into executable test cases or constraint specifications, then uses program search to find solutions that satisfy or violate the conjecture.","intents":["Find counterexamples to mathematical conjectures by searching for inputs that violate the conjecture","Discover constructive proofs or algorithms that demonstrate conjecture validity","Automatically generate test cases that probe conjecture boundaries","Explore the space of possible solutions to open mathematical problems"],"best_for":["Mathematicians and theoretical computer scientists exploring conjectures","Research teams validating or refuting open problems computationally","Educational institutions teaching mathematical discovery and proof techniques"],"limitations":["Only applicable to conjectures that can be formalized as executable constraints or test cases","Computational search is limited to finite domains or bounded search spaces — cannot prove universal statements","Discovering counterexamples does not prove a conjecture false in general; requires mathematical verification","Proof discovery is limited to constructive proofs; existential proofs or proofs by contradiction are harder to automate"],"requires":["Formal specification of the conjecture as executable constraints or test cases","Bounded search space or finite domain (e.g., integers up to 10^6)","Evaluation harness that can check conjecture satisfaction","Mathematical validator to verify discovered solutions"],"input_types":["mathematical conjecture in natural language or formal notation","formalized constraints or test case specifications","domain bounds and search space definition","reference materials or related theorems"],"output_types":["counterexample (if conjecture is false) with proof of violation","constructive proof or algorithm (if conjecture is true)","search statistics showing coverage of solution space","mathematical explanation of discovered solution"],"categories":["planning-reasoning","code-generation-editing"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mathematical-discoveries-from-program-search-with-large-language-models-funsearch__cap_5","uri":"capability://automation.workflow.scalable.evaluation.and.ranking.of.program.candidates","name":"scalable evaluation and ranking of program candidates","description":"Efficiently evaluates large numbers of program candidates (100s to 1000s) against test suites and performance metrics, then ranks them by quality scores. The system uses parallel evaluation, caching, and early termination to reduce computational overhead while maintaining ranking accuracy.","intents":["Quickly identify the best programs from a large candidate pool","Understand the distribution of solution quality across the search space","Allocate computational budget efficiently by prioritizing promising candidates","Track convergence and improvement over search iterations"],"best_for":["Teams running large-scale program search experiments with 1000s of candidates","Researchers studying solution quality distributions and search landscapes","Optimization workflows where evaluation cost is a bottleneck"],"limitations":["Parallel evaluation requires multi-core or distributed infrastructure — not practical on single machines for large candidate pools","Caching assumes deterministic evaluation; non-deterministic programs or stochastic metrics break caching assumptions","Early termination may miss high-quality solutions if they fail early test cases","Ranking is only as good as the evaluation metrics; poor metrics lead to poor rankings"],"requires":["Parallel execution environment (multi-core, distributed cluster, or cloud compute)","Deterministic evaluation harness with consistent pass/fail results","Caching layer for test results (in-memory or persistent)","Ranking algorithm or scoring function"],"input_types":["program candidates (code or pseudocode)","test suites with expected outputs","performance benchmarks","evaluation configuration (timeout, resource limits, etc.)"],"output_types":["ranked list of candidates with quality scores","evaluation statistics (pass rate, performance metrics, etc.)","convergence plots showing improvement over iterations","detailed evaluation logs for debugging"],"categories":["automation-workflow","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":18,"verified":false,"data_access_risk":"low","permissions":["LLM API access (GPT-4 or equivalent) with function-calling capability","Formal specification of program constraints or test cases","Computational budget for iterative evaluation (hours to days per discovery)","Domain-specific evaluation harness (mathematical validator, performance benchmarker, etc.)","LLM with large context window (8K+ tokens) to retain failure history","Deterministic evaluation function that produces consistent pass/fail results","Structured logging of program attempts with failure reasons and constraint violations","Mechanism to serialize and deserialize program candidates for storage","Quantifiable evaluation metrics for each objective (test pass rate, execution time, code length, etc.)","Weighting scheme or Pareto ranking method to combine objectives"],"failure_modes":["Requires well-defined evaluation metrics or test suites to judge program correctness — works poorly on subjective or open-ended problems","Search time grows exponentially with program complexity and constraint count; practical for small-to-medium programs only","LLM guidance is probabilistic and may miss solution regions if training data doesn't cover similar problem structures","No guarantees of optimality or completeness — discovered solutions are heuristically good, not proven optimal","Context window limits the number of failure examples that can be retained — typically 10-50 examples before context overflow","LLM may overfit to recent failures and miss alternative solution strategies if failure patterns are not diverse","No mechanism to escape local optima if all nearby proposals fail — requires random restarts or search space diversification","Failure analysis is implicit in the LLM's reasoning; no explicit symbolic explanation of why patterns fail","Defining and weighting multiple objectives requires domain expertise; poor objective design leads to irrelevant solutions","Evaluation cost scales linearly with number of objectives — each candidate must be tested against all metrics","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.05,"quality":0.12,"ecosystem":0.25,"match_graph":0.25,"freshness":0.5,"weights":{"adoption":0.25,"quality":0.25,"ecosystem":0.1,"match_graph":0.35,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"inactive","updated_at":"2026-06-17T09:51:03.578Z","last_scraped_at":"2026-05-03T14:00:27.894Z","last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=mathematical-discoveries-from-program-search-with-large-language-models-funsearch","compare_url":"https://unfragile.ai/compare?artifact=mathematical-discoveries-from-program-search-with-large-language-models-funsearch"}},"signature":"fKrPmTR1/TwijYVP+Zmr+90YFrNCBBVoy4Isswy92fcPI/jpCYjH1TNPIsqF/u+7QnNMbSHghCmXp8O/WNhoDQ==","signedAt":"2026-06-19T22:56:05.963Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/mathematical-discoveries-from-program-search-with-large-language-models-funsearch","artifact":"https://unfragile.ai/mathematical-discoveries-from-program-search-with-large-language-models-funsearch","verify":"https://unfragile.ai/api/v1/verify?slug=mathematical-discoveries-from-program-search-with-large-language-models-funsearch","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}