Agent Behavior Learning And Policy Optimization

1

OpikRepository57/100

via “agent optimization with bayesian and grid search algorithms”

LLM evaluation and tracing platform — automated metrics, prompt management, CI/CD integration.

Unique: BaseOptimizer framework with pluggable algorithms (Bayesian, grid search, random) enables custom optimization strategies. Integrates with evaluation system to use quality scores as optimization signal.

vs others: Open-source optimizer framework allows custom algorithms vs. closed-box commercial solutions; integration with evaluation system enables end-to-end optimization vs. separate tools.

2

opikAgent56/100

via “agent optimization with hyperparameter tuning”

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Unique: Implements a pluggable BaseOptimizer framework supporting multiple optimization algorithms (Bayesian, genetic, etc.) integrated with the experiment system, enabling automated hyperparameter search without external optimization libraries

vs others: More specialized than generic hyperparameter optimization tools because it understands LLM-specific hyperparameters (temperature, top_p, system prompts) and integrates with the evaluation system

3

AgentScopeRepository56/100

via “agentic rl and model fine-tuning for agent behavior optimization”

Multi-agent platform with distributed deployment.

Unique: Integrates agentic RL and fine-tuning as a built-in optimization framework that collects agent trajectories, uses evaluation metrics as reward signals, and fine-tunes underlying LLMs through provider APIs, enabling continuous agent improvement without external ML infrastructure.

vs others: More integrated than external fine-tuning services because optimization is coordinated with agent execution and evaluation; more flexible than single-approach solutions because it supports both RL and supervised fine-tuning.

4

hello-agentsAgent52/100

via “agentic reinforcement learning training pipeline for agent optimization”

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

Unique: Provides concrete patterns for implementing RL training loops for agents, including reward signal generation and trajectory collection, treating RL as an optional optimization layer rather than a requirement, enabling teams to start with prompt-based agents and add RL training as they scale

vs others: More sophisticated than pure prompt engineering but more practical than full policy learning from scratch; enables continuous improvement of agent behavior based on real-world performance

5

agentscopeAgent51/100

via “model fine-tuning and optimization with rl and prompt tuning”

Build and run agents you can see, understand and trust.

Unique: Integrates RL-based fine-tuning and prompt tuning as first-class optimization capabilities, allowing agents to improve their behavior through learning rather than requiring manual prompt engineering or model retraining

vs others: More integrated than LangChain's optimization support because fine-tuning and prompt tuning are built into the framework; more practical than AutoGen's optimization because it provides concrete RL and prompt tuning implementations

6

Agent framework that generates its own topology and evolves at runtimeFramework50/100

Hi HN,I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they slee

Unique: Learns topology and routing policies from execution traces using ML, enabling data-driven optimization of agent networks without manual tuning

vs others: More sophisticated than heuristic-based evolution, but requires more data and expertise; less predictable than rule-based optimization

7

aiAgentsEverywhereAgent49/100

via “adaptive agent behavior learning from interaction feedback”

aiAgentsEverywhere

Unique: Implements closed-loop learning where user feedback directly influences agent behavior through automated policy updates, rather than one-way feedback collection for manual model retraining

vs others: Enables continuous improvement without manual retraining cycles, unlike static agent systems that require explicit model updates; more practical than full RLHF by using lightweight preference learning on interaction data

8

MobileAgentAgent49/100

via “semi-online reinforcement learning for action policy optimization”

Mobile-Agent: The Powerful GUI Agent Family

Unique: Semi-online RL approach collects trajectories from live app executions and generates synthetic rewards based on task completion metrics, enabling continuous policy improvement without manual annotation; integrated with VERL framework for distributed training across GPU clusters

vs others: More efficient than supervised fine-tuning because it learns from both successful and failed trajectories; more practical than pure online RL because it uses semi-online data collection that doesn't require real-time training infrastructure

9

Agent-SAgent49/100

via “behavior best-of-n (bbon) sampling with rollout-based refinement”

Agent S: an open agentic framework that uses computers like a human

Unique: Implements in-context reinforcement learning through parallel rollout sampling and LMM-based trajectory evaluation, achieving 72.60% OSWorld accuracy without model fine-tuning by leveraging the LMM's reasoning capability to select high-quality action sequences

vs others: Outperforms single-shot planning by 10-15% on complex benchmarks through best-of-N selection, while avoiding the infrastructure complexity of external RL training or reward models

10

Agent Swarm – Multi-agent self-learning teamsRepository42/100

via “self-learning agent behavior adaptation”

Show HN: Agent Swarm – Multi-agent self-learning teams (OSS)

Unique: unknown — insufficient data on specific learning algorithms, whether learning is prompt-based or model-based, and how learning state persists across agent restarts

vs others: Positions as self-improving agents vs static LLM-based agents, but implementation details and learning guarantees are not documented

11

auto-companyAgent42/100

via “performance monitoring and autonomous optimization”

🤖 A fully autonomous AI company that runs 24/7. 14 AI agents (Bezos, Munger, DHH...) brainstorm ideas, write code, deploy products & make money — no human in the loop. Powered by Claude Code.

Unique: Implements closed-loop optimization where agents continuously monitor performance and autonomously adjust strategies without human intervention, using real-time metrics to drive decision-making rather than static plans

vs others: More automated than traditional performance management because it eliminates human analysis and decision-making; less reliable than human optimization because agents may lack domain expertise and real-world grounding

12

Honcho ServerMCP Server38/100

via “agent-behavior-modeling-and-prediction”

Build AI agents with social cognition and theory-of-mind capabilities to create personalized LLM-powered applications. Leverage comprehensive models of user psychology over time to enhance interactions and insights. Easily integrate multi-participant sessions and asynchronous reasoning for advanced

Unique: Applies theory-of-mind reasoning to AI agents themselves, building explicit models of agent behavior and decision-making that enable prediction and coordination in multi-agent systems

vs others: Extends psychology modeling beyond users to agents, enabling multi-agent systems to reason about each other's behavior and coordinate more effectively than systems treating agents as black boxes

13

openkrewAgent36/100

via “agent performance optimization and cost tracking”

Distributed multi-machine AI agent team platform

Unique: Integrates cost tracking and optimization into the core framework with automatic token counting and cost calculation across multiple LLM providers, rather than requiring manual cost tracking

vs others: Provides built-in cost controls and optimization recommendations, whereas most frameworks leave cost management to external tools or manual implementation

14

Agent Composer – Create your own AI rocket scientist agentAgent35/100

via “agent customization and parameter tuning”

Hey HN! We launched a thing today, and built a cool demo that I'm excited to share with the community.This tool creates AI agents easily and can handle some really technically complex work. I whipped up this rocket scientist agent in our tool in 10 minutes. I asked a couple of aerospace enginee

Unique: Exposes agent tuning parameters through a visual interface with likely guided defaults and explanations, enabling non-technical users to optimize agent behavior without understanding underlying LLM mechanics

vs others: More accessible than tuning agents built with LangChain or AutoGen, where parameter changes require code modifications and deeper LLM knowledge

15

openclaw-qaAgent34/100

via “agent evolution and capability adaptation through experience”

OpenClaw Q&A 社区 — AI Agent 记忆系统、多Agent架构、进化系统、具身AI | 龙虾茶馆 🦞

Unique: Implements closed-loop agent evolution where performance feedback directly drives configuration changes, creating a self-improving system that adapts without human intervention — rather than static agent definitions that require manual updates

vs others: Goes beyond prompt engineering by systematically analyzing what works and doesn't work, then automatically adjusting agent behavior based on empirical performance data, similar to reinforcement learning but applied to agent configuration rather than neural weights

16

neoagentAgent34/100

via “constraint-aware decision making with policy enforcement”

Proactive personal AI agent with no limits

Unique: Implements explicit constraint evaluation before action execution with conflict resolution, rather than relying on training-time alignment like most LLM agents

vs others: Provides stronger safety guarantees than alignment-based approaches by enforcing hard constraints, though potentially limiting agent flexibility

17

AgentsFramework29/100

via “symbolic-learning-based agent optimization”

Library/framework for building language agents

Unique: Directly parallels neural network training by treating prompts and tools as learnable parameters optimized through language-based gradients rather than numeric backpropagation, enabling agents to evolve without retraining underlying models

vs others: Differs from prompt engineering frameworks (like DSPy) by automating the full training loop with language gradients; differs from RL-based agent optimization by using symbolic reflection instead of reward signals

18

GPTSwarmAgent29/100

via “graph-based-agent-parameter-optimization”

Language Agents as Optimizable Graphs

Unique: Applies gradient-based and evolutionary optimization techniques to agent workflow parameters by leveraging the DAG structure to compute parameter sensitivities, rather than treating agent optimization as a black-box hyperparameter search problem

vs others: Enables principled multi-objective optimization of agent workflows with explicit cost-accuracy tradeoff analysis, whereas manual tuning or grid search approaches lack visibility into parameter sensitivity and Pareto frontiers

19

BrainSoupProduct25/100

via “agent behavior customization and instruction management”

Build an AI team that works for you, on your PC

Unique: Provides UI-driven agent instruction management with template inheritance and versioning, enabling non-technical users to customize agent behavior without prompt engineering expertise

vs others: More accessible than code-based agent configuration in LangChain or AutoGPT, with visual instruction management reducing barrier to entry for non-developers

20

“Westworld” simulationRepository23/100

via “agent behavior definition and policy execution”

A multi-agent environment simulation library

Unique: Separates behavior logic from agent state management through a policy-as-function model, allowing behaviors to be defined as pure functions that can be tested, composed, and swapped at runtime without modifying agent internals

vs others: More flexible than rigid behavior tree implementations because policies are first-class functions that can be dynamically composed, whereas behavior trees require structural modifications to add new patterns

Top Matches

Also Known As

Company