Trajectory Conditioned Solution Generation With Scoring Feedback

1

Classifier-Free Diffusion GuidanceProduct23/100

via “conditional-unconditional score function learning”

* ⭐ 08/2022: [Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (DreamBooth)](https://arxiv.org/abs/2208.12242)

Unique: Uses conditioning dropout during training to force a single model to learn both conditional and unconditional score functions within shared parameters, rather than training separate models or using external classifiers for guidance

vs others: More parameter-efficient than separate conditional and unconditional models, and avoids external classifier dependencies compared to classifier guidance, but requires careful multi-objective training and may suffer from objective interference

2

Large Language Models as Optimizers (OPRO)Product21/100

via “trajectory-conditioned solution generation with scoring feedback”

* ⏫ 10/2023: [Eureka: Human-Level Reward Design via Coding Large Language Models (Eureka)](https://arxiv.org/abs/2310.12931)

Unique: Encodes the full optimization history as in-context examples rather than using a learned surrogate model or explicit reward function. The LLM implicitly learns to recognize patterns in the trajectory (e.g., 'solutions with property X scored higher') and applies those patterns to generate the next candidate, enabling adaptation without explicit model updates.

vs others: Simpler and faster to implement than Bayesian optimization or neural surrogate models, while capturing richer semantic patterns than random search or grid search by leveraging the LLM's pre-trained understanding of solution quality.

3

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (Retroformer)Product18/100

via “trajectory filtering and quality-based curriculum learning”

### Other Papers <a name="2023op"></a>

Unique: Applies curriculum learning to trajectory-based policy optimization, enabling agents to learn from mixed-quality data by prioritizing successful examples — this is distinct from uniform trajectory sampling which treats all trajectories equally

vs others: More sample-efficient than uniform sampling because high-quality trajectories contribute more to learning, and more robust than filtering alone because it gradually includes harder cases rather than discarding them

Top Matches

Also Known As

Company