Capability
Action Discretization And Token Based Policy Representation
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “action-as-text-token-representation”
Google's vision-language-action model for robotics.
Unique: Represents robot actions as discrete tokens in the language model vocabulary rather than using continuous outputs or separate policy heads, enabling the same transformer decoder to generate both language and actions
vs others: Simplifies architecture compared to models with separate policy networks or continuous action heads, enabling more efficient joint training on language and robotic tasks within a single transformer framework