Scalable Behavior Cloning with Open Data, Training, and Evaluation
Abstract.
We introduce ABC, a fully open-source stack for bimanual manipulation with behavior cloning. At its core is the release of the ABC dataset, the largest bimanual teleoperation dataset to date, featuring 3,500 hours of data spanning over 130K episodes across nearly 200 diverse tasks. Furthermore, we open-source our accessible hardware setup, training infrastructure, and simulation pipeline. We also release 200 hours of sim-teleop data and provide a co-training recipe that produces correlated simulation and real-world evaluation. This allows researchers to effectively evaluate design choices without deploying on physical robots. We explore various training recipes and compare common architectural choices for Diffusion Transformers (DiT) and Vision-Language-Action (VLA) models, grounding our findings in real-world evaluations whose logs will also be released. The resulting policies successfully execute dexterous tasks such as box folding and extracting credit cards from wallets. By providing a reproducible toolbox, we aim to place researchers on an equal footing, establishing the necessary foundation to learn the ABCs of Behavior Cloning together as a community.