Magma is pre-trained on large amounts of heterogeneous VL datasets including images, videos and robotics data.