which is designed to integrate CLIP and DINOv2 with multi-level features merging for enhancing visual capabilities of MLLMs. Checkout the paper. (We have added the pdf of the paper in /images folder) ...