Carper AI and Carnegie Mellon University Make Strides in Language Modeling and Image Detection
- Carper AI’s Free Willy 1 and 2 models outperform current language models
- Microsoft’s Orca approach to training was employed to create these models, which utilize a smaller dataset for the same performance as larger ones
- 500,000 and 100,000 cases were crafted using a less complex and more advanced model respectively
- Both models succeeded in comparison tests across various benchmarks proving size does not always matter
- Carper AI is eager to see what other applications will develop out of their open-source models
- Carnegie Mellon University has also developed an AI model called Bottom-up Top-down Detection Transformer (BTD) which anchors spoken language to objects in images or 3D point clouds, with the ability to recognize details such as individual parts of an object
- BTD also works in 2D environments with performance on par or twice as quickly as conventional detectors.
New 3D Language Model Exceeds Expectations on Multiple Benchmarks
- This video demonstrates the versatility of a new 3D language model that outperforms state-of-the-art methods on all 3D language grounding benchmarks
- It was awarded best mission at ECCV workshop for Language for 3D scenes
- With further training, the model can also compete with existing approaches for 2D language grounding benchmarks
- It uses a Fishing Deformable Attention Mechanism to converge twice as rapidly as the state-of-the-art, highlighting its impressive computational efficiency.