🎉 Share Your 2025 Year-End Summary & Win $10,000 Sharing Rewards!
Reflect on your year with Gate and share your report on Square for a chance to win $10,000!
👇 How to Join:
1️⃣ Click to check your Year-End Summary: https://www.gate.com/competition/your-year-in-review-2025
2️⃣ After viewing, share it on social media or Gate Square using the "Share" button
3️⃣ Invite friends to like, comment, and share. More interactions, higher chances of winning!
🎁 Generous Prizes:
1️⃣ Daily Lucky Winner: 1 winner per day gets $30 GT, a branded hoodie, and a Gate × Red Bull tumbler
2️⃣ Lucky Share Draw: 10
Ali large model is open source again! Able to read pictures and know objects, based on Tongyi Qianwen 7B, commercially available
Source: Qubit
Following Tongyi Qianwen-7B (Qwen-7B), Alibaba Cloud launched the large-scale visual language model Qwen-VL, and it will be directly open sourced as soon as it goes online.
For example 🌰, we input a picture of Arnia, through the form of question and answer, Qwen-VL-Chat can not only summarize the content of the picture, but also locate the Arnia in the picture.
The first general model that supports Chinese open domain positioning
Let’s take a look at the characteristics of the Qwen-VL series models as a whole:
In terms of scenarios, Qwen-VL can be used in scenarios such as knowledge question answering, image question answering, document question answering, and fine-grained visual positioning.
For example, if a foreign friend who cannot understand Chinese goes to the hospital to see a doctor, facing the guide map with one head and two big ones, and does not know how to get to the corresponding department, he can directly throw the map and questions to Qwen-VL, and let it follow the Image information acts as a translator.
In terms of visual positioning ability, even if the picture is very complicated and there are many characters, Qwen-VL can accurately find Hulk and Spiderman according to the requirements.
The researchers tested Qwen-VL on standard English assessments in four categories of multimodal tasks (Zero-shot Caption/VQA/DocVQA/Grounding).
In addition, the researchers built a test set TouchStone based on the GPT-4 scoring mechanism.
If you are interested in Qwen-VL, there are demos on Modak Community and huggingface that you can try directly, and the link is at the end of the article~
Qwen-VL supports researchers and developers to carry out secondary development, and also allows commercial use, but it should be noted that for commercial use, you need to fill in the questionnaire application first.
Project link:
-Chat
Paper address: