Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Caitong Securities: Architectural Innovation Breaks Through Large Model Inference Latency Bottleneck, Vast Market Space Expected to Scale Rapidly
Caitong Securities released a research report stating that LPU is a new generation chip designed for large model inference, with the core based on TSP architecture. The firm believes that LPU benefits from excellent performance in low inference latency and is expected to achieve rapid penetration. They are optimistic about LPU’s high growth potential and the PCB opportunities brought by LPU shipments in cabinets. Recommended stocks include: Zhiwei Intelligence (001339.SZ) (stake in Yuanchuan Micro), Xingchen Technology (301536.SZ) (multiple rounds of funding for Yuanchuan Micro), Sh电股份 (002463.SZ) (NVIDIA PCB supplier), Shenghong Technology (300476.SZ) (NVIDIA PCB supplier), and Shennan Circuit (002916.SZ).
Caitong Securities’ main points are as follows:
LPU is a new generation chip designed for large model inference, with TSP architecture at its core.
LPU is a new chip architecture tailored for sequential processing of compute-intensive tasks. Its core is based on TSP architecture, which includes five major functional modules. It disassembles the traditional five-stage processor pipeline across the entire chip, eliminating hardware complexity and ensuring deterministic instruction execution order and timing. Under the TSP architecture, compilers can directly access and precisely control the chip’s underlying hardware state, enabling software-defined hardware.
LPU can reduce latency during large model inference, enhancing user experience.
Latency during large model inference is closely linked to user experience. The main bottleneck is in the Decode stage, limited by memory bandwidth. LPU offers faster memory bandwidth, which shortens inference latency. Additionally, large models based on LPU not only have faster inference speeds but also offer better cost performance, further improving user experience.
LPU has broad potential for development and has entered initial mass production.
Currently, token consumption has surged significantly. By early 2024, China’s daily token consumption is projected at 100 billion, and by February 2026, the daily token consumption of mainstream large models has reached 180 trillion, driving rapid growth in the inference chip market. LPU can reduce inference latency for large models. The firm believes LPU is expected to gradually penetrate the inference chip market, which has high growth potential. LPU has already entered initial mass production, with volume expansion imminent.
Risk warnings: Risks of AI technology development not meeting expectations; risks of large model development falling short; risks of industry development not meeting expectations.