Expanding Access to Large Language Models

Advertisements

On February 5, the Chinese tech scene witnessed a significant milestone as Baidu Smart Cloud successfully activated its self-developed Kunlun Chip third-generation WanKa clusterThis marks the first officially lit self-developed WanKa cluster in the countryNot only does this action address Baidu’s own computing power supply challenges, but it also opens the door to potentially reducing the costs associated with large-scale AI models.

Prior to this event, DeepSeek launched its V3 and R1 models, achieving performance that could rival OpenAI's leading models while drastically cutting costsDuring the Spring Festival, this innovation garnered overwhelming attention worldwide.

The backdrop of these breakthroughs indicates that the competition in the AI large model market has entered a new phaseThis competition is no longer solely about technological supremacy; rather, it encompasses a broader range of factors, including cost efficiency, user experience, and ecosystem robustness

Advertisements

The dream of "supporting AI for the price of a daily cup of milk tea" is now closer to becoming a reality, marking a major stride towards making AI more accessible to the general public.

Following DeepSeek, domestic self-developed WanKa clusters have made their debutIn the wake of DeepSeek's model announcement, numerous actions in the chip industry, both domestic and international, have transpiredMajor players such as NVIDIA, AMD, and Intel abroad, and Huawei, Shengteng, Muxi, Tensu Zhixin, Mohe Thread, and Haiguang in China, have all declared their support for the deployment and inference services of DeepSeek's models.

On that first day back to work following the Spring Festival, Baidu Smart Cloud shared the news of lighting up the Kunlun Chip third-generation WanKa clusterThe completion of this cluster is expected to promote further reductions in model costs.

Previously, companies like Google, Amazon AWS, and Tesla had developed some chips in-house to lower costs and increase value for money

Advertisements

In contrast, during the past year in China, stringent compute power limitations have been a crucial factor keeping the costs of large models highBy developing custom chips and establishing extensive clusters, Baidu aims not only to alleviate its own compute power shortages but also to create a pathway to reduce the cost of large models.

The Kunlun Chip, which Baidu has been developing since its first generation launch in 2018, has seen little reporting in the past two yearsHowever, before the activation of the WanKa cluster, the industry began to pick up on whispers concerning the third-generation Kunlun Chip, with speculation that it would enter mass production in 2024. Some companies within the industry informed that they are evaluating the purchase of servers based on the third-generation Kunlun Chips in the latter half of 2024.

Baidu’s chairman, Li Yanhong, has consistently emphasized that the Kunlun chipset is the "cornerstone" of Baidu's AI technology stack, with self-developed capabilities ensuring technological sovereignty during the generative AI era.

In various presentations throughout 2024, Baidu has stated that the Kunlun Chip deepens its collaboration with the PaddlePaddle deep learning framework and the Wenxin large model, creating an end-to-end optimization that encompasses "chip-framework-model-application," significantly enhancing overall performance.

Previous generations of Kunlun chips were primarily used for AI deployment and inference services

Advertisements

The Kunlun Chip third-generation, however, marks a leap forward, specifically optimized for large models and training in the AI cloud spaceThe WanKa cluster that has now been activated has the capacity to significantly shorten the training cycles for models with hundreds of billions of parameters, while simultaneously supporting larger models, complex tasks, and multimodal data, thus facilitating the development of applications akin to Sora.

Moreover, the WanKa cluster supports multi-task concurrent operations, allowing dynamic resource segmentation that can train multiple lightweight models simultaneouslyThrough optimized communication and fault tolerance mechanisms, it aims to reduce wasted computing power, facilitating an exponential drop in training costs.

It is noteworthy that the inference market is poised to be a focal point this yearSources indicate that both domestic and international chip firms are gearing up to capture market share from NVIDIA

A seasoned AI computing expert informed that inference prioritizes "energy efficiency," focusing on maximizing computational performance per watt.

With Baidu’s Kunlun Chip cluster expected to join this competition, industry strategies for the inference market have largely centered around adapting services to mainstream modelsIndubitably, alongside its Wenxin Yiyan, the Kunlun Chip is also compatible with a host of models, including DeepSeek.

Baidu's official announcements have emphasized that the rise of domestic large models signifies a transition from "single-task-centric computing" to "maximized cluster efficiency." This includes plans to blend training, fine-tuning, and inference tasks to enhance overall cluster utilization and lower per-unit computing power costs.

Looking ahead, major players both domestically and internationally face the challenge of navigating around NVIDIA’s stronghold, built around CUDA

alefox

Over the past decade, NVIDIA has leveraged CUDA to not only control the training market but also to encroach upon the inference marketThe true power of CUDA lies in its extensive libraries developed across various domains, including life sciences, quantitative finance, and autonomous driving"To develop an application for drug molecules or autonomous driving using CUDA, there could already be hundreds of thousands of lines of code in existence, and you'd only need to write a few hundred lines to solve the problem," an expert remarked.

Currently, countries such as the UK, France, Canada, and various Chinese enterprises are showcasing resilience within the AI chip ecosystem, initiating foundational ecosystem developmentAdditionally, numerous global universities and research institutions, supported by their governments, are engaging in foundational work within this domain.

The wave of advancements set in motion by DeepSeek continues to ripple through the AI working landscape, with major cloud computing enterprises announcing their support for DeepSeek’s models, thus initiating a price war over market share.

The enthusiasm of these major firms is linked to the tremendous traffic generated by the DeepSeek models globally

During this Spring Festival holiday, monikers such as "Mysterious Eastern Power," "AI's Pinduoduo," and "AI for the price of a daily cup of milk tea" have all highlighted DeepSeekThe buzz resonates both in domestic and international circles, reflecting considerable attention towards the homegrown DeepSeek large models.

On February 4, the latest AI product leaderboard was revealed, indicating that within just 20 days since its launch, the DeepSeek application has surpassed 20 million daily active users, surpassing ChatGPT’s user base post-launch in just five days, marking it as the fastest-growing AI application globally.

On social media platform Weibo, the topic "DeepSeek on how to live well" surged to the top of trending searches on February 4. On XiaoHongShu, posts related to DeepSeek quickly exceeded 490,000, with various tutorials and evaluations flooding in, and some users amusingly engaged in "AI fortune-telling."

The enticing combination of "free usage + better performance" has proven to be the key to piquing the interest of everyday users.

More critically, DeepSeek has dealt a significant blow to the pricing structure established by OpenAI

Data indicates that under average usage conditions, the overall cost of using DeepSeek-R1 could be around 1/30 of OpenAI's latest model, making AI applications accessible to a wider audience with minimal cost.

As DeepSeek's popularity surged, the competition in pricing among tech giants escalatedOverseas entities like Microsoft Azure, Amazon AWS, and NVIDIA NIM services have integrated DeepSeek models in attempts to capture market share with more competitive pricingMeanwhile, domestic operators, including Aliyun, Baidu Smart Cloud, and Volcano Engine, are not far behind, engaging in pricing battles subsequent to their integration of DeepSeek models.

Some cloud computing companies have aligned their pricing to match DeepSeek's official list price, often adding discounts or free usage allowancesNotably, on February 3, Baidu Smart Cloud announced the most competitive rates: its R1 calling price came down to 50% of DeepSeek's official pricing, and its V3 calling price dropped to 30% of the official price, along with a two-week free promotion.

The dramatic reduction in large model calling prices has substantially lowered the barrier to high-quality models, accelerating decision-making within enterprises and rapidly igniting excitement among developers.

DeepSeek has emerged as a hot topic across various global tech forums

On the developer community CSDN, four out of the top ten trending posts are linked to DeepSeek, fast-tracking related applicationsOne user even harnessed DeepSeek to colorize old photographs without needing to write a single line of code.

In the finance sector, Jiangsu Bank has integrated DeepSeek into its service platform, "Smart Xiao Su," utilizing the DeepSeek-VL2 multimodal model and the lightweight DeepSeek-R1 inference model for intelligent contract quality inspection and automated valuation reconciliation tasks.

A multinational pharmaceutical company has utilized the DeepSeek-R1 model to construct a system for predicting drug side effects, leveraging patient historical data and real-time monitoring to lower clinical trial risks.

Shanghai Jiao Tong University has commenced using DeepSeek-V3 for generating synthetic data to develop specialized large models.

In response to DeepSeek’s competitive edge, OpenAI quickly released its new o3-mini model, which came with a decreased price point

Although this still exceeds DeepSeek’s pricing, it represents a noteworthy downward pricing trajectory.

Ultimately, the rise of DeepSeek signifies a paradigm shift in AI large model competition, transitioning from a focus solely on technology to a multifaceted contest encompassing cost, user experience, and ecosystem strength.

The slogan "AI for the price of a daily cup of milk tea" is no longer a mere dreamThe recent activities within the industry, buoyed by highly competitive pricing advantages, have fundamentally altered ordinary users’ interaction with AI technology and sparked a transformative wave across the sector, catalyzing AI development in a more inclusive direction.

The march towards democratizing large models is bound to accelerateAs more technology giants and platforms join the fray ignited by DeepSeek, the push for large models’ accessibility will gain momentum.

On February 3, we experienced the public cloud’s DeepSeek API call, engaging with DeepSeek R1 through two interactive applications:

Interactive Experience One: The AI Strategist of Qin Shi Huang Experience Card

Interactive Experience Two: The Time-Dyeing Machine for Old Photos

It was evident that even without any technical background prior to the engagement, users could easily log into Baidu Smart Cloud's website, complete real-name verification, and simply click to experience the online functionalities, accessing the DeepSeek-R1 and DeepSeek-V3 models within the “Model Square.”

Users can also select from 67 models provided by Qianfan, enabling the simultaneous operation of six models for comparative analyses, encouraging a user-driven selection process through direct experience.

This exemplifies the platform's advantages, integrating various open-source models akin to "Didi Chuxing" in AI

Users can compare pricing and performance, freely choosing the most efficient model services, while intelligent "carpooling" and multimodal collaboration can enhance model capabilities and application depth.

On the complementary services front, leading cloud platforms have exhibited quick responsiveness in constructing comprehensive solutions, including one-stop development toolchains and lifecycle security mechanisms.

In the toolchain domain, despite the model explosion charting a course over two years, access to useful tools remains criticalFor example, within the GitHub community, the most popular DeepSeek projects include a toolkit to assist developers in using DeepSeek—DeepSeek-Tools—and another one that aids developers in optimizing DeepSeek model hyperparameters—DeepSeek-AutoML.

Throughout the various cloud platforms, many have expressed commitment to enhancing their toolchains

For instance, while Baidu Smart Cloud has not directly offered a DeepSeek toolkit on its Qianfan model platform, it has consolidated various similar tools encompassing data processing, workflow orchestration, model fine-tuning, evaluation, and quantization.

When enterprise users endeavor to develop applications with DeepSeek models but are concerned about issues like training data leakage or concerns regarding the generated content and model safety during inference, cloud platforms have instituted safety mechanisms to allay these fears.

As per news releases, Baidu Smart Cloud has integrated unique content security operators with its DeepSeek model connections within the Qianfan inference pipeline, ensuring the safety of generated contentThe Data SafeBox guarantees that models may only be used for inference prediction programs, while training data remains exclusively applicable for model fine-tuning tasks

The built-in BLS log analysis and BCM alert mechanisms on the Qianfan platform ensure that highly regulated users in fields such as finance or healthcare can establish secure AI applications.

Additionally, the extensive industry coverage and solution portfolios amassed by cloud platforms will assist developers in rapidly replicating and adapting DeepSeek for vertical sector applications.

Finally, as enterprise focus gradually shifts from model training and fine-tuning to inference, supporting and optimizing inference becomes pivotalBaidu Smart Cloud has notably optimized DeepSeek performance, achieving outstanding optimization for computational tasks particular to the DeepSeek model architectureMoreover, they have attained enhanced throughput while maintaining core latency metrics under required service level agreements through effectively overlapping various resource types and implementing efficient Prefill/Decode architectures.

Qianfan supports multiple mainstream inference frameworks, allowing developers to select the most suitable inference engines for their practical applications

For instance, vLLM is renowned for its high throughput and memory efficiency, making it suitable for large-scale model deployments, while SGLang outshines other mainstream frameworks in latency and throughputFurthermore, users have the flexibility to customize the import and deployment of models, perfectly positioned for DeepSeek development.

With the inclusion of major players and platforms, the democratization of AI will undoubtedly become a hallmark of this year's agendaAs large models transition from being exclusive "luxury toys" to a staple for the average person, lowered innovation barriers will ignite immense creativity, allowing ordinary people—including small business owners using AI to design successful products, high school students leveraging open-source models for campus assistance, and rural doctors employing multimodal tools for diagnosis—to partake in this intelligent revolution, enabling everyone to step into the future once deemed unreachable, standing on the shoulders of AI.


Leave A Comment

Save my name, email, and website in this browser for the next time I comment.