KubeCon China 2025 Experience

LLM, LLM, and More LLM

Series - Cloud-Native Related

After resigning at the end of January, I spent time at home for the Chinese New Year, then traveled around Shanghai, Zhangjiajie, Chongqing, Suzhou, and Nanjing. I didn’t return to Shenzhen to start job hunting until mid-April. Initially, I wasn’t sure if I would have time to attend KubeCon China 2025 in June. However, I was fortunate that the company I received an offer from highly values technology. During the interview, my leader mentioned seeing my KubeCon experiences in my blog and said the company strongly encourages participation in such technical exchange events, even supporting giving talks with full expense coverage.

So, less than a month after joining, I went on a company-funded trip to KubeCon China 2025 in Hong Kong (:

I asked my colleagues if they were interested, but for various reasons, I ended up being the only one attending (sad

In short, this year’s KubeCon China was almost entirely focused on AI on Kubernetes - it could have been renamed to CloudNative AI Con.

This year’s KubeCon China was only two days long, with significantly fewer talks than last year - almost half as many. As a result, I also watched many KubeCon Europe 2025 talks online as a supplement.

Overall, my impressions this year were:

  • Kubernetes has become a mature foundation - anything that can run on K8s will eventually be moved to K8s
  • AI has brought new life to the CloudNative community, with many new CloudNative projects emerging around AI in the past two years. AI topics have become the absolute main theme of KubeCon.
    • The AI deployment section mainly discussed AI inference, with key technical points: distributed inference, scaling, and LLM-Aware load balancing, as well as AI model distribution
    • There were several discussions about AIOps, from simple ChatBot implementations to more complex Multi-Agent systems for tasks like cloud cost analysis and optimization
      • Kuaishou attempted to use Logs/Metrics to train a model for each service in their ultra-large-scale cluster to dynamically adjust HPA, achieving a balance between SLA and cost (if I remembered incorrectly, I take no responsibility hhh)
  • OpenTelemetry is maturing and getting closer to its goal of unifying Logs/Traces/Metrics signals
    • Platforms like Uptrace have emerged as unified observability platforms, fully utilizing OTel’s labels to correlate Logs/Traces
    • Current best practice is to still use traditional methods for collecting Logs and Metrics at the Infra level, while at the APP level, OTel handles all Logs, Traces, and Metrics, correlating them through Span ID with consistent label semantics
  • WASM is still exploring its use cases, with this year’s main focus being running small models at the edge

KubeCon China 2025 and KubeCon Europe 2025 video playlists:

Presentation slides can be downloaded here (NOTE: not all talks will have PDFs uploaded):

Next, I’ll introduce some interesting content I heard, organized by topic, along with corresponding video links and possible PPT links.

Introducing AIBrix: Cost-Effective and Scalable Kubernetes Control Plane for VLLM - Jiaxin Shan & Liguang Xie, ByteDance

AIBrix is a complete solution for running distributed LLM inference on K8s, including:

  • Distributed inference deployment
  • LLM scaling
  • LLM request routing (load balancing)
  • Distributed KV cache
    • Mainly centralized storage of this data to reduce HBM memory usage and lower memory requirements
  • Dynamic LoRa loading

Code:

AIBrix is currently under the vllm-project, with a good number of stars, suggesting the project is healthy and worth following.

More Than Model Sharding: LWS & Distributed Inference - Peter Pan & Nicole Li, DaoCloud & Shane Wang, Intel

One of the most interesting talks, covering distributed inference architecture, optimization points, and the advantages and usage of LWS.

Code:

Simply put, LWS is a CRD specifically designed for LLM distributed inference deployment, mainly supporting group scheduling for LLM tasks.

NOTE: According to an issue, AIBrix might integrate with LWS (possibly with official support): https://github.com/vllm-project/aibrix/issues/843#issuecomment-2728305020

AI Model Distribution Challenges and Best Practices

Developers discussing how to distribute LLM models of hundreds of GB in size within clusters. Current industry approaches:

  • dragonfly
  • juicefs
  • oci model spec + oci volume (k8s 1.33+)
  • Antipatterns in Observability: Lessons Learned and How OpenTelemetry Solves Them - Steve Flanders, Splunk
    • Very interesting and informative. The observability antipatterns he listed include:
      • Telemetry Data
        • Incomplete Instrumentation - need to introduce zero-code otel sdk for automatic data collection
          • metrics/logs/metrics signals aren’t all enabled by default, depends on agent implementation
          • In k8s, it’s recommended to disable both stdout logging and traditional prometheus pull /metrics endpoints, letting otel agent handle all App-level signals. Daemonset mode otel (or vector/fluentbit) mainly handles Infra-level logs
        • Over-Instrumentation - need to filter and streamline metrics at the otel-collector level before sending to backend storage
        • Inconsistent Naming Conventions - fully adopt OpenTelemetry solution for unified naming
      • Observability Platform
        • Vendor Lock-in - only choose platforms supporting OTel standards and use OTel naming conventions
        • Tool Sprawl - use unified observability platforms like Uptrace that support automatic Log-Trace correlation
        • Underestimating Scalability Requirements - use OTel for signal collection and choose scalable backend storage like VictoriaMetrics
      • Company Culture
        • Silos and Lack of Collaboration
        • Lack of Ownership & Accountability
  • KubeCon EU 2025 - From Logs To Insights: Real-time Conversational Troubleshooting for Kubernetes With GenAI - Tiago Reichert & Lucas Duarte, AWS
    • The opening OnCall skit was very realistic… though getting a phone alert after 1 minute of pod pending seems exaggerated…
    • After the skit, they covered the main content: encoding logs with embed models and storing in OpenSearch for RAG, giving the ChatBot k8s readonly permissions (banned secrets access), then using Deepseek/Claude for Q&A to solve problems
    • Code: https://github.com/aws-samples/sample-eks-troubleshooting-rag-chatbot
  • Portrait Service: AI-Driven PB-Scale Data Mining for Cost Optimization and Stability Enhancement - Yuji Liu & Zhiheng Sun, Kuaishou
    • Discussed how Kuaishou manages stability and performance optimization in their ultra-large-scale cluster of 200,000 machines
    • Covered relatively basic content - mainly collecting vast amounts of cluster information, processing through a big data system, then training dedicated models, with each service potentially having its own resource optimization model
    • This approach might be too heavy - worth learning from but not very useful in my current work scenario (scale too small)

The Next Steps for Ingress-NGINX and the Ingate Project - Jintao Zhang, Kong Inc.

Ingress-NGINX is finally being retired, with its successor being InGate, though InGate is currently almost empty (:

Code:

Keynote: Who Owns Your Pod? Observing and Blocking Unwanted Behavior at eBay With eBPF

Mainly introduced cilium’s tetragon, an eBPF-based K8S security tool, somewhat similar to apparmor but capable of more fine-grained permission management.

A friend argued that such tools aren’t very necessary - we should use GitOps processes and move security checks to the CICD pipeline.

KubeCon EU 2025 - Autonomous Al Agents for Cloud Cost Analysis - Ilya Lyamkin, Spotify

Implementation of a Multi-Agent system that automatically creates plans, writes SQL and Python for cloud cost analysis - very valuable reference.

Keynote: An Optimized Linux Stack for GenAI Workloads - Michael Yuan, WasmEdge

Discussed using WasmEdge + LlamaEdge to run small LLM models on edge devices - quite interesting.

KubeCon EU 2025 - Tutorial: Build, Operate, and Use a Multi-Tenant AI Cluster Based Entirely on Open Source

An hour-plus tutorial by IBM. Installed many components including Kueue, Kubeflow, PyTorch, Ray, vLLM, and Autopilot

Attending KubeCon isn’t just about listening to technical changes and progress from the past year - it’s also an important opportunity to socialize with developers from various fields, kind of like a large-scale online friend meetup (:

This year, I got @scruelt, @ox-warrior, and other friends to come to KubeCon, and at the venue, I met up with @cookie, @rizumu, @ayakaneko, and @dotnetfx35 for casual chats. I received Kubernetes and Go cookies printed with 3D printers from @rizumu and@ayakaneko, and incidentally spread the word about NixOS (:

Meetup successful! Also spread the word about NixOS

K8s/Go cookies and Istio fridge magnets received

On Day 2 morning, I found there weren’t many talks I wanted to attend, so I noticed there was a Peer Group Meeting to join, though it required signing up first. I went with@scruelt to sign up, and we were a bit worried that signing up just 20 minutes before might be too late, but when we got to the meeting room, we found only 3 mentors present, so we just chatted casually with them. The three mentors were Nate Waddington (Head of Mentorship & Documentation, Canada), Kohei Ota (CNCF Ambassador, Japan), and Amit DSouza (co-founder of Odyssey Cloud, Australia). A Cisco engineer also joined in halfway through.

It was mostly casual conversation. @scruelt’s English is better than mine, and since he just resigned, he had many questions to ask - he initiated most of the topics. As for me, since everything has been going smoothly lately, I didn’t have many questions to ask.

Entered the Peer Group Meeting to find only Mentors hhh

Let’s end with some photos.

Welcome to KubeCon China 2025

Got a T-shirt first hehe

Coffee break time

Want that SUSE plush toy!

A small SUSE on a big SUSE

Using tetragon to restrict file access

LWS Talk, discussing PD separation

Switch store promoting Miku Boxing

Three friends bought Switch 2 here during KubeCon, they made a killing

All my 'loot' hhh

Boarding, goodbye Shenzhen

How many times have I flown now?

Had a great time, see you next year!

Related Content