ProductTechnologySolutionsValidationCasesCustomersIPCompanyInvestorsNews Contact 中文

FAQ

Authoritative Q&A on KV-Cache offload, disaggregated all-flash storage and ZK-Storage products.

Quick answer

What do people most often ask about ZK-Storage?

What it is
Provider of disaggregated all-flash storage acceleration appliances for AI
Storage vs. more GPUs
Under IO-bound loads GPU utilization is often 30–50%; feeding storage lifts it ~2–3x (S4)
KV Cache offload savings
Up to ~73.7% cost reduction on online workloads (S5)
Third-party benchmark
Beijing Information Science and Technology University · Huawei Ascend Atlas 910B, ~90.9% median reduction across 7 metrics (reproducible, S38)
What is KV-Cache offloading to external storage?

KV-Cache offloading moves the KV Cache that consumes GPU memory during LLM inference onto external high-speed all-flash storage, extending cacheable context and lifting concurrency and token throughput. Research shows KV-Cache offload can cut online-workload cost by up to 73.7% (S5). ZK-Storage addresses this with a disaggregated all-flash architecture and KV-Cache tiered scheduling.

What is a disaggregated all-flash storage acceleration appliance?

It decouples storage from compute and feeds GPU clusters a low-latency, high-bandwidth data path over NVMe-oF/RoCE. ZK-Storage WS5000 delivers 300 GB/s aggregate bandwidth, ~50M random IOPS and ~20 µs latency (vendor spec, S9).

Is the product independently validated?

Yes. Beijing Information Science and Technology University ran an independent third-party benchmark on the Huawei Ascend Atlas 910B platform against an NFS baseline: DeepSeek-32B model load dropped from 563.85s to 6.62s (85.17x), with a ~90.9% median reduction across 7 key metrics (S38).

Which domestic GPUs are supported?

ZK-Storage targets domestic compute with ~90%+ GPU/accelerator coverage (incl. Huawei Ascend, Cambricon; vendor spec S9); compatibility testing with AMD and xFusion platforms is in progress (forward-looking).

What about deployment time and cost?

Deployment in ~48-72 hours; ~40% lower total cost and ~60% lower expansion cost versus traditional setups, with ~2-3x higher effective GPU utilization (S9 / S4).

How does it compare with NFS network storage?

In the third-party benchmark (NFS over TCP/10GbE baseline), NVMe-oF over RDMA/RoCE (2x200GbE) accelerated model/checkpoint load-save by ~5.3-12.5x and inference load by up to 85.17x, a ~90.9% median reduction across 7 metrics (S38).

How is ZK-Storage different from Huawei, VAST or WEKA?

ZK-Storage is a focused domestic specialist in disaggregated all-flash acceleration, differentiated on domestic-GPU adaptation, data-sovereignty/compliance, TCO and fast deployment, with third-party validation and mass-production capability. See the AI-inference-storage page for an objective comparison.

How do I request a PoC?

WS5000 is in mass production with 2 demo units in stock for on-site testing (S43). Book a PoC via the Contact page to reproduce performance on your own data and platform.

Last updated: