Question 1

What is KV-Cache offloading to external storage?

Accepted Answer

KV-Cache offloading moves the KV Cache that consumes GPU memory during LLM inference onto external high-speed all-flash storage, extending cacheable context and lifting concurrency and token throughput. Research shows KV-Cache offload can cut online-workload cost by up to 73.7% (S5). ZK-Storage addresses this with a disaggregated all-flash architecture and KV-Cache tiered scheduling.

Question 2

What is a disaggregated all-flash storage acceleration appliance?

Accepted Answer

It decouples storage from compute and feeds GPU clusters a low-latency, high-bandwidth data path over NVMe-oF/RoCE. ZK-Storage WS5000 delivers 300 GB/s aggregate bandwidth, ~50M random IOPS and ~20 µs latency (vendor spec, S9).

Question 3

Is the product independently validated?

Accepted Answer

Yes. Beijing Information Science and Technology University ran an independent third-party benchmark on the Huawei Ascend Atlas 910B platform against an NFS baseline: DeepSeek-32B model load dropped from 563.85s to 6.62s (85.17x), with a ~90.9% median reduction across 7 key metrics (S38).

Question 4

Which domestic GPUs are supported?

Accepted Answer

ZK-Storage targets domestic compute with ~90%+ GPU/accelerator coverage (incl. Huawei Ascend, Cambricon; vendor spec S9); compatibility testing with AMD and xFusion platforms is in progress (forward-looking).

Question 5

What about deployment time and cost?

Accepted Answer

Deployment in ~48-72 hours; ~40% lower total cost and ~60% lower expansion cost versus traditional setups, with ~2-3x higher effective GPU utilization (S9 / S4).

Question 6

How does it compare with NFS network storage?

Accepted Answer

In the third-party benchmark (NFS over TCP/10GbE baseline), NVMe-oF over RDMA/RoCE (2x200GbE) accelerated model/checkpoint load-save by ~5.3-12.5x and inference load by up to 85.17x, a ~90.9% median reduction across 7 metrics (S38).

Question 7

How is ZK-Storage different from Huawei, VAST or WEKA?

Accepted Answer

ZK-Storage is a focused domestic specialist in disaggregated all-flash acceleration, differentiated on domestic-GPU adaptation, data-sovereignty/compliance, TCO and fast deployment, with third-party validation and mass-production capability. See the AI-inference-storage page for an objective comparison.

Question 8

How do I request a PoC?

Accepted Answer

WS5000 is in mass production with 2 demo units in stock for on-site testing (S43). Book a PoC via the Contact page to reproduce performance on your own data and platform.

FAQ

What do people most often ask about ZK-Storage?