When containers started out, they were meant to be ephemeral – stateless, disposable and data-light. But that’s all changed. As Gartner notes, use cases for containers have evolved to include analytics and artificial intelligence (AI) processing, and by 2028, it predicts 15% of on-premise production workloads will run in containers. That’s a 300% increase since 2022.

Now, while containers themselves retain all the benefits of ephemerality – rapidly reproducing, then dying back just as quickly to account for workload spikes – the storage attached to them cannot live by the same rules.

As enterprises move from proofs of concept to running a big chunk of production workloads in containers, the storage layer has become a pivot point. While the early days were focused on simple web scaling, containers have now moved into the realm of mission-critical databases, massive data science pipelines, and the power-hungry world of generative AI (GenAI).

The challenge lies in navigating key choices such as file versus block versus object storage, CSI versus container-native storage, and whether to go for a dedicated container storage platform.

Containerisation is lightweight virtualisation

Containerisation is a lightweight form of virtualisation. Unlike traditional virtual machines (VMs) that require a hypervisor and a full guest operating system (OS), containers share the host server’s OS. This makes them lighter, faster to scale and more portable. They are built on microservices principles that break monolithic applications into discrete, application programming interface (API)-linked components in a way that aligns with DevOps methodologies.

While several orchestrators exist (for example, Docker Swarm and OpenShift), Kubernetes is the market leader. It manages the cluster of nodes, which is where pods run the containers. Clusters are groups of nodes managed by a control plane, which is where we find the API server, a scheduler for pod placement, a controller to maintain the desired state, and etcd for storage configuration.

As originally conceived, container storage was ephemeral, and data vanished when a pod was deleted. So, to support enterprise applications, Kubernetes developed persistent volumes (PV), which are attached to a cluster and decouple storage from compute to allow applications to remain portable while maintaining access to data.

CSI vs container-native storage

Container Storage Interface (CSI) is a standard that allows storage suppliers – more than 130 drivers are available – to expose their systems to Kubernetes. CSI allows Kubernetes to trigger advanced data services such as snapshots, cloning and automated provisioning across block, file and object storage in on-premise and cloud environments.

Container-native storage potentially has the advantage of portability – on-premise, in the cloud, and so on, by virtue of the virtualisation inherent – while CSI is more likely to tie a deployment to deployed storage arrays

CSI is essentially a “broker”. It is an industry-standard API that acts as a middleman, allowing Kubernetes to talk to external storage arrays. For example, when a developer requests storage via a persistent volume claim (PVC), the CSI driver tells the external storage box to carve out a piece of capacity and plug it into the container. The advantage is that you get to use the expensive, reliable enterprise storage you already own, but the storage is still “outside” the cluster, and if you move containers to a different cloud or datacentre, that external hardware might not be there.

Meanwhile, container-native storage is storage that lives inside the Kubernetes cluster. It is usually deployed as a set of containers itself. It takes specified drives attached to Kubernetes nodes and pools them together into one big virtual resource.

Container-native storage potentially has the advantage of portability – on-premise, in the cloud, and so on, by virtue of the virtualisation inherent – while CSI is more likely to tie a deployment to deployed storage arrays.

Container-native storage is location independent, so you can run the same setup on-premise or in the cloud. But it can consume central processing unit (CPU) and random access memory (RAM) from your Kubernetes nodes to manage the data, which may be a concern. 

Do we need containers to be that portable?

CSI offers connection to big-iron fully featured storage, and container-native storage holds the promise of flexible deployment, portability, and so on.  But is portability that important? Eric Phenix, who leads the engineering practice at analysts GigaOm, says not. 

“Containers offer a compute abstraction layer that allows the application to be infrastructure agnostic, rather than a solution that is designed to make applications more portable,” he says.

Phenix argues that while containers make the code agnostic, deployment is another matter. “Unless a company is specifically a customer-facing instanced PaaS [platform as a service] where they need to run on every cloud, I don’t see the need to run the same workload on multiple clouds. Once things are deployed, they’re always messy to migrate,” he says.

And this “messiness” is almost always a data problem, according to Phenix. While the container image can move in seconds, the multi-terabyte persistent volume attached to it cannot.

James Brown, an analyst at GigaOm, points out that container-native storage is essentially software-defined storage and brings its own lock-ins. “Heavily integrated, container-native supplier platforms risk replacing hardware lock-in with software lock-in. Tying your architecture to proprietary in-cluster storage features creates massive migration hurdles, effectively breaking the core portability promise of Kubernetes,” he says.

So, the choice here comes down to just how portable you need things to be. Enterprises often use a hybrid approach: CSI to connect to massive, high-performance arrays for their heaviest databases; container-native storage for modern, distributed apps that need to be able to move without a “messy” data migration.

In 2026, choosing the correct storage protocol for container storage is all about playing in a “mixed economy”, with a Kubernetes cluster able to pull from all three formats simultaneously.

Block for high performance

Block storage presents data as a raw, unformatted volume – like a physical hard drive – that is attached to a single node at a time. In Kubernetes, this is typically handled via persistent volumes using the ReadWriteOnce (RWO) access mode.

Block storage can be in on-premise arrays or in the cloud, such as in Amazon Elastic Block Store (EBS), Google Persistent Disk, or Microsoft Azure Disk.

Block storage offers the lowest latency and highest input/output operations per second (IOPS) because there is no filesystem overhead between the application and the storage. That makes it ideal for databases where small, frequent updates happen at specific locations within files.

When it comes to the cons, most block storage cannot be mounted to multiple pods across different nodes simultaneously, and scaling usually requires resizing the volume and expanding the filesystem. Block storage is generally the most expensive, too.

File for directory access

File storage provides a shared hierarchical namespace (folders and files) accessible over a network. In Kubernetes, it is the primary way to achieve ReadWriteMany, allowing multiple pods on different nodes to read and write to the same data.

It is also available in on-premise storage or cloud services such as Amazon Elastic File System (EFS), Microsoft Azure Files and Google Filestore.

File access is perfectly suited for horizontal scaling of web servers where all pods need access to the same assets, and most legacy applications are built to read/write to a standard directory structure. Compared to block access, network protocols like NFS or SMB introduce more latency, and at large scales (millions of files), traversing deep directory trees can become extremely slow. Meanwhile, handling concurrent writes across many pods can lead to file locking conflicts if not managed carefully.

Object for sizeable datastores

Object storage manages data as discrete objects in a flat namespace and is accessed via APIs (for example, S3 or Swift) rather than being “mounted” like a disk. It’s the cloud-native storage protocol, though it can run on-site, too. Examples include Amazon Simple Storage Service (S3), MinIO, Google Cloud Storage and Ceph RGW. Object storage can store petabytes of data without worrying about partition limits or disk sizes, and is usually the cheapest option for large-scale unstructured data (logs, images, backups).

Object storage is ideal for modern “cloud-native” apps that talk directly to storage via HTTP/HTTPS, bypassing the OS kernel entirely.

On the negative side, object storage is generally the slowest for transactional work with high throughput but higher latency than block or file. Meanwhile, you can’t “edit” a single line in a file; you must re-upload the entire object to change it.

Storage protocol decision-making

In summary, block storage is expensive but the best performing, file storage is less costly but with scale restrictions, and object storage is great for huge capacity but also lags in performance terms. So, which one to choose? It’s a case of horses for courses, according to Tony Lock, director of engagement and distinguished analyst at Freeform Dynamics.  

“In an ideal world, the choice of underlying storage – block, file or object – will likely depend on what the app is, where the organisation wishes to run it, and what its characteristics are in terms of size, number of containers, latency requirements, security, location, cost, etc,” he says.

Meanwhile, Whit Walters, field chief technology officer at GigaOm, believes S3 is winning the battle, but block has its place. He says: “The real story is protocol bifurcation inside AI pipelines. Object storage dominates the ingestion and data lake tier, offering exabyte-scale horizontal scaling with rich, customisable metadata that enables semantic discovery natively at the storage layer.

“Block storage still owns the inference hot path where vector databases demand 500,000+ IOPS, however.

“The emerging trend to watch is COSI, the Container Object Storage Interface, which aims to make object storage buckets first-class Kubernetes resources with standardised, declarative lifecycle management.”

CSI vs container-native in storage supplier container platform

All the big storage suppliers provide some form of platform or wrapper for container storage. These include Dell’s Container Storage Modules, HPE’s Ezmeral Runtime Enterprise, the Hitachi Kubernetes Service (HKS), NetApp’s Astra and Everpure’s Portworx.

What they all have in common is a means of managing container storage – and in some cases, data protection and more. Where they differ under the hood is that most are based around CSI, so they provide a layer from which to manage CSI drivers to their storage.

CSI connectivity may well be better suited to larger, more static environments, while container-native solutions can be best for more dynamic sets of workloads

Some differ in that they provide their management functionality from within Kubernetes. Everpure’s Portworx, for example, lives entirely within Kubernetes but uses CSI as a “handshake” with external storage.

Meanwhile, HPE Ezmeral also runs in Kubernetes but accesses data via the CSI driver. NetApp’s Astra Datastore was container-native in a similar way to Portworx, but was discontinued in 2023.

While all the key storage suppliers offer products that can manage storage for containers, be sure to check the extent to which these are container-native or dependent on CSI. As mentioned, CSI connectivity may well be better suited to larger, more static environments, while container-native solutions can be best for more dynamic sets of workloads.

GigaOm’s Walters puts a finer point on it: “The Kubernetes tax is real, but it’s a trade-off. Container-native platforms run replication, dedupe and encryption on worker nodes. Ceph alone carries a 2-10% baseline CPU penalty per node just for cluster quorum, and that spikes hard during replica rebuilds.

“In GPU [graphics processing unit]-dense AI environments, where every cycle counts, offloading that work to dedicated array ASICs [application-specific integrated circuits] via an advanced CSI model keeps compute nodes clean. But in multicloud or edge scenarios without dedicated arrays, that CPU tax buys you topology-aware placement and self-healing automation that’s genuinely hard to replicate otherwise.”

There may also be performance considerations in terms of contention for resources, as well as questions about how they are administered. 

Towards autonomous, agentic storage

As we look towards 2027, the focus is shifting from manual provisioning to policy-driven storage.

The ultimate goal is a system where the storage “senses” workload requirements. For example, if an AI training container spins up, the system automatically provisions high-throughput file storage, or if a database scales up, it gets low-latency block storage.

Share.
Leave A Reply

Exit mobile version