As Einstein explained, the perception of how fast time is passing depends on the perspective of the observer. For a dog, a year might feel the same way that seven years feels to a human. In the data storage industry, changes happen at a far faster rate than in many other spheres of human activity. We asked a panel of experts to compare the data storage and management challenges that enterprises faced ten years ago with those they face now. We also asked the panel to discuss the way that the current storage landscape and its increasingly complex challenges are now influencing technology developments. As well identifying major trends, the panel’s comments also tended to confirm the saying that history doesn’t repeat itself, but it certainly rhymes.
More than one expert on our panel said the data storage challenges that IT organisations faced in 2014 are very similar to those they face today – at least at a high level. “The challenges haven’t changed much, even though the technology has. Probably the biggest was dealing with ever-increasing demands for storage capacity. The second challenge was protecting the data. Even though the intensity of ransomware attacks was not the same as it is today, data protection was still a major issue. The third challenge was not having enough staff to handle the storage workload. That staffing problem has only gotten worse since then,” said Randy Kerns, senior strategist and analyst at analyst firm the Futurum Group.
Graham Breeze, vice president of products at storage system vendor Tintri agreed, but added an important qualification. “The challenges are fundamentally the same as they were ten years ago, while the scope and scale of these challenges have dramatically changed,” he said.
Erfane Arwani, CEO at Biomemory, a start-up focused on DNA storage and synthesis, stressed the difficulties of keeping up with data growth in 2014. “Companies struggled to manage exponential data growth with technology solutions that weren’t yet optimised for large data volumes,” he said. Arwani pointed out that ten years ago, enterprise disk drive capacities ranged from only 1TB to 4TB. In the ten years since then, disk capacities have soared, and the highest capacity disk drives now handle 30TB. Meanwhile datacentre usage of flash storage has surged, and the largest enterprise flash drives now exceed 60TB in capacity.
In 2014 enterprises were still focused on on-premise storage and were using public cloud storage services to a lesser extent than now. “It was a matter of choosing between NAS and SAN, and cloud solutions were comparable to ice baths – beneficial but not suitable for everyone,” said Ferhat Kaddour, vice president of sales and alliances at Atempo, a supplier of data protection and management software. Ensuring sufficient overall capacity for an organisation was a multi-faceted activity. “The scalability challenge involved predicting future storage needs, optimising storage utilisation, and implementing effective storage tiering strategies,” said Drew Wanstall, vice president of business development at Scale Logic, a vendor of media production storage and workflow infrastructure.
Fast forward to now, and data is still expanding at a very rapid rate. “It’s interesting to see how data keeps growing at a crazy pace,” said Enrico Signoretti, vice president of products and partnerships at Cubbit, a vendor of geo-distributable cloud storage systems. Valéry Guilleaume, CEO at Nodeum, a supplier of data management software, identified some of the new sources of data that are perpetuating this growth and have already ushered in the era of so-called Big Data. “Today, it’s not just users that are generating data, but also the systems being developed within each industry, for example: data-generating cars, electronic microscopes, blade scanners, or seismic sensors. These new sources are creating data at a speed that is incommensurate with the data-generating sources of ten to fifteen years ago,” he said.
However, the difficulties of scaling up physical storage capacity to keep up with data growth have been lessened to at least some extent by the increased use of public cloud storage, and by improvements in data storage technology. Among the last ten years’ technology developments, the most notable has been the enormous reduction in the price of flash memory, which has led to the widespread use of flash in enterprise datacentres. “Capacity demand continues, but the scale and performance of flash allow for greater consolidation and fewer physical systems, less power/cooling/space demands, and simpler means for addressing performance,” said Kerns. “The technology to address problems is available and more effective than ten years ago. Having the staff to take advantage of it is the big issue.”
Although other panel members said they believed that storage scalability remains a major problem, Kerns’ view was echoed by other industry analysts on our panel. “More data does make management more complex, but less so than it had in the past. Storage solutions are far more scalable than they used to be. The challenge of data explosion, especially in AI, is finding the right data, getting it in the right clean format, and leveraging it as quickly as the organisation wishes. The challenge today isn’t storing data as much as it is using data,” said Scott Sinclair, practice director at analyst firm the Enterprise Storage Group (ESG).
David Norfolk, practice leader at analyst firm Bloor Research, said: “The technical issues of ten years ago have largely gone. Storage is now cheap, reliable and easy to scale. But storage management – including threat management – is now a source of cost.”
The threats that Norfolk referred to include cyberattacks, which have grown significantly in number and intensity over the last decade, according to multiple experts on our panel. “Security is clearly today’s top data storage challenge. While there have always been security threats from malicious actors and users, today’s issues are indeed harder and more expensive to address, as a result of the well-organised and funded ransomware actors, often from state-sponsored groups,” said Paul Speciale, chief marketing officer at cloud storage software vendor Scality.
“With the ongoing ransomware boom and the emergence of malicious AI tools and as-a-service cybercrime models, data protection is at the forefront of storage challenges today. Breaches are not only more frequent, but they also pack a more powerful punch with improved tactics like double (and triple) extortion and the more recently observed dual-strain attacks,” said Sergei Serdyuk, vice president of product management at Nakivo, a supplier of backup, ransomware protection and disaster recovery solutions.
That is not the only change in the IT landscape that has driven up storage management costs. Ten years ago, data growth was being driven by the overall digitisation of business and by the increasing use of analytics. Now it is also being driven by the need to collect data to train AI and machine learning systems, and as Guilleaume described, the growth of the Internet of Things (IoT) as a data source. Although the term IoT was coined in the 1990s, it is only over the last ten years that it has become a commonplace reality. At the same time, enterprises have also been storing more unstructured data such as video and text. Unstructured data now accounts for the majority of data stored by enterprises. Unlike structured data, unstructured data is not organised according to pre-defined database schema, making it far harder to manage.
“Today, it’s akin to navigating a vast ocean of big data. From customer interactions to sensor data collected, even smaller entities handle petabytes, and the larger ones, exabytes. The difficulties lie not only in the sheer amount of data but also in the strategic tactics needed to extract, categorize, and safeguard it,” said Kaddour. Norfolk at Bloor Research named a critical data attribute that is challenging to achieve when using unstructured data: “Quality, now that data comes from a swamp instead of a proper database.”
Edge computing and the use of public clouds as part of hybrid computing strategies have also complicated data storage. “Managing data at the edge efficiently has become crucial. Ensuring data availability and resilience in distributed environments present new challenges,” said Johan Pellicaan, vice president and managing director at Scale Computing, a vendor of edge computing, virtualisation, and hyperconverged solutions.
As well as securing data at the edge, enterprises must also be able to move data between multiple locations. “Today’s challenges are all related to the movement of data across multi- and hybrid-cloud environments. Around 50% of organisations identify that they move data between on and off premises environments ‘all the time’ or ‘regularly’. These issues are more difficult to address because of how disparate the environments are when your data spans AWS, Azure, GCP, the datacentre, edge, etc,” said Sinclair at ESG.
Data movements and the need for interoperability across multiple computing venues are not the only complications created by public cloud computing. “Since public clouds are one of the main solutions for keeping the majority of organisations’ data, the dependency on these external vendors for business continuity, or even other more important sovereignty-related matters, is now a growing challenge,” said Ricardo Mendes, cofounder CEO at Vawlt, a vendor of storage and security software. Other experts on our panel also named data sovereignty as a challenge for businesses using public clouds. For Cubbit, Signoretti said: “Navigating complex data sovereignty regulations, such as GDPR and NIS2, adds a layer of complexity for businesses.”
Public cloud SaaS services have also introduced new locations in which data must be protected. “One big difference today is in the number of different places where companies house critical data. This is particularly apparent when you look at the increased use of SaaS applications. The average midsize company uses over 200 SaaS applications, but there are very few options available to deliver enterprise-class data protection that can scale to protect those applications and provide rapid, granular recovery,” said Kim King, senior director of product marketing at backup software vendor HYCU. According to King, over 50% of successful ransomware attacks begin by targeting SaaS applications.
Kerns confirmed this view of SaaS data protection. “Meeting the same enterprise requirements for protection of information assets in the public cloud as on premises has been a learning experience that requires effort, and, usually, new software solutions,” Hinting that enterprises should learn from others’ mistakes, he added: “There have been cases where some believed this effort was not necessary for data in a public cloud.” Note his use of the past tense in that statement.
But while public clouds have introduced challenges, multiple members of our panel said the advantages they have delivered include the democratisation of technologies, to the benefit of smaller businesses. As one example of this, Norfolk at Bloor Research said: “There used to be a huge difference between big firms with proper databases and small firms with data stores that didn’t support ACID [Atomicity, Consistency, Isolation and Durability.] Cloud technologies have evened this up a lot.”
We asked our experts how today’s challenges are changing storage technologies and services offered by vendors. Security challenges are being addressed by developing yet more sophisticated defences against cyber-attacks, according to Serdyuk at Nakivo. “Vendors are incorporating advanced encryption mechanisms, access controls, and compliance features into their solutions. Many offer secure enclaves and hardware-based security to address the evolving threat landscape. However, many storage solutions remain lacking in terms of comprehensive backup and recovery tools,” he said.
The need to extract and categorise data from diverse sources is driving the development of software tools that automate that process. Serdyuk said: “Management tools like metadata tagging, version control, and analytics capabilities are gaining traction.” Guilleaume at Nodeum said: “Emerging solutions providing data analysis now make it possible to make data talk, and to extract metadata from it in a way that is incomparable with what was possible in the past.”
Meanwhile enterprises now also require data management software to support hybrid and multi-cloud infrastructures. “Vendors who recognise this are developing solutions that support easy integration with various cloud providers, on-premises infrastructure, and mixed configurations. They are also offering tools for seamless data migration and synchronisation across different environments,” said Serdyuk.
“There is a push for consistency of technology across environments. Some vendors are putting their technology in the cloud,” said Sinclair. One example of such vendors is NetApp, whose on-premises storage and data management software is now also incorporated into the AWS, Microsoft Azure, and Google Cloud public clouds. “Others are integrating third-party technologies like VMware or Red Hat OpenShift that can be deployed in multiple locations,” Sinclair added.
With respect to the complications caused by the need to maintain data sovereignty and comply with multiple data regulations that apply to data storage in what may be multiple public clouds and multiple countries, Signoretti at Cubbit said: “Vendors are prioritising sovereign solutions for regulated industries like healthcare and the public sector, emphasising compliance in regions such as EMEA and APAC. Though still subject to the CLOUD Act, Microsoft and AWS recently introduced sovereign cloud storage offers.” The CLOUD (Clarifying Lawful Overseas Use of Data) act is US legislation implemented in 2018 that gives US and non-US authorities investigating crimes the right to access enterprise data held by service providers.
On a technical front, Craig Carlson, an advisor to the technical council of the Storage Networking Industry Association (SNIA), referred to the need to provide AI systems with fast access to data. “AI is currently being addressed by looking at what can be done to bring networks to their highest performance while also being highly scalable. This work is ongoing in groups such as Ultra Ethernet,” he said. A body called the Ultra Ethernet Consortium is developing an architecture that it says will make Ethernet as fast as current supercomputing interconnects, while being highly scalable and as ubiquitous and cost-effective as current Ethernet, and backwards compatible. Members of the heavily-backed consortium include AMD, Arista, Broadcom, Cisco Systems, Huawei, HPE, and Intel.
Our final questions for our experts were about the future challenges they expect enterprises to face as data volumes continue to grow, especially in the context of AI and machine learning. A consensus view about the relationship between data size and management difficulties was typified by Breeze at Tintri, who said: “More data absolutely drives increasingly complex challenges related to storage. Data growth stretches demands in every dimension, illuminating the need for more leverage – the proverbial ‘do more with less’.”
The much-needed bigger levers are likely to be available from advances in data management systems – the metadata tagging, version control, and analytics capabilities referred to by Guilleaume. Norfolk suggested that AI will drive this and other advances. “I suppose the big issue today is the AI industry and its appetite for data – and the sustainability and resource cost of vast amounts of data, even if each individual bit is cheaper to store,” he said. “Data quality will be a huge challenge. Decisions shouldn’t be based on outdated, incorrect, or biased data. AI, in particular, doesn’t cope well with training on biased data.”
AI is also set to drive advances in data mobility, according to Guilleaume, who said: “AI/ML will further accelerate the need for data mobility between the levels where data is stored and where it is analysed.” These storage management and mobility advances may not be restricted purely to AI usage. Carlson at SNIA said: “There’s always a trickle down in technology. So, technologies being developed now for the highest-end AI datacentres will become more mainstream in a few years.”
Norfolk was not the only expert to refer to sustainability. Roy Illsley, chief analyst at research firm Omdia, said: “I think the big question is: how can storage and all the data we have be as ‘green’ as possible? At some point we either have to change our lives and way we do things, or technology rides to the rescue. I think it will be a combination of these two, which means we need to work out how we can generate less data or be more precise about what data we have.”
Arwani also named the environmental impact of storage, particularly in terms of CO2 emissions and energy use, as a current storage challenge, alongside platform interoperability and security. He cited an estimate by the International Energy Agency (IEA) that datacentre electricity consumption in 2022 was around 1% to 1.3% of global demand. The IEA has also predicted that datacentre energy consumption could rise three to four-fold by 2026. Arwani said: “These problems are more costly and complex to solve, as they require not only technological advances, but also awareness and changes in data governance.”
On the hardware side, Carlson commented on the fact that the flash technology curve appears to be running out of steam, as it has become very much harder for flash chip makers to reduce costs by packing yet more data bits into each flash memory cell. “What will be the next technology to bring reliable high performance to storage in the next ten to 20 years?” he asked. “Long-term usage of the current tape-disk-flash model may not be feasible. Hence the development of new (and still highly experimental) technologies such as DNA storage,” he said.
Not surprisingly, Arwani at Biomemory suggested that DNA storage will indeed be the solution. “Suppliers are developing greener solutions, such as helium hard disks that reduce energy consumption, or DNA storage technologies such as those being developed by Biomemory and Catalog DNA. These technologies promise a storage density of one exabyte per gram and a durability of several millennia. What’s more, they open up the possibility of new use cases, such as the first space datacentres.” If that last prediction comes true, remember that you read it here first.