To succeed in today’s data-rich and data-centric world, companies are building new, high-value applications on top of NoSQL, Hadoop and other modern data platforms. According to IDC, the big data market will reach $48 billion by 2019. At the same time DevOps processes are rapidly penetrating the Global 2000, impacting the very companies that are adopting these new data platforms. These teams and their processes are now responsible for managing data infrastructures that are orders of magnitude larger than anything companies have dealt with previously. As a result, big data, DevOps and data management are rapidly intersecting, and the speed at which groups are expected to support this new world order and launch new applications raises a new set of challenges, considerations and questions, including:
How do data management principles change in the world of Big Data?
How can agility and security co-exist in modern data environments?
Let’s address each of these issues in more detail.
The Implications of Scale
Big data applications run on scale-out architectures that can reach thousands of nodes and petabytes of data. For example, Apple deploys Apache Cassandra across at least 75,000 nodes and also deploys other big data platforms to power a number of their consumer-facing applications. This scale has a number of different implications for data management principles, including those around backup and recovery. This means that a single human error that accidentally deletes tables can result in the loss of hundreds of terabytes of data, not just hundreds of gigabytes. At this scale, data sets take exponentially longer to rebuild, proportionately increasing the opportunity costs of time spent on this activity not to mention the business impact of the lost data itself. To put it another way, an accidental data loss will mean that dozens of engineers need to halt other business critical projects to rebuild the lost data set – a multi-million dollar bill in revenue loss and opportunity cost.
DevOps teams often have service level agreements (SLAs) whether internal or external. As a result, they often have to rethink their assumptions around data recovery time, built around traditional application data sets, and how to change the underlying recovery architecture to support these SLAs.
The Importance of Self-Service
Companies that employ DevOps principles focus on rapid deployment frequency. In a big data world this means figuring out how best to enable engineering or data science teams self-service access to production data sets to facilitate rapid application iteration: the concept of waiting for data or for custom scripts to be written every time data is needed is completely antithetical to the DevOps movement.
On the other hand, data often contains confidential or personally identifiable information and data breaches remain common and costly. A 2015 report by the Ponemon Institute put the average consolidated total cost of a data breach at $3.8 million, which represented a 23 percent increase since 2013. Consumer trust is also important, and consumers are increasingly aware of how easily hackers can access their data.
Self-service access, therefore, has to be paired with appropriate protection for personally identifiable or other confidential information, whether that’s in the form of data masking or data encryption, or both. Most people don’t naturally think about data masking as part of their “always-on” data strategy, but without it you run the risk of compromising the trust of your users. Even a “small” breach of data can significantly impact the reputation and viability of a business.
At the heart of innovations across markets like IoT and mobile, and industries such as retail, banking and healthcare, is data. Refreshingly, data is also increasingly understood as the currency that drives the value of the companies that use it optimally. As companies continue to migrate off legacy systems in favor of platforms designed to support today’s application needs, they must also plan accordingly to ensure issues around scale and security are fully considered and addressed. These are top-of-mind issues for DevOps teams, and a focus on the entire application lifecycle is key to modern data management. The right planning has big upside and the risks related to lost or compromised data are far too great to ignore.