One of the rarer opportunities in the channel is a truly “greenfield” application that pulls along a massive amount of IT infrastructure. For that reason the channel as a whole should be looking to drive more adoption of Big Data analytics applications. In fact, with that goal in mind, Dell has been steadily building out a reference architecture for Hadoop that runs on top of its PowerEdge servers.
This week Dell extended that architecture to include the distribution of Hadoop from Cloudera that has been validated alongside extract, transform and load (ETL) software from Syncsort. The end goal, said Armando Acosta, Hadoop product and planning manager for Dell, is to make use of ETL software that traditionally has been used to move large amounts of data in mainframe environments and apply those concepts to Hadoop.
Most IT organizations have a large amount of data they have been collecting for years residing in a data warehouse. Rather than continuing to pay for the commercial software licenses associated with storing that data, many IT organizations are looking to shift that data in bulk into an open source Hadoop environment. And rather than having to validate an ETL solution themselves, Dell is working with Cloudera and Syncsort to create a reference architecture that will prove to be the foundation for a modern data warehouse, Acosta said.
The rate and degree to which modern data warehouses based on Hadoop will replace traditional data warehouses, however, is subject of some debate. Many IT organizations have made massive investments in existing data warehouse applications that they are not ready to part with. In those scenarios Hadoop is being used primarily as a low-cost storage environment for data that needs to move into the data warehouse to be processed. At the other end of the spectrum, a significant number of business intelligence and analytics applications that work natively against Hadoop are becoming available. As these applications become more widely adopted, the need for legacy data warehouse applications may be sharply reduced.
Whatever the ultimate outcome, one thing is for certain: A lot of data will be moving in and out of Hadoop in bulk for a long time to come. As such, solution providers would do well to start thinking about not only how to stand up a Hadoop cluster but also all the related technologies required to make that Hadoop cluster operational.