Google Open Sources Dataflow Analytics Code through Apache Incubator

Google is open-sourcing more code by contributing Cloud Dataflow to the Apache Software Foundation. The move, a first for Google, opens new cloud-based data analytics options and integration opportunities for big data companies.

Cloud Dataflow is a platform for processing large amounts of data in the cloud. It features an open source, Java-based SDK, which makes it easy to integrate with other cloud-centric analytics and Big Data tools.
The platform’s main value for Big Data operations is providing compatibility with new technologies as they emerge while still integrating into existing workflows. That saves organizations from having to revamp their analytics infrastructure or code each time a new data processing framework appears.

Although the Dataflow SDK has been open source for more than a year, Google took the bigger step this week of proposing to turn the platform into an Apache Incubator project. That move paves the way for Dataflow’s codebase to eventually become a full-fledged Apache Software Foundation project.

Google has partnered with Cloudera, data Artisans, Talend, Cask and PayPal in issuing the proposal. Those partners are already celebrating the proposal, which — if approved, which it should certainly be — will make it simpler to build Dataflow’s scalability and integration features into commercial Big Data platforms in an open source, vendor-neutral way.

Talend, for instance, had this to say: “Developers leveraging the Dataflow framework won’t be ‘locked-in’ with a specific data processing runtime and will be able to leverage new data processing framework as they emerge without having to rewrite their Dataflow pipelines, making it Future-proof.”

For the channel, Google’s proposal means the cloud and big data are set to grow closer together — and that it will be easier for open source big data companies to keep the future of data analytics open.