In this special guest feature from the ISC Cloud and Big Data conference, Tom Wilkie from Scientific Computing World and writes that Big Data and cloud computing are thriving in commerce and business, but scientific and engineering applications appear to be lagging behind.
Politics, in the form of inconsistent national data protection laws, and commerce, via the licensing policies of Independent Software Vendors (ISVs) are inhibiting the uptake and use of cloud services in science and engineering, the ISC Cloud and Big Data conference in Frankfurt was told in September.
The advent of the cloud may pose as monumental a challenge to ISVs’ traditional way of doing business as the science publishing industry has had to face when researchers moved away from reading subscription-based journals to demanding pay-per-view access to individual papers in the scientific literature. Indeed, the analogy can be pushed further, with open source software potentially being as threatening to ISVs as Open Access publishing is to the great science publishing houses. (It is not an easy transition for the science publishing industry, as chronicled by the up-to-date reports and comment in our sister publication, Research Information.)
In business and commercial computing, momentum towards cloud and big data has already built up to the point where it is unstoppable. In technical computing, the growth of the Internet of Things is pressing towards convergence of technologies, but obstacles remain, in that HPC and big data have evolved different hardware and software systems while Open Stack, the Open Source cloud computing platform, does not work well with HPC.
In a frank and outspoken criticism of poor standards of data protection in the UK and Ireland, the meeting heard the head of IT operations at a major German bank warn that no German financial institution would trust a data centre or cloud service where its data might end up being transferred to the UK or Ireland. The US Patriot Act, the disarray between European nations over data protection, and financial policies being pursued by some of the software and service vendors, appear to be as important obstacles to the deployment of public cloud and big data technologies as any technical issues, judging by the themes emerging at the conference.
Dr Jan Vitt, head of IT Operations at DZ Bank, the fourth largest bank in Germany, told the meeting that Germany’s stringent data protection laws meant that if the bank wanted to go ‘off premise’ on a cloud application ‘we want to do so in Germany’. (The meeting took place before a recent European Court of Justice ruling that places even more stringent controls over data to be exported from the EU, so it was not possible to explore the judgement’s implications for scientific and business use of the cloud.)
Can data protection be computerized?
The disarray of European national privacy laws was highlighted again towards the end of the meeting when Fabio Martinelli, of the Italian National Research Council discussed the Coco Cloud. This is a project funded by the European Commission to create ‘Confidential and compliant clouds’ (hence Coco Cloud). The researchers have had to go to extraordinary lengths to try to create a cloud whose operating software would automatically enforce data usage policies that were acceptable to the different legal jurisdictions within Europe.
The idea behind the Coco Cloud project is to allow European cloud users to share their data securely and privately in the cloud. From the point of view of the European Commission, this will increase the trust of users in the cloud services and thus encourage the more widespread adoption of cloud technologies, with consequent benefits for the users and for Europe’ digital economy in general. Although various EU directives were supposed to have harmonised data protection across the European Union, Martinelli pointed out that ‘in Europe, we have to deal with 28 countries’ where integrating all the legislative constraints is difficult for software engineers trying to write it into code for data usage control not least, he remarked, because there is a lot of ambiguity in natural language and ‘lawyers make a profit out of ambiguity’.
Although the project is receiving public funding from the European Commission, Martinelli pointed out that ‘We started with a real demand from big corporations across Europe to build a concrete framework for the creation, analysis and termination of Data Sharing Agreements.’ The pilot projects are working well, he continued, though ‘cooperation between lawyers and ICT experts has been challenging.’
As well as the actual Coco Cloud, the project has also generated advice to the Commission which is now considering several documents on cross-border aspects of data protection. Moreover, HP wants to integrate the results of the project into the Cloud28+ initiative that it is sponsoring, Martinelli said. This is an attempt to provide a catalogue of services and service providers available in Europe and thus accelerate the adoption of cloud technologies in Europe. It is notable that the Cloud28+ website prominently displays a statement on the importance of having a “strong focus on compliance with the European rules on data privacy and security.”
Engineering workflow in the cloud
But considerations of data protection aside, existing cloud services tend not to reflect the workflow in engineering companies, Daniel Weber, the deputy head of the department of Interactive Engineering technologies at the Fraunhofer Institute for Computer Graphics in Darmstadt, told the meeting.
Engineering workflows are complex, he continued, using different techniques such as CFD and simulation and hence a variety of software packages that needed to talk to each other. In contrast, most cloud solutions provided only single access to the cloud for high-performance computing.
“Our idea is go beyond isolated use to seamless engineering workflows in the cloud,” he said. He is working on another EU-funded project, called CloudFlow, which aims to make it easy to pass data from one simulation or software package to another. The challenge is to ensure that the data exchange is based on accepted standards. (There is an unfortunate clash of names: the European CloudFlow project is an open platform, while there is a US commercial venture also offering cloud related services but in a very different, non-engineering context at http://www.cloudflow.net/ .)
The cloud related components are based on Open Stack, but there are known issues with using Open Stack for HPC and so the project had to create an “abstraction layer” to get round the problem, which may mean that each HPC centre has to tailor its own implementation. Weber cited the example of a small German engineering company, Stellba Hydro, which conducts maintenance of water turbines but which wanted access to HPC to design and simulate flow systems – not least to check the safety of water-powered electricity generating stations that had been installed decades earlier.
Weber believes that implementing a cloud-based engineering workflow will not only open up HPC to small and medium-sized companies, it will also broaden the range of customers that the ISVs can attract. However, he continued that in his view, the ISVs “are really struggling” with the issue of pricing on a pay-per-use basis. “ISV licensing is complex and difficult,” he said. “I look forward to the transition to pay per use.”
There are other efforts to make it easier to access the cloud, and find a way through the competing claims of different cloud providers. One commercial initiative to provide help and support to potential users in finding the right cloud service for them is a German start-up company, Ascamso. In an interview, its co-founder, Jan Thielscher, pointed out that it was conducting thousands of tests on cloud providers, to study their capabilities and provide insight into which one was appropriate for a customer’s needs. In addition to assessing capabilities and performance, it also looks at pricing, in order to deliver an overall service provider rating and clear price comparisons. In an echo of Weber’s point about engineering workflows, Thielscher stressed that scientific and engineering use cases differ from those of commercial and business users of the cloud, and Ascamso’s assessment process takes that into account.
Convergence of Cloud, Big Data and HPC
In the view of Stephan Gillich, Intel’s Director of Technical Computing for EMEA, the boundaries between the cloud, big data, and high-performance computing are dissolving, and the technologies are converging on each other. There is an overlap between simulation and analytics, he said, such that modern life is characterized by “pervasive analytics,” especially in transport, the life sciences, and manufacturing. However, the problem was that a distinction between data and simulation was built in at the systems level: simulation uses Fortran or C++ as the programming language whereas data analytics uses Java or Hadoop; the file system software is different as is the resource management software. Simulation is compute and memory focused; whereas data analytics is storage focused, he said.
Intel was pursuing this issue of a converged architecture for HPC and big data, and he envisaged a future in which a cluster’s resource manager was both HPC and big data aware; storage was Lustre with a Hadoop adaptor; and the hardware was both compute and big data capable. The future would lie with ‘in-memory computing’, he said, and memory technology would be a key enabler of this converged computing world. He stressed that Intel’s range of products encompassed much more than just processors and pointed to the new 3D XPoint technology, announced by Intel and Micron at the end of July. In the original launch announcement, the two companies called it the first new mainstream memory chip to come to market in 25 years. According to Gillich, they sit in between SSD and DRAM as a transistorless non-volatile big memory, he said: 100 times faster than SSDs but with 10 times the density of DRAM.
Driving Towards Convergence
The conference heard two examples of the convergence of big data, HPC, the cloud, and the Internet of Things that well illustrated the pressures in favor of the sort of convergence of technologies outlined by Gillich. Both examples, as it happened, related to transport and in both cases, while they were vivid illustrations that the Internet of Things is not just marketing hype but already a reality, the ‘Things’ in these cases about which data was being reported and analyzed were actually people: in both cases about how they were moving on the streets and roads using, in the one case, the data from their mobile phones and, in the other, their motor cars were transmitting the data.
Michal Piorkowski, Head of Big Data Insights at Swisscom, reported how powerful data analytics applied to the data already being routinely collected about the location of individuals’ mobile phones was helping local government in Switzerland develop ‘smart’ urbanization. Switzerland faced the prospect of becoming a megalopolis with rampant urbanisation, with demands for a higher quality of life, all having to be met out of limited local government budgets.
The telecoms companies already have the mobile network providing monitoring data that is constantly updated, he pointed out. The scale of the data gathering is immense: 20 billion events each day. By adding powerful data analytics, “We turn it into mobility data for urban specialists to implement interactive city planning.” In real time, the data from mobile phones could produce information about the movement of crowds through the cities and about movements of people from one city to another. If machine learning is added to the data analytics, he said, it was possible to predict traffic jams before they occurred. For longer term urban planning purposes, the system could provide data on a daily basis on the scale of the whole country, whereas traditional survey-based techniques yielded only a single snapshot every four years or so.
The automotive industry is working towards an agreed protocol that would allow data from cars to be transmitted back to a central point, using car to car communication (even between cars from rival manufacturers). Again, one benefit would be advance information about traffic jams but the major application will be in driverless cars.
Andreas Sasse, head of mobile services and data at Volkswagen Research, explained that it was not enough to surround a driverless car with sensors – laser ranger finders, IR scanners, and radar – to scan its neighborhood, it also needed to communicate with other cars to understand the road conditions further ahead. Moreover, it needed to have a detailed map of the roads, down to a precise location of the position of the lanes on a multi-lane highway as well as of junctions and intersections. A modern car already has 15 map-related functions incorporated within it, he said (thus going far beyond SatNav). However, satellite navigation data tend to be updated at best once a year, whereas a driverless car would require a detailed lane model that was updated hourly, if not by the minute, and that would have to be highly reliable.
It was a huge challenge that had to be solved, he said: the map needs to be digital but it also needs to be a ‘learning map’. “The cars themselves can give us feedback. Sometimes we can process data and send it back to the cars, but often we need to get back to the map compiler.” The major German car manufacturers recently bought a SatNav company to push the mapping forward, collectively – it clearly makes sense for all the cars to have the same map and so mapping will not be a commercial differentiator in the era of driverless cars. Similarly, car to car communication is not going to be proprietary, he said. He estimated that about 11GB would be generated per car per day. The cars can talk to each other over a WLAN-type system with a range of about two kilometers, but because of the common protocol, an Audi, for example, does not need to wait for another Audi to be within range before it can transmit.
Sasse said that VW at present did not want to cooperate with either Google or Apple, both of whom are developing driverless cars. ‘They have an established ecosystem and we would end up dependent on them,” he said. He discounted Google’s highly publicized efforts: “I’m less worried by Google than by Apple’s urban electric car.” He pointed out that Google is a computing company whose business is software rather than things, and that it was too far away from scalability and from an actual car.
It was a nice reminder, particularly in a gathering such as the Cloud and Big Data conference, that these IT technologies are not ends in themselves, but rather tools that serve other purposes, involving people and physical objects.