All over the world, we may witness a surge in interest to Big Data technologies. This interest derives from the constant growth of data volume, which has to be operated and processed by large companies. Many organizations consider accumulated data as an important asset; however, it is becoming more and more difficult to process and extract the necessary information.
Big Data means a huge amount of raw information, so huge that it can not be processed with standard software and hardware tools. Another important issue is Big Data store. A good example is a piece of information, generated by various physical experimental equipment e.g. by the Large Hadron Collider.
Understanding the importance of this trend, the University established Big Data Analytics and Technologies Laboratory. The laboratory team decided to develop in two years a system for storing large amounts of information and test it during one of the CERN experiments. This system allows scientists to receive a large amount of information almost instantly, and in the past, it required hours and days. The laboratory is supervised by Alexey Klimentov, a renowned scientist in the field of modern methods of data collection, storage, processing and analysis for experiments and mega-science experiments, head of the research group on the physical software of the Brookhaven National Laboratory (USA).
«Science, a variety of the production is now in dire need of a new system and architecture to store large amounts of information. Let’s imagine, the ATLAS experiment involves about 3,000 scientists and it's not just a big experiment, it is a huge experiment! We work with 160 petabytes of data, Google, for example, with 180 petabytes. And the existing approaches to information storage can no longer offer us a suitable option. Therefore, our system allows receiving information for a few seconds, earlier it could take hours».
— Alexei Klimentov, supervisor of Big Data Analytics and Technologies Laboratory
At CERN, particularly in a large experiment Atlas, he supervises issues related to the processing of data and development of supercomputers. CERN representatives, during their visit to TPU, had interviewed the University students and the best ones took an internship at the center. According to Valery Parubts, one of the interns at ATLAS IT-department, the main task of the University team in the experiment was analysis and storage of Big Data.
He underlines that nowadays, Big Data is the key area of information technology development. It is a set of methods, approaches, and tools for processing large structured and unstructured data, in conditions when the volume of this data is growing every year. Classical methods of processing, used five-ten years ago, is useless, they are not able to cope with such amount of information. Thus, IT specialists are looking for other methods. CERN is the best place to solve this global problem. Currently, about dozen of TPU specialists are there, working on the experiments.
CERN scientists know about the significant part of particle collisions and it is important for them to capture unusual collisions to support this or that hypothesis. These unusual cases include about 1% of the total amount of data, obtained by collider. However, even this figure is very high, since researchers conduct new experiments, again and again receiving new data. According to an approximate calculation, to process all the information-collected scientists will need 100 years and even more of continuous calculations at current capacities.
Thus, scientists need to figure out how to optimize all the necessary calculations. It is a goal of the University interns in ATLAS experiment, that is work on Big Data analyzing and storing processes.
Until then, all information collected with the Large Hadron Collider awaits processing. In addition, this data requires saving and CERN solved this problem by developing LHC Computing Grid. This is the CERN global computer network, located all around the world. The Grid includes 170 computer centers from 36 countries, continuously receiving, storing and processing information from CERN.
«My goal is to upgrade data distribution system, improve algorithms, in other words - where and what should be sent. Most of the CERN staff is likely to notice nothing, but this work will save the experiment participants from constant consultations with experts about the procedure for receiving these data. For an IT-specialist, it is a very interesting challenge when you have to operate with a large amount of data».
— Valery Parubets, ATLAS internship participant
The University strives to become a scientific eventmaker in this area. In the beginning of December (2016) TPU jointly with National Research Center Kurchatov Institute and RASA Center organized the first international school, dedicated to Big Data and related topics. It is the first large-scale event beyond the Urals, dedicated to Big Data. The school brought together renowned researchers from Russia, UK, USA, and Italy.
During school, everybody could attend the scientists’ speeches, lectures, and seminars. The researchers described the use of neural networks in industrial cybersecurity, told about methods of searching for asteroids in near-earth orbits, machine learning and data processing at the Large Hadron Collider. In the future, Big Data technologies will allow working with far greater amount of information and researchers believe that it will improve quality of life, change transport conditions, improve the accuracy of weather forecasts, and others.
In addition, the University developed a new English-language Master degree program in Big Data Technology, the first students were enrolled in 2017.