-
Essay / The Importance of Big Data - 997
.1 Big dataBig data[1] is described as an enormous, massive amount of data that is complex, diverse and heterogeneous in nature. This data is difficult to process, analyze and store by traditional systems[4]. According to a study carried out on Big data[1], it is studied that today we are flooded with digital data. Until 2003, 5 exabytes of data were created, but today this figure can be reached in 2 days. The volume of data is growing at such a rate that it could reach 8 zettabytes next year. Currently, to store all the world's data, billions of powerful computers would be needed. A wide variety of semi-structured or unstructured data is created by social media. 10 billion SMS messages are sent daily by mobile subscribers. The speed with which videos, audios, tweets, posts, emails and social interactions are created is truly unprecedented. Due to the massive increase in the amount of data, it is estimated that information is expected to increase 50-fold over the next 10 years. Big data has its importance in many fields such as storing logs in IT industries, storing and analyzing disease types in healthcare, optimizing digital data, social media interactions, demand forecasting and risk reduction in financial institutions. But security and privacy are considered the main problem of Big Data. This must therefore be remedied by implementing a framework that includes an authentication or cryptographic security system.2.2 MapReduceAt a time when we are inundated with data, parallel processing of tasks has become essential to process enormous amounts of data within a reasonable time. MapReduce[8] is a programming tool developed by Google for processing large datasets running on parallel computers such as product clusters. The main idea of MapReduce mode...... middle of paper ......on, that is, it does not move to the next step until all tasks in the The previous step has not been completed. but this property causes performance degradation and makes it difficult to support inline processing. An incremental MapReduce framework is developed, which processes data as streaming engines. Each task runs continuously with a sliding window. Their system generates outputs from MapReduce by reading the elements in the window.e) Performance Optimization: MapReduce programs are mainly used for data analysis. In order to complete these programs in a reasonable time, it is best to provide the automatic optimization functionality for MapReduce programs. A static analysis approach called MANIMAL for automatic optimization of a single Map-Reduce job is proposed. In their approach, a parser examines the program codes before execution without any execution information.2.3 HBase