Posts

Introduction to Hadoop – HDFS and Map Reduce

Image
In the last post, we have went through the History of Hadoop . In this blog we will understand about What is Hadoop ? What does it consists of ? and Where it is used? The Hadoop platform consists of two key services: a reliable, distributed file system called Hadoop Distributed File System (HDFS) and the high-performance parallel data processing engine called Hadoop MapReduce . Hadoop was created by Doug Cutting and named after his son’s toy elephant. Vendors that provide Hadoop-based platforms include Cloudera, Hortonworks, MapR, Greenplum, IBM and Amazon. Data Distribution Data distribution used in Hadoop is parallel processing and the file system used here is Distributed File System. Advantages of Distributed File Systems are [1] I/O Speed [2] Less processing time $ Imagine one single machine which is processing 1 TB of data. So, within some time it will process it. But what if the data is more? For example say 500TB? If it takes like 45 min to process