How Data Oriented Companies like Facebook stores and process data ?
Have you ever wondered how companies like Facebook stores data ?
Facebook generates 4 petabytes of data per day — that’s a million gigabytes. All that data is stored into Hive, which contains about 300 petabytes of data. It is the fact that Facebook is the third busiest site over the internet. With tens on millions of posts daily, Facebook ends up with a massive amount of data. In short terms we call such a huge amount of data as BIG DATA. How those data is stored or processed ? You actually end up with a solution named as Hadoop. In order to store and process such a large amout of data, it uses Hadoop Distributed File System (HDFS) and Hadoop Clusters.
You might know that Hadoop is a open source project. Now you will get to know how data is stored and manipulated in Data oriented companies using Hadoop Clusters.
Hadoop :
Hadoop is a open source project which is created for Distributed Storage Clusters where we have multiple number of Nodes connected to each other and shares their own storage/RAM/CPU(Resources) to a particular node called as Master Node or in Hadoop terms we call it as Name Node(NN) and the one who contributed their resources are called as Slave Nodes or Data Nodes(DN)
It is like a Topology in which there is One to many mode of connection.

By using Hadoop Distributed Storage Clusters, a master with the help/contribution of slaves can store any amount of data in which the data is striped or divided into segments and those segments are stored in slaves and whenever master node needs and want to process the data, it can get the data from Slaves. By using such a system Velocity(How speed the data can be accessed) and Volume(How much amount of data can be stored) can be improved.
For better understanding, let me tell you this with an example.
In distributed Storage Cluster, I(master) need to store 100GB of data. But unfortunately in a single system we can not be able to store 100GB of data(Assume). But in such condition, I can ask my 5 friends(in Hadoop terms — Slaves) to share their resources to store 20GB of data in each of their system and connect with me with networking. In a single system if we want to store 200GB of data it would be Time Consuming around 20 minutes (assume if we want to store 1GB it takes 1 minute). By asking our friends to store 20 GB, within 2 Minutes, a master node can save 200Gb of data (Faster than the normal Storing) and processing(Searching or any manipulation) can also be done very quickly.
Like these, data oriented companies also have their own millions of slaves (servers). With the help of those servers/slaves the data can be stored and processed quicky.
This is a blog I have written as a part of my journey in ARTH — The School Of Technologies Program, Guided by World Record Holder Mr. Vimal Daga Sir.