Hadoop
大约 2 分钟
Hadoop
Apache Hadoop 软件库是一个框架,允许使用简单的编程模型在计算机集群间分布式处理大型数据集!
Apache Hadoop
- Hadoop Common - contains libraries and utilities needed by other Hadoop modules.
- Hadoop Distributed File System (HDFS) - a distributed file-system that provides high-throughput access to application data.
- Hadoop YARN - a framework for job scheduling and cluster resource management -the "operating system"of Hadoop!
- Hadoop MapReduce - a YARN-based system for parallel processing of large-scale data sets
Hadoop 2.x = HDFS + YARN
Roles of the cluster nodes
- Master Node(s):Typically one machine in the cluster is designated as the NameNode(NN)and another machine as the ResourceManager(RM).
- For simplicity,we put NN and RM at the same node
- Slave Nodes: The rest of the machines in the cluster act as both DataNode(DN)and NodeManager(NM).