The Oracle Linux operating system and Cloudera’s Distribution including Apache Hadoop (CDH) underlie all other software components installed on Oracle Big Data Appliance. CDH is an integrated stack of components that have been tested and packaged to work together.
CDH has a batch processing infrastructure that can store files and distribute work across a set of computers. Data is processed on the same computer where it is stored. In a single Oracle Big Data Appliance rack, CDH distributes the files and workload across 18 servers, which compose a cluster. Each server is a node in the cluster.
The software framework consists of these primary components:
  • File system: The Hadoop Distributed File System (HDFS) is a highly scalable file system that stores large files across multiple servers. It achieves reliability by replicating data across multiple servers without RAID technology. It runs on top of the Linux file system on Oracle Big Data Appliance.
  • MapReduce engine: The MapReduce engine provides a platform for the massively parallel execution of algorithms written in Java. Oracle Big Data Appliance 3.0 runs YARN by default.
  • Administrative frameworkCloudera Manager is a comprehensive administrative tool for CDH. In addition, you can use Oracle Enterprise Manager to monitor both the hardware and software on Oracle Big Data Appliance.
  • Apache projects: CDH includes Apache projects for MapReduce and HDFS, such as HivePigOozieZooKeeperHBase,Sqoop, and Spark.
  • Cloudera applications: Oracle Big Data Appliance installs all products included in Cloudera Enterprise Data Hub Edition, includingImpalaSearch, and Navigator.
Recommended Posts

Start typing and press Enter to search