Definition of Big data, History of Data Management, Big data characteristics : Volume, Variety, Velocity, Veracity, Analytics, Basic nomenclature , Analytics process model , Analytical model requirements , Types of data sources , Sampling , Types of data elements , Missing values , Standardizing data , Outlier detection and treatment , Categorization .

A brief history of Hadoop, The Hadoop ecosystem, Hadoop release, The building blocks of Hadoop, Name node-data node, secondary name node, Job tracker, Task tracker, The Hadoop Distributed File System: The design of HDFS, HDFS concepts, Hadoop file systems.

NoSQL data modeling techniques: Types of NoSQL stores Choice of database system, JSON, Column Family Databases, Operations on column family, Understanding Cassandra data model, Designing Cassandra data structures, Schema migration approach using ETL.

MapReduce workflows, How MapReduce works, Anatomy of MapReduce : MapReduce1, MapReduce2, Failures in classic MapReduce, YARN, Failure in YARN, Job scheduling - The fair scheduler, The capacity scheduler, Shuffle and sort in MapReduce.

In- memory computing technology, Real-time analytics, CAP Theorem, Use of In-memory data grid, Map-Reduce and real time processing, Real-time analysis of machine generated data
Data Scientist : Definition, Big Data flow, Data scientist activities.