Projects in Hadoop and Big Data - Learn by Building Apps
WEBRip | MP4/AVC, ~504 kb/s | 1278 x 718 | English: AAC, 59.2 kb/s (2 ch), 44.1 KHz | 2.64 GB
Genre: Development / Software Engineering | Language: English | +Project Files
This course will help you take a quantum jump and will help you build Hadoop solutions that will solve real world problems. However we must warn you that this course is not for the faint hearted and will test your abilities and knowledge while help you build a cutting edge knowhow in the most happening technology space. The course focuses on the following topics
Add Value to Existing Data - Learn how technologies such as Mapreduce applies to Clustering problems. The project focus on removing duplicate or equivalent values from a very large data set with Mapreduce.
Hadoop Analytics and NoSQL - Parse a twitter stream with Python, extract keyword with apache pig and map to hdfs, pull from hdfs and push to mongodb with pig, visualise data with node js . Learn all this in this cool project.
Kafka Streaming with Yarn and Zookeeper - Set up a twitter stream with Python, set up a Kafka stream with java code for producers and consumers, package and deploy java code with apache samza.
Real-Time Stream Processing with Apache Kafka and Apache Storm - This project focus on twitter streaming but uses Kafka and apache storm and you will learn to use each of them effectively.
Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache Solr- Set up the relational schema for a Health Care Data dictionary used by the US Dept of Veterans Affairs, demonstrate underlying technology and conceptual framework. Demonstrate issues with certain join queries that fail on MySQL, map technology to a Hadoop/Hive stack with Scoop and HCatalog, show how this stack can perform the query successfully.
Log collection and analytics with the Hadoop Distributed File System using Apache Flume and Apache HCatalog - Use Apache Flume and Apache HCatalog to map real time log stream to hdfs and tail this file as Flume event stream. , Map data from hdfs to Python with Pig, use Python modules for analytic queries
Data Science with Hadoop Predictive Analytics - Create structured data with Mapreduce, Map data from hdfs to Python with Pig, run Python Machine Learning logistic regression, use Python modules for regression matrices and supervise training
Visual Analytics with Apache Spark on Yarn - Create structured data with Mapreduce, Map data from hdfs to Python with Spark, convert Spark dataframes and RDD’s to Python datastructures, Perform Python visualisations
Customer 360 degree view, Big Data Analytics for e-commerce - Demonstrate use of EComerce tool ‘Datameer’ to perform many fof the analytic queries from part 6,7 and 8. Perform queries in the context of Senitment analysis and Twiteer stream.
Putting it all together Big Data with Amazon Elastic Map Reduce - Rub clustering code on AWS Mapreduce cluster. Using AWS Java sdk spin up a Dedicated task cluster with the same attributes.
So after this course you can confidently built almost any system within the Hadoop family of technologies. This course comes with complete source code and we are working to get you the fully functional virtual machine for all the projects to get you started quickly. The course also comes with English captions. So buckle up and join us on our journey into the Big Data.
What are the requirements?
- Working knowledge of Hadoop is expected before starting this course
- Basic programming knowledge of Java and Python will be great
What am I going to get from this course?
- Over 42 lectures and 10 hours of content!
- Understand the Hadoop Ecosystem and Associated Technologies
- Learn Concepts to Solve Real World Problems
- Learn the Updated Changes in Hadoop
- Use Code Examples Present Here to Create Your own Big Data Services
What is the target audience?
- Students who want to use Hadoop and Big Data in their Workplace and want to learn the implementation details for big data technologies.
Section 1: Introduction
Lecture 1 Introduction 03:32
Section 2: Add Value to Existing Data with Mapreduce
Lecture 2 Introduction to the Project 15:00
Lecture 3 Build and Run the Basic Code 14:08
Lecture 4 Understanding the Code 13:54
Lecture 5 Dependencies and packages 14:43
Section 3: Hadoop Analytics and NoSQL
Lecture 6 Introduction to Hadoop Analytics 15:46
Lecture 7 Introduction to NoSQL Database 15:28
Lecture 8 Solution Architecture 14:50
Lecture 9 Installing the Solution 09:11
Section 4: Kafka Streaming with Yarn and Zookeeper
Lecture 10 Introduction to Kafka Yarn and Zookeeper 14:29
Lecture 11 Code Structure 15:22
Lecture 12 Creating Kafka Streams 15:17
Lecture 13 Yarn Job with Samza 15:35
Section 5: Real Time Stream processing with Apache Kafka and Apache Storm
Lecture 14 Real Time Streaming 15:09
Lecture 15 Hortonbox Virtual Machine 14:58
Lecture 16 Running in Cluster Mode 15:30
Lecture 17 Submitting the Storm Jar 13:58
Section 6: Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache S
Lecture 18 Introduction to the Project 14:14
Lecture 19 Introduction to HDDAccess 14:46
Lecture 20 Sqoop, Hive and Solr 13:56
Lecture 21 Hive Usage 16:06
Section 7: Log collection and analytics with the Hadoop Distributed File System using Apach
Lecture 22 Apache Flume and HCatalog 15:18
Lecture 23 Install and Configure Apache Flume 14:51
Lecture 24 Visualisation of the Data 14:51
Lecture 25 Embedded Pig Scripts 13:36
Section 8: Data Science with Hadoop Predictive Analytics
Lecture 26 Introduction to Data Science 14:48
Lecture 27 Source Code Review 14:52
Lecture 28 Setting Up the Machine 15:09
Lecture 29 Project Review 15:10
Section 9: Visual Analytics with Apache Spark on Yarn
Lecture 30 Project Setup 15:30
Lecture 31 Setting Up Java Dependencies 15:24
Lecture 32 Spark Analytics with PySpark 15:36
Lecture 33 Bringing it all together 13:50
Section 10: Customer 360 degree view, Big Data Analytics for e-commerce
Lecture 34 Ecommerce and Big Data 14:59
Lecture 35 Installing Datameer 15:43
Lecture 36 Analytics and Visualizations 15:50
Lecture 37 Demonstration 13:29
Section 11: Putting it all together Big Data with Amazon Elastic Map Reduce
Lecture 38 Introduction to the Project 15:55
Lecture 39 Configuration 15:28
Lecture 40 Setting Up Cluster on EMR 15:01
Lecture 41 Dedicated Task Cluster on EMR 15:28
Section 12: Summary
Lecture 42 Summary 02:03