Staff Member #1
Biography of instructor/staff member #1
The Big Data Technologies course comprehensively overviews key concepts and tools for handling large-scale data. The curriculum covers the Hadoop ecosystem, including HDFS, MapReduce, and YARN. Students learn about distributed file systems, parallel processing, and NoSQL databases like HBase and MongoDB. The course explores data ingestion techniques using tools like Sqoop and Flume and data processing with Apache Pig and Hive. Advanced topics include Apache Spark for in-memory processing and machine learning with MLlib. Students gain hands-on experience with PySpark, focusing on RDDs, DataFrames, and Spark SQL. The course also covers data quality, preprocessing, and storage strategies. By the end, students are equipped to design, implement, and manage big data solutions, preparing them for data engineering and analytics roles across various industries.
Programming skills (Python/Java), database knowledge, understanding of distributed systems, basic statistics, and familiarity with cloud computing.
Biography of instructor/staff member #1
Biography of instructor/staff member #2
The Open edX platform works best with current versions of Chrome, Edge, Firefox, or Safari.
See our list of supported browsers for the most up-to-date information.