The amount of data that many organisations need to be able to process is increasing. This course will introduce the technologies used build systems that scales to handle very large volumes of data.
Many organisations face growing volumes of data. Extracting value from all this data can be challenging, since traditional systems based on relational databases are not suitable for big data. This course is focused on the Hadoop and Spark ecosystems and will teach the skills needed to build modern big data applications.
The course will put emphasis on giving the participants practical hands-on experience with the tools. The technologies introduced as part of the course will be centered around the Hadoop/Spark open source ecosystem.
The course will be as vendor neutral as possible, but the practical hands-on exercises will be run on Google Cloud Platform. All the principles taught will be transferable to other cloud providers or to on-premise solutions and differences between providers will be highlighted where relevant.
Course participants should have prior experience with writing code for data analysis, but no prior knowledge of Hadoop or Spark is required. In this instructor-led course, participants will go through hands-on sessions with planned exercises. The exercises will be in the Python programming language, but the focus will be on the overall concepts, not the particulars of the programming language, so deep knowledge of Python is not required for participants with programming experience from other languages. Hadoop and Spark readily supports Scala, Java and Python.
Participants are expected to bring their own laptop to the class, everything else needed for the course is provided.
Participants will be introduced to the fast evolving world of big data technologies.
Participants will have insight into the differences between traditional database systems and modern distributed systems.
Participants will gain hands-on experience with storing, processing and serving large amounts of data.
Participants will understand the main challenges of designing systems for big data.
Participants will be introduced to the programming style used in big data.
Will have an employee with a good understanding of when big data technologies are the right choice for solving a specific problem.
Will have an employee with the ability to get started with setting up big data infrastructure.
Will have an employee able to carry out analysis of large datasets.
The two-day course will be instructor led with hands-on exercises. The focus will be on giving the participants the knowledge and the confidence to get started with modern distributed data processing. The technologies that we will work with includes Apache Hadoop, Apache Spark and Hbase/BigTable.
There is no preparation before the course.
The course will be held during the period at. 9 am to 4 pm both days. IDA provides food, breakfast, lunch and afternoon cake.
After the course, participants will receive a course certificate
Andreas Koch has a background in data science and has for the past 6 years been architecting and developing data intensive applications based on Hadoop and Spark. Currently, Andreas is working as a data science consultant using his extensive experience with data processing systems to advice organisations on how to best leverage their data.
The training language and the study material will be in English at this course.