Installing Hadoop HDFS data lake in Raspberry Pi 4 Ubuntu cluster
Introduction
Weeks ago I decided to start creating an experimental home size "Big data" system based on Apache Spark. The first step for it is to create a distributed filesystem where Apache Spark will read and write eveything.
HDFS is the Hadoop distributed filesystem which provides features like: fault detection and recovery, huge datasets, hardware at data, etc... despite it is a Hadoop ecosystem piece, it works nice as the data distributed filesytem for Apache Spark.
(...)
[Continue Reading]