- Data model:
- Analogous to tables in relational database.
- Each table has a corresponding HDFS Dir.
- Data is serialized and stored in files within Dir.
- Support external tables in data stored in HDFS, NFS or local directory.
- Typed columns (int, float, string, date, Boolean) – also list: map (for JSON – like data).
- You can also read about What is Big Data and How Can It Help Increase Revenue? here
- Analogous to dense indexes on partition columns.
- Nested subdirectories in HDFS for each combination of partition column values.
- Allows users to efficiently retrieve tows.
- Table can have one or more partitions (1-level) which determine the distribution of data within subdirectories of table directory.
Read also about Explain the difference between NAS and HDFS Category here
Ex: Table T under /wh/T and is partitioned column on ds + Ctry
For ds = 20090101
Ctry = US
Then data is stored within dir.
/wh/T/ds = 20090101/Ctry=Us.