Hire Database in Hadoop

  • Data model:
  • Tables:
    • Analogous to tables in relational database.
    • Each table has a corresponding HDFS Dir.
    • Data is serialized and stored in files within Dir.
    • Support external tables in data stored in HDFS, NFS or local directory.
    • Typed columns (int, float, string, date, Boolean) – also list: map (for JSON – like data).
  • Partitions:
  • Analogous to dense indexes on partition columns.
  • Nested subdirectories in HDFS for each combination of partition column values.
  • Allows users to efficiently retrieve tows.
  • Table can have one or more partitions (1-level) which determine the distribution of data within subdirectories of table directory.

Read also about Explain the difference between NAS and HDFS Category here

Ex: Table T under /wh/T and is partitioned column on ds + Ctry

For ds = 20090101

Ctry     = US

Then data is stored within dir.

/wh/T/ds = 20090101/Ctry=Us.

Tutorialchat provides all updates on Interview questions to learn about Hadoop for Online Hadoop Tutorial.