cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Hadoop Hive migration to Databricks

ShankarM
New Contributor III

Hi,

Can you let me know what are the challenges and how to mitigate while migrating Hive objects to Databricks. I could not get any information on this. 

Can you please provide.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager
Hi @ShankarM, Here are the main considerations and mitigation strategies:

 

  1. Hive Workload Adaptation:

    • Challenge: Adapting existing Hadoop workloads to Databricksโ€™ advanced analytics framework.
    • Mitigation: Rework and fine-tune your Hadoop workloads to leverage Databricksโ€™ speedy, in-memory processing. 
  2. Architecture Differences:

    • Challenge: Hadoop and Databricks have distinct architectures.
    • Mitigation: Understand the differences. Hadoop operates across multiple hardware systems, while Databricks offers a unified analytics platform built on Apache Spark. 
  3. Hive Metastore and UDFs:

    • Challenge: Hive SerDe and UDFs need adjustments.
    • Mitigation:
      • Update Hive SerDe to use Databricks-native file codecs (change DDL from STORED AS to USING).
      • Install Hive UDFs as libraries or refactor them to native Spark.
      • Adjust directory structure for tables (Databricks uses partitions differently).
  4. SQL Workloads:

    • Challenge: Migrating SQL workloads from other systems.
    • Mitigation:
      • Refactor SQL pipelines as needed (Databricks uses Delta Lake by default).
      • Configure access to external data sources.

If you encounter specific issues during the migration, feel free to ask for further assistance! ๐Ÿ˜Š

 

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager
Hi @ShankarM, Here are the main considerations and mitigation strategies:

 

  1. Hive Workload Adaptation:

    • Challenge: Adapting existing Hadoop workloads to Databricksโ€™ advanced analytics framework.
    • Mitigation: Rework and fine-tune your Hadoop workloads to leverage Databricksโ€™ speedy, in-memory processing. 
  2. Architecture Differences:

    • Challenge: Hadoop and Databricks have distinct architectures.
    • Mitigation: Understand the differences. Hadoop operates across multiple hardware systems, while Databricks offers a unified analytics platform built on Apache Spark. 
  3. Hive Metastore and UDFs:

    • Challenge: Hive SerDe and UDFs need adjustments.
    • Mitigation:
      • Update Hive SerDe to use Databricks-native file codecs (change DDL from STORED AS to USING).
      • Install Hive UDFs as libraries or refactor them to native Spark.
      • Adjust directory structure for tables (Databricks uses partitions differently).
  4. SQL Workloads:

    • Challenge: Migrating SQL workloads from other systems.
    • Mitigation:
      • Refactor SQL pipelines as needed (Databricks uses Delta Lake by default).
      • Configure access to external data sources.

If you encounter specific issues during the migration, feel free to ask for further assistance! ๐Ÿ˜Š

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group