cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Just a beginner in Data Engineer

DataSax
New Contributor III

Hi Everyone,

I am happy to be part of this great community.

I just determined to be a Data Engineer by profession and I will need a lot of advice on how I can quickly grab it and become  a professional.

I have Python Programming knowledge and Web development skills.

More advice will be appreciated.

DataSax
2 ACCEPTED SOLUTIONS

Accepted Solutions

Rishabh-Pandey
Esteemed Contributor

Welcome to the community! @DataSax  ๐ŸŽ‰

It's fantastic to hear that youโ€™re aspiring to become a Data Engineer. This is a dynamic and rewarding field, and with your background in Python and web development, you already have a strong foundation to build upon.

Here are a few steps and pieces of advice to help you on your journey:

  1. Deepen Your Understanding of Data Engineering Concepts:

    • Data Warehousing: Learn about data warehousing concepts, including data modeling, ETL processes (Extract, Transform, Load), and the various architectures used in modern data warehousing.
    • Big Data Technologies: Get familiar with big data frameworks like Apache Spark, Hadoop, and Kafka. These tools are essential for handling and processing large datasets efficiently.
    • Cloud Platforms: Explore cloud services from providers like AWS, Azure, and Google Cloud. Databricks, in particular, offers a powerful platform for managing and processing data at scale.
  2. Master SQL and Data Manipulation:

    • SQL Proficiency: SQL is crucial for querying and managing data in relational databases. Ensure youโ€™re comfortable with writing complex queries and understanding database structures.
    • Data Transformation: Learn how to clean, transform, and manipulate data using tools like Pandas in Python, as these skills are critical for preparing data for analysis.
  3. Learn About Data Pipelines:

    • ETL Processes: Understand how to design and implement ETL pipelines to move data from various sources into data warehouses or lakes.
    • Workflow Orchestration: Familiarize yourself with tools like Apache Airflow or Azure Data Factory for scheduling and managing data workflows.
  4. Explore Databricks and Delta Lake:

    • Databricks Platform: As a Databricks Certified Professional Data Engineer, I highly recommend diving into the Databricks platform. Itโ€™s an excellent environment for learning about big data processing and analytics.
    • Delta Lake: Learn how Delta Lake improves data reliability and performance, and how it integrates with the broader Databricks ecosystem.
  5. Practical Experience:

    • Hands-On Projects: Apply what youโ€™ve learned by working on projects that involve data ingestion, transformation, and analysis. Real-world experience is invaluable.
    • Certification Paths: Consider pursuing certifications like the Databricks Certified Associate Developer for Apache Spark or the Databricks Certified Professional Data Engineer. These can validate your skills and open up new career opportunities.
  6. Stay Updated and Connected:

    • Community and Networking: Engage with communities like this one, attend meetups, and participate in forums to stay updated with the latest trends and best practices.
    • Continuous Learning: The field of data engineering is constantly evolving. Keep learning through courses, tutorials, and by following industry leaders on platforms like LinkedIn.
  7. Ask for Help and Share Your Journey:

    • Donโ€™t hesitate to ask questions and seek guidance. The community is here to support you.
    • Share your progress, challenges, and successes. This can be inspiring for others and can help you stay motivated.
Rishabh Pandey

View solution in original post

szymon_dybczak
Contributor III

Most import thing, at least at the beginning of your data journey is to grasp a good understanding of SQL. It's cornerstone in data world.
Definitely you should familiarize yourself with a concept of data modeling, especially dimensional modeling that is encounter most often, to get used to terms like fact table, dimension table, slowly changing dimensions etc.
Other than that, it's useful to have some cloud knowledge under your belt because nowadays we're doing data projects on cloud platforms like Azure, AWS, GCP.
And since you know python, you should focus your attention on pyspark api when you'll be learning Spark/Databricks.

Good luck,
Slash

View solution in original post

4 REPLIES 4

Rishabh-Pandey
Esteemed Contributor

Welcome to the community! @DataSax  ๐ŸŽ‰

It's fantastic to hear that youโ€™re aspiring to become a Data Engineer. This is a dynamic and rewarding field, and with your background in Python and web development, you already have a strong foundation to build upon.

Here are a few steps and pieces of advice to help you on your journey:

  1. Deepen Your Understanding of Data Engineering Concepts:

    • Data Warehousing: Learn about data warehousing concepts, including data modeling, ETL processes (Extract, Transform, Load), and the various architectures used in modern data warehousing.
    • Big Data Technologies: Get familiar with big data frameworks like Apache Spark, Hadoop, and Kafka. These tools are essential for handling and processing large datasets efficiently.
    • Cloud Platforms: Explore cloud services from providers like AWS, Azure, and Google Cloud. Databricks, in particular, offers a powerful platform for managing and processing data at scale.
  2. Master SQL and Data Manipulation:

    • SQL Proficiency: SQL is crucial for querying and managing data in relational databases. Ensure youโ€™re comfortable with writing complex queries and understanding database structures.
    • Data Transformation: Learn how to clean, transform, and manipulate data using tools like Pandas in Python, as these skills are critical for preparing data for analysis.
  3. Learn About Data Pipelines:

    • ETL Processes: Understand how to design and implement ETL pipelines to move data from various sources into data warehouses or lakes.
    • Workflow Orchestration: Familiarize yourself with tools like Apache Airflow or Azure Data Factory for scheduling and managing data workflows.
  4. Explore Databricks and Delta Lake:

    • Databricks Platform: As a Databricks Certified Professional Data Engineer, I highly recommend diving into the Databricks platform. Itโ€™s an excellent environment for learning about big data processing and analytics.
    • Delta Lake: Learn how Delta Lake improves data reliability and performance, and how it integrates with the broader Databricks ecosystem.
  5. Practical Experience:

    • Hands-On Projects: Apply what youโ€™ve learned by working on projects that involve data ingestion, transformation, and analysis. Real-world experience is invaluable.
    • Certification Paths: Consider pursuing certifications like the Databricks Certified Associate Developer for Apache Spark or the Databricks Certified Professional Data Engineer. These can validate your skills and open up new career opportunities.
  6. Stay Updated and Connected:

    • Community and Networking: Engage with communities like this one, attend meetups, and participate in forums to stay updated with the latest trends and best practices.
    • Continuous Learning: The field of data engineering is constantly evolving. Keep learning through courses, tutorials, and by following industry leaders on platforms like LinkedIn.
  7. Ask for Help and Share Your Journey:

    • Donโ€™t hesitate to ask questions and seek guidance. The community is here to support you.
    • Share your progress, challenges, and successes. This can be inspiring for others and can help you stay motivated.
Rishabh Pandey

Dear Rishabh264,

Thank you for the proper analysis, I am much appreciated. And I will follow your recommendations.

Best regards 

DataSax

szymon_dybczak
Contributor III

Most import thing, at least at the beginning of your data journey is to grasp a good understanding of SQL. It's cornerstone in data world.
Definitely you should familiarize yourself with a concept of data modeling, especially dimensional modeling that is encounter most often, to get used to terms like fact table, dimension table, slowly changing dimensions etc.
Other than that, it's useful to have some cloud knowledge under your belt because nowadays we're doing data projects on cloud platforms like Azure, AWS, GCP.
And since you know python, you should focus your attention on pyspark api when you'll be learning Spark/Databricks.

Good luck,
Slash

Dear Slash,

Thank you for your brief and I understand deeply what to worked on as you also analyze.

Best regards 

DataSax

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group