cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Beginner in Data Engineering + AI-Looking for Learning Path Guidanc

Naziam
Visitor

Hello everyone,

I’m a beginner who is starting my journey in Data Engineering and AI Engineering. I’m currently learning basic concepts and trying to understand how everything connects in real-world projects.

My goal is to become a Data Engineer / AI Engineer (Databricks-focused).

I would really appreciate guidance on:

  • What should I learn first in Databricks (Lakehouse, Spark, pipelines, etc.)
  • Best beginner-friendly learning path or resources
  • Small projects I can build to practice
  • Skills needed to become job-ready in this field

I’m very motivated to learn consistently and would love to follow a proper roadmap from experienced professionals here.

Thank you in advance 

1 REPLY 1

amirabedhiafi
Contributor

Hi Naziam,

I will share with your my learning path with some tips :

- Learn SQL well, then Python basics such as lists, dictionaries, functions, files, and simple data processing. These are essential before going deep into Spark.

- Understand what ETL/ELT means, how data moves from source systems to bronze/silver/gold layers and how batch pipelines differ from streaming pipelines.

- Learn the Databricks workspace, notebooks, clusters/compute, catalogs, schemas, tables, and Delta Lake. The Lakehouse concept is important because Databricks combines data lake, data warehouse, analytics, and AI workloads in one platform. Databricks has official Learning Paths for data engineering and machine learning topics. https://community.databricks.com/t5/learning-paths/ct-p/databricks-learning-paths

- You need also to focus on DataFrames, Spark SQL, joins, aggregations, window functions, partitioning and performance basics. Microsoft also has an Azure Databricks learning path covering Spark DataFrames, Spark SQL, PySpark, Delta tables, workspace navigation and clusters. https://learn.microsoft.com/en-us/training/paths/data-engineer-azure-databricks

- Learn how to load files, clean data and build repeatable pipelines. Databricks Auto Loader is useful because it incrementally processes new files as they arrive in cloud storage. https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/

- Practice building bronze, silver, and gold tables. Learn Delta features like schema enforcement, updates or merges, time travel and data quality checks.

- Learn Databricks Workflows or Lakeflow pipelines to schedule and manage jobs. Databricks documentation has examples for building ETL pipelines with CDC and Lakeflow Spark Declarative Pipelines. https://docs.databricks.com/aws/en/ldp/tutorial-pipelines

Once you are comfortable with data engineering, start learning ML/AI concepts: feature tables, model training basics, vector search, RAG, MLflow, and model deployment. Do not jump directly to GenAI before understanding how clean, governed data pipelines work.

For practice projects, start small:

  • Build a CSV-to-Delta pipeline using bronze, silver and gold tables.

  • Create a sales analytics lakehouse with customers, products, and orders.

  • Build an incremental ingestion pipeline using Auto Loader.

  • Create a simple streaming project using JSON files or events.

  • Build a small RAG chatbot using cleaned documents stored in Databricks.

To become job-ready, focus on:

  • SQL

  • Python

  • PySpark

  • Delta Lake

  • Medallion architecture

  • Databricks Workflows / pipelines

  • Git basics

  • Cloud fundamentals

  • Data modeling

  • Data quality and testing

  • Basic CI/CD concepts

  • Communication and documentation skills

For certification, a you can look at the Databricks Certified Data Engineer Associate exam. It is designed around using the Databricks Lakehouse Platform for data engineering tasks. 

https://www.databricks.com/learn/certification/data-engineer-associate

 

My advice do not try to learn everything at once. Build one small project every few weeks, document it on GitHub and explain the business problem, architecture, tables, pipeline, and output. That will help you learn much faster and also build a portfolio for job applications.

Good luck with your learning journey!

Keep in mind that learning is a continuous path 😄

If this answer resolves your question, could you please mark it as “Accept as Solution”? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP