<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Machine Learning Professional Preparation in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129424#M594</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/171339"&gt;@TheOC&lt;/a&gt;&amp;nbsp;, thanks for the kind words. I did run into a bit of difficulty choosing the vector store—I went with Postgres/pgvector as a middle ground for response time and volume, with a path to scale later to Aurora. For the exam, they expect familiarity with using Delta tables. Also, there are other open-source vector databases to consider, and the choice should depend on each project’s context. Hope that helps!&lt;/P&gt;</description>
    <pubDate>Sat, 23 Aug 2025 11:24:23 GMT</pubDate>
    <dc:creator>WiliamRosa</dc:creator>
    <dc:date>2025-08-23T11:24:23Z</dc:date>
    <item>
      <title>Databricks Machine Learning Professional Preparation</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129420#M592</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="WiliamRosa_0-1755947321744.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19297i4AE31FAAFA9D5DD3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="WiliamRosa_0-1755947321744.png" alt="WiliamRosa_0-1755947321744.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Recently I earned the Databricks Machine Learning Professional certification and wanted to share my study journey. Before the exam, I worked on a project as a data engineer alongside data scientists (ML models, LLMs, MLflow). That led me to build a personal RAG project on Databricks, which ended up preparing me for many exam topics. Below is a compact flow of that project plus the official prep guide and a Udemy practice test.&amp;nbsp;&lt;/P&gt;&lt;P&gt;My Databricks RAG Lab - flow summary&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) Goal &amp;amp; stack&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Goal: practice end-to-end (ingest -&amp;gt; embed -&amp;gt; retrieve -&amp;gt; answer) on Databricks&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Stack: Databricks Runtime; PyMuPDF (PDF parsing); Hugging Face E5 via Databricks Model Serving; PostgreSQL + pgvector (vector store); LangChain (RAG); secrets with dbutils.secrets; Jobs/Workflows (orchestration)&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) Ingestion&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Upload PDFs to DBFS (/tmp/.../docs/) with UUID filenames&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Keep upload separated from processing for scale and observability&amp;nbsp;&lt;/P&gt;&lt;P&gt;3) Orchestration&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Databricks Workflow scans the folder and triggers the processor notebook with the file path as a parameter&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Simple, re-runnable pipeline&amp;nbsp;&lt;/P&gt;&lt;P&gt;4) Parsing &amp;amp; chunking&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Extract text with PyMuPDF&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Chunk to fit embedding/token limits and carry metadata (file, user, timestamps)&amp;nbsp;&lt;/P&gt;&lt;P&gt;5) Embeddings (Model Serving)&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Call a Model Serving endpoint with E5 to generate embeddings&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Decouple model choice so you can swap models without rewriting the pipeline&amp;nbsp;&lt;/P&gt;&lt;P&gt;6) Vector storage&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Store chunks + vectors + metadata in PostgreSQL/pgvector&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Use SQL for Top-K similarity; easy to debug and cost-predictable&amp;nbsp;&lt;/P&gt;&lt;P&gt;7) Retrieval (Top-K)&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Embed the question -&amp;gt; run Top-K vector search in pgvector -&amp;gt; fetch relevant chunks&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; Generation (RAG)&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Build a prompt with question + retrieved chunks and call an LLM endpoint for grounded answers&amp;nbsp;&lt;/P&gt;&lt;P&gt;9) Ops, security, observability&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Secrets via secret scopes (DB creds, endpoints)&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Layout ready for multi-tenant isolation if needed&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Simple metrics (latency, Top-K size, document counts); optional MLflow versioning&amp;nbsp;&lt;/P&gt;&lt;P&gt;10) Why this helped for the exam&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Exercises Jobs/Workflows and Model Serving (orchestration and deployment)&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Hands-on with feature engineering/embeddings and modular pipelines&amp;nbsp;&lt;/P&gt;&lt;P&gt;- MLOps basics: reproducibility, secrets, cost/performance trade-offs&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Practice discussing governance and best practices aligned to exam objectives&amp;nbsp;&lt;/P&gt;&lt;P&gt;Official prep guide:&amp;nbsp;&lt;BR /&gt;&lt;A href="https://www.databricks.com/learn/certification/machine-learning-professional" target="_blank"&gt;https://www.databricks.com/learn/certification/machine-learning-professional&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;Udemy practice test:&amp;nbsp;&lt;BR /&gt;&lt;A href="https://www.udemy.com/course/databricks-machine-learning-professional-practice-test/" target="_blank"&gt;https://www.udemy.com/course/databricks-machine-learning-professional-practice-test/&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wish you all the best!&amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 11:09:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129420#M592</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-23T11:09:10Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Machine Learning Professional Preparation</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129422#M593</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hey&amp;nbsp;&lt;/SPAN&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&lt;SPAN&gt;&amp;nbsp;, this is a super cool write up, thanks for sharing!&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Just curious: did you face any unexpected challenges on this project?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 11:14:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129422#M593</guid>
      <dc:creator>TheOC</dc:creator>
      <dc:date>2025-08-23T11:14:54Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Machine Learning Professional Preparation</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129424#M594</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/171339"&gt;@TheOC&lt;/a&gt;&amp;nbsp;, thanks for the kind words. I did run into a bit of difficulty choosing the vector store—I went with Postgres/pgvector as a middle ground for response time and volume, with a path to scale later to Aurora. For the exam, they expect familiarity with using Delta tables. Also, there are other open-source vector databases to consider, and the choice should depend on each project’s context. Hope that helps!&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 11:24:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129424#M594</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-23T11:24:23Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Machine Learning Professional Preparation</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129425#M595</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;it certainly does help!&amp;nbsp;&lt;BR /&gt;Thanks for the insight.&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 11:30:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129425#M595</guid>
      <dc:creator>TheOC</dc:creator>
      <dc:date>2025-08-23T11:30:33Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Machine Learning Professional Preparation</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129437#M596</link>
      <description>&lt;P&gt;Thanks a bunch for sharing this&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;.&amp;nbsp;I've bookmarked this and&amp;nbsp;I'll be using this as my reference guide when I get deeper into ML later in the year &lt;span class="lia-unicode-emoji" title=":crossed_fingers:"&gt;🤞&lt;/span&gt;&lt;span class="lia-unicode-emoji" title=":smirking_face:"&gt;😏&lt;/span&gt;. That project looks so freaking cool by the way!! Bravo, sir&lt;span class="lia-unicode-emoji" title=":clapping_hands:"&gt;👏&lt;/span&gt;.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;All the best.&lt;BR /&gt;BS&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 13:21:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129437#M596</guid>
      <dc:creator>BS_THE_ANALYST</dc:creator>
      <dc:date>2025-08-23T13:21:59Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Machine Learning Professional Preparation</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129446#M597</link>
      <description>&lt;P&gt;Thanks a lot, my friend &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;&amp;nbsp;! Really glad you found it useful &lt;span class="lia-unicode-emoji" title=":raising_hands:"&gt;🙌&lt;/span&gt;. I’m sure when you dive into ML later this year, you’ll do awesome things with it. Appreciate the kind words about the project — means a lot! &lt;span class="lia-unicode-emoji" title=":rocket:"&gt;🚀&lt;/span&gt;&lt;/P&gt;&lt;P&gt;All the best to you too, and let’s keep learning and sharing along the way &lt;span class="lia-unicode-emoji" title=":flexed_biceps:"&gt;💪&lt;/span&gt;.&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 18:08:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-machine-learning-professional-preparation/m-p/129446#M597</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-23T18:08:37Z</dc:date>
    </item>
  </channel>
</rss>

