<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 2021-07-Webinar--Hassle-Free-Data-Ingestion-Social-1200x628 in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/2021-07-webinar-hassle-free-data-ingestion-social-1200x628/m-p/14110#M746</link>
    <description>&lt;P&gt;Check out &lt;A href="https://databricks.com/p/webinar/hassle-free-data-ingestion-part-2?utm_source=databricks&amp;amp;utm_medium=web&amp;amp;utm_campaign=7013f000000Lj3VAAS&amp;amp;utm_content=community&amp;amp;_ga=2.119396361.1818799050.1636465053-461827103.1630341298&amp;amp;_gac=1.224608744.1634341140.CjwKCAjwzaSLBhBJEiwAJSRokld_VYTm09AIW8YPsuNmBXIq3fiJwkToOJcT9o4uj_WImoFhpZDJqRoCD8wQAvD_BwE" alt="https://databricks.com/p/webinar/hassle-free-data-ingestion-part-2?utm_source=databricks&amp;amp;utm_medium=web&amp;amp;utm_campaign=7013f000000Lj3VAAS&amp;amp;utm_content=community&amp;amp;_ga=2.119396361.1818799050.1636465053-461827103.1630341298&amp;amp;_gac=1.224608744.1634341140.CjwKCAjwzaSLBhBJEiwAJSRokld_VYTm09AIW8YPsuNmBXIq3fiJwkToOJcT9o4uj_WImoFhpZDJqRoCD8wQAvD_BwE" target="_blank"&gt;Part 2 of this Data Ingestion webinar&lt;/A&gt; to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.&lt;/P&gt;</description>
    <pubDate>Tue, 09 Nov 2021 14:32:13 GMT</pubDate>
    <dc:creator>Emily_S</dc:creator>
    <dc:date>2021-11-09T14:32:13Z</dc:date>
    <item>
      <title>2021-07-Webinar--Hassle-Free-Data-Ingestion-Social-1200x628</title>
      <link>https://community.databricks.com/t5/machine-learning/2021-07-webinar-hassle-free-data-ingestion-social-1200x628/m-p/14109#M745</link>
      <description>&lt;P&gt;Thanks to everyone who joined the &lt;B&gt;Hassle-Free Data Ingestion webinar&lt;/B&gt;. You can access the on-demand recording &lt;A href="https://databricks.com/p/webinar/hassle-free-data-ingestion?utm_source=databricks&amp;amp;utm_medium=web&amp;amp;utm_campaign=7013f000000cXwpAAE&amp;amp;utm_content=community" alt="https://databricks.com/p/webinar/hassle-free-data-ingestion?utm_source=databricks&amp;amp;utm_medium=web&amp;amp;utm_campaign=7013f000000cXwpAAE&amp;amp;utm_content=community" target="_blank"&gt;&lt;U&gt;here&lt;/U&gt;&lt;/A&gt;. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're sharing a subset of the phenomenal questions asked and answered throughout the session. You'll find &lt;U&gt;Ingestion Q&amp;amp;A&lt;/U&gt; listed first, followed by some &lt;U&gt;Delta Q&amp;amp;A&lt;/U&gt;. Please feel free to ask follow-up questions or add comments as threads.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;TOPIC: Ingestion including Auto Loader and COPY INTO &lt;/U&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Are there any out-of-the-box tools with plug-and-play transformations that are available from Databricks to build data ingestion pipelines?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;That is what Auto Loader and COPY INTO provide; with a few lines of script, you can build an advanced ingestion pipeline!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: What is COPY INTO and Auto Loader? Why have both?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;COPY INTO is SQL and batch only. Auto Loader is Python/Scala and streaming or batch. Auto Loader is also available in SQL in DLT. COPY INTO is a simpler API.&amp;nbsp;You can use both to write to a Delta table, but for complex ingestion workloads, we advise Auto Loader. Read more about Auto Loader and COPY INTO in the blog, &lt;A href="https://databricks.com/blog/2021/07/23/getting-started-with-ingestion-into-delta-lake.html" alt="https://databricks.com/blog/2021/07/23/getting-started-with-ingestion-into-delta-lake.html" target="_blank"&gt;Getting Started With Ingestion into Delta Lake&lt;/A&gt;. Since Auto Loader runs in a Databricks notebook (or Delta Live Table), you'll need to write your script either in Python, Scala or SQL.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Once a file is ingested, is the source file no longer needed for any rollback to an earlier point in time?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;That is correct, but they are good to keep around if you need to reprocess the file.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Are there plans to work with a schema of XML files?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Yes, it is possible to read XML as String and use any XML library to parse.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	The default data type on Auto Loader is always a string. Can we give hints?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Yes! Learn more about &lt;A href="https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-schema.html" alt="https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-schema.html" target="_blank"&gt;Auto Loader Schema Inference and Evolution capabilities&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Can the change data be directly ingested from databases?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;You would need to use a CDC tool like AWS DMS; read this &lt;A href="https://databricks.com/blog/2019/07/15/migrating-transactional-data-to-a-delta-lake-using-aws-dms.html" alt="https://databricks.com/blog/2019/07/15/migrating-transactional-data-to-a-delta-lake-using-aws-dms.html" target="_blank"&gt;blog with more details&lt;/A&gt;. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Is there any concise list of data source connectors?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/data/data-sources/index.html" alt="https://docs.databricks.com/data/data-sources/index.html" target="_blank"&gt;https://docs.databricks.com/data/data-sources/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Is there an interface for Nifi?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;This talk might be interesting for you, &lt;A href="https://databricks.com/session/story-deduplication-and-mutation" alt="https://databricks.com/session/story-deduplication-and-mutation" target="_blank"&gt;Story Deduplication and Mutation. &lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: How is Azure Event Hub supported for ingestion?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;As a stream ingest, see &lt;A href="https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/data/stream-processing-databricks#event-hubs" alt="https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/data/stream-processing-databricks#event-hubs" target="_blank"&gt;this doc for more info&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Are there any cookie-cutter templates available from Databricks for some common use cases?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;We have &lt;A href="https://databricks.com/solutions/accelerators" alt="https://databricks.com/solutions/accelerators" target="_blank"&gt;solution accelerators&lt;/A&gt; you can follow!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	What is Databricks support for Hive until migrated to Delta Lake?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Databricks supports external Hive,&amp;nbsp;details in the &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/compatibility/hive.html" alt="https://docs.databricks.com/spark/latest/spark-sql/compatibility/hive.html" target="_blank"&gt;docs&lt;/A&gt;. Please reach out to your account team for help in migration.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;TOPIC: Delta&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Where does Delta get involved during ingestion?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;The data ingested is in a raw format like JSON or CSV, and it goes into a Delta table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Is it ever easier to just delete and remake your delta table with every update? If your Delta table is created from a Pandas DataFrame, for example?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;You can easily convert your Pandas DataFrame to Spark DataFrame save it as Delta and benefit from Delta's ACID transactions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Can I roll back or roll forward a Delta table using a Databricks notebook? Would that change be persistent for other Databricks users?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;You can use RESTORE to roll back, and other users will see the change. Read more in the &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-restore.html" alt="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-restore.html" target="_blank"&gt;docs&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q: Can we delete partition-wise from Delta tables?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Yes, but you can also delete on a row-by-row basis in Delta.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Q:	Is it possible to separate compute for Delta and compute for Spark?&lt;/B&gt;&lt;/P&gt;&lt;P&gt;There is a standalone open source reader/writer for Delta that would allow you to separate the two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Add your follow-up questions to threads!&lt;/P&gt;</description>
      <pubDate>Fri, 01 Oct 2021 21:10:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/2021-07-webinar-hassle-free-data-ingestion-social-1200x628/m-p/14109#M745</guid>
      <dc:creator>MadelynM</dc:creator>
      <dc:date>2021-10-01T21:10:35Z</dc:date>
    </item>
    <item>
      <title>Re: 2021-07-Webinar--Hassle-Free-Data-Ingestion-Social-1200x628</title>
      <link>https://community.databricks.com/t5/machine-learning/2021-07-webinar-hassle-free-data-ingestion-social-1200x628/m-p/14110#M746</link>
      <description>&lt;P&gt;Check out &lt;A href="https://databricks.com/p/webinar/hassle-free-data-ingestion-part-2?utm_source=databricks&amp;amp;utm_medium=web&amp;amp;utm_campaign=7013f000000Lj3VAAS&amp;amp;utm_content=community&amp;amp;_ga=2.119396361.1818799050.1636465053-461827103.1630341298&amp;amp;_gac=1.224608744.1634341140.CjwKCAjwzaSLBhBJEiwAJSRokld_VYTm09AIW8YPsuNmBXIq3fiJwkToOJcT9o4uj_WImoFhpZDJqRoCD8wQAvD_BwE" alt="https://databricks.com/p/webinar/hassle-free-data-ingestion-part-2?utm_source=databricks&amp;amp;utm_medium=web&amp;amp;utm_campaign=7013f000000Lj3VAAS&amp;amp;utm_content=community&amp;amp;_ga=2.119396361.1818799050.1636465053-461827103.1630341298&amp;amp;_gac=1.224608744.1634341140.CjwKCAjwzaSLBhBJEiwAJSRokld_VYTm09AIW8YPsuNmBXIq3fiJwkToOJcT9o4uj_WImoFhpZDJqRoCD8wQAvD_BwE" target="_blank"&gt;Part 2 of this Data Ingestion webinar&lt;/A&gt; to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 14:32:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/2021-07-webinar-hassle-free-data-ingestion-social-1200x628/m-p/14110#M746</guid>
      <dc:creator>Emily_S</dc:creator>
      <dc:date>2021-11-09T14:32:13Z</dc:date>
    </item>
  </channel>
</rss>

