<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ingest csv file on-prem to delta table on databricks in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/ingest-csv-file-on-prem-to-delta-table-on-databricks/m-p/70317#M7290</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90838"&gt;@pshuk&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.&lt;/P&gt;
&lt;P&gt;DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the &lt;A href="https://docs.databricks.com/en/delta-live-tables/load.html#load-data-with-delta-live-tables" target="_self"&gt;Load data with Delta Live Tables&lt;/A&gt;&amp;nbsp;guide.&lt;/P&gt;
&lt;P&gt;For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;@dlt.table
def raw_data():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("external_pipeline_output_location/")
  )&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Next, you can explore the&amp;nbsp;&lt;A href="https://docs.databricks.com/en/ingestion/auto-loader/options.html#auto-loader-options" target="_blank" rel="noopener"&gt;Autoloader Settings&lt;/A&gt;&amp;nbsp;to further customize your ingestion logic.&lt;/P&gt;
&lt;P&gt;It would be beneficial to read about&amp;nbsp;&lt;A href="https://docs.databricks.com/en/delta-live-tables/updates.html#continuous-vs-triggered-pipeline-execution" target="_self"&gt;Continuous vs Triggered Pipeline Execution&lt;/A&gt; to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).&lt;/P&gt;
&lt;P&gt;If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: &lt;A href="https://docs.databricks.com/en/delta-live-tables/tutorial-bundles.html#develop-delta-live-tables-pipelines-with-databricks-asset-bundles" target="_self"&gt;Develop Delta Live Tables pipelines with Databricks Asset Bundles&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 22 May 2024 20:40:49 GMT</pubDate>
    <dc:creator>raphaelblg</dc:creator>
    <dc:date>2024-05-22T20:40:49Z</dc:date>
    <item>
      <title>ingest csv file on-prem to delta table on databricks</title>
      <link>https://community.databricks.com/t5/get-started-discussions/ingest-csv-file-on-prem-to-delta-table-on-databricks/m-p/70280#M7289</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;So I want to create a delta live table using a csv file that I create locally (on-prem). A little background: So I have a working ELT pipeline that finds newly generated files (since the last upload), and upload them to databricks volume and at the same time create a csv file locally with all the meta data information about these files. Is there any way I can create a delta live table at the databricks using this csv file after finishing my upload. I am using databricks CLI to upload files but haven't found a way to create the table using CLI.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be greatly appreciated.&lt;/P&gt;&lt;P&gt;TIA.&lt;/P&gt;</description>
      <pubDate>Wed, 22 May 2024 15:20:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/ingest-csv-file-on-prem-to-delta-table-on-databricks/m-p/70280#M7289</guid>
      <dc:creator>pshuk</dc:creator>
      <dc:date>2024-05-22T15:20:10Z</dc:date>
    </item>
    <item>
      <title>Re: ingest csv file on-prem to delta table on databricks</title>
      <link>https://community.databricks.com/t5/get-started-discussions/ingest-csv-file-on-prem-to-delta-table-on-databricks/m-p/70317#M7290</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90838"&gt;@pshuk&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.&lt;/P&gt;
&lt;P&gt;DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the &lt;A href="https://docs.databricks.com/en/delta-live-tables/load.html#load-data-with-delta-live-tables" target="_self"&gt;Load data with Delta Live Tables&lt;/A&gt;&amp;nbsp;guide.&lt;/P&gt;
&lt;P&gt;For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;@dlt.table
def raw_data():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("external_pipeline_output_location/")
  )&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Next, you can explore the&amp;nbsp;&lt;A href="https://docs.databricks.com/en/ingestion/auto-loader/options.html#auto-loader-options" target="_blank" rel="noopener"&gt;Autoloader Settings&lt;/A&gt;&amp;nbsp;to further customize your ingestion logic.&lt;/P&gt;
&lt;P&gt;It would be beneficial to read about&amp;nbsp;&lt;A href="https://docs.databricks.com/en/delta-live-tables/updates.html#continuous-vs-triggered-pipeline-execution" target="_self"&gt;Continuous vs Triggered Pipeline Execution&lt;/A&gt; to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).&lt;/P&gt;
&lt;P&gt;If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: &lt;A href="https://docs.databricks.com/en/delta-live-tables/tutorial-bundles.html#develop-delta-live-tables-pipelines-with-databricks-asset-bundles" target="_self"&gt;Develop Delta Live Tables pipelines with Databricks Asset Bundles&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 May 2024 20:40:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/ingest-csv-file-on-prem-to-delta-table-on-databricks/m-p/70317#M7290</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-05-22T20:40:49Z</dc:date>
    </item>
  </channel>
</rss>

