<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta live tables multiple .csv diff schemas in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92426#M38435</link>
    <description>&lt;P&gt;The code follows similar pattern below to load the different tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;import dlt&lt;BR /&gt;import re&lt;BR /&gt;import pyspark.sql.functions as F&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;landing_zone = '/Volumes/bronze_dev/landing_zone/'&lt;BR /&gt;source = 'addresses'&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(&lt;BR /&gt;comment="addresses snapshot",&lt;BR /&gt;name="addresses"&lt;BR /&gt;)&lt;BR /&gt;&lt;BR /&gt;def addresses(table_properties={"quality": "bronze"}):&lt;BR /&gt;return (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.format("cloudFiles")&lt;BR /&gt;.option("cloudFiles.format", "csv")&lt;BR /&gt;.option("cloudFiles.inferColumnTypes", True)&lt;BR /&gt;.option("header", True)&lt;BR /&gt;.option("quoted", True)&lt;BR /&gt;.option("quote", "\"")&lt;BR /&gt;.load(f"{landing_zone}{source}")&lt;BR /&gt;.select(&lt;BR /&gt;"*",&lt;BR /&gt;F.current_timestamp().alias("processing_time"),&lt;BR /&gt;F.col("_metadata.file_name").alias("source_file")&lt;BR /&gt;)&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 01 Oct 2024 11:55:36 GMT</pubDate>
    <dc:creator>Frustrated_DE</dc:creator>
    <dc:date>2024-10-01T11:55:36Z</dc:date>
    <item>
      <title>Delta live tables multiple .csv diff schemas</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92406#M38428</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; I have a fairly straight-forward task whereby I am looking to ingest six .csv file all with different names, schema's and blob locations into individual tables on one bronze schema. I have the files in my landing zone under different folders but have the code consolidated in one notebook with each cell pointing to the ingest code using 'cloudfiles' at the different file locations and loading into an appropriate table name.&lt;/P&gt;&lt;P&gt;The issue I am facing is that when I run the pipeline the tables are created with the correct names but the schema's are all the same(?) Does anyone have any suggestions where It could be going wrong? It seems like the most straight-forward task or am I missing some limitation? Any thoughts appreciated.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2024 08:59:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92406#M38428</guid>
      <dc:creator>Frustrated_DE</dc:creator>
      <dc:date>2024-10-01T08:59:48Z</dc:date>
    </item>
    <item>
      <title>Re: Delta live tables multiple .csv diff schemas</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92416#M38429</link>
      <description>&lt;P&gt;have you enabled schema inference while reading the csv files?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2024 10:03:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92416#M38429</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-10-01T10:03:03Z</dc:date>
    </item>
    <item>
      <title>Re: Delta live tables multiple .csv diff schemas</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92418#M38430</link>
      <description>&lt;P&gt;I have inferred the schema werners, yes.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2024 10:19:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92418#M38430</guid>
      <dc:creator>Frustrated_DE</dc:creator>
      <dc:date>2024-10-01T10:19:27Z</dc:date>
    </item>
    <item>
      <title>Re: Delta live tables multiple .csv diff schemas</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92420#M38432</link>
      <description>&lt;P&gt;it could be that all the data is read instead of only a single subfolder.&lt;/P&gt;&lt;P&gt;can you share some code perhaps?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2024 10:24:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92420#M38432</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-10-01T10:24:16Z</dc:date>
    </item>
    <item>
      <title>Re: Delta live tables multiple .csv diff schemas</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92426#M38435</link>
      <description>&lt;P&gt;The code follows similar pattern below to load the different tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;import dlt&lt;BR /&gt;import re&lt;BR /&gt;import pyspark.sql.functions as F&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;landing_zone = '/Volumes/bronze_dev/landing_zone/'&lt;BR /&gt;source = 'addresses'&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(&lt;BR /&gt;comment="addresses snapshot",&lt;BR /&gt;name="addresses"&lt;BR /&gt;)&lt;BR /&gt;&lt;BR /&gt;def addresses(table_properties={"quality": "bronze"}):&lt;BR /&gt;return (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.format("cloudFiles")&lt;BR /&gt;.option("cloudFiles.format", "csv")&lt;BR /&gt;.option("cloudFiles.inferColumnTypes", True)&lt;BR /&gt;.option("header", True)&lt;BR /&gt;.option("quoted", True)&lt;BR /&gt;.option("quote", "\"")&lt;BR /&gt;.load(f"{landing_zone}{source}")&lt;BR /&gt;.select(&lt;BR /&gt;"*",&lt;BR /&gt;F.current_timestamp().alias("processing_time"),&lt;BR /&gt;F.col("_metadata.file_name").alias("source_file")&lt;BR /&gt;)&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2024 11:55:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-multiple-csv-diff-schemas/m-p/92426#M38435</guid>
      <dc:creator>Frustrated_DE</dc:creator>
      <dc:date>2024-10-01T11:55:36Z</dc:date>
    </item>
  </channel>
</rss>

