<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using spark.read.json with a {} literal in my path in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106726#M42564</link>
    <description>&lt;P&gt;Thanks so much for responding.&amp;nbsp; It is still bombing out:&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;Path does not exist: s3://snowflake-genesys/v2.outbound.campaigns.{id}/2025-01-22/00/002054-134158ad-1647-f75a-7cd9-b36910365e09.json. SQLSTATE: 42K03&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 22 Jan 2025 20:06:52 GMT</pubDate>
    <dc:creator>johngabbradley</dc:creator>
    <dc:date>2025-01-22T20:06:52Z</dc:date>
    <item>
      <title>Using spark.read.json with a {} literal in my path</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106700#M42557</link>
      <description>&lt;P&gt;I am pulling data from an S3 bucket using spark.read.json like this&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;s3_uri &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"s3://snowflake-genesys/v2.outbound.campaigns.&lt;/SPAN&gt;&lt;SPAN&gt;{id}&lt;/SPAN&gt;&lt;SPAN&gt;/2025-01-22/00/"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;json&lt;/SPAN&gt;&lt;SPAN&gt;(s3_uri)&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;My s3 url has the {id} in the file path.&amp;nbsp; I have used r&lt;SPAN&gt;"s3://snowflake-genesys/v2.outbound.campaigns.&lt;/SPAN&gt;&lt;SPAN&gt;{id}&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;/2025-01-22/00/" and f"s3://snowflake-genesys/v2.outbound.campaigns.{{id}}/2025-01-22/00/".&lt;BR /&gt;&lt;BR /&gt;I can get an&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;dbutils.fs.&lt;/SPAN&gt;&lt;SPAN&gt;ls&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"s3://snowflake-genesys/v2.outbound.campaigns.&lt;/SPAN&gt;&lt;SPAN&gt;{{&lt;/SPAN&gt;&lt;SPAN&gt;id&lt;/SPAN&gt;&lt;SPAN&gt;}}&lt;/SPAN&gt;&lt;SPAN&gt;/2025-01-22/00/"&lt;/SPAN&gt;&lt;SPAN&gt;) to return the files for me.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I can get it to work with a wild card but that's not optimal because I have other large folders between campaigns and /2025-01-22/.&amp;nbsp; This returns way to much data:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"s3://snowflake-genesys/v2.outbound.campaigns.*/2025-01-22/00/". I have other folders like this&amp;nbsp;v2.outbound.campaigns.{id}.progress/2025-01-22/&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;How can I get spark to appropriately read the file.&amp;nbsp; What am I doing wrong?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 22 Jan 2025 18:42:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106700#M42557</guid>
      <dc:creator>johngabbradley</dc:creator>
      <dc:date>2025-01-22T18:42:05Z</dc:date>
    </item>
    <item>
      <title>Re: Using spark.read.json with a {} literal in my path</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106724#M42563</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145401"&gt;@johngabbradley&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Would below approach work for you?&lt;/P&gt;
&lt;P&gt;s3_uri = "s3://snowflake-genesys/v2.outbound.campaigns.{id}/2025-01-22/00/"&lt;BR /&gt;files = dbutils.fs.ls(s3_uri)&lt;BR /&gt;file_paths = [file.path for file in files]&lt;BR /&gt;df = spark.read.json(file_paths)&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jan 2025 20:00:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106724#M42563</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-01-22T20:00:45Z</dc:date>
    </item>
    <item>
      <title>Re: Using spark.read.json with a {} literal in my path</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106726#M42564</link>
      <description>&lt;P&gt;Thanks so much for responding.&amp;nbsp; It is still bombing out:&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;Path does not exist: s3://snowflake-genesys/v2.outbound.campaigns.{id}/2025-01-22/00/002054-134158ad-1647-f75a-7cd9-b36910365e09.json. SQLSTATE: 42K03&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 22 Jan 2025 20:06:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-json-with-a-literal-in-my-path/m-p/106726#M42564</guid>
      <dc:creator>johngabbradley</dc:creator>
      <dc:date>2025-01-22T20:06:52Z</dc:date>
    </item>
  </channel>
</rss>

