<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Export Databricks results to Blob in a csv file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32790#M23921</link>
    <description>&lt;P&gt;Thank you for your reply Hubert. &lt;/P&gt;&lt;P&gt;When I run  &lt;A href="http://dbutils.fs.ls/" alt="http://dbutils.fs.ls/" target="_blank"&gt;dbutils.fs.ls&lt;/A&gt;("/dbfs/mnt/pdf-recognized")  I get the error message saying that the directory doesn't exist. I double checked the spelling and the container is really in that storage account. I don't know why it tells me that.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Dec 2021 13:43:14 GMT</pubDate>
    <dc:creator>frank26364</dc:creator>
    <dc:date>2021-12-21T13:43:14Z</dc:date>
    <item>
      <title>Export Databricks results to Blob in a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32788#M23919</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to export my data from Databricks to the blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is the code:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt; %pip install azure.storage.blob
    %pip install azure.ai.formrecognizer
    
  
    from azure.storage.blob import ContainerClient
    
    container_url = "https://mystorageaccount.blob.core.windows.net/pdf-raw"
    container = ContainerClient.from_container_url(container_url)
    
    for blob in container.list_blobs():
    blob_url = container_url + "/" + blob.name
    print(blob_url)
&amp;nbsp;
&amp;nbsp;
import requests
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
&amp;nbsp;
endpoint = "https://myendpoint.cognitiveservices.azure.com/"
key = "mykeynumber"
&amp;nbsp;
form_recognizer_client = FormRecognizerClient(endpoint, credential=AzureKeyCredential(key))
&amp;nbsp;
   
    import pandas as pd
    
    field_list = ["InvoiceDate","InvoiceID","Items","VendorName"]
    df = pd.DataFrame(columns=field_list)
    
    for blob in container.list_blobs():
        blob_url = container_url + "/" + blob.name
        poller = form_recognizer_client.begin_recognize_invoices_from_url(invoice_url=blob_url)
        invoices = poller.result()
        print("Scanning " + blob.name + "...")
    
        for idx, invoice in enumerate(invoices):
            single_df = pd.DataFrame(columns=field_list)
            
            for field in field_list:
                entry = invoice.fields.get(field)
                
                if entry:
                    single_df[field] = [entry.value]
                    
                single_df['FileName'] = blob.name
                df = df.append(single_df)
                
    df = df.reset_index(drop=True)
    df
    
&amp;nbsp;
    account_name = "mystorageaccount"
    account_key = "fs.azure.account.key." + account_name + ".blob.core.windows.net"
    
    try:
        dbutils.fs.mount(
            source = "wasbs://pdf-recognized@mystorageaccount.blob.core.windows.net",
            mount_point = "/mnt/pdf-recognized",
            extra_configs = {account_key: dbutils.secrets.get(scope ="formrec", key="formreckey")} )
        
    except:
        print('Directory already mounted or error')
    
    df.to_csv(r"/dbfs/mnt/pdf-recognized/output.csv", index=False)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The code works well until the very last line. I get the following error message: FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/pdf-recognized/output.csv'.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried using /dbfs:/ instead of /dbfs/ but I don't know what I am doing wrong.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I export my Databricks results to the blob?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Mon, 20 Dec 2021 13:38:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32788#M23919</guid>
      <dc:creator>frank26364</dc:creator>
      <dc:date>2021-12-20T13:38:54Z</dc:date>
    </item>
    <item>
      <title>Re: Export Databricks results to Blob in a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32789#M23920</link>
      <description>&lt;P&gt;please verify that directory exists:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;dbutils.fs.ls("/dbfs/mnt/pdf-recognized")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Dec 2021 14:24:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32789#M23920</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-12-20T14:24:01Z</dc:date>
    </item>
    <item>
      <title>Re: Export Databricks results to Blob in a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32790#M23921</link>
      <description>&lt;P&gt;Thank you for your reply Hubert. &lt;/P&gt;&lt;P&gt;When I run  &lt;A href="http://dbutils.fs.ls/" alt="http://dbutils.fs.ls/" target="_blank"&gt;dbutils.fs.ls&lt;/A&gt;("/dbfs/mnt/pdf-recognized")  I get the error message saying that the directory doesn't exist. I double checked the spelling and the container is really in that storage account. I don't know why it tells me that.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Dec 2021 13:43:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32790#M23921</guid>
      <dc:creator>frank26364</dc:creator>
      <dc:date>2021-12-21T13:43:14Z</dc:date>
    </item>
    <item>
      <title>Re: Export Databricks results to Blob in a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32792#M23923</link>
      <description>&lt;P&gt;Hi Kaniz, thank you for you response. &lt;/P&gt;&lt;P&gt;I tried  %s/pdf-recognized/output.csv  but I received the following error message:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;UsageError: Line magic function `%s/pdf-recognized/output.csv` not found&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you confirm if this would be the way to add this line:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;account_name = "mystorageaccount"
account_key = "fs.azure.account.key." + account_name + ".blob.core.windows.net"
    
try:
    dbutils.fs.mount(
        source = "wasbs://pdf-recognized@mystorageaccount.blob.core.windows.net",
        mount_point = "/mnt/pdf-recognized",
        extra_configs = {account_key: dbutils.secrets.get(scope ="formrec", key="formreckey")} )
       
except:
    print('Directory already mounted or error')
%s/pdf-recognized/output.csv&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Sun, 09 Jan 2022 16:24:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32792#M23923</guid>
      <dc:creator>frank26364</dc:creator>
      <dc:date>2022-01-09T16:24:18Z</dc:date>
    </item>
    <item>
      <title>Re: Export Databricks results to Blob in a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32793#M23924</link>
      <description>&lt;P&gt;Hi, I am new to databricks and this code was taken from a tutorial I found. The reason why the error happened was that I had no secrets scope mapped in databricks. Once I setup the secrets scope the code worked correctly.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you everyone for your help!&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jan 2022 15:17:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32793#M23924</guid>
      <dc:creator>frank26364</dc:creator>
      <dc:date>2022-01-21T15:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: Export Databricks results to Blob in a csv file</title>
      <link>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32794#M23925</link>
      <description>&lt;P&gt;@Francis Bouliane​&amp;nbsp;- Thank you for sharing the solution.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jan 2022 20:01:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-databricks-results-to-blob-in-a-csv-file/m-p/32794#M23925</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-01-21T20:01:01Z</dc:date>
    </item>
  </channel>
</rss>

