cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Read each cell contains SQL from one notebook and execute it on another notebook and export result

ToReSa
New Contributor II

Hi, 

I'm new to databricks, so, excuse me if the question is silly one. I have a requirement to read cell by cell from one notebook (say notebookA) and execute the contents of the cell in another notebook (say notebookB) using a python script. All the cells in notebookA has sqls. So, I need to execute the SQL one by one and export the sql result into a .csv file.

 

I am good with executing the sql and exporting results into .csv file, but I am looking for some guidance on how to read each cell of notebookA in notebookB using python script. Appreciate any help on this regard.

Thank you!

5 REPLIES 5

menotron
Valued Contributor

Not sure what are you trying to achieve by this approach. But you could read a SQL

Databricks notebook source and get the queries in each cell using this script.

 

from databricks_api import DatabricksAPI
import base64
import re

notebook_context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
databricks_api_instance = DatabricksAPI(
    host=notebook_context.apiUrl().getOrElse(None),
    token=notebook_context.apiToken().getOrElse(None),
)

response = databricks_api_instance.workspace.export_workspace(
    f"<PATH OF NOTEBOOK A IN WORKSPACE>",
    format="SOURCE",
    direct_download=None,
    headers=None,
)

notebook_content = base64.b64decode(response["content"]).decode("utf-8")

exclude = [".*Databricks notebook source", ".*MAGIC.*"]
regex = re.compile("|".join(exclude))
queries = list(
    filter(None, [s.strip() for s in re.sub(regex, "", notebook_content).split("-- COMMAND ----------")])
)

# [spark.sql(query).display() for query in queries]

 

 

ToReSa
New Contributor II

Thank you, @menotron I'll give it a try. 

Kaniz_Fatma
Community Manager
Community Manager

Hi @ToReSa, Welcome to the Databricks Community! Thank you for sharing your question.

I see that you're working on executing SQL queries from one notebook to another using a Python script. @menotron  has shared a detailed approach that might help you achieve your goal.

Please try out the suggested solution and let us know if it works for you. If you encounter any challenges or need further assistance, donโ€™t hesitate to reach out. Weโ€™re here to help!

Also, Iโ€™m reaching out to a few experts in Databricks to see if they have additional insights or alternative approaches. Weโ€™ll make sure you get the guidance you need.

Thank you for being an active part of our community!

Kaniz_Fatma
Community Manager
Community Manager

Hi @menotron, Thank you for your prompt and detailed response! Your expertise is greatly valued, and I appreciate you taking the time to help out.

Your approach seems solid, and Iโ€™m sure it will be beneficial to @ToReSa. If you have any further thoughts or additional tips, please feel free to share them. Your contributions are what make our community thrive.

Thanks again for your continuous support!

 

Kaniz_Fatma
Community Manager
Community Manager

Hi @ToReSa, If you just want to execute the notebook, calling another notebook would be easier. You can even exchange some data between the notebooks.

But if you specifically want to pick each SQL from one notebook and execute it in another notebook, you should go with this approach:-

  • Export notebook A as JSON: Export the notebook to JSON for programmatic access.

  • Extract SQL Commands: Read the JSON file in notebook B and extract SQL commands from each cell.

  • Execute SQL Commands: Run each SQL command in notebook B and save results to CSV files.

Here is the sample script:-

import json
import requests
import pandas as pd

# Export notebookA as JSON
response = requests.get(
    'https://your-databricks-instance/api/2.0/workspace/export',
    headers={'Authorization': 'Bearer your_databricks_token'},
    params={'path': '/path/to/notebookA', 'format': 'SOURCE'}
)
notebookA_content = response.json()

# Extract SQL commands
cells = notebookA_content['content']['cells']
sql_commands = [cell['source'] for cell in cells if cell['cell_type'] == 'code']

# Execute and export results
for i, sql in enumerate(sql_commands):
    result = spark.sql(sql)
    result_df = result.toPandas()
    result_df.to_csv(f'result_{i}.csv', index=False)

Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.

If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries.

We appreciate your participation and are here if you need further assistance!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group