Databricks Community

ToReSa · ‎08-17-2024

Hi,

I'm new to databricks, so, excuse me if the question is silly one. I have a requirement to read cell by cell from one notebook (say notebookA) and execute the contents of the cell in another notebook (say notebookB) using a python script. All the cells in notebookA has sqls. So, I need to execute the SQL one by one and export the sql result into a .csv file.

I am good with executing the sql and exporting results into .csv file, but I am looking for some guidance on how to read each cell of notebookA in notebookB using python script. Appreciate any help on this regard.

Thank you!

menotron · ‎08-17-2024

Not sure what are you trying to achieve by this approach. But you could read a SQL

Databricks notebook source and get the queries in each cell using this script.

from databricks_api import DatabricksAPI
import base64
import re

notebook_context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
databricks_api_instance = DatabricksAPI(
    host=notebook_context.apiUrl().getOrElse(None),
    token=notebook_context.apiToken().getOrElse(None),
)

response = databricks_api_instance.workspace.export_workspace(
    f"<PATH OF NOTEBOOK A IN WORKSPACE>",
    format="SOURCE",
    direct_download=None,
    headers=None,
)

notebook_content = base64.b64decode(response["content"]).decode("utf-8")

exclude = [".*Databricks notebook source", ".*MAGIC.*"]
regex = re.compile("|".join(exclude))
queries = list(
    filter(None, [s.strip() for s in re.sub(regex, "", notebook_content).split("-- COMMAND ----------")])
)

# [spark.sql(query).display() for query in queries]

ToReSa · ‎08-18-2024

Thank you, @menotron I'll give it a try.

Retired_mod · ‎08-18-2024

Hi @ToReSa, Welcome to the Databricks Community! Thank you for sharing your question.

I see that you're working on executing SQL queries from one notebook to another using a Python script. @menotron has shared a detailed approach that might help you achieve your goal.

Please try out the suggested solution and let us know if it works for you. If you encounter any challenges or need further assistance, don’t hesitate to reach out. We’re here to help!

Also, I’m reaching out to a few experts in Databricks to see if they have additional insights or alternative approaches. We’ll make sure you get the guidance you need.

Thank you for being an active part of our community!

Retired_mod · ‎08-18-2024

Hi @menotron, Thank you for your prompt and detailed response! Your expertise is greatly valued, and I appreciate you taking the time to help out.

Your approach seems solid, and I’m sure it will be beneficial to @ToReSa. If you have any further thoughts or additional tips, please feel free to share them. Your contributions are what make our community thrive.

Thanks again for your continuous support!

Retired_mod · ‎08-19-2024

Hi @ToReSa, If you just want to execute the notebook, calling another notebook would be easier. You can even exchange some data between the notebooks.

But if you specifically want to pick each SQL from one notebook and execute it in another notebook, you should go with this approach:-

Export notebook A as JSON: Export the notebook to JSON for programmatic access.
Extract SQL Commands: Read the JSON file in notebook B and extract SQL commands from each cell.
Execute SQL Commands: Run each SQL command in notebook B and save results to CSV files.

Here is the sample script:-

import json
import requests
import pandas as pd

# Export notebookA as JSON
response = requests.get(
    'https://your-databricks-instance/api/2.0/workspace/export',
    headers={'Authorization': 'Bearer your_databricks_token'},
    params={'path': '/path/to/notebookA', 'format': 'SOURCE'}
)
notebookA_content = response.json()

# Extract SQL commands
cells = notebookA_content['content']['cells']
sql_commands = [cell['source'] for cell in cells if cell['cell_type'] == 'code']

# Execute and export results
for i, sql in enumerate(sql_commands):
    result = spark.sql(sql)
    result_df = result.toPandas()
    result_df.to_csv(f'result_{i}.csv', index=False)

Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.

If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries.

We appreciate your participation and are here if you need further assistance!

Databricks Community

Read each cell contains SQL from one notebook and execute it on another notebook and export result

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April