โ07-25-2023 06:28 AM
I have code files (.hql) in s3 which was running on Hive. But now, I need to run them on the Databricks cluster.
I can rename the files to .sql and add comment "-- Databricks notebook source" on top of the files to treat them as single cell notebook. But I am not getting way to run that file directly on Databricks.
Also, I don't see any way to bring multiple files from s3 to Databricks workspace as running notebook is supported only in two ways - Workspace or Git.
โ07-25-2023 11:08 AM
@Kratik
You could use Python to read those files from the directory and get the contents of the file in a variable.
You can use this variable in a spark sql statement like below:
%python
read file
sql_contents = content of the file
spark.sql(sql_contents)
You can enclose this in a method and use it in a for loop to iterate through all the files you have in your directory.
โ07-26-2023 07:55 AM - edited โ07-26-2023 07:56 AM
This is one option. But I feel this is not the best way. In case there are some junk characters or special characters, there could be chances of things getting broken or file not read fully.
Also each of my SQL have multiple variables which needs to be passed during runtime as arguments. So I prefer to run them as a file.
โ07-25-2023 12:19 PM - edited โ07-25-2023 12:21 PM
Hi, You can connect to S3 and get the files and then run it , following: https://docs.databricks.com/storage/amazon-s3.html
Let us know if this helps or something else is expected?
Please tag @Debayan with your next comment which will notify me. Thanks!
โ07-26-2023 08:00 AM
The way which you suggested is for accessing the file on S3 within the notebook. What I am targeting to achieve is execute the file on S3 directly as notebook on Databricks cluster. Obviously to recognise the SQL files as Databricks notebook, I will add the comment --Databricks notebook source on top.
Each of my SQL file have multiple variables which needs to be passed during runtime.
โ08-04-2023 12:31 AM
Since I didn't find a way of running SQL code directly from S3, I moved ahead by importing s3 file to Databricks using API by following below :
1. Added -- Databricks notebook source on top of each file so that it is treated as Databricks notebook.
2. Created one notebook with logic to import the sql file to Databricks workspace from s3.
The logic includes:
Pass this import_params dictionary as payload as post request to api
โ08-14-2023 07:55 AM
@Debayan you can see this response.
โ08-07-2023 12:12 AM
Hi, Sorry missed seeing the post, did the issue got resolved?
โ08-08-2023 04:48 AM
Hi @Debayan ,
I have posted the approach I followed in previous comment. Maybe the option to run file directly from S3 is not possible.
โ08-14-2023 06:22 AM
Hi Kratik, the recommended practise would be to import the file and do it as in the other case the execution can be slow depending on the network config or the network speed.
โ08-14-2023 07:54 AM
@Debayan That's what I mentioned in my previous response in detail. Import the file in workspace and execute.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group