- 936 Views
- 2 replies
- 0 kudos
What is the best practice for accelerating queries which looks like the following?win = Window.partitionBy('key1','key2').orderBy('timestamp')
df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OP...
- 936 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Hanan Shteingart​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...
1 More Replies
by
kll
• New Contributor III
- 1545 Views
- 1 replies
- 0 kudos
I am running jupyter notebook on a cluster with configuration: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)Worker type: i3.xlarge 30.5gb memory, 4 coresMin 2 and max 8 workers cursor = conn.cursor()
cursor.execute(
"""
...
- 1545 Views
- 1 replies
- 0 kudos
Latest Reply
Hi, Could you please confirm the usage of your cluster while running this job? you can monitor the performance here: https://docs.databricks.com/clusters/clusters-manage.html#monitor-performance with different metrics. Also, please tag @Debayan​ with...
- 5136 Views
- 5 replies
- 1 kudos
Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...
- 5136 Views
- 5 replies
- 1 kudos
Latest Reply
Hi @Sanjoy Sen​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...
4 More Replies
- 2397 Views
- 6 replies
- 1 kudos
Hi All,As part of our solution approach, we need to connect to one of our AWS RDS Oracle databases from Azure Databricks notebook.We need your help to understand which IP range of Azure Databricks to consider to whitelist them on AWS RDS security gro...
- 2397 Views
- 6 replies
- 1 kudos
Latest Reply
Hi @Mahesh D​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
5 More Replies
- 4547 Views
- 13 replies
- 9 kudos
I'm on unity catalogI'm trying to do a dbt run on a project that works locallybut the databricks dbt workflow task seems to be ignoring the project.yml settings for schemas and catalogs, as well as that defined in the config block of individual model...
- 4547 Views
- 13 replies
- 9 kudos
Latest Reply
Hi @Jakub K​ I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest provid...
12 More Replies
by
SS2
• Valued Contributor
- 915 Views
- 2 replies
- 0 kudos
Hi @Ananth Arunachalam/Team,Can we read file from ADLS gen 2 using shell script (%%bash or %%sh ) without doing mounting.​ Please let me know. Thank you.​
- 915 Views
- 2 replies
- 0 kudos
Latest Reply
@S S​ you can access data in ADLS GEn2 using multiple ways, please check below article.easy way is using storage account access key method https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage
1 More Replies
- 1032 Views
- 3 replies
- 0 kudos
Trying to follow along with the DLT videos in the academy. I get an error when running the setup script. Error trace below. It stems from running Classroom-Setup-04.1DA = DBAcademyHelper(course_config=course_config, lesson_config=...
- 1032 Views
- 3 replies
- 0 kudos
Latest Reply
I tried with Py4J versions 0.10.9.5, .3, and .1. None of those versions worked. I also tried upgrading the runtime to 13.0 and 12.1 and saw the same issue. The 13.0 runtime upgraded Py4J to 0.10.9.7 and that didn't resolve the issue. The error stayed...
2 More Replies
by
adrin
• New Contributor III
- 26814 Views
- 9 replies
- 6 kudos
I see the way to move from python to sql is to create a temp view, and then access that dataframe from sql, and in a sql cell.
Now the question is, how can I have a %sql cell with a select statement in it, and assign the result of that statement to ...
- 26814 Views
- 9 replies
- 6 kudos
Latest Reply
Results from an SQL cell are available as a Python DataFrame. The Python DataFrame name is _sqldf.To save the DataFrame, run this code in a Python cell:df = _sqldfKeep in mind that the value in _sqldf is held in memory and will be replaced with the m...
8 More Replies
by
shamly
• New Contributor III
- 2292 Views
- 4 replies
- 4 kudos
I am trying to read a csv and do an activity from azure storage account using databricks shell script. I wanted to add this shell script into my big python code for other sources as well. I have created widgets for file path in python. I have created...
- 2292 Views
- 4 replies
- 4 kudos
Latest Reply
You can mount the storage account and then can set env level variable and can do the operation that you want.
3 More Replies
by
KVNARK
• Honored Contributor II
- 1251 Views
- 9 replies
- 5 kudos
It would be great if Databricks starts increasing the number of rewards, as the no of users in community ae increasing. When we want to redeem something the limited goodies available in community rewards portal are out of stock. So its better to incr...
- 1251 Views
- 9 replies
- 5 kudos
Latest Reply
@Kaniz Fatma​ @Vidula Khanna​ Hi. I just see the below rewards available to redeem. Is this different based on the location?
8 More Replies
- 3416 Views
- 2 replies
- 1 kudos
The Databricks widget (dbutils) provides the get function for accessing the job parameters of a job.​dbutils.widgets.get('my_param')Unlike Python dict, where get returns None or an optional argument if the dict doesn't contain the parameter, the widg...
- 3416 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @Mattias P​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...
1 More Replies
- 1156 Views
- 2 replies
- 0 kudos
I am using databricks-connect to access a remote cluster. Everything works as expected and I can set breakpoints and interrogate the results, same for when it trys to execute the following code:val testDF = spark.createDataFrame(spark.sparkContext .e...
- 1156 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @James Metcalf​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...
1 More Replies
- 1509 Views
- 2 replies
- 0 kudos
working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL]​ #[Spark streaming]​ #[Spark structured streaming]​ #Spark​
- 1509 Views
- 2 replies
- 0 kudos
Latest Reply
Hello @KARTHICK N​ ,The default value for spark.sql.files.maxPartitionBytes is 128 MB. These defaults are in the Apache Spark documentation https://spark.apache.org/docs/latest/sql-performance-tuning.html (unless there might be some overrides).To che...
1 More Replies