Data Engineering

Forum Posts

Sorted by:

by chanansh • Contributor

02-03-2023 5:02:06 AM

936 Views
2 replies
0 kudos

Delta table acceleration for group by on key columns using ZORDER does not work

What is the best practice for accelerating queries which looks like the following?win = Window.partitionBy('key1','key2').orderBy('timestamp') df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OP...

Data Engineering

936 Views
2 replies
0 kudos

02-03-2023 5:02:06 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-05-2023 11:55:40 PM

0 kudos

Hi @Hanan Shteingart Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

0 kudos

04-05-2023 11:55:40 PM

1 More Replies

by Kanna1706 • New Contributor III

04-04-2023 10:45:50 PM

1690 Views
3 replies
4 kudos

Resolved! Where we can find our created table (location) in data bricks community edition.

Data Engineering

1690 Views
3 replies
4 kudos

04-04-2023 10:45:50 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-05-2023 11:30:53 PM

4 kudos

Hi @Machireddy Nikitha Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

4 kudos

04-05-2023 11:30:53 PM

2 More Replies

by kll • New Contributor III

04-05-2023 3:13:10 PM

1545 Views
1 replies
0 kudos

Fatal error: The Python kernel is unresponsive when attempting to query data from AWS Redshift within Jupyter notebook

I am running jupyter notebook on a cluster with configuration: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)Worker type: i3.xlarge 30.5gb memory, 4 coresMin 2 and max 8 workers cursor = conn.cursor() cursor.execute( """ ...

Data Engineering

1545 Views
1 replies
0 kudos

04-05-2023 3:13:10 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

04-05-2023 10:41:45 PM

0 kudos

Hi, Could you please confirm the usage of your cluster while running this job? you can monitor the performance here: https://docs.databricks.com/clusters/clusters-manage.html#monitor-performance with different metrics. Also, please tag @Debayan with...

0 kudos

04-05-2023 10:41:45 PM

by sensanjoy • Contributor

03-29-2023 2:42:45 AM

5136 Views
5 replies
1 kudos

Resolved! Performance issue with pyspark udf function calling rest api

Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...

Data Engineering

5136 Views
5 replies
1 kudos

03-29-2023 2:42:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-03-2023 11:36:56 PM

1 kudos

Hi @Sanjoy Sen Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

1 kudos

04-03-2023 11:36:56 PM

4 More Replies

by MaheshDR • New Contributor II

03-24-2023 1:47:38 AM

2397 Views
6 replies
1 kudos

Open firewall to Azure Databricks workspace from AWS RDS machine/EC2 machine

Hi All,As part of our solution approach, we need to connect to one of our AWS RDS Oracle databases from Azure Databricks notebook.We need your help to understand which IP range of Azure Databricks to consider to whitelist them on AWS RDS security gro...

Data Engineering

2397 Views
6 replies
1 kudos

03-24-2023 1:47:38 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-25-2023 10:55:30 PM

1 kudos

Hi @Mahesh D Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

1 kudos

03-25-2023 10:55:30 PM

5 More Replies

by jakubk • Contributor

03-01-2023 11:14:31 PM

4547 Views
13 replies
9 kudos

dbt workflow job limitations - naming the target? where do docs go?

I'm on unity catalogI'm trying to do a dbt run on a project that works locallybut the databricks dbt workflow task seems to be ignoring the project.yml settings for schemas and catalogs, as well as that defined in the config block of individual model...

Data Engineering

4547 Views
13 replies
9 kudos

03-01-2023 11:14:31 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-21-2023 11:11:30 PM

9 kudos

Hi @Jakub K I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest provid...

9 kudos

03-21-2023 11:11:30 PM

12 More Replies

by SS2 • Valued Contributor

04-05-2023 1:23:15 AM

915 Views
2 replies
0 kudos

How we can read data from adls gen 2 using bash (%sh) command.(without mounting)

Hi @Ananth Arunachalam/Team,Can we read file from ADLS gen 2 using shell script (%%bash or %%sh ) without doing mounting. Please let me know. Thank you.

Data Engineering

915 Views
2 replies
0 kudos

04-05-2023 1:23:15 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-05-2023 8:24:37 AM

0 kudos

@S S you can access data in ADLS GEn2 using multiple ways, please check below article.easy way is using storage account access key method https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage

0 kudos

04-05-2023 8:24:37 AM

1 More Replies

by aladda • Honored Contributor II

06-19-2021 8:59:35 PM

2106 Views
2 replies
3 kudos

Resolved! Can you share variables defined in a Python based cell with Scala cells?

Data Engineering

2106 Views
2 replies
3 kudos

06-19-2021 8:59:35 PM

View Replies

Latest Reply

Imtiyaz_Shaikh
New Contributor II

04-05-2023 11:34:51 AM

3 kudos

The workaround is available here.Idea is to use spark.conf.set(), spark.conf.get() methods.

3 kudos

04-05-2023 11:34:51 AM

1 More Replies

by blockee • New Contributor II

03-31-2023 1:25:50 PM

1032 Views
3 replies
0 kudos

DE 4.1 - DLT UI Walkthrough Error in Classroom Setup

Trying to follow along with the DLT videos in the academy. I get an error when running the setup script. Error trace below. It stems from running Classroom-Setup-04.1DA = DBAcademyHelper(course_config=course_config, lesson_config=...

Data Engineering

1032 Views
3 replies
0 kudos

03-31-2023 1:25:50 PM

View Replies

Latest Reply

blockee
New Contributor II

04-05-2023 10:04:01 AM

0 kudos

I tried with Py4J versions 0.10.9.5, .3, and .1. None of those versions worked. I also tried upgrading the runtime to 13.0 and 12.1 and saw the same issue. The 13.0 runtime upgraded Py4J to 0.10.9.7 and that didn't resolve the issue. The error stayed...

0 kudos

04-05-2023 10:04:01 AM

2 More Replies

by adrin • New Contributor III

07-19-2018 7:11:48 AM

26814 Views
9 replies
6 kudos

Resolved! How to access the result of a %sql cell from python

I see the way to move from python to sql is to create a temp view, and then access that dataframe from sql, and in a sql cell. Now the question is, how can I have a %sql cell with a select statement in it, and assign the result of that statement to ...

Data Engineering

26814 Views
9 replies
6 kudos

07-19-2018 7:11:48 AM

View Replies

Latest Reply

dogwoodlx
New Contributor II

04-05-2023 6:33:31 AM

6 kudos

Results from an SQL cell are available as a Python DataFrame. The Python DataFrame name is _sqldf.To save the DataFrame, run this code in a Python cell:df = _sqldfKeep in mind that the value in _sqldf is held in memory and will be replaced with the m...

6 kudos

04-05-2023 6:33:31 AM

8 More Replies

by shamly • New Contributor III

01-12-2023 7:10:55 AM

2292 Views
4 replies
4 kudos

Urgent - Use Python Variable in shell command in databricks notebook

I am trying to read a csv and do an activity from azure storage account using databricks shell script. I wanted to add this shell script into my big python code for other sources as well. I have created widgets for file path in python. I have created...

Data Engineering

2292 Views
4 replies
4 kudos

01-12-2023 7:10:55 AM

View Replies

Latest Reply

SS2
Valued Contributor

04-05-2023 1:31:20 AM

4 kudos

You can mount the storage account and then can set env level variable and can do the operation that you want.

4 kudos

04-05-2023 1:31:20 AM

3 More Replies

by KVNARK • Honored Contributor II

12-20-2022 8:29:39 PM

1251 Views
9 replies
5 kudos

It would be great if Databricks starts increasing the number of rewards, as the no of users in community ae increasing. When we want to redeem somethi...

It would be great if Databricks starts increasing the number of rewards, as the no of users in community ae increasing. When we want to redeem something the limited goodies available in community rewards portal are out of stock. So its better to incr...

Data Engineering

1251 Views
9 replies
5 kudos

12-20-2022 8:29:39 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-05-2023 12:17:04 AM

5 kudos

@Kaniz Fatma @Vidula Khanna Hi. I just see the below rewards available to redeem. Is this different based on the location?

5 kudos

04-05-2023 12:17:04 AM

8 More Replies

by fuselessmatt • Contributor

04-03-2023 6:34:19 AM

3416 Views
2 replies
1 kudos

Can assign a default value for job parameter from the widget?

The Databricks widget (dbutils) provides the get function for accessing the job parameters of a job.dbutils.widgets.get('my_param')Unlike Python dict, where get returns None or an optional argument if the dict doesn't contain the parameter, the widg...

Data Engineering

3416 Views
2 replies
1 kudos

04-03-2023 6:34:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-04-2023 11:39:32 PM

1 kudos

Hi @Mattias P Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

1 kudos

04-04-2023 11:39:32 PM

1 More Replies

by DBJmet • New Contributor

04-03-2023 4:00:23 PM

1156 Views
2 replies
0 kudos

Databricks-Connect Error occurred while running *** java.io.StreamCorruptedException: invalid type code: 00

I am using databricks-connect to access a remote cluster. Everything works as expected and I can set breakpoints and interrogate the results, same for when it trys to execute the following code:val testDF = spark.createDataFrame(spark.sparkContext .e...

Data Engineering

1156 Views
2 replies
0 kudos

04-03-2023 4:00:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-04-2023 10:50:29 PM

0 kudos

Hi @James Metcalf Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

0 kudos

04-04-2023 10:50:29 PM

1 More Replies

by Data_Engineer3 • Contributor II

04-02-2023 9:20:18 AM

1509 Views
2 replies
0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL] #[Spark streaming] #[Spark structured streaming] #Spark

Data Engineering

1509 Views
2 replies
0 kudos

04-02-2023 9:20:18 AM

View Replies

Latest Reply

NandiniN
Valued Contributor II

04-03-2023 7:26:55 AM

0 kudos

Hello @KARTHICK N ,The default value for spark.sql.files.maxPartitionBytes is 128 MB. These defaults are in the Apache Spark documentation https://spark.apache.org/docs/latest/sql-performance-tuning.html (unless there might be some overrides).To che...

0 kudos

04-03-2023 7:26:55 AM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Delta table acceleration for group by on key columns using ZORDER does not work

Resolved! Where we can find our created table (location) in data bricks community edition.

Fatal error: The Python kernel is unresponsive when attempting to query data from AWS Redshift within Jupyter notebook

Resolved! Performance issue with pyspark udf function calling rest api

Open firewall to Azure Databricks workspace from AWS RDS machine/EC2 machine

dbt workflow job limitations - naming the target? where do docs go?

How we can read data from adls gen 2 using bash (%sh) command.(without mounting)

Resolved! Can you share variables defined in a Python based cell with Scala cells?

DE 4.1 - DLT UI Walkthrough Error in Classroom Setup

Resolved! How to access the result of a %sql cell from python

Urgent - Use Python Variable in shell command in databricks notebook

It would be great if Databricks starts increasing the number of rewards, as the no of users in community ae increasing. When we want to redeem somethi...

Can assign a default value for job parameter from the widget?

Databricks-Connect Error occurred while running *** java.io.StreamCorruptedException: invalid type code: 00

Default maximum spark streaming chunk size in delta files in each batch?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...