Data Engineering

Forum Posts

Sorted by:

by Orianh • Valued Contributor II

06-06-2022 8:28:26 AM

4003 Views
10 replies
2 kudos

Resolved! Databrikcs job cli

Hey guys, I'm trying to create a job via databricks cli, This job is going to use a wheell file that I already upload to dbfs and exported from this package the entry point that needed for the job.In the UI I can see that the job has been created, Bu...

Data Engineering

4003 Views
10 replies
2 kudos

06-06-2022 8:28:26 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

06-13-2022 3:10:11 AM

2 kudos

Hi @orian hindi , We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will re...

2 kudos

06-13-2022 3:10:11 AM

9 More Replies

by vivek_sinha • Contributor

06-10-2022 5:07:26 PM

6239 Views
4 replies
4 kudos

Resolved! Getting Authentication Error while accessing Azure Blob table (wasb) URL using PySpark

I am trying to access the Azure Blob table using Pyspark but getting an Authentication Error. Here I am passing SAS token (HTTP and HTTPS enabled) but it's working only with WASBS (HTTPS) URL, not with WASB (HTTP) URL.Even I tried with Account key as...

Data Engineering

6239 Views
4 replies
4 kudos

06-10-2022 5:07:26 PM

View Replies

Latest Reply

vivek_sinha
Contributor

06-12-2022 3:42:29 AM

4 kudos

Hi @Arvind Ravish The issue got fixed after passing HTTP and HTTPS enabled token to spark executors.Thanks again for your help

4 kudos

06-12-2022 3:42:29 AM

3 More Replies

by vivek_sinha • Contributor

06-10-2022 5:39:04 PM

17146 Views
4 replies
4 kudos

Resolved! PySpark on Jupyterhub K8s || Unable to query data || Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Pyspark Version: 2.4.5 Hive Version: 1.2 Hadoop Version: 2.7 AWS-SDK Jar: 1.7.4 Hadoop-AWS: 2.7.3When I am trying to show data I am getting Class org.apache.hadoop.fs.s3a.S3AFileSystem not found while I am passing all the information which all are re...

Data Engineering

17146 Views
4 replies
4 kudos

06-10-2022 5:39:04 PM

View Replies

Latest Reply

vivek_sinha
Contributor

06-12-2022 12:49:46 AM

4 kudos

Hi @Arvind Ravish Thanks for the response and now I fixed the issue.The image which I was using to launch spark executor didn't have aws jars. After doing necessary changes it started working.But still may thanks for your response.

4 kudos

06-12-2022 12:49:46 AM

3 More Replies

by Prabakar • Esteemed Contributor III

06-10-2022 3:27:34 PM

2128 Views
1 replies
1 kudos

Non-admin users unable to create jobs from Job UI Non-admin users may be experiencing difficulties interacting with the jobs UI. This is due to a rece...

Non-admin users unable to create jobs from Job UINon-admin users may be experiencing difficulties interacting with the jobs UI. This is due to a recently discovered UI regression in the 3.73 shard release, deployed to the jobs service starting June 6...

Data Engineering

2128 Views
1 replies
1 kudos

06-10-2022 3:27:34 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

06-10-2022 3:29:27 PM

1 kudos

This has been conveyed to all customers. If the email landed in your spam box then this should help you.

1 kudos

06-10-2022 3:29:27 PM

by mali_bigdata • New Contributor

06-10-2022 1:14:27 PM

918 Views
0 replies
0 kudos

Databricks is adding NULL value in the URL while moving the Fairlearn dashboard and causing CORS error and fairlearn dashboard keeps spinning.

We are trying to run FairnessDashboard and once we pass in the data to the dashboard it keeps on the spinning. Please see the attached file.Also we noticed that Databricks is adding NULL in the URL and eventually we get the CORS error and it is redir...

Data Engineering

918 Views
0 replies
0 kudos

06-10-2022 1:14:27 PM

by Kapur • New Contributor II

06-10-2022 11:09:01 AM

529 Views
0 replies
2 kudos

Is it Delta lake frae work merge operations require schema for spark structural stream processsing ?

Data Engineering

529 Views
0 replies
2 kudos

06-10-2022 11:09:01 AM

by eager_to_learn • New Contributor III

06-02-2022 7:30:22 AM

3659 Views
9 replies
5 kudos

Resolved! Databricks pool - 2 instances are in running state without any job running in the system

We are using Azure Databricks pools, configured 16 max instances. Out of 16, 2 instances are in running state without any job in running condition, how & where can i check the usage of the instances ?p.s. SQL pool is also not running, so no chances o...

Data Engineering

3659 Views
9 replies
5 kudos

06-02-2022 7:30:22 AM

View Replies

Latest Reply

eager_to_learn
New Contributor III

06-10-2022 6:12:10 AM

5 kudos

@Kaniz Fatma / @Prabakar Ammeappin Any idea, how can we queue the jobs in the Resource pools, is it some setting which we need to switch on so the jobs are queued until instances are available or can you point some documentation for the same ?

5 kudos

06-10-2022 6:12:10 AM

8 More Replies

by ABAGRI • New Contributor II

04-25-2022 6:26:55 AM

1455 Views
2 replies
2 kudos

Resolved! Having Issues with extracting records from complex JSON

Hi Team,we are using delta live tables to ingest data from Kafka.the JSON file we receive is a complex JSON structure and we are trying to explode the file into its necessary columns and transactions, Thank youplease see attached sample file{ "Table...

Data Engineering

1455 Views
2 replies
2 kudos

04-25-2022 6:26:55 AM

View Replies

Latest Reply

User16753725469
Contributor II

06-10-2022 5:41:42 AM

2 kudos

Hi @Lantis Pillay Could you please try to parse JSON records in the below way

2 kudos

06-10-2022 5:41:42 AM

1 More Replies

by Antoine_De_A • New Contributor III

06-09-2022 6:47:26 AM

2115 Views
2 replies
4 kudos

Resolved! Streaming data to CosmosDB

Hello everyone,Here is the problem I am facing. I'm currently working on streaming data to DataBricks, my goal is to create a data stream on a first notebook, and then on a second notebook to read this data stream, add all the new rows to a dataFrame...

Data Engineering

2115 Views
2 replies
4 kudos

06-09-2022 6:47:26 AM

View Replies

Latest Reply

Antoine_De_A
New Contributor III

06-09-2022 9:14:30 AM

4 kudos

Problem solved!Instead of trying to do everything directly with the .writeStream options I used the .forEachBatch() function which allows me to call a function outside the .writeStream().In this function I get a dataFrame in parameter which is my str...

4 kudos

06-09-2022 9:14:30 AM

1 More Replies

by curious-case-of • New Contributor II

04-11-2022 2:03:18 AM

8174 Views
3 replies
6 kudos

Resolved! Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.The runtime when I run from within the notebook directly is roughly 2 hours. But w...

Data Engineering

8174 Views
3 replies
6 kudos

04-11-2022 2:03:18 AM

View Replies

Latest Reply

wvl
New Contributor II

06-09-2022 6:34:08 AM

6 kudos

We're seeing the same behavior.. Good performance using interactive cluster.Using identically sized job cluster, performance is bad. Any ideas?

6 kudos

06-09-2022 6:34:08 AM

2 More Replies

by data_engineer_0 • New Contributor II

12-17-2021 1:28:20 AM

12880 Views
3 replies
2 kudos

How to run the .py file in databricks cluster

Hi team,I wants to run the below command in databricks and also need to capture the error and success message.Please help me out here,Thanks in advanceEx: python3 /mnt/users/code/x.py --arguments

Data Engineering

12880 Views
3 replies
2 kudos

12-17-2021 1:28:20 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

06-09-2022 6:02:15 AM

2 kudos

Hello @Piper Wilson Would this task not help?https://docs.databricks.com/dev-tools/api/latest/examples.html#jobs-api-examples

2 kudos

06-09-2022 6:02:15 AM

2 More Replies

by User15787040559 • New Contributor III

06-04-2021 1:44:46 PM

2126 Views
2 replies
0 kudos

MicrosoftTeams-image

ERROR Max retries exceeded with url: /api/2.0/jobs/runs/get?run_id= Failed to establish a new connectionThis error can happen when exceeding the rate limits for all REST API calls as documented here.In the image shown for example we're using the Jobs...

Data Engineering

2126 Views
2 replies
0 kudos

06-04-2021 1:44:46 PM

View Replies

Latest Reply

User16764241763
Honored Contributor

06-09-2022 4:52:51 AM

0 kudos

Hi @Carlos Morillo Are you facing this issue consistently or when you run a lot of jobs?We are internally tracking a similar issue. Could you please file a support request with Microsoft Support? Databricks and MSFT will collaborate and provide upd...

0 kudos

06-09-2022 4:52:51 AM

1 More Replies

by chandan_a_v • Valued Contributor

06-02-2022 1:49:34 AM

16323 Views
9 replies
3 kudos

Resolved! TypeError: 'JavaPackage' object is not callable

Data Engineering

16323 Views
9 replies
3 kudos

06-02-2022 1:49:34 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

06-09-2022 12:39:05 AM

3 kudos

Hi @Chandan Angadi , We haven’t heard from you on the last response from @Prabakar Ammeappin , and I was checking back to see if you have a resolution yet. If you have any solution, please do share that with the community as it can be helpful to ot...

3 kudos

06-09-2022 12:39:05 AM

8 More Replies

by Gopal_Sir • New Contributor III

06-08-2022 11:47:42 PM

24134 Views
5 replies
7 kudos

Resolved! How to convert a string column to Array of Struct ?

I have a nested struct , where on of the field is a string , it looks something like this ....string = "[{\"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\"},{\"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"},{\"to_lo...

Data Engineering

24134 Views
5 replies
7 kudos

06-08-2022 11:47:42 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

06-09-2022 1:04:53 AM

7 kudos

Can you mark the question as answered so others can find the solution?

7 kudos

06-09-2022 1:04:53 AM

4 More Replies

by Data_Cowboy • New Contributor III

06-06-2022 7:44:26 AM

6436 Views
4 replies
1 kudos

Resolved! Plotting in pyspark.pandas Uncaught ReferenceError Plotly is not defined

Hi, I am trying to plot using pyspark.pandas running this sample code: speed = [0.1, 17.5, 40, 48, 52, 69, 88] lifespan = [2, 8, 70, 1.5, 25, 12, 28] index = ['snail', 'pig', 'elephant', 'rabbit', 'giraffe', 'coyote', 'horse'] psdf = ps.Data...

Data Engineering

6436 Views
4 replies
1 kudos

06-06-2022 7:44:26 AM

View Replies

Latest Reply

Data_Cowboy
New Contributor III

06-07-2022 5:27:34 AM

1 kudos

Thank you @Werner Stinckens . I was able to find the plotly documentation listed below and setting the output_type and calling displayHTML() helped remedy the error.

1 kudos

06-07-2022 5:27:34 AM

3 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

Resolved! Databrikcs job cli

Resolved! Getting Authentication Error while accessing Azure Blob table (wasb) URL using PySpark

Resolved! PySpark on Jupyterhub K8s || Unable to query data || Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Non-admin users unable to create jobs from Job UI Non-admin users may be experiencing difficulties interacting with the jobs UI. This is due to a rece...

Databricks is adding NULL value in the URL while moving the Fairlearn dashboard and causing CORS error and fairlearn dashboard keeps spinning.

Is it Delta lake frae work merge operations require schema for spark structural stream processsing ?

Resolved! Databricks pool - 2 instances are in running state without any job running in the system

Resolved! Having Issues with extracting records from complex JSON

Resolved! Streaming data to CosmosDB

Resolved! Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

How to run the .py file in databricks cluster

MicrosoftTeams-image

Resolved! TypeError: 'JavaPackage' object is not callable

Resolved! How to convert a string column to Array of Struct ?

Resolved! Plotting in pyspark.pandas Uncaught ReferenceError Plotly is not defined

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error