Data Engineering

Forum Posts

Sorted by:

by Cosimo_F_ • Contributor

07-18-2022 10:17:15 AM

218 Views
0 replies
0 kudos

Structured Streaming migration from readStream to Auto Loader (AWS)

Hello!I have a (non-stateful) daily structured streaming ingestion job from json S3 file storage into a Delta table using Spark readStream. I would like to switch to Auto Loader to reduce file discovery time. My questions are:Do I need to create a ne...

Data Engineering

218 Views
0 replies
0 kudos

07-18-2022 10:17:15 AM

by User16783853430 • New Contributor III

07-17-2022 4:23:58 PM

973 Views
2 replies
0 kudos

Connecting Power BI to DBSQL

Trying to connect Power BI with veresion 2.106.582.0 32 bit But get the error

Data Engineering

973 Views
2 replies
0 kudos

07-17-2022 4:23:58 PM

View Replies

Latest Reply

pavan_kumar
Contributor

07-18-2022 6:08:58 AM

0 kudos

@Wilbur Tong Along with the steps mentioned by @Akash Bhat please try to follow the steps mentioned in below document:https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-connector-extensibility#custom-connectorsfor downloading the pqx fi...

0 kudos

07-18-2022 6:08:58 AM

1 More Replies

by mujtaba1 • New Contributor

07-18-2022 5:42:31 AM

183 Views
0 replies
0 kudos

TheINFO.pk is the premier and most trustworthy source of information in Pakistan. We provides latest news, updates, jobs, Results, Education, telecom ...

TheINFO.pk is the premier and most trustworthy source of information in Pakistan. We provides latest news, updates, jobs, Results, Education, telecom and many more stuff those you love.

Data Engineering

183 Views
0 replies
0 kudos

07-18-2022 5:42:31 AM

by RantoB • Valued Contributor

11-10-2021 7:55:10 AM

9255 Views
6 replies
9 kudos

Error with databricks workspace

Hi,I have the following error :Error: b'{"error_code":"TEMPORARILY_UNAVAILABLE","message":"The service at /api/2.0/workspace/get-status is temporarily unavailable. Please try again later."}'when I do :databricks workspace export_dir path .ordatabrick...

Data Engineering

9255 Views
6 replies
9 kudos

11-10-2021 7:55:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-10-2021 9:50:18 AM

9 kudos

Please try to reconfigure cli. Please double check databricks hostdatabricks configure --tokenRegarding second command which you shared (%sh ls /Workspace) it will not work on free community edition. There you can use only native function like - dbu...

9 kudos

11-10-2021 9:50:18 AM

5 More Replies

by RS1 • New Contributor III

07-06-2022 6:59:25 AM

1826 Views
11 replies
9 kudos

Data & AI Summit 2022 - Training Videos of paid Instructor led sessions not yet uploaded. @kaniz fatma

@Kaniz Fatma I attended the Advanced Machine Learning with Databricks training last week virtually I am still unable to get the day 2 session videos of any of the Instructor led Paid Trainings. They are supposed to be available for replay with in 24...

Data Engineering

1826 Views
11 replies
9 kudos

07-06-2022 6:59:25 AM

View Replies

Latest Reply

RS1
New Contributor III

07-14-2022 5:45:31 PM

9 kudos

Hi @Kaniz Fatma , they uploaded the full video for Advanced Machine Learning with Databricks course day 2, Thank you for the follow up. but still we have the same issue with Apache Spark Programming with Databricks - Bundle: Day 2 Training . can you...

9 kudos

07-14-2022 5:45:31 PM

10 More Replies

by Tejas1987 • New Contributor II

07-12-2022 1:46:46 PM

1202 Views
2 replies
1 kudos

Resolved! Finding multiple substrings from a DataFrame column dynamically?

Hello friends,I have a DataFrame with specific values. I am trying to find specific values out of it. *I/P -|ID | text ||:--|:------||1 | select distinct Col1 as OrderID from Table1 WHERE ( (Col3 Like '%ABC%') OR (Col3 Like '%DEF%') OR (Col3 Like '...

Data Engineering

1202 Views
2 replies
1 kudos

07-12-2022 1:46:46 PM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

07-14-2022 3:30:40 AM

1 kudos

What is the logic for substring function?Can't you use str1[idxi+14:3] for substring?

1 kudos

07-14-2022 3:30:40 AM

1 More Replies

by BradSheridan • Valued Contributor

07-17-2022 6:53:46 AM

1669 Views
4 replies
0 kudos

CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'

Hey there Community!! I'm using dlt.apply_changes in my DLT job as follows:dlt.apply_changes( target = "employee_silver", source = "employee_bronze_clean_v", keys = ["EMPLOYEE_ID"], sequence_by = col("last_updated"), apply_as_deletes = expr("Op ...

Data Engineering

1669 Views
4 replies
0 kudos

07-17-2022 6:53:46 AM

View Replies

Latest Reply

axb0
New Contributor III

07-17-2022 8:09:59 AM

0 kudos

First try expr("Operation = 'DELETE'") for your apply_as_deletes

0 kudos

07-17-2022 8:09:59 AM

3 More Replies

by leon • New Contributor II

07-10-2022 2:11:01 AM

1968 Views
2 replies
1 kudos

SQL connector from databricks-sql-connector takes too much time to convert to pandas

Hello,I am using querying my Delta Lake with SQL Connect and later want to explore the result in pandas.with connection.cursor() as cursor: cur = cursor.execute(""" SELECT DISTINCT sample_timestamp, value, name FROM de...

Data Engineering

1968 Views
2 replies
1 kudos

07-10-2022 2:11:01 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-13-2022 11:21:24 PM

1 kudos

Hi @Leon Bam, Please check this article and let us know if that helps.

1 kudos

07-13-2022 11:21:24 PM

1 More Replies

by Anonymous • Not applicable

06-27-2022 10:56:03 AM

4714 Views
26 replies
4 kudos

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting th...

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting things happening at Databricks. From notebooks to the Lakehouse, we know some of these new features wil...

Data Engineering

4714 Views
26 replies
4 kudos

06-27-2022 10:56:03 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

07-17-2022 12:47:12 AM

4 kudos

Cloning libraries when cloning clustersCurrently when we clone clusters, the externally added libraries aren't copied as part of cloning process.It's an expected behavior but a missing one. At times new developers end up spending lot of time in debug...

4 kudos

07-17-2022 12:47:12 AM

25 More Replies

by junaid • New Contributor

07-14-2022 2:56:09 AM

4220 Views
1 replies
0 kudos

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

When attempting to deploy/start an Databricks cluster on AWS through the UI, the following error consistently occurs:Bootstrap Timeout:[id: InstanceId(i-093caac78cdbfa7e1), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-335698072713...

Data Engineering

4220 Views
1 replies
0 kudos

07-14-2022 2:56:09 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-15-2022 9:55:34 PM

0 kudos

Hi @Junaid Ahmed, Nice to meet you, and Thank you for asking me this question. We have had a similar issue in the past and got the best answer too on it.Please see this community thread with the same question. Please let us know if that helps you.

0 kudos

07-15-2022 9:55:34 PM

by AmanSehgal • Honored Contributor III

07-14-2022 7:46:33 PM

5101 Views
2 replies
12 kudos

How concurrent runs in a job matches to cluster configuration?

In databricks jobs, there's a field to add concurrent runs which can be set to 1000.If I've a cluster with 4 worker nodes and 8 cores each, then at max how many concurrent jobs I'll be able to execute?What will happen if I launch 100 instances of sam...

Data Engineering

5101 Views
2 replies
12 kudos

07-14-2022 7:46:33 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

07-15-2022 5:08:12 PM

12 kudos

@Aman Sehgal On E2 workspace the limit is 1000 concurrent runs. If you trigger 100 runs at the same time, 100 clusters will be created and the runs will be executed. If you use the same cluster for 100 runs, then you might face a lot of failed jobs...

12 kudos

07-15-2022 5:08:12 PM

1 More Replies

by Nickje56 • New Contributor

06-29-2022 1:29:15 AM

3181 Views
1 replies
1 kudos

Resolved! _sqldf not defined

In the release notes of May 2022 it says that we are now able to investigate our SQL results in python in a python notebook. (See also documentation here: Use notebooks - Azure Databricks | Microsoft Docs ) So I created a simple query (select * from ...

Data Engineering

3181 Views
1 replies
1 kudos

06-29-2022 1:29:15 AM

View Replies

Latest Reply

User16753725469
Contributor II

07-15-2022 7:32:26 AM

1 kudos

This feature was delayed and will be rolled out over Databricks platform releases 3.74 through 3.76. you can check the release notes for more info --> https://docs.databricks.com/release-notes/product/2022/may.html

1 kudos

07-15-2022 7:32:26 AM

by Confused • New Contributor III

12-03-2021 3:18:17 AM

6022 Views
7 replies
2 kudos

Schema evolution issue

Hi AllI am loading some data using auto loader but am having trouble with Schema evolution.A new column has been added to the data I am loading and I am getting the following error:StreamingQueryException: Encountered unknown field(s) during parsing:...

Data Engineering

6022 Views
7 replies
2 kudos

12-03-2021 3:18:17 AM

View Replies

Latest Reply

rgrosskopf
New Contributor II

07-15-2022 7:16:06 AM

2 kudos

I agree that hints are the way to go if you have the schema available but the whole point of schema evolution is that you might not always know the schema in advance.I received a similar error with a similar streaming query configuration. The issue w...

2 kudos

07-15-2022 7:16:06 AM

6 More Replies

by vk217 • Contributor

07-14-2022 8:52:14 AM

1126 Views
2 replies
3 kudos

Resolved! Generic user account and personal access token to Azure Datarbicks

Is there a way to create a generic user account and personal access token to connect to databricks. I have Azure build pipeline and VSCode test that is using my personal access token for running builds and tests.

Data Engineering

1126 Views
2 replies
3 kudos

07-14-2022 8:52:14 AM

View Replies

Latest Reply

Gabriel0007
New Contributor III

07-14-2022 9:01:12 AM

3 kudos

You can create a service account (principle) for jobs, applications etc. Here's a link to the docs:https://docs.databricks.com/administration-guide/users-groups/service-principals.html

3 kudos

07-14-2022 9:01:12 AM

1 More Replies

by Tahseen0354 • Contributor III

07-04-2022 6:38:52 AM

1177 Views
5 replies
2 kudos

Why set up audit log delivery in databricks GCP fails ?

I am trying to set up audit log delivery in google cloud. I have followed this page https://docs.gcp.databricks.com/administration-guide/account-settings-gcp/log-delivery.html and have added log-delivery@databricks-prod-master.iam.gserviceaccount.co...

Data Engineering

1177 Views
5 replies
2 kudos

07-04-2022 6:38:52 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-07-2022 10:42:26 PM

2 kudos

Hi @Md Tahseen Anam , We haven't heard from you on the last response from @Prabakar, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.A...

2 kudos

07-07-2022 10:42:26 PM

4 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Structured Streaming migration from readStream to Auto Loader (AWS)

Connecting Power BI to DBSQL

TheINFO.pk is the premier and most trustworthy source of information in Pakistan. We provides latest news, updates, jobs, Results, Education, telecom ...

Error with databricks workspace

Data & AI Summit 2022 - Training Videos of paid Instructor led sessions not yet uploaded. @kaniz fatma

Resolved! Finding multiple substrings from a DataFrame column dynamically?

CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'

SQL connector from databricks-sql-connector takes too much time to convert to pandas

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting th...

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

How concurrent runs in a job matches to cluster configuration?

Resolved! _sqldf not defined

Schema evolution issue

Resolved! Generic user account and personal access token to Azure Datarbicks

Why set up audit log delivery in databricks GCP fails ?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...