Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

06-27-2022 10:56:03 AM

16733 Views
26 replies
4 kudos

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting th...

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting things happening at Databricks. From notebooks to the Lakehouse, we know some of these new features wil...

Data Engineering

16733 Views
26 replies
4 kudos

06-27-2022 10:56:03 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

07-17-2022 12:47:12 AM

4 kudos

Cloning libraries when cloning clustersCurrently when we clone clusters, the externally added libraries aren't copied as part of cloning process.It's an expected behavior but a missing one. At times new developers end up spending lot of time in debug...

4 kudos

07-17-2022 12:47:12 AM

25 More Replies

by AmanSehgal • Honored Contributor III

07-14-2022 7:46:33 PM

13261 Views
2 replies
12 kudos

How concurrent runs in a job matches to cluster configuration?

In databricks jobs, there's a field to add concurrent runs which can be set to 1000.If I've a cluster with 4 worker nodes and 8 cores each, then at max how many concurrent jobs I'll be able to execute?What will happen if I launch 100 instances of sam...

Data Engineering

13261 Views
2 replies
12 kudos

07-14-2022 7:46:33 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

07-15-2022 5:08:12 PM

12 kudos

@Aman Sehgal On E2 workspace the limit is 1000 concurrent runs. If you trigger 100 runs at the same time, 100 clusters will be created and the runs will be executed. If you use the same cluster for 100 runs, then you might face a lot of failed jobs...

12 kudos

07-15-2022 5:08:12 PM

1 More Replies

by Nickje56 • New Contributor

06-29-2022 1:29:15 AM

6939 Views
1 replies
1 kudos

Resolved! _sqldf not defined

In the release notes of May 2022 it says that we are now able to investigate our SQL results in python in a python notebook. (See also documentation here: Use notebooks - Azure Databricks | Microsoft Docs ) So I created a simple query (select * from ...

Data Engineering

6939 Views
1 replies
1 kudos

06-29-2022 1:29:15 AM

View Replies

Latest Reply

User16753725469
Databricks Employee

07-15-2022 7:32:26 AM

1 kudos

This feature was delayed and will be rolled out over Databricks platform releases 3.74 through 3.76. you can check the release notes for more info --> https://docs.databricks.com/release-notes/product/2022/may.html

1 kudos

07-15-2022 7:32:26 AM

by Confused • New Contributor III

12-03-2021 3:18:17 AM

12502 Views
7 replies
2 kudos

Schema evolution issue

Hi AllI am loading some data using auto loader but am having trouble with Schema evolution.A new column has been added to the data I am loading and I am getting the following error:StreamingQueryException: Encountered unknown field(s) during parsing:...

Data Engineering

12502 Views
7 replies
2 kudos

12-03-2021 3:18:17 AM

View Replies

Latest Reply

rgrosskopf
New Contributor II

07-15-2022 7:16:06 AM

2 kudos

I agree that hints are the way to go if you have the schema available but the whole point of schema evolution is that you might not always know the schema in advance.I received a similar error with a similar streaming query configuration. The issue w...

2 kudos

07-15-2022 7:16:06 AM

6 More Replies

by vk217 • Contributor

07-14-2022 8:52:14 AM

3375 Views
2 replies
3 kudos

Resolved! Generic user account and personal access token to Azure Datarbicks

Is there a way to create a generic user account and personal access token to connect to databricks. I have Azure build pipeline and VSCode test that is using my personal access token for running builds and tests.

Data Engineering

3375 Views
2 replies
3 kudos

07-14-2022 8:52:14 AM

View Replies

Latest Reply

Gabriel0007
New Contributor III

07-14-2022 9:01:12 AM

3 kudos

You can create a service account (principle) for jobs, applications etc. Here's a link to the docs:https://docs.databricks.com/administration-guide/users-groups/service-principals.html

3 kudos

07-14-2022 9:01:12 AM

1 More Replies

by Tahseen0354 • Valued Contributor

07-04-2022 6:38:52 AM

3656 Views
4 replies
2 kudos

Why set up audit log delivery in databricks GCP fails ?

I am trying to set up audit log delivery in google cloud. I have followed this page https://docs.gcp.databricks.com/administration-guide/account-settings-gcp/log-delivery.html and have added log-delivery@databricks-prod-master.iam.gserviceaccount.co...

Data Engineering

3656 Views
4 replies
2 kudos

07-04-2022 6:38:52 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

07-07-2022 3:01:50 AM

2 kudos

I would suggest, contacting your Databricks accounts representative for this. They would be able to check if something went wrong with your workspace subscription.

2 kudos

07-07-2022 3:01:50 AM

3 More Replies

by Gabriel0007 • New Contributor III

07-14-2022 8:56:46 AM

2482 Views
2 replies
2 kudos

How do I process each new record when using autoloader.

For instance, I'm ingesting webhook data into a delta table with autoloader and need to run a process for each new record as it arrives.

Data Engineering

2482 Views
2 replies
2 kudos

07-14-2022 8:56:46 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

07-14-2022 5:18:45 PM

2 kudos

With autoloader, you can do something like changelog and record data about operations performed on each micro batch - like affected id, I/U/D, timestamp etc..Then you can make use of this changelog table, and run subsequent processes for each row aff...

2 kudos

07-14-2022 5:18:45 PM

1 More Replies

by ishantjain194 • New Contributor II

07-13-2022 8:09:03 AM

2651 Views
2 replies
3 kudos

AWS OR AZURE OR GCLOUD??

I want to know whether which cloud is better to learn and which cloud services has more career opportunities.

Data Engineering

2651 Views
2 replies
3 kudos

07-13-2022 8:09:03 AM

View Replies

Latest Reply

Cedric
Databricks Employee

07-15-2022 5:18:27 AM

3 kudos

As addition to @Kaniz Fatma great comparison article, cloud skills are generally transferable across other providers. It is the same concept just with different names (eg: EC2 / Azure VM / Google Compute Engine). Learning cloud in general is a good ...

3 kudos

07-15-2022 5:18:27 AM

1 More Replies

by Cassio • New Contributor II

02-21-2022 11:38:00 AM

4939 Views
4 replies
3 kudos

Resolved! "SparkSecurityException: Cannot read sensitive key" error when reading key from Spark config

In Databricks 10.1 it is possible to define in the "Spark Config" of the cluster something like:spark.fernet {{secrets/myscope/encryption-key}} . In my case my scopes are tied to Azure Key Vault.With that I can make a query as follows:%sql SELECT d...

Data Engineering

4939 Views
4 replies
3 kudos

02-21-2022 11:38:00 AM

View Replies

Latest Reply

Soma
Valued Contributor

07-15-2022 1:30:23 AM

3 kudos

This solution exposes the entire secret if I use commands like belowsql("""explain select upper("${spark.fernet.email}") as data """).display()Please dont use this

3 kudos

07-15-2022 1:30:23 AM

3 More Replies

by 754424 • New Contributor

07-12-2022 11:52:48 AM

2217 Views
3 replies
2 kudos

Firefox only - copying from notebook table output copies cell contents instead

Firefox only - copying from notebook table output copies cell contents instead in Firefox (and firefox based browsers)

Data Engineering

2217 Views
3 replies
2 kudos

07-12-2022 11:52:48 AM

View Replies

Latest Reply

User16741082858
Databricks Employee

07-14-2022 12:26:32 PM

2 kudos

Hi @Jim Kutter, I have gone ahead and put in a ticket for you regarding this. Your Databricks representative will be in touch with you regarding the status. Thank you for your patience!

2 kudos

07-14-2022 12:26:32 PM

2 More Replies

by WillieAlsop • New Contributor

07-14-2022 3:44:32 AM

1226 Views
1 replies
0 kudos

What is the instrument of activity of Go?

ORDER NOW >>> https://www.outlookindia.com/outlook-spotlight/go-reviews-is-go-legit-or-news-207720 Go"Go" may upgrade mental state many days. They might work on mental concentration in half a month. This regular item might help focus levels and make ...

Data Engineering

1226 Views
1 replies
0 kudos

07-14-2022 3:44:32 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

07-14-2022 5:14:21 PM

0 kudos

@Kaniz Fatma - spam post

0 kudos

07-14-2022 5:14:21 PM

by aschiff • Contributor II

07-06-2022 12:00:19 PM

52682 Views
24 replies
4 kudos

Resolved! Extracting data from a multi-layered JSON object

I have a table in databricks called owner_final_delta with a column called contacts that holds data with this structure:array<struct<address:struct<apartment:string,city:string,house:string,poBox:string,sources:array<string>,state:string,street:strin...

Data Engineering

52682 Views
24 replies
4 kudos

07-06-2022 12:00:19 PM

View Replies

Latest Reply

Dooley
Databricks Employee

07-06-2022 1:27:43 PM

4 kudos

Have you tried to use the explode function for that column with the array?df.select(explode(df.emailId).alias("email")).show()----------Also, if you are a SQL lover, you can instead use the Databricks syntax for querying a JSON seen here.

4 kudos

07-06-2022 1:27:43 PM

23 More Replies

by StackP • New Contributor

07-12-2022 8:12:57 PM

4101 Views
1 replies
0 kudos

How to add unique consecutive id to delta lake table

In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).So far I have tried converting delta table to pyspark dataframe and...

Data Engineering

4101 Views
1 replies
0 kudos

07-12-2022 8:12:57 PM

View Replies

Latest Reply

Sandeep
Databricks Employee

07-14-2022 1:23:30 PM

0 kudos

How about defining an identity column as below?GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ]https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html#parameters

0 kudos

07-14-2022 1:23:30 PM

by BradSheridan • Databricks Partner

07-14-2022 9:49:29 AM

2543 Views
2 replies
1 kudos

Resolved! Add an Instance Profile to a DLT job cluster

@Tomasz Bacewicz I've got another, related question for you about the job cluster that is spun up for DLT jobs. Adding the JSON strings for our required E2 tags worked like a charm, but now I need to attach an existing Instance Profile since I'm tr...

Data Engineering

2543 Views
2 replies
1 kudos

07-14-2022 9:49:29 AM

View Replies

Latest Reply

tomasz
Databricks Employee

07-14-2022 10:04:42 AM

1 kudos

@Brad Sheridan To do that you have to add the aws_attributes tag within a cluster configuration and there you have the ability to add an instance_profile_arn like so:"clusters": [ { "label": "default", "aws_attributes": { ...

1 kudos

07-14-2022 10:04:42 AM

1 More Replies

by junaid • New Contributor II

07-14-2022 2:56:09 AM

8793 Views
0 replies
1 kudos

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

When attempting to deploy/start an Databricks cluster on AWS through the UI, the following error consistently occurs:Bootstrap Timeout:[id: InstanceId(i-093caac78cdbfa7e1), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-335698072713...

Data Engineering

8793 Views
0 replies
1 kudos

07-14-2022 2:56:09 AM

Databricks Community

Forum Posts

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting th...

How concurrent runs in a job matches to cluster configuration?

Resolved! _sqldf not defined

Schema evolution issue

Resolved! Generic user account and personal access token to Azure Datarbicks

Why set up audit log delivery in databricks GCP fails ?

How do I process each new record when using autoloader.

AWS OR AZURE OR GCLOUD??

Resolved! "SparkSecurityException: Cannot read sensitive key" error when reading key from Spark config

Firefox only - copying from notebook table output copies cell contents instead

What is the instrument of activity of Go?

Resolved! Extracting data from a multi-layered JSON object

How to add unique consecutive id to delta lake table

Resolved! Add an Instance Profile to a DLT job cluster

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template