cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 16733 Views
  • 26 replies
  • 4 kudos

Use Case Sharing Sweepstakes !  Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting th...

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting things happening at Databricks. From notebooks to the Lakehouse, we know some of these new features wil...

  • 16733 Views
  • 26 replies
  • 4 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 4 kudos

Cloning libraries when cloning clustersCurrently when we clone clusters, the externally added libraries aren't copied as part of cloning process.It's an expected behavior but a missing one. At times new developers end up spending lot of time in debug...

  • 4 kudos
25 More Replies
AmanSehgal
by Honored Contributor III
  • 13261 Views
  • 2 replies
  • 12 kudos

How concurrent runs in a job matches to cluster configuration?

In databricks jobs, there's a field to add concurrent runs which can be set to 1000.If I've a cluster with 4 worker nodes and 8 cores each, then at max how many concurrent jobs I'll be able to execute?What will happen if I launch 100 instances of sam...

  • 13261 Views
  • 2 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

@Aman Sehgal​ On E2 workspace the limit is 1000 concurrent runs. If you trigger 100 runs​ at the same time, 100 clusters will be created and the runs will be executed. If you use the same cluster for 100 runs, then you might face a lot of failed jobs...

  • 12 kudos
1 More Replies
Nickje56
by New Contributor
  • 6939 Views
  • 1 replies
  • 1 kudos

Resolved! _sqldf not defined

In the release notes of May 2022 it says that we are now able to investigate our SQL results in python in a python notebook. (See also documentation here: Use notebooks - Azure Databricks | Microsoft Docs ) So I created a simple query (select * from ...

  • 6939 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753725469
Databricks Employee
  • 1 kudos

This feature was delayed and will be rolled out over Databricks platform releases 3.74 through 3.76. you can check the release notes for more info --> https://docs.databricks.com/release-notes/product/2022/may.html

  • 1 kudos
Confused
by New Contributor III
  • 12502 Views
  • 7 replies
  • 2 kudos

Schema evolution issue

Hi AllI am loading some data using auto loader but am having trouble with Schema evolution.A new column has been added to the data I am loading and I am getting the following error:StreamingQueryException: Encountered unknown field(s) during parsing:...

  • 12502 Views
  • 7 replies
  • 2 kudos
Latest Reply
rgrosskopf
New Contributor II
  • 2 kudos

I agree that hints are the way to go if you have the schema available but the whole point of schema evolution is that you might not always know the schema in advance.I received a similar error with a similar streaming query configuration. The issue w...

  • 2 kudos
6 More Replies
vk217
by Contributor
  • 3375 Views
  • 2 replies
  • 3 kudos

Resolved! Generic user account and personal access token to Azure Datarbicks

Is there a way to create a generic user account and personal access token to connect to databricks. I have Azure build pipeline and VSCode test that is using my personal access token for running builds and tests.

  • 3375 Views
  • 2 replies
  • 3 kudos
Latest Reply
Gabriel0007
New Contributor III
  • 3 kudos

You can create a service account (principle) for jobs, applications etc. Here's a link to the docs:https://docs.databricks.com/administration-guide/users-groups/service-principals.html

  • 3 kudos
1 More Replies
Tahseen0354
by Valued Contributor
  • 3656 Views
  • 4 replies
  • 2 kudos

Why set up audit log delivery in databricks GCP fails ?

I am trying to set up audit log delivery in google cloud. I have followed this page https://docs.gcp.databricks.com/administration-guide/account-settings-gcp/log-delivery.html and have added log-delivery@databricks-prod-master.iam.gserviceaccount.co...

  • 3656 Views
  • 4 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

I would suggest, contacting your Databricks accounts representative for this. They would be able to check if something went wrong with your workspace subscription.

  • 2 kudos
3 More Replies
Gabriel0007
by New Contributor III
  • 2482 Views
  • 2 replies
  • 2 kudos

How do I process each new record when using autoloader.

For instance, I'm ingesting webhook data into a delta table with autoloader and need to run a process for each new record as it arrives.

  • 2482 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

With autoloader, you can do something like changelog and record data about operations performed on each micro batch - like affected id, I/U/D, timestamp etc..Then you can make use of this changelog table, and run subsequent processes for each row aff...

  • 2 kudos
1 More Replies
ishantjain194
by New Contributor II
  • 2651 Views
  • 2 replies
  • 3 kudos

AWS OR AZURE OR GCLOUD??

I want to know whether which cloud is better to learn and which cloud services has more career opportunities.

  • 2651 Views
  • 2 replies
  • 3 kudos
Latest Reply
Cedric
Databricks Employee
  • 3 kudos

As addition to @Kaniz Fatma​ great comparison article, cloud skills are generally transferable across other providers. It is the same concept just with different names (eg: EC2 / Azure VM / Google Compute Engine). Learning cloud in general is a good ...

  • 3 kudos
1 More Replies
Cassio
by New Contributor II
  • 4939 Views
  • 4 replies
  • 3 kudos

Resolved! "SparkSecurityException: Cannot read sensitive key" error when reading key from Spark config

In Databricks 10.1 it is possible to define in the "Spark Config" of the cluster something like:spark.fernet {{secrets/myscope/encryption-key}} . In my case my scopes are tied to Azure Key Vault.With that I can make a query as follows:%sql   SELECT d...

  • 4939 Views
  • 4 replies
  • 3 kudos
Latest Reply
Soma
Valued Contributor
  • 3 kudos

This solution exposes the entire secret if I use commands like belowsql("""explain select upper("${spark.fernet.email}") as data """).display()Please dont use this

  • 3 kudos
3 More Replies
754424
by New Contributor
  • 2217 Views
  • 3 replies
  • 2 kudos

Firefox only - copying from notebook table output copies cell contents instead

Firefox only - copying from notebook table output copies cell contents instead in Firefox (and firefox based browsers)

  • 2217 Views
  • 3 replies
  • 2 kudos
Latest Reply
User16741082858
Databricks Employee
  • 2 kudos

Hi @Jim Kutter​, I have gone ahead and put in a ticket for you regarding this. Your Databricks representative will be in touch with you regarding the status. Thank you for your patience!

  • 2 kudos
2 More Replies
WillieAlsop
by New Contributor
  • 1226 Views
  • 1 replies
  • 0 kudos

What is the instrument of activity of Go?

ORDER NOW >>> https://www.outlookindia.com/outlook-spotlight/go-reviews-is-go-legit-or-news-207720 Go"Go" may upgrade mental state many days. They might work on mental concentration in half a month. This regular item might help focus levels and make ...

  • 1226 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

@Kaniz Fatma​ - spam post

  • 0 kudos
aschiff
by Contributor II
  • 52682 Views
  • 24 replies
  • 4 kudos

Resolved! Extracting data from a multi-layered JSON object

I have a table in databricks called owner_final_delta with a column called contacts that holds data with this structure:array<struct<address:struct<apartment:string,city:string,house:string,poBox:string,sources:array<string>,state:string,street:strin...

  • 52682 Views
  • 24 replies
  • 4 kudos
Latest Reply
Dooley
Databricks Employee
  • 4 kudos

Have you tried to use the explode function for that column with the array?df.select(explode(df.emailId).alias("email")).show()----------Also, if you are a SQL lover, you can instead use the Databricks syntax for querying a JSON seen here.

  • 4 kudos
23 More Replies
StackP
by New Contributor
  • 4101 Views
  • 1 replies
  • 0 kudos

How to add unique consecutive id to delta lake table

In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).So far I have tried converting delta table to pyspark dataframe and...

  • 4101 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sandeep
Databricks Employee
  • 0 kudos

How about defining an identity column as below?GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ]https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html#parameters

  • 0 kudos
BradSheridan
by Databricks Partner
  • 2543 Views
  • 2 replies
  • 1 kudos

Resolved! Add an Instance Profile to a DLT job cluster

@Tomasz Bacewicz​ I've got another, related question for you about the job cluster that is spun up for DLT jobs. Adding the JSON strings for our required E2 tags worked like a charm, but now I need to attach an existing Instance Profile since I'm tr...

  • 2543 Views
  • 2 replies
  • 1 kudos
Latest Reply
tomasz
Databricks Employee
  • 1 kudos

@Brad Sheridan​ To do that you have to add the aws_attributes tag within a cluster configuration and there you have the ability to add an instance_profile_arn like so:"clusters": [ { "label": "default", "aws_attributes": { ...

  • 1 kudos
1 More Replies
junaid
by New Contributor II
  • 8793 Views
  • 0 replies
  • 1 kudos

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

When attempting to deploy/start an Databricks cluster on AWS through the UI, the following error consistently occurs:Bootstrap Timeout:[id: InstanceId(i-093caac78cdbfa7e1), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-335698072713...

  • 8793 Views
  • 0 replies
  • 1 kudos
Labels