cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

RS1
by New Contributor III
  • 2845 Views
  • 11 replies
  • 9 kudos

Data & AI Summit 2022 - Training Videos of paid Instructor led sessions not yet uploaded. @kaniz fatma

@Kaniz Fatma​ I attended the Advanced Machine Learning with Databricks training last week virtually I am still unable to get the day 2 session videos of any of the Instructor led Paid Trainings. They are supposed to be available for replay with in 24...

  • 2845 Views
  • 11 replies
  • 9 kudos
Latest Reply
RS1
New Contributor III
  • 9 kudos

Hi @Kaniz Fatma​ , they uploaded the full video for Advanced Machine Learning with Databricks course day 2, Thank you for the follow up. but still we have the same issue with Apache Spark Programming with Databricks - Bundle: Day 2 Training . can you...

  • 9 kudos
10 More Replies
Tejas1987
by New Contributor II
  • 1620 Views
  • 2 replies
  • 1 kudos

Resolved! Finding multiple substrings from a DataFrame column dynamically?

Hello friends,I have a DataFrame with specific values. I am trying to find specific values out of it.   *I/P -|ID | text ||:--|:------||1 | select distinct Col1 as OrderID from Table1 WHERE ( (Col3 Like '%ABC%') OR (Col3 Like '%DEF%') OR (Col3 Like '...

  • 1620 Views
  • 2 replies
  • 1 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 1 kudos

What is the logic for substring function?Can't you use str1[idxi+14:3] for substring?

  • 1 kudos
1 More Replies
BradSheridan
by Valued Contributor
  • 2298 Views
  • 4 replies
  • 0 kudos

CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'

Hey there Community!! I'm using dlt.apply_changes in my DLT job as follows:dlt.apply_changes( target = "employee_silver",  source = "employee_bronze_clean_v",  keys = ["EMPLOYEE_ID"],  sequence_by = col("last_updated"),  apply_as_deletes = expr("Op ...

  • 2298 Views
  • 4 replies
  • 0 kudos
Latest Reply
axb0
New Contributor III
  • 0 kudos

First try expr("Operation = 'DELETE'") for your apply_as_deletes

  • 0 kudos
3 More Replies
leon
by New Contributor II
  • 2534 Views
  • 2 replies
  • 1 kudos

SQL connector from databricks-sql-connector takes too much time to convert to pandas

Hello,I am using querying my Delta Lake with SQL Connect and later want to explore the result in pandas.with connection.cursor() as cursor: cur = cursor.execute(""" SELECT DISTINCT sample_timestamp, value, name FROM de...

  • 2534 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Leon Bam​, Please check this article and let us know if that helps.

  • 1 kudos
1 More Replies
Anonymous
by Not applicable
  • 9944 Views
  • 26 replies
  • 4 kudos

Use Case Sharing Sweepstakes !  Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting th...

Use Case Sharing Sweepstakes ! Data + AI Summit is in full swing and we know you are just as excited as we are to learn about the new and exciting things happening at Databricks. From notebooks to the Lakehouse, we know some of these new features wil...

  • 9944 Views
  • 26 replies
  • 4 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 4 kudos

Cloning libraries when cloning clustersCurrently when we clone clusters, the externally added libraries aren't copied as part of cloning process.It's an expected behavior but a missing one. At times new developers end up spending lot of time in debug...

  • 4 kudos
25 More Replies
junaid
by New Contributor
  • 6379 Views
  • 1 replies
  • 0 kudos

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

When attempting to deploy/start an Databricks cluster on AWS through the UI, the following error consistently occurs:Bootstrap Timeout:[id: InstanceId(i-093caac78cdbfa7e1), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-335698072713...

  • 6379 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Junaid Ahmed​, Nice to meet you, and Thank you for asking me this question. We have had a similar issue in the past and got the best answer too on it.Please see this community thread with the same question. Please let us know if that helps you.

  • 0 kudos
AmanSehgal
by Honored Contributor III
  • 9722 Views
  • 2 replies
  • 12 kudos

How concurrent runs in a job matches to cluster configuration?

In databricks jobs, there's a field to add concurrent runs which can be set to 1000.If I've a cluster with 4 worker nodes and 8 cores each, then at max how many concurrent jobs I'll be able to execute?What will happen if I launch 100 instances of sam...

  • 9722 Views
  • 2 replies
  • 12 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 12 kudos

@Aman Sehgal​ On E2 workspace the limit is 1000 concurrent runs. If you trigger 100 runs​ at the same time, 100 clusters will be created and the runs will be executed. If you use the same cluster for 100 runs, then you might face a lot of failed jobs...

  • 12 kudos
1 More Replies
Nickje56
by New Contributor
  • 3721 Views
  • 1 replies
  • 1 kudos

Resolved! _sqldf not defined

In the release notes of May 2022 it says that we are now able to investigate our SQL results in python in a python notebook. (See also documentation here: Use notebooks - Azure Databricks | Microsoft Docs ) So I created a simple query (select * from ...

  • 3721 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

This feature was delayed and will be rolled out over Databricks platform releases 3.74 through 3.76. you can check the release notes for more info --> https://docs.databricks.com/release-notes/product/2022/may.html

  • 1 kudos
Confused
by New Contributor III
  • 7349 Views
  • 7 replies
  • 2 kudos

Schema evolution issue

Hi AllI am loading some data using auto loader but am having trouble with Schema evolution.A new column has been added to the data I am loading and I am getting the following error:StreamingQueryException: Encountered unknown field(s) during parsing:...

  • 7349 Views
  • 7 replies
  • 2 kudos
Latest Reply
rgrosskopf
New Contributor II
  • 2 kudos

I agree that hints are the way to go if you have the schema available but the whole point of schema evolution is that you might not always know the schema in advance.I received a similar error with a similar streaming query configuration. The issue w...

  • 2 kudos
6 More Replies
vk217
by Contributor
  • 1565 Views
  • 2 replies
  • 3 kudos

Resolved! Generic user account and personal access token to Azure Datarbicks

Is there a way to create a generic user account and personal access token to connect to databricks. I have Azure build pipeline and VSCode test that is using my personal access token for running builds and tests.

  • 1565 Views
  • 2 replies
  • 3 kudos
Latest Reply
Gabriel0007
New Contributor III
  • 3 kudos

You can create a service account (principle) for jobs, applications etc. Here's a link to the docs:https://docs.databricks.com/administration-guide/users-groups/service-principals.html

  • 3 kudos
1 More Replies
Tahseen0354
by Contributor III
  • 1694 Views
  • 5 replies
  • 2 kudos

Why set up audit log delivery in databricks GCP fails ?

I am trying to set up audit log delivery in google cloud. I have followed this page https://docs.gcp.databricks.com/administration-guide/account-settings-gcp/log-delivery.html and have added log-delivery@databricks-prod-master.iam.gserviceaccount.co...

  • 1694 Views
  • 5 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Md Tahseen Anam​ , We haven't heard from you on the last response from @Prabakar, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.A...

  • 2 kudos
4 More Replies
Gabriel0007
by New Contributor III
  • 1088 Views
  • 2 replies
  • 2 kudos

How do I process each new record when using autoloader.

For instance, I'm ingesting webhook data into a delta table with autoloader and need to run a process for each new record as it arrives.

  • 1088 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

With autoloader, you can do something like changelog and record data about operations performed on each micro batch - like affected id, I/U/D, timestamp etc..Then you can make use of this changelog table, and run subsequent processes for each row aff...

  • 2 kudos
1 More Replies
ishantjain194
by New Contributor II
  • 1296 Views
  • 4 replies
  • 4 kudos

AWS OR AZURE OR GCLOUD??

I want to know whether which cloud is better to learn and which cloud services has more career opportunities.

  • 1296 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @ishant jain​ , We haven't heard from you on the last response from @me and @Cedric Law Hing Ping​​, and I was checking back to see if our solutions helped you. Or else, If you have any solution, please share it with the community as it can be hel...

  • 4 kudos
3 More Replies
Cassio
by New Contributor II
  • 2602 Views
  • 4 replies
  • 3 kudos

Resolved! "SparkSecurityException: Cannot read sensitive key" error when reading key from Spark config

In Databricks 10.1 it is possible to define in the "Spark Config" of the cluster something like:spark.fernet {{secrets/myscope/encryption-key}} . In my case my scopes are tied to Azure Key Vault.With that I can make a query as follows:%sql   SELECT d...

  • 2602 Views
  • 4 replies
  • 3 kudos
Latest Reply
Soma
Valued Contributor
  • 3 kudos

This solution exposes the entire secret if I use commands like belowsql("""explain select upper("${spark.fernet.email}") as data """).display()Please dont use this

  • 3 kudos
3 More Replies
Jin_Kim
by New Contributor II
  • 914 Views
  • 1 replies
  • 0 kudos

Question on single job with multi task

Say, I have a job with 10 parallel tasks. I had to cancel one of the tasks to fix something and I unable to restart just that task. Is this by design? Should I restart the job in this case.Q2) If one of the tasks fails, will it auto recover just tha...

  • 914 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Jin Kim​, Please enable "Task Orchestration in Jobs" in your Admin Console, and then you can add as many tasks to your job. You can also specify the dependency of your task.

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels