cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Braxx
by Contributor II
  • 11467 Views
  • 3 replies
  • 1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

  • 11467 Views
  • 3 replies
  • 1 kudos
Latest Reply
Invasioned
New Contributor II
  • 1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

  • 1 kudos
2 More Replies
ashdam
by New Contributor III
  • 7658 Views
  • 9 replies
  • 1 kudos

Resolved! How to version your workflows/jobs

WE would like to version control workflows/jobs over git, not the underlying notebooks but the job logic itselfis it possible?

  • 7658 Views
  • 9 replies
  • 1 kudos
Latest Reply
ashdam
New Contributor III
  • 1 kudos

Thank you very much for all your answers

  • 1 kudos
8 More Replies
madhav_dhruve
by New Contributor III
  • 4326 Views
  • 1 replies
  • 0 kudos

Move Files from S3 to Local File System with Unity Catalog Enabled

Dear Databricks Community Experts,I am working on databricks on AWS with unity catalog.One usecase for me is to uncompress files with many extensions there on S3 Bucket.Below is my strategy:-Move files from S3 to Local file system (where spark driver...

Screenshot 2023-07-18 at 10.57.19 AM.png
  • 4326 Views
  • 1 replies
  • 0 kudos
Latest Reply
rvadali2
New Contributor II
  • 0 kudos

did you find a solution to this? 

  • 0 kudos
chari
by Contributor
  • 6070 Views
  • 1 replies
  • 1 kudos

Cant connect power BI desktop to Azure databricks

Hello,I am trying to connect Power BI desktop to azure databricks (source: delta table) by downloading a connection file from Databricks. I see an error message like below when I open the connection file with power BI. Repeated attempts have given th...

  • 6070 Views
  • 1 replies
  • 1 kudos
Latest Reply
chari
Contributor
  • 1 kudos

Actually, the Azure Databricks cluster is managed by IT team. So there are some boundaries as I am granted only usage privelages.

  • 1 kudos
pratik21
by New Contributor II
  • 6227 Views
  • 3 replies
  • 1 kudos

Unexpected error while calling Notebook string matching regex `\$[\w_]+' expected but `M' found

Run result unavailable: job failed with error message INVALID_PARAMETER_VALUE: Failed to parse %run command: string matching regex `\$[\w_]+' expected but `M' found) Stacktrace:/Notebookpath: scalato call notebook we are using dbutils.notebook.run("N...

  • 6227 Views
  • 3 replies
  • 1 kudos
Latest Reply
wise_owl
New Contributor III
  • 1 kudos

Not sure of @pratik21 , but for me cloning the notebook at a different location worked for me and it stopped giving me the error altogether.

  • 1 kudos
2 More Replies
User16826990884
by New Contributor III
  • 3415 Views
  • 3 replies
  • 0 kudos

Version control jobs

How do engineering teams out there version control their jobs? If there is a production issue, can I revert to an older version of the job?

  • 3415 Views
  • 3 replies
  • 0 kudos
Latest Reply
Rom
New Contributor III
  • 0 kudos

You can use version controlled source code for you databricks job and each time you need to rollback to older version of your job you need just to move to older version code. For version controlled source code you have multiple choises:-  Use a noteb...

  • 0 kudos
2 More Replies
Diogo_W
by New Contributor III
  • 4432 Views
  • 2 replies
  • 1 kudos

Resolved! Spark in not executing any tasks

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.Even for very simple code it hangs forever.Anyone ever faced something similar? Our infra is AWS. 

Diogo_W_0-1698352974280.png Diogo_W_1-1698353051402.png
  • 4432 Views
  • 2 replies
  • 1 kudos
Latest Reply
Diogo_W
New Contributor III
  • 1 kudos

Found the solution: Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

  • 1 kudos
1 More Replies
azera
by New Contributor II
  • 1826 Views
  • 2 replies
  • 2 kudos

Stream-stream window join after time window aggregation not working in 13.1

Hey,I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.However, when I'm reproducing the following code:clicksWindow = c...

  • 1826 Views
  • 2 replies
  • 2 kudos
Latest Reply
Happyfield7
New Contributor II
  • 2 kudos

Hey,I'm currently facing the same problem, so I would to know if you've made any progress in resolving this issue.

  • 2 kudos
1 More Replies
erigaud
by Honored Contributor
  • 2695 Views
  • 1 replies
  • 0 kudos

Combining DLT and workflow - MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED

Hello everyone !I currently have a DLT pipeline that loads into several Delta LIVE tables (both streaming and materialized view).The end table of my DLT pipeline is a materialized view called "silver.my_view".In a later step I need to join/union/merg...

  • 2695 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hello everyone !I currently have a DLT pipeline that loads into several Delta LIVE tables (both streaming and materialized view).The end table of my DLT pipeline is a materialized view called "silver.my_view".In a later step I need to join/union/merg...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Rani
by New Contributor
  • 8704 Views
  • 2 replies
  • 0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

  • 8704 Views
  • 2 replies
  • 0 kudos
Latest Reply
subham0611
New Contributor II
  • 0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

  • 0 kudos
1 More Replies
Leszek
by Contributor
  • 7339 Views
  • 1 replies
  • 2 kudos

IDENTITY columns generating every other number when merging

Hi,I'm doing merge to my Delta Table which has IDENTITY column:Id BIGINT GENERATED ALWAYS AS IDENTITYInserted data has in the id column every other number, like this:Is this expected behavior? Is there any workaround to make number increasing by 1?

image
  • 7339 Views
  • 1 replies
  • 2 kudos
Latest Reply
Dataspeaksss
New Contributor II
  • 2 kudos

Were you able to resolve it? I'm facing the same issue.

  • 2 kudos
Mohammad_Younus
by New Contributor
  • 4479 Views
  • 0 replies
  • 0 kudos

Merge delta tables with data more than 200 million

HI Everyone,Im trying to merge two delta tables who have data more than 200 million in each of them. These tables are properly optimized. But upon running the job, the job is taking a long time to execute and the memory spills are huger (1TB-3TB) rec...

Mohammad_Younus_0-1698373999153.png
  • 4479 Views
  • 0 replies
  • 0 kudos
Joe1912
by New Contributor III
  • 1102 Views
  • 0 replies
  • 0 kudos

Issue with MERGE INTO for first batch

I have source data with multiple rows and columns, 1 of column is city. I want to get unique city into other table by stream data from source table. So I trying to use merge into and foreachBatch with my merge function.  My merge condition is : On so...

  • 1102 Views
  • 0 replies
  • 0 kudos
JD2
by Contributor
  • 1261 Views
  • 0 replies
  • 0 kudos

cursor type\loop question

Hello:In my Hive Metastore, I have 35 tables in database that I want to export in excel. I need help on query that can loop one table at a time export one table to excel.Any help is appreciated.Thanking in advance for your kind help.

  • 1261 Views
  • 0 replies
  • 0 kudos
Sahha_Krishna
by New Contributor
  • 7953 Views
  • 1 replies
  • 0 kudos

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Unable to start the Cluster in AWS-hosted Databricks because of the below reason{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZIN...

Data Engineering
AWS
EC2
VPC
  • 7953 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Databricks Employee
  • 0 kudos

Hi, Sahha: Thanks for contacting Databricks Support.  This is the common type of error, which indicates that the bootstrap failed due to a misconfigured data plane network. Databricks requested EC2 instances for a new cluster, but encountered a long ...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels