cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Siddhesh2525
by New Contributor III
  • 11802 Views
  • 4 replies
  • 4 kudos

how to set retry attempt and how to set email alert with error message of databricks notebook

how to set retry attempt in the data bricks notebook in term of like if any cmd /cell get fails that times that particular cmd/cell should be rerun for purpose of connection issue etc.

  • 11802 Views
  • 4 replies
  • 4 kudos
Latest Reply
Siddhesh2525
New Contributor III
  • 4 kudos

"you can just implement try/except in cell, handling it by using dbutils.notebook.exit(jobId) and using other dbutils can help,@HubertDudek As i am fresher in the databricks ,Could you please suggest /explain me in detail

  • 4 kudos
3 More Replies
Brose
by New Contributor III
  • 17378 Views
  • 9 replies
  • 2 kudos

Creating a delta table Mismatch Input Error

I am trying to create a delta table for streaming data, but I am getting the following error; Error in SQL statement: ParseException: mismatched input 'CREATE' expecting {<EOF>, ';'}(line 2, pos 0).My statement is as follows;%sqlDROP TABLE IF EXISTS ...

  • 17378 Views
  • 9 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Ambrose Walker​ - If Jose's answer resolved your issue, would you be happy to mark that post as best? That will help others find the solution more quickly.

  • 2 kudos
8 More Replies
missyT
by New Contributor III
  • 2086 Views
  • 1 replies
  • 0 kudos

ISP Network Question

Some ISP's like Charter have their systems configured in such a way that from a customers router the ARP table for all of the IP's in the subnet show the same MAC address. The IP it hands off through their modem to the CPE router is a /22. When you t...

  • 2086 Views
  • 1 replies
  • 0 kudos
Latest Reply
Prabakar
Databricks Employee
  • 0 kudos

Hi, @Missy Trussell​ I don't see this to be a Databricks related question. I would suggest that you raise this query in StackOverflow or some networking-related forums.

  • 0 kudos
JD2
by Contributor
  • 4784 Views
  • 4 replies
  • 7 kudos

Resolved! Lakehouse with Delta Lake Deep Dive Training

Hello:As per link shown below, I need help to see from where I can get the DBC file for hands-on training.https://www.youtube.com/watch?v=znv4rM9wevc&ab_channel=DatabricksAny help is greatly appreciated.Thanks

  • 4784 Views
  • 4 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 7 kudos

Thank you for url just watching it

  • 7 kudos
3 More Replies
Hubert-Dudek
by Databricks MVP
  • 1457 Views
  • 0 replies
  • 19 kudos

docs.databricks.com

Databricks Runtime 10.2 Beta is available from yesterday.More details here: https://docs.databricks.com/release-notes/runtime/10.2.htmlNew features and improvementsUse Files in Repos with Spark StreamingDatabricks Utilities adds an update mount comma...

image.png
  • 1457 Views
  • 0 replies
  • 19 kudos
Hubert-Dudek
by Databricks MVP
  • 2615 Views
  • 2 replies
  • 18 kudos

I thought that Azure Data Factory is built on spark but now when I crushed it I see that is build directly on databricks :-)

I thought that Azure Data Factory is built on spark but now when I crushed it I see that is build directly on databricks

image.png
  • 2615 Views
  • 2 replies
  • 18 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 18 kudos

correct. Because Data Flows were available before their own (MS) spark pools were available.But let's be honest: that is only a good thing

  • 18 kudos
1 More Replies
guruv
by New Contributor III
  • 9441 Views
  • 4 replies
  • 2 kudos

Resolved! delta table autooptimize vs optimize command

HI,i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3. there are only write/read operation on delta tables and no update/delete.As part of release pipeline, below commands are executed in a new notebook in...

  • 9441 Views
  • 4 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

the auto optimize is sufficient, unless you run into performance issues.Then I would trigger an optimize. This will generate files of 1GB (so larger than the standard size of auto optimize). And of course the Z-Order if necessary.The suggestion to ...

  • 2 kudos
3 More Replies
MadelynM
by Databricks Employee
  • 1681 Views
  • 0 replies
  • 1 kudos

vimeo.com

Repos let you use Git functionality such as cloning a remote repo, managing branches, pushing and pulling changes and visually comparing differences upon commit. Here's a quick video (3:56) on setting up a repo for Databricks on AWS. Pre-reqs: Git in...

  • 1681 Views
  • 0 replies
  • 1 kudos
MadelynM
by Databricks Employee
  • 1199 Views
  • 0 replies
  • 0 kudos

vimeo.com

A job is a way of running a notebook either immediately or on a scheduled basis. Here's a quick video (4:04) on how to schedule a job and automate a workflow for Databricks on AWS. To follow along with the video, import this notebook into your worksp...

  • 1199 Views
  • 0 replies
  • 0 kudos
MadelynM
by Databricks Employee
  • 1395 Views
  • 0 replies
  • 1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

  • 1395 Views
  • 0 replies
  • 1 kudos
marchello
by New Contributor III
  • 4146 Views
  • 5 replies
  • 6 kudos

Resolved! register model - need python 3, but get only python 2

Hi all, I'm trying to register a model with python 3 support, but continue getting only python 2. I can see that runtime 6.0 and above get python 3 by default, but I don't see a way to set neither runtime version, nor python version during model regi...

  • 4146 Views
  • 5 replies
  • 6 kudos
Latest Reply
marchello
New Contributor III
  • 6 kudos

Hi team, thanks for getting back to me. Let's put this on hold for now. I will update once it's needed again. It was solely for education purpose and right now I have quite urgent stuff to do.Have a great day. 

  • 6 kudos
4 More Replies
Murugan
by New Contributor II
  • 6106 Views
  • 4 replies
  • 1 kudos

Databricks interoperability between cloud environments

While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,1) Whether Databricks can be cloud agnostic (i.e.,) In ca...

  • 6106 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

You'll be interested in the Unity Catalog.The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the...

  • 1 kudos
3 More Replies
as999
by New Contributor III
  • 2723 Views
  • 3 replies
  • 1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  For example.Original DF.sno Object Name  shape  rating1  Fruit apple round ...

  • 2723 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

  • 1 kudos
2 More Replies
Sam
by New Contributor III
  • 2294 Views
  • 1 replies
  • 4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

  • 2294 Views
  • 1 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

  • 4 kudos
Labels