Data Engineering

Forum Posts

Sorted by:

by JD2 • Contributor

12-04-2021 11:09:31 AM

1771 Views
4 replies
7 kudos

Resolved! Lakehouse with Delta Lake Deep Dive Training

Hello:As per link shown below, I need help to see from where I can get the DBC file for hands-on training.https://www.youtube.com/watch?v=znv4rM9wevc&ab_channel=DatabricksAny help is greatly appreciated.Thanks

Data Engineering

1771 Views
4 replies
7 kudos

12-04-2021 11:09:31 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-07-2021 2:13:04 AM

7 kudos

Thank you for url just watching it

7 kudos

12-07-2021 2:13:04 AM

3 More Replies

by Hubert-Dudek • Esteemed Contributor III

12-07-2021 2:05:54 AM

480 Views
0 replies
19 kudos

docs.databricks.com

Databricks Runtime 10.2 Beta is available from yesterday.More details here: https://docs.databricks.com/release-notes/runtime/10.2.htmlNew features and improvementsUse Files in Repos with Spark StreamingDatabricks Utilities adds an update mount comma...

Data Engineering

480 Views
0 replies
19 kudos

12-07-2021 2:05:54 AM

by Hubert-Dudek • Esteemed Contributor III

12-06-2021 9:57:22 AM

840 Views
2 replies
18 kudos

I thought that Azure Data Factory is built on spark but now when I crushed it I see that is build directly on databricks :-)

I thought that Azure Data Factory is built on spark but now when I crushed it I see that is build directly on databricks

Data Engineering

840 Views
2 replies
18 kudos

12-06-2021 9:57:22 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-07-2021 12:21:06 AM

18 kudos

correct. Because Data Flows were available before their own (MS) spark pools were available.But let's be honest: that is only a good thing

18 kudos

12-07-2021 12:21:06 AM

1 More Replies

by SailajaB • Valued Contributor III

12-02-2021 10:57:51 PM

816 Views
3 replies
6 kudos

Can we create event based triggers/schedules in Databricks itself without using ADF

Data Engineering

816 Views
3 replies
6 kudos

12-02-2021 10:57:51 PM

View Replies

Latest Reply

Atanu
Esteemed Contributor

12-06-2021 9:40:28 PM

6 kudos

Please use our Job API https://docs.databricks.com/dev-tools/api/latest/jobs.html

6 kudos

12-06-2021 9:40:28 PM

2 More Replies

by guruv • New Contributor III

11-30-2021 10:03:59 PM

2431 Views
5 replies
2 kudos

Resolved! delta table autooptimize vs optimize command

HI,i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3. there are only write/read operation on delta tables and no update/delete.As part of release pipeline, below commands are executed in a new notebook in...

Data Engineering

2431 Views
5 replies
2 kudos

11-30-2021 10:03:59 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-01-2021 2:13:41 AM

2 kudos

the auto optimize is sufficient, unless you run into performance issues.Then I would trigger an optimize. This will generate files of 1GB (so larger than the standard size of auto optimize). And of course the Z-Order if necessary.The suggestion to ...

2 kudos

12-01-2021 2:13:41 AM

4 More Replies

by MadelynM • New Contributor III

12-06-2021 12:44:06 PM

571 Views
0 replies
1 kudos

vimeo.com

Repos let you use Git functionality such as cloning a remote repo, managing branches, pushing and pulling changes and visually comparing differences upon commit. Here's a quick video (3:56) on setting up a repo for Databricks on AWS. Pre-reqs: Git in...

Data Engineering

571 Views
0 replies
1 kudos

12-06-2021 12:44:06 PM

by MadelynM • New Contributor III

12-06-2021 12:36:46 PM

380 Views
0 replies
0 kudos

vimeo.com

A job is a way of running a notebook either immediately or on a scheduled basis. Here's a quick video (4:04) on how to schedule a job and automate a workflow for Databricks on AWS. To follow along with the video, import this notebook into your worksp...

Data Engineering

380 Views
0 replies
0 kudos

12-06-2021 12:36:46 PM

by MadelynM • New Contributor III

12-06-2021 12:24:26 PM

320 Views
0 replies
1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

Data Engineering

320 Views
0 replies
1 kudos

12-06-2021 12:24:26 PM

by letsaskme • New Contributor

12-04-2021 9:00:44 AM

225 Views
0 replies
0 kudos

letsaskme-com-digital-marketing-free-paid-guest-posting-blog-post-websites-list-

Lets ask me List of 300+ Quality Marketing, Business, SEO, Tech & Wordpress Guest Blogging Sites That Accept Guest Posts.https://letsaskme.com/digital-marketing/free-paid-guest-posting-blog-post-websites-list-2020/#guestpost #blogger

Data Engineering

225 Views
0 replies
0 kudos

12-04-2021 9:00:44 AM

by marchello • New Contributor III

11-24-2021 10:59:57 AM

1078 Views
5 replies
6 kudos

Resolved! register model - need python 3, but get only python 2

Hi all, I'm trying to register a model with python 3 support, but continue getting only python 2. I can see that runtime 6.0 and above get python 3 by default, but I don't see a way to set neither runtime version, nor python version during model regi...

Data Engineering

1078 Views
5 replies
6 kudos

11-24-2021 10:59:57 AM

View Replies

Latest Reply

marchello
New Contributor III

12-03-2021 9:13:17 AM

6 kudos

Hi team, thanks for getting back to me. Let's put this on hold for now. I will update once it's needed again. It was solely for education purpose and right now I have quite urgent stuff to do.Have a great day.

6 kudos

12-03-2021 9:13:17 AM

4 More Replies

by Murugan • New Contributor II

12-02-2021 12:51:55 AM

1854 Views
4 replies
1 kudos

Databricks interoperability between cloud environments

While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,1) Whether Databricks can be cloud agnostic (i.e.,) In ca...

Data Engineering

1854 Views
4 replies
1 kudos

12-02-2021 12:51:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-03-2021 4:49:07 AM

1 kudos

You'll be interested in the Unity Catalog.The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the...

1 kudos

12-03-2021 4:49:07 AM

3 More Replies

by as999 • New Contributor III

12-02-2021 5:54:33 AM

767 Views
3 replies
1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row. For example.Original DF.sno Object Name shape rating1 Fruit apple round ...

Data Engineering

767 Views
3 replies
1 kudos

12-02-2021 5:54:33 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-02-2021 11:43:12 PM

1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

1 kudos

12-02-2021 11:43:12 PM

2 More Replies

by Sam • New Contributor III

12-02-2021 3:53:18 PM

680 Views
1 replies
4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

Data Engineering

680 Views
1 replies
4 kudos

12-02-2021 3:53:18 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-02-2021 11:29:53 PM

4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

4 kudos

12-02-2021 11:29:53 PM

by WayneDeleersnyd • New Contributor III

11-30-2021 8:00:07 AM

3625 Views
11 replies
0 kudos

Resolved! Unable to view exported notebooks in HTML format

My team and I noticed an issue lately where notebooks, when exported to HTML format, are not viewable in a stand-alone state anymore. Older notebooks which were exported have no issues, but newer exports are not viewable. The only way we can view t...

Data Engineering

3625 Views
11 replies
0 kudos

11-30-2021 8:00:07 AM

View Replies

Latest Reply

cconnell
Contributor II

12-02-2021 11:50:14 AM

0 kudos

I can confirm that the Community Edition now does correct readable HTML export.

0 kudos

12-02-2021 11:50:14 AM

10 More Replies

by User16826992666 • Valued Contributor

06-16-2021 10:51:24 AM

1234 Views
3 replies
0 kudos

If our company has an Enterprise Git server deployed on a private network, can we use Repos?

Our team would like to use the Repos functionality but our security prevents outside traffic through public networks. Is there any way we can still use Repos?

Data Engineering

1234 Views
3 replies
0 kudos

06-16-2021 10:51:24 AM

View Replies

Latest Reply

User16781336501
New Contributor III

12-02-2021 10:38:45 AM

0 kudos

Please contact your account team for some options that are in preview right now.

0 kudos

12-02-2021 10:38:45 AM

2 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Lakehouse with Delta Lake Deep Dive Training

docs.databricks.com

I thought that Azure Data Factory is built on spark but now when I crushed it I see that is build directly on databricks :-)

Can we create event based triggers/schedules in Databricks itself without using ADF

Resolved! delta table autooptimize vs optimize command

vimeo.com

vimeo.com

vimeo.com

letsaskme-com-digital-marketing-free-paid-guest-posting-blog-post-websites-list-

Resolved! register model - need python 3, but get only python 2

Databricks interoperability between cloud environments

python dataframe or hiveSql update based on predecessor value?

collect_set/ collect_list Pushdown

Resolved! Unable to view exported notebooks in HTML format

If our company has an Enterprise Git server deployed on a private network, can we use Repos?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...