Data Engineering

Forum Posts

Sorted by:

by gbrueckl • Contributor II

03-03-2022 7:34:22 AM

6734 Views
16 replies
3 kudos

Resolved! Setup Git Integration via REST API

We are currently setting up CI/CD for our Databricks workspace using Databricks Repos following the approach described in the offical docs: https://docs.databricks.com/repos.html#best-practices-for-integrating-databricks-repos-with-cicd-workflowsObvi...

Data Engineering

6734 Views
16 replies
3 kudos

03-03-2022 7:34:22 AM

View Replies

Latest Reply

New1
New Contributor II

03-11-2022 4:33:20 AM

3 kudos

Hi, how can i trigger a job externally using Github actions?

3 kudos

03-11-2022 4:33:20 AM

15 More Replies

by ayzm • New Contributor

03-23-2022 12:32:51 AM

899 Views
0 replies
0 kudos

[Databricks Connect] Cannot cross line reference when using lambda expression through db-connect

Hirun below code line at spark-shell through db-connectIt throw exception:java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance o...

Data Engineering

899 Views
0 replies
0 kudos

03-23-2022 12:32:51 AM

by Hubert-Dudek • Esteemed Contributor III

03-17-2022 6:18:13 AM

952 Views
3 replies
22 kudos

Software 2.0 is one of 10 most important trends which will shape next decade. Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy...

Software 2.0 is one of 10 most important trends which will shape next decade.Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy. He wrote that Neural networks are not just another classifier, they represent the beginning of a fu...

Data Engineering

952 Views
3 replies
22 kudos

03-17-2022 6:18:13 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-17-2022 6:22:12 AM

22 kudos

https://www.youtube.com/watch?v=P5CBHuaC2x8

22 kudos

03-17-2022 6:22:12 AM

2 More Replies

by alejandrofm • Valued Contributor

02-12-2022 1:39:51 PM

5063 Views
11 replies
1 kudos

Resolved! How can I view the query history, duration, etc for all users

Hi! I have some jobs that stay idle for some time when getting data from a S3 mount on DBFS, this are all SQL queries on Delta, how can I know where is the bottle neck, duration, cue? to diagnose the slow spark performance that I think is on the proc...

Data Engineering

5063 Views
11 replies
1 kudos

02-12-2022 1:39:51 PM

View Replies

Latest Reply

alejandrofm
Valued Contributor

03-16-2022 7:16:45 AM

1 kudos

We found out we were regeneratig the symlink manifest for all the partitions on this case. And for some reason it was executed twice, at start and end of the job.delta_table.generate('symlink_format_manifest')We configured the table with:ALTER TABLE ...

1 kudos

03-16-2022 7:16:45 AM

10 More Replies

by fsm • New Contributor II

02-06-2022 1:26:11 PM

4566 Views
5 replies
2 kudos

Resolved! Implementation of a stable Spark Structured Streaming Application

Hi folks,I have an issue. It's not critical but's annoying.We have implemented a Spark Structured Streaming Application.This application will be triggered wire Azure Data Factory (every 8 minutes). Ok, this setup sounds a little bit weird and it's no...

Data Engineering

4566 Views
5 replies
2 kudos

02-06-2022 1:26:11 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

03-14-2022 6:24:49 PM

2 kudos

@Markus Freischlad Looks like the spark driver was stuck. It will be good to capture the thread dump of the Spark driver to understand what operation is stuck

2 kudos

03-14-2022 6:24:49 PM

4 More Replies

by admo • New Contributor III

03-17-2022 2:11:05 AM

1646 Views
5 replies
7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

Data Engineering

1646 Views
5 replies
7 kudos

03-17-2022 2:11:05 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-17-2022 3:42:49 AM

7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

7 kudos

03-17-2022 3:42:49 AM

4 More Replies

by emanuele_maffeo • New Contributor III

03-17-2022 7:55:24 AM

1699 Views
6 replies
8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

Data Engineering

1699 Views
6 replies
8 kudos

03-17-2022 7:55:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-17-2022 11:53:21 AM

8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

8 kudos

03-17-2022 11:53:21 AM

5 More Replies

by Soma • Valued Contributor

03-18-2022 3:39:16 AM

1917 Views
4 replies
6 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

Data Engineering

1917 Views
4 replies
6 kudos

03-18-2022 3:39:16 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-22-2022 5:31:53 AM

6 kudos

Hi @somanath Sankaran , Did you try to change kernel to ipython as suggested by @Hubert Dudek ?

6 kudos

03-22-2022 5:31:53 AM

3 More Replies

by Mendes • New Contributor

03-18-2022 12:45:51 PM

1844 Views
3 replies
0 kudos

Resolved! How to connect a databricks table to a tabular cube

Data Engineering

1844 Views
3 replies
0 kudos

03-18-2022 12:45:51 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-22-2022 5:10:04 AM

0 kudos

Hi @Danilo Mendes , Can you please elaborate on the question? Do you want to connect through SSAS?

0 kudos

03-22-2022 5:10:04 AM

2 More Replies

by Tahseen0354 • Contributor III

03-21-2022 11:54:43 AM

1263 Views
4 replies
2 kudos

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpo...

Data Engineering

1263 Views
4 replies
2 kudos

03-21-2022 11:54:43 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-21-2022 12:11:13 PM

2 kudos

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu. In standard cluster 1 person could utilize all worker cpus as you...

2 kudos

03-21-2022 12:11:13 PM

3 More Replies

by Raie • New Contributor III

03-18-2022 10:24:34 AM

3576 Views
3 replies
4 kudos

Resolved! How do I specify column's data type with spark dataframes?

What I am doing:spark_df = spark.createDataFrame(dfnew)spark_df.write.saveAsTable("default.test_table", index=False, header=True)This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detect...

Data Engineering

3576 Views
3 replies
4 kudos

03-18-2022 10:24:34 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-20-2022 7:47:29 AM

4 kudos

just create table earlier and set column types (CREATE TABLE ... LOCATION ( path path)in dataframe you need to have corresponding data types which you can make using cast syntax, just your syntax is incorrect, here is example of correct syntax:from p...

4 kudos

03-20-2022 7:47:29 AM

2 More Replies

by Shehan92 • New Contributor II

03-13-2022 8:29:13 PM

2091 Views
3 replies
4 kudos

Resolved! Error in accessing Delta Tables

I'm getting attached error in accessing delta lake tables in the data bricks workspaceSummary of error: Could not connect to md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com:3306 : Connection resetAttached detailed error

Data Engineering

2091 Views
3 replies
4 kudos

03-13-2022 8:29:13 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-21-2022 8:58:12 AM

4 kudos

Hi @Shehan Madusanka , Are you still seeing the error or were you able to resolve it?

4 kudos

03-21-2022 8:58:12 AM

2 More Replies

by rachelk05 • New Contributor II

03-13-2022 12:22:12 PM

1173 Views
2 replies
4 kudos

Resolved! Databricks Community: Cluster Terminated Reason: Unexpected Launch Failure

Hi,I've been encountering the following error when I try to start a cluster, but the status page says everything is fine. Is something happening or are there other steps I can try?Time2022-03-13 14:40:51 EDTMessageCluster terminated.Reason:Unexpected...

Data Engineering

1173 Views
2 replies
4 kudos

03-13-2022 12:22:12 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-21-2022 8:55:23 AM

4 kudos

Hi @Rachel Kelley , Were you able to not see the error anymore?

4 kudos

03-21-2022 8:55:23 AM

1 More Replies

by tomsyouruncle • New Contributor III

03-13-2022 4:20:01 PM

9332 Views
20 replies
4 kudos

Resolved! How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

Data Engineering

9332 Views
20 replies
4 kudos

03-13-2022 4:20:01 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-21-2022 4:55:16 AM

4 kudos

Hi @Tom Turner , An admin can enable this feature as follows:Go to the Admin Console.Click the Workspace Settings tab.In the Repos section, click the Files in Repos toggle.After the feature has been enabled, you must restart your cluster and refresh...

4 kudos

03-21-2022 4:55:16 AM

19 More Replies

by Maverick1 • Valued Contributor II

03-04-2022 5:42:36 AM

2002 Views
10 replies
7 kudos

Resolved! How to infer the online feature store table via an mlflow registered model, which is deployed to a sagemaker endpoint?

Can an mlflow registered model automatically infer the online feature store table, if that model is trained and logged via a databricks feature store table and the table is pushed to an online feature store (like AWS RDS)?

Data Engineering

2002 Views
10 replies
7 kudos

03-04-2022 5:42:36 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-21-2022 8:42:21 AM

7 kudos

Hi @Saurabh Verma , Feature Store <> SageMaker integration is not fully rolled out yet. We are looking to roll that out in Private Preview mode soon. It will need DynamoDB online store type which will be available soon.

7 kudos

03-21-2022 8:42:21 AM

9 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Setup Git Integration via REST API

[Databricks Connect] Cannot cross line reference when using lambda expression through db-connect

Software 2.0 is one of 10 most important trends which will shape next decade. Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy...

Resolved! How can I view the query history, duration, etc for all users

Resolved! Implementation of a stable Spark Structured Streaming Application

Scaling issue for inference with a spark.mllib model

Resolved! Trigger.AvailableNow on scala - compile issue

Resolved! Enable custom Ipython Extension

Resolved! How to connect a databricks table to a tabular cube

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Resolved! How do I specify column's data type with spark dataframes?

Resolved! Error in accessing Delta Tables

Resolved! Databricks Community: Cluster Terminated Reason: Unexpected Launch Failure

Resolved! How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

Resolved! How to infer the online feature store table via an mlflow registered model, which is deployed to a sagemaker endpoint?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...