cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gbrueckl
by Contributor II
  • 6734 Views
  • 16 replies
  • 3 kudos

Resolved! Setup Git Integration via REST API

We are currently setting up CI/CD for our Databricks workspace using Databricks Repos following the approach described in the offical docs: https://docs.databricks.com/repos.html#best-practices-for-integrating-databricks-repos-with-cicd-workflowsObvi...

  • 6734 Views
  • 16 replies
  • 3 kudos
Latest Reply
New1
New Contributor II
  • 3 kudos

Hi, how can i trigger a job externally using Github actions?

  • 3 kudos
15 More Replies
ayzm
by New Contributor
  • 899 Views
  • 0 replies
  • 0 kudos

[Databricks Connect] Cannot cross line reference when using lambda expression through db-connect

Hirun below code line at spark-shell through db-connectIt throw exception:java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance o...

image.png image
  • 899 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 952 Views
  • 3 replies
  • 22 kudos

Software 2.0 is one of 10 most important trends which will shape next decade. Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy...

Software 2.0 is one of 10 most important trends which will shape next decade.Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy. He wrote that Neural networks are not just another classifier, they represent the beginning of a fu...

Software 2.0
  • 952 Views
  • 3 replies
  • 22 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 22 kudos

https://www.youtube.com/watch?v=P5CBHuaC2x8

  • 22 kudos
2 More Replies
alejandrofm
by Valued Contributor
  • 5063 Views
  • 11 replies
  • 1 kudos

Resolved! How can I view the query history, duration, etc for all users

Hi! I have some jobs that stay idle for some time when getting data from a S3 mount on DBFS, this are all SQL queries on Delta, how can I know where is the bottle neck, duration, cue? to diagnose the slow spark performance that I think is on the proc...

  • 5063 Views
  • 11 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

We found out we were regeneratig the symlink manifest for all the partitions on this case. And for some reason it was executed twice, at start and end of the job.delta_table.generate('symlink_format_manifest')We configured the table with:ALTER TABLE ...

  • 1 kudos
10 More Replies
fsm
by New Contributor II
  • 4566 Views
  • 5 replies
  • 2 kudos

Resolved! Implementation of a stable Spark Structured Streaming Application

Hi folks,I have an issue. It's not critical but's annoying.We have implemented a Spark Structured Streaming Application.This application will be triggered wire Azure Data Factory (every 8 minutes). Ok, this setup sounds a little bit weird and it's no...

  • 4566 Views
  • 5 replies
  • 2 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 2 kudos

@Markus Freischlad​  Looks like the spark driver was stuck. It will be good to capture the thread dump of the Spark driver to understand what operation is stuck

  • 2 kudos
4 More Replies
admo
by New Contributor III
  • 1646 Views
  • 5 replies
  • 7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

  • 1646 Views
  • 5 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

  • 7 kudos
4 More Replies
emanuele_maffeo
by New Contributor III
  • 1699 Views
  • 6 replies
  • 8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

  • 1699 Views
  • 6 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

  • 8 kudos
5 More Replies
Soma
by Valued Contributor
  • 1917 Views
  • 4 replies
  • 6 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

  • 1917 Views
  • 4 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @somanath Sankaran​ , Did you try to change kernel to ipython as suggested by @Hubert Dudek​ ?

  • 6 kudos
3 More Replies
Tahseen0354
by Contributor III
  • 1263 Views
  • 4 replies
  • 2 kudos

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpo...

  • 1263 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu. In standard cluster 1 person could utilize all worker cpus as you...

  • 2 kudos
3 More Replies
Raie
by New Contributor III
  • 3576 Views
  • 3 replies
  • 4 kudos

Resolved! How do I specify column's data type with spark dataframes?

What I am doing:spark_df = spark.createDataFrame(dfnew)spark_df.write.saveAsTable("default.test_table", index=False, header=True)This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detect...

  • 3576 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

just create table earlier and set column types (CREATE TABLE ... LOCATION ( path path)in dataframe you need to have corresponding data types which you can make using cast syntax, just your syntax is incorrect, here is example of correct syntax:from p...

  • 4 kudos
2 More Replies
Shehan92
by New Contributor II
  • 2091 Views
  • 3 replies
  • 4 kudos

Resolved! Error in accessing Delta Tables

I'm getting attached error in accessing delta lake tables in the data bricks workspaceSummary of error: Could not connect to md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com:3306 : Connection resetAttached detailed error

  • 2091 Views
  • 3 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Shehan Madusanka​ , Are you still seeing the error or were you able to resolve it?

  • 4 kudos
2 More Replies
rachelk05
by New Contributor II
  • 1173 Views
  • 2 replies
  • 4 kudos

Resolved! Databricks Community: Cluster Terminated Reason: Unexpected Launch Failure

Hi,I've been encountering the following error when I try to start a cluster, but the status page says everything is fine. Is something happening or are there other steps I can try?Time2022-03-13 14:40:51 EDTMessageCluster terminated.Reason:Unexpected...

  • 1173 Views
  • 2 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Rachel Kelley​ , Were you able to not see the error anymore?

  • 4 kudos
1 More Replies
tomsyouruncle
by New Contributor III
  • 9332 Views
  • 20 replies
  • 4 kudos

Resolved! How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

image repos
  • 9332 Views
  • 20 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Tom Turner​ , An admin can enable this feature as follows:Go to the Admin Console.Click the Workspace Settings tab.In the Repos section, click the Files in Repos toggle.After the feature has been enabled, you must restart your cluster and refresh...

  • 4 kudos
19 More Replies
Maverick1
by Valued Contributor II
  • 2002 Views
  • 10 replies
  • 7 kudos

Resolved! How to infer the online feature store table via an mlflow registered model, which is deployed to a sagemaker endpoint?

Can an mlflow registered model automatically infer the online feature store table, if that model is trained and logged via a databricks feature store table and the table is pushed to an online feature store (like AWS RDS)?

  • 2002 Views
  • 10 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @Saurabh Verma​ , Feature Store <> SageMaker integration is not fully rolled out yet. We are looking to roll that out in Private Preview mode soon. It will need DynamoDB online store type which will be available soon.

  • 7 kudos
9 More Replies
Labels
Top Kudoed Authors