cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bency
by New Contributor III
  • 6726 Views
  • 6 replies
  • 4 kudos

Resolved! Databricks Delta Lake Sink Connector

I am trying to use Databricks Delta Lake Sink Connector(confluent cloud ) and write to S3 . the connector starts up with the following error . Any help on this could be appreciated org.apache.kafka.connect.errors.ConnectException: java.sql.SQLExcepti...

  • 6726 Views
  • 6 replies
  • 4 kudos
Latest Reply
Bency
New Contributor III
  • 4 kudos

Hi @Kaniz Fatma​  yes we did , looks like it was indeed a whitelisting issue . Thanks @Hubert Dudek​  @Kaniz Fatma​ 

  • 4 kudos
5 More Replies
Mike_Gardner
by New Contributor II
  • 2061 Views
  • 1 replies
  • 3 kudos

Resolved! Data Cache in Serverless SQL Endpoints vs Non-Serverless SQL Endpoints

Do Serverless SQL Endpoints benefit from Delta and Spark Cache? If so, does it differ from a non-serverless endpoints? How long does the cache last?

  • 2061 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

All SQL endpoints have delta cache enabled out of the box (in fact 2X-Small etc. are E8/16 etc. instances which are delta cache enabled). Delta cache is managed dynamically. So it stays till there is free RAM for that.

  • 3 kudos
tz1
by New Contributor III
  • 19561 Views
  • 7 replies
  • 6 kudos

Resolved! Problem with Databricks JDBC connection: Error occured while deserializing arrow data

I have a Java program like this to test out the Databricks JDBC connection with the Databricks JDBC driver. Connection connection = null; try { Class.forName(driver); connection = DriverManager.getConnection(url...

  • 19561 Views
  • 7 replies
  • 6 kudos
Latest Reply
Alice__Caterpil
New Contributor III
  • 6 kudos

Hi @Jose Gonzalez​ ,This similar issue in snowflake in JDBC is a good reference, I was able to get this to work in Java OpenJDK 17 by having this JVM option specified:--add-opens=java.base/java.nio=ALL-UNNAMEDAlthough I came across another issue with...

  • 6 kudos
6 More Replies
Constantine
by Contributor III
  • 2542 Views
  • 1 replies
  • 4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass   udf_func = udf(my_udf, StringType()) data...

  • 2542 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

  • 4 kudos
Jeff1
by Contributor II
  • 2076 Views
  • 3 replies
  • 1 kudos

Resolved! Strange object returned using sparklyr

CommunityI'm running a sparklyr "group_by" function and the function returns the following info:# group by event_typeacled_grp_tbl <- acled_tbl %>% group_by("event_type") %>% summary(count = n())                   Length Cl...

  • 2076 Views
  • 3 replies
  • 1 kudos
Latest Reply
Jeff1
Contributor II
  • 1 kudos

I should have deleted the post. While your are correct "event_type" should be without quotes the problem was the Summary function. I was using the wrong function it should have been "summarize."

  • 1 kudos
2 More Replies
trendtoreview
by New Contributor
  • 923 Views
  • 1 replies
  • 0 kudos

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might b...

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might be any person: your crush, love, friend, relatives, colleague, or any celebrity. Liking is the strong...

  • 923 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

@[Kaniz Fatma]​ @[Vartika]​ SPAM

  • 0 kudos
Databach
by New Contributor
  • 4001 Views
  • 0 replies
  • 0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

image build.sbt
  • 4001 Views
  • 0 replies
  • 0 kudos
CBull
by New Contributor III
  • 2538 Views
  • 3 replies
  • 2 kudos

Is there a way in Azure to compare data in one field?

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an h...

  • 2538 Views
  • 3 replies
  • 2 kudos
Latest Reply
merca
Valued Contributor II
  • 2 kudos

Since you are trying to do this in SQL, I hope someone else can write you the correct answer. The above example is for pyspark. You can check the SQL synax from Databricks documents

  • 2 kudos
2 More Replies
CBull
by New Contributor III
  • 5252 Views
  • 6 replies
  • 3 kudos

Resolved! Spark Notebook to import data into Excel

Is there a way to create a notebook that will take the SQL that I want to put into the Notebook and populate Excel daily and send it to a particular person?

  • 5252 Views
  • 6 replies
  • 3 kudos
Latest Reply
merca
Valued Contributor II
  • 3 kudos

Do I understand you correctly:You want to run a notebook or sql query that will generate some data in form of table and you need to "send" somehow this data to someone (or somebody needs this data at some point)?If this is correct assumption, you hav...

  • 3 kudos
5 More Replies
hrushi2000
by New Contributor
  • 851 Views
  • 1 replies
  • 0 kudos

Machine learning is sanctionative computers to tackle tasks that have, until now, completely been administered by folks.From driving cars to translati...

Machine learning is sanctionative computers to tackle tasks that have, until now, completely been administered by folks.From driving cars to translating speech, machine learning is driving accolade explosion among the capabilities of computing – serv...

  • 851 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

@[Kaniz Fatma]​ @[Vartika]​ SPAM

  • 0 kudos
ayzm
by New Contributor
  • 1646 Views
  • 0 replies
  • 0 kudos

[Databricks Connect] Cannot cross line reference when using lambda expression through db-connect

Hirun below code line at spark-shell through db-connectIt throw exception:java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance o...

image.png image
  • 1646 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1947 Views
  • 2 replies
  • 22 kudos

Software 2.0 is one of 10 most important trends which will shape next decade. Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy...

Software 2.0 is one of 10 most important trends which will shape next decade.Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy. He wrote that Neural networks are not just another classifier, they represent the beginning of a fu...

Software 2.0
  • 1947 Views
  • 2 replies
  • 22 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 22 kudos

https://www.youtube.com/watch?v=P5CBHuaC2x8

  • 22 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 18171 Views
  • 11 replies
  • 1 kudos

Resolved! How can I view the query history, duration, etc for all users

Hi! I have some jobs that stay idle for some time when getting data from a S3 mount on DBFS, this are all SQL queries on Delta, how can I know where is the bottle neck, duration, cue? to diagnose the slow spark performance that I think is on the proc...

  • 18171 Views
  • 11 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

We found out we were regeneratig the symlink manifest for all the partitions on this case. And for some reason it was executed twice, at start and end of the job.delta_table.generate('symlink_format_manifest')We configured the table with:ALTER TABLE ...

  • 1 kudos
10 More Replies
Dusko
by New Contributor III
  • 3948 Views
  • 6 replies
  • 1 kudos

How to access root mountPoint without "Access Denied"?

Hi, I’m trying to read file from S3 root bucket. I can ls all the files but I can’t read it because of access denied. When I mount the same S3 root bucket under some other mountPoint, I can touch and read all the files. I also see that this new mount...

  • 3948 Views
  • 6 replies
  • 1 kudos
Latest Reply
Dusko
New Contributor III
  • 1 kudos

Hi @Atanu Sarkar​ , @Piper Wilson​ ,​thanks for the replies. Well I don't understand the fact about ownership. I believe that rootbucket is still under my ownership (I created it and I could upload/delete any files through browser without any problem...

  • 1 kudos
5 More Replies
fsm
by New Contributor II
  • 7522 Views
  • 4 replies
  • 2 kudos

Resolved! Implementation of a stable Spark Structured Streaming Application

Hi folks,I have an issue. It's not critical but's annoying.We have implemented a Spark Structured Streaming Application.This application will be triggered wire Azure Data Factory (every 8 minutes). Ok, this setup sounds a little bit weird and it's no...

  • 7522 Views
  • 4 replies
  • 2 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 2 kudos

@Markus Freischlad​  Looks like the spark driver was stuck. It will be good to capture the thread dump of the Spark driver to understand what operation is stuck

  • 2 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels