cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tz1
by New Contributor III
  • 21747 Views
  • 7 replies
  • 6 kudos

Resolved! Problem with Databricks JDBC connection: Error occured while deserializing arrow data

I have a Java program like this to test out the Databricks JDBC connection with the Databricks JDBC driver. Connection connection = null; try { Class.forName(driver); connection = DriverManager.getConnection(url...

  • 21747 Views
  • 7 replies
  • 6 kudos
Latest Reply
Alice__Caterpil
New Contributor III
  • 6 kudos

Hi @Jose Gonzalez​ ,This similar issue in snowflake in JDBC is a good reference, I was able to get this to work in Java OpenJDK 17 by having this JVM option specified:--add-opens=java.base/java.nio=ALL-UNNAMEDAlthough I came across another issue with...

  • 6 kudos
6 More Replies
Constantine
by Contributor III
  • 2882 Views
  • 1 replies
  • 4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass   udf_func = udf(my_udf, StringType()) data...

  • 2882 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

  • 4 kudos
Jeff1
by Contributor II
  • 2513 Views
  • 3 replies
  • 1 kudos

Resolved! Strange object returned using sparklyr

CommunityI'm running a sparklyr "group_by" function and the function returns the following info:# group by event_typeacled_grp_tbl <- acled_tbl %>% group_by("event_type") %>% summary(count = n())                   Length Cl...

  • 2513 Views
  • 3 replies
  • 1 kudos
Latest Reply
Jeff1
Contributor II
  • 1 kudos

I should have deleted the post. While your are correct "event_type" should be without quotes the problem was the Summary function. I was using the wrong function it should have been "summarize."

  • 1 kudos
2 More Replies
trendtoreview
by New Contributor
  • 1091 Views
  • 1 replies
  • 0 kudos

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might b...

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might be any person: your crush, love, friend, relatives, colleague, or any celebrity. Liking is the strong...

  • 1091 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

@[Kaniz Fatma]​ @[Vartika]​ SPAM

  • 0 kudos
Databach
by New Contributor
  • 4579 Views
  • 0 replies
  • 0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

image build.sbt
  • 4579 Views
  • 0 replies
  • 0 kudos
CBull
by New Contributor III
  • 2969 Views
  • 3 replies
  • 2 kudos

Is there a way in Azure to compare data in one field?

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an h...

  • 2969 Views
  • 3 replies
  • 2 kudos
Latest Reply
merca
Valued Contributor II
  • 2 kudos

Since you are trying to do this in SQL, I hope someone else can write you the correct answer. The above example is for pyspark. You can check the SQL synax from Databricks documents

  • 2 kudos
2 More Replies
CBull
by New Contributor III
  • 6458 Views
  • 6 replies
  • 3 kudos

Resolved! Spark Notebook to import data into Excel

Is there a way to create a notebook that will take the SQL that I want to put into the Notebook and populate Excel daily and send it to a particular person?

  • 6458 Views
  • 6 replies
  • 3 kudos
Latest Reply
merca
Valued Contributor II
  • 3 kudos

Do I understand you correctly:You want to run a notebook or sql query that will generate some data in form of table and you need to "send" somehow this data to someone (or somebody needs this data at some point)?If this is correct assumption, you hav...

  • 3 kudos
5 More Replies
hrushi2000
by New Contributor
  • 1093 Views
  • 1 replies
  • 0 kudos

Machine learning is sanctionative computers to tackle tasks that have, until now, completely been administered by folks.From driving cars to translati...

Machine learning is sanctionative computers to tackle tasks that have, until now, completely been administered by folks.From driving cars to translating speech, machine learning is driving accolade explosion among the capabilities of computing – serv...

  • 1093 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

@[Kaniz Fatma]​ @[Vartika]​ SPAM

  • 0 kudos
ayzm
by New Contributor
  • 2070 Views
  • 0 replies
  • 0 kudos

[Databricks Connect] Cannot cross line reference when using lambda expression through db-connect

Hirun below code line at spark-shell through db-connectIt throw exception:java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance o...

image.png image
  • 2070 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 2902 Views
  • 2 replies
  • 22 kudos

Software 2.0 is one of 10 most important trends which will shape next decade. Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy...

Software 2.0 is one of 10 most important trends which will shape next decade.Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy. He wrote that Neural networks are not just another classifier, they represent the beginning of a fu...

Software 2.0
  • 2902 Views
  • 2 replies
  • 22 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 22 kudos

https://www.youtube.com/watch?v=P5CBHuaC2x8

  • 22 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 21269 Views
  • 11 replies
  • 1 kudos

Resolved! How can I view the query history, duration, etc for all users

Hi! I have some jobs that stay idle for some time when getting data from a S3 mount on DBFS, this are all SQL queries on Delta, how can I know where is the bottle neck, duration, cue? to diagnose the slow spark performance that I think is on the proc...

  • 21269 Views
  • 11 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

We found out we were regeneratig the symlink manifest for all the partitions on this case. And for some reason it was executed twice, at start and end of the job.delta_table.generate('symlink_format_manifest')We configured the table with:ALTER TABLE ...

  • 1 kudos
10 More Replies
Dusko
by New Contributor III
  • 4699 Views
  • 6 replies
  • 1 kudos

How to access root mountPoint without "Access Denied"?

Hi, I’m trying to read file from S3 root bucket. I can ls all the files but I can’t read it because of access denied. When I mount the same S3 root bucket under some other mountPoint, I can touch and read all the files. I also see that this new mount...

  • 4699 Views
  • 6 replies
  • 1 kudos
Latest Reply
Dusko
New Contributor III
  • 1 kudos

Hi @Atanu Sarkar​ , @Piper Wilson​ ,​thanks for the replies. Well I don't understand the fact about ownership. I believe that rootbucket is still under my ownership (I created it and I could upload/delete any files through browser without any problem...

  • 1 kudos
5 More Replies
fsm
by New Contributor II
  • 9406 Views
  • 4 replies
  • 2 kudos

Resolved! Implementation of a stable Spark Structured Streaming Application

Hi folks,I have an issue. It's not critical but's annoying.We have implemented a Spark Structured Streaming Application.This application will be triggered wire Azure Data Factory (every 8 minutes). Ok, this setup sounds a little bit weird and it's no...

  • 9406 Views
  • 4 replies
  • 2 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 2 kudos

@Markus Freischlad​  Looks like the spark driver was stuck. It will be good to capture the thread dump of the Spark driver to understand what operation is stuck

  • 2 kudos
3 More Replies
admo
by New Contributor III
  • 10091 Views
  • 4 replies
  • 7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

  • 10091 Views
  • 4 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

  • 7 kudos
3 More Replies
Mendes
by New Contributor
  • 3652 Views
  • 2 replies
  • 0 kudos
  • 3652 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Danilo Mendes​ , Table schema is stored in the default Azure Databricks internal metastore and you can also configure and use external metastores. Ingest data into Azure Databricks. Access data in Apache Spark formats and from external data sources....

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels