cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bency
by New Contributor III
  • 3109 Views
  • 3 replies
  • 2 kudos

Invalid field schema option provided-DatabricksDeltaLakeSinkConnector

I have configured a Delta Lake Sink connector which reads from an AVRO topic and writes to the Delta lake . I have followed the docs and my config looks like below .  { "name": "dev_test_delta_connector", "config": {  "topics": "dl_test_avro",  "inp...

  • 3109 Views
  • 3 replies
  • 2 kudos
Latest Reply
Bency
New Contributor III
  • 2 kudos

@Hubert Dudek​ , Should I be configuring anything with respect to schema in the connector config ? Because I did successfully stage some data from another topic of a different format(JSON_SR) into delta lake table , but its with AVRO topic that I ge...

  • 2 kudos
2 More Replies
User16826992666
by Databricks Employee
  • 4025 Views
  • 2 replies
  • 1 kudos

Resolved! As an admin of a Databricks SQL environment, can I cancel long running queries?

I don't want one long or poorly written query to block my entire SQL endpoint for everyone else. Do I have the ability to kill specific queries?

  • 4025 Views
  • 2 replies
  • 1 kudos
Latest Reply
DevB
New Contributor II
  • 1 kudos

Is there a way to stop the session programmatically? like "kill session_id" or something similar in API?

  • 1 kudos
1 More Replies
Bency
by New Contributor III
  • 9537 Views
  • 6 replies
  • 4 kudos

Resolved! Databricks Delta Lake Sink Connector

I am trying to use Databricks Delta Lake Sink Connector(confluent cloud ) and write to S3 . the connector starts up with the following error . Any help on this could be appreciated org.apache.kafka.connect.errors.ConnectException: java.sql.SQLExcepti...

  • 9537 Views
  • 6 replies
  • 4 kudos
Latest Reply
Bency
New Contributor III
  • 4 kudos

Hi @Kaniz Fatma​  yes we did , looks like it was indeed a whitelisting issue . Thanks @Hubert Dudek​  @Kaniz Fatma​ 

  • 4 kudos
5 More Replies
Mike_Gardner
by Databricks Partner
  • 2952 Views
  • 1 replies
  • 3 kudos

Resolved! Data Cache in Serverless SQL Endpoints vs Non-Serverless SQL Endpoints

Do Serverless SQL Endpoints benefit from Delta and Spark Cache? If so, does it differ from a non-serverless endpoints? How long does the cache last?

  • 2952 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

All SQL endpoints have delta cache enabled out of the box (in fact 2X-Small etc. are E8/16 etc. instances which are delta cache enabled). Delta cache is managed dynamically. So it stays till there is free RAM for that.

  • 3 kudos
tz1
by New Contributor III
  • 24806 Views
  • 7 replies
  • 6 kudos

Resolved! Problem with Databricks JDBC connection: Error occured while deserializing arrow data

I have a Java program like this to test out the Databricks JDBC connection with the Databricks JDBC driver. Connection connection = null; try { Class.forName(driver); connection = DriverManager.getConnection(url...

  • 24806 Views
  • 7 replies
  • 6 kudos
Latest Reply
Alice__Caterpil
New Contributor III
  • 6 kudos

Hi @Jose Gonzalez​ ,This similar issue in snowflake in JDBC is a good reference, I was able to get this to work in Java OpenJDK 17 by having this JVM option specified:--add-opens=java.base/java.nio=ALL-UNNAMEDAlthough I came across another issue with...

  • 6 kudos
6 More Replies
Constantine
by Contributor III
  • 3445 Views
  • 1 replies
  • 4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass   udf_func = udf(my_udf, StringType()) data...

  • 3445 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

  • 4 kudos
Jeff1
by Contributor II
  • 3175 Views
  • 3 replies
  • 1 kudos

Resolved! Strange object returned using sparklyr

CommunityI'm running a sparklyr "group_by" function and the function returns the following info:# group by event_typeacled_grp_tbl <- acled_tbl %>% group_by("event_type") %>% summary(count = n())                   Length Cl...

  • 3175 Views
  • 3 replies
  • 1 kudos
Latest Reply
Jeff1
Contributor II
  • 1 kudos

I should have deleted the post. While your are correct "event_type" should be without quotes the problem was the Summary function. I was using the wrong function it should have been "summarize."

  • 1 kudos
2 More Replies
trendtoreview
by New Contributor
  • 1431 Views
  • 1 replies
  • 0 kudos

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might b...

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might be any person: your crush, love, friend, relatives, colleague, or any celebrity. Liking is the strong...

  • 1431 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

@[Kaniz Fatma]​ @[Vartika]​ SPAM

  • 0 kudos
Databach
by New Contributor
  • 5482 Views
  • 0 replies
  • 0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

image build.sbt
  • 5482 Views
  • 0 replies
  • 0 kudos
CBull
by New Contributor III
  • 3628 Views
  • 3 replies
  • 2 kudos

Is there a way in Azure to compare data in one field?

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an h...

  • 3628 Views
  • 3 replies
  • 2 kudos
Latest Reply
merca
Valued Contributor II
  • 2 kudos

Since you are trying to do this in SQL, I hope someone else can write you the correct answer. The above example is for pyspark. You can check the SQL synax from Databricks documents

  • 2 kudos
2 More Replies
CBull
by New Contributor III
  • 8482 Views
  • 6 replies
  • 3 kudos

Resolved! Spark Notebook to import data into Excel

Is there a way to create a notebook that will take the SQL that I want to put into the Notebook and populate Excel daily and send it to a particular person?

  • 8482 Views
  • 6 replies
  • 3 kudos
Latest Reply
merca
Valued Contributor II
  • 3 kudos

Do I understand you correctly:You want to run a notebook or sql query that will generate some data in form of table and you need to "send" somehow this data to someone (or somebody needs this data at some point)?If this is correct assumption, you hav...

  • 3 kudos
5 More Replies
ayzm
by New Contributor
  • 2895 Views
  • 0 replies
  • 0 kudos

[Databricks Connect] Cannot cross line reference when using lambda expression through db-connect

Hirun below code line at spark-shell through db-connectIt throw exception:java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance o...

image.png image
  • 2895 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Databricks MVP
  • 4106 Views
  • 2 replies
  • 22 kudos

Software 2.0 is one of 10 most important trends which will shape next decade. Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy...

Software 2.0 is one of 10 most important trends which will shape next decade.Idea of Software 2.0 was first time presented in 2017 by Andrej Karpathy. He wrote that Neural networks are not just another classifier, they represent the beginning of a fu...

Software 2.0
  • 4106 Views
  • 2 replies
  • 22 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 22 kudos

https://www.youtube.com/watch?v=P5CBHuaC2x8

  • 22 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 27072 Views
  • 11 replies
  • 1 kudos

Resolved! How can I view the query history, duration, etc for all users

Hi! I have some jobs that stay idle for some time when getting data from a S3 mount on DBFS, this are all SQL queries on Delta, how can I know where is the bottle neck, duration, cue? to diagnose the slow spark performance that I think is on the proc...

  • 27072 Views
  • 11 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

We found out we were regeneratig the symlink manifest for all the partitions on this case. And for some reason it was executed twice, at start and end of the job.delta_table.generate('symlink_format_manifest')We configured the table with:ALTER TABLE ...

  • 1 kudos
10 More Replies
Dusko
by Databricks Partner
  • 6078 Views
  • 6 replies
  • 1 kudos

How to access root mountPoint without "Access Denied"?

Hi, I’m trying to read file from S3 root bucket. I can ls all the files but I can’t read it because of access denied. When I mount the same S3 root bucket under some other mountPoint, I can touch and read all the files. I also see that this new mount...

  • 6078 Views
  • 6 replies
  • 1 kudos
Latest Reply
Dusko
Databricks Partner
  • 1 kudos

Hi @Atanu Sarkar​ , @Piper Wilson​ ,​thanks for the replies. Well I don't understand the fact about ownership. I believe that rootbucket is still under my ownership (I created it and I could upload/delete any files through browser without any problem...

  • 1 kudos
5 More Replies
Labels