Data Engineering

Forum Posts

Sorted by:

by BarakHav • New Contributor II

03-27-2022 5:19:03 AM

1884 Views
0 replies
3 kudos

Automatically Vacuuming a Delta Table on Databricks

Hi all,I've recently checked my bucket size on AWS and saw that it's size doesn't make much sense. I decided to vacuum each delta table with 2 weeks of retention. That shrunk the data from 30TB to around 5TB, though I was wondering, shouldn't default...

Data Engineering

1884 Views
0 replies
3 kudos

03-27-2022 5:19:03 AM

by Gvsmao • New Contributor III

02-14-2022 6:36:51 AM

12327 Views
7 replies
3 kudos

Resolved! SQL Databricks - Spot VMs (Cost Optimized)

Hello! I want to ask a question please!Referring to Spot VMs with the "Cost Optimized" setting:In the case of Endpoint X-Small, which are 2 workers, if I send 10 simultaneous queries and a worker is evicted, can I have an error in any of these querie...

Data Engineering

12327 Views
7 replies
3 kudos

02-14-2022 6:36:51 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-26-2022 3:26:46 AM

3 kudos

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post. www.mygroundbiz.com

3 kudos

03-26-2022 3:26:46 AM

6 More Replies

by Krish123 • New Contributor

03-25-2022 8:19:19 PM

2014 Views
0 replies
0 kudos

mount a Azure DL in Databricks

Hello Team,I am quite new to Databricks and I am learning PySpark and Databricks. I am trying to mount a DL Gen2 in Databricks, as part of that I had created app registration, added DL into app registration permissions, created a secret and also adde...

Data Engineering

2014 Views
0 replies
0 kudos

03-25-2022 8:19:19 PM

by Hubert-Dudek • Databricks MVP

03-25-2022 12:11:02 PM

2021 Views
0 replies
24 kudos

Delta time travel - recover unconditionally delete Recovery is a great feature of the delta. Let's check with a real example of how recovery optio...

Delta time travel - recover unconditionally deleteRecovery is a great feature of the delta. Let's check with a real example of how recovery option work.Please watch my new youtube video about that topic.https://www.youtube.com/watch?v=TrUT6pvFKic

Data Engineering

2021 Views
0 replies
24 kudos

03-25-2022 12:11:02 PM

by shan_chandra • Databricks Employee

03-25-2022 8:46:04 AM

5541 Views
1 replies
2 kudos

Resolved! java.lang.ArithmeticException: Casting XXXXXXXXXXX to int causes overflow

My job started failing with the below error when inserting rows into a delta table. ailing with the below error when inserting rows (timestamp) to a delta table, it was working well before.java.lang.ArithmeticException: Casting XXXXXXXXXXX to int cau...

Data Engineering

5541 Views
1 replies
2 kudos

03-25-2022 8:46:04 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

03-25-2022 8:49:16 AM

2 kudos

This is because the Integer type represents 4-byte signed integer numbers. The range of numbers is from -2147483648 to 2147483647.Kindly use double as the data type to insert the "2147483648" value in the delta table.In the below example, The second ...

2 kudos

03-25-2022 8:49:16 AM

by Bency • New Contributor III

03-24-2022 12:01:42 PM

3045 Views
3 replies
2 kudos

Invalid field schema option provided-DatabricksDeltaLakeSinkConnector

I have configured a Delta Lake Sink connector which reads from an AVRO topic and writes to the Delta lake . I have followed the docs and my config looks like below . { "name": "dev_test_delta_connector", "config": { "topics": "dl_test_avro", "inp...

Data Engineering

3045 Views
3 replies
2 kudos

03-24-2022 12:01:42 PM

View Replies

Latest Reply

Bency
New Contributor III

03-24-2022 12:23:17 PM

2 kudos

@Hubert Dudek , Should I be configuring anything with respect to schema in the connector config ? Because I did successfully stage some data from another topic of a different format(JSON_SR) into delta lake table , but its with AVRO topic that I ge...

2 kudos

03-24-2022 12:23:17 PM

2 More Replies

by User16826992666 • Databricks Employee

06-16-2021 11:00:20 AM

3985 Views
2 replies
1 kudos

Resolved! As an admin of a Databricks SQL environment, can I cancel long running queries?

I don't want one long or poorly written query to block my entire SQL endpoint for everyone else. Do I have the ability to kill specific queries?

Data Engineering

3985 Views
2 replies
1 kudos

06-16-2021 11:00:20 AM

View Replies

Latest Reply

DevB
New Contributor II

03-24-2022 12:16:40 PM

1 kudos

Is there a way to stop the session programmatically? like "kill session_id" or something similar in API?

1 kudos

03-24-2022 12:16:40 PM

1 More Replies

by Bency • New Contributor III

03-18-2022 4:51:39 AM

9390 Views
6 replies
4 kudos

Resolved! Databricks Delta Lake Sink Connector

I am trying to use Databricks Delta Lake Sink Connector(confluent cloud ) and write to S3 . the connector starts up with the following error . Any help on this could be appreciated org.apache.kafka.connect.errors.ConnectException: java.sql.SQLExcepti...

Data Engineering

9390 Views
6 replies
4 kudos

03-18-2022 4:51:39 AM

View Replies

Latest Reply

Bency
New Contributor III

03-24-2022 11:53:07 AM

4 kudos

Hi @Kaniz Fatma yes we did , looks like it was indeed a whitelisting issue . Thanks @Hubert Dudek @Kaniz Fatma

4 kudos

03-24-2022 11:53:07 AM

5 More Replies

by Mike_Gardner • Databricks Partner

03-24-2022 9:06:45 AM

2895 Views
1 replies
3 kudos

Resolved! Data Cache in Serverless SQL Endpoints vs Non-Serverless SQL Endpoints

Do Serverless SQL Endpoints benefit from Delta and Spark Cache? If so, does it differ from a non-serverless endpoints? How long does the cache last?

Data Engineering

2895 Views
1 replies
3 kudos

03-24-2022 9:06:45 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-24-2022 10:20:29 AM

3 kudos

All SQL endpoints have delta cache enabled out of the box (in fact 2X-Small etc. are E8/16 etc. instances which are delta cache enabled). Delta cache is managed dynamically. So it stays till there is free RAM for that.

3 kudos

03-24-2022 10:20:29 AM

by tz1 • New Contributor III

03-02-2022 8:14:02 AM

24542 Views
7 replies
6 kudos

Resolved! Problem with Databricks JDBC connection: Error occured while deserializing arrow data

I have a Java program like this to test out the Databricks JDBC connection with the Databricks JDBC driver. Connection connection = null; try { Class.forName(driver); connection = DriverManager.getConnection(url...

Data Engineering

24542 Views
7 replies
6 kudos

03-02-2022 8:14:02 AM

View Replies

Latest Reply

Alice__Caterpil
New Contributor III

03-23-2022 5:19:51 PM

6 kudos

Hi @Jose Gonzalez ,This similar issue in snowflake in JDBC is a good reference, I was able to get this to work in Java OpenJDK 17 by having this JVM option specified:--add-opens=java.base/java.nio=ALL-UNNAMEDAlthough I came across another issue with...

6 kudos

03-23-2022 5:19:51 PM

6 More Replies

by Constantine • Contributor III

03-24-2022 8:39:56 AM

3399 Views
1 replies
4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass udf_func = udf(my_udf, StringType()) data...

Data Engineering

3399 Views
1 replies
4 kudos

03-24-2022 8:39:56 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-24-2022 8:52:10 AM

4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

4 kudos

03-24-2022 8:52:10 AM

by Jeff1 • Contributor II

03-23-2022 8:39:55 AM

3109 Views
3 replies
1 kudos

Resolved! Strange object returned using sparklyr

CommunityI'm running a sparklyr "group_by" function and the function returns the following info:# group by event_typeacled_grp_tbl <- acled_tbl %>% group_by("event_type") %>% summary(count = n()) Length Cl...

Data Engineering

3109 Views
3 replies
1 kudos

03-23-2022 8:39:55 AM

View Replies

Latest Reply

Jeff1
Contributor II

03-24-2022 5:21:44 AM

1 kudos

I should have deleted the post. While your are correct "event_type" should be without quotes the problem was the Summary function. I was using the wrong function it should have been "summarize."

1 kudos

03-24-2022 5:21:44 AM

2 More Replies

by trendtoreview • New Contributor

03-24-2022 1:14:34 AM

1403 Views
1 replies
0 kudos

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might b...

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might be any person: your crush, love, friend, relatives, colleague, or any celebrity. Liking is the strong...

Data Engineering

1403 Views
1 replies
0 kudos

03-24-2022 1:14:34 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-24-2022 3:49:35 AM

0 kudos

@[Kaniz Fatma] @[Vartika] SPAM

0 kudos

03-24-2022 3:49:35 AM

by Databach • New Contributor

03-24-2022 2:34:46 AM

5447 Views
0 replies
0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

Data Engineering

5447 Views
0 replies
0 kudos

03-24-2022 2:34:46 AM

by CBull • New Contributor III

03-15-2022 2:31:48 PM

3580 Views
3 replies
2 kudos

Is there a way in Azure to compare data in one field?

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an h...

Data Engineering

3580 Views
3 replies
2 kudos

03-15-2022 2:31:48 PM

View Replies

Latest Reply

merca
Valued Contributor II

03-23-2022 8:00:45 PM

2 kudos

Since you are trying to do this in SQL, I hope someone else can write you the correct answer. The above example is for pyspark. You can check the SQL synax from Databricks documents

2 kudos

03-23-2022 8:00:45 PM

2 More Replies

Databricks Community

Forum Posts

Automatically Vacuuming a Delta Table on Databricks

Resolved! SQL Databricks - Spot VMs (Cost Optimized)

mount a Azure DL in Databricks

Delta time travel - recover unconditionally delete Recovery is a great feature of the delta. Let's check with a real example of how recovery optio...

Resolved! java.lang.ArithmeticException: Casting XXXXXXXXXXX to int causes overflow

Invalid field schema option provided-DatabricksDeltaLakeSinkConnector

Resolved! As an admin of a Databricks SQL environment, can I cancel long running queries?

Resolved! Databricks Delta Lake Sink Connector

Resolved! Data Cache in Serverless SQL Endpoints vs Non-Serverless SQL Endpoints

Resolved! Problem with Databricks JDBC connection: Error occured while deserializing arrow data

Resolved! How to process a large delta table with UDF ?

Resolved! Strange object returned using sparklyr

We all have been in the situation at some time where we wonder how to stop liking someone. There could be any reason behind this situation and might b...

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Is there a way in Azure to compare data in one field?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template