Data Engineering

Forum Posts

Sorted by:

by dotan • New Contributor II

03-06-2023 8:27:58 AM

891 Views
2 replies
1 kudos

Resolved! How do I reduce the size of a hive table's S3 bucket

I have a hive table in Delta format with over 1B rows, when I check the Data Explorer in the SQL section of Databricks it notes that the table size is 139.3GiB with 401 files but when I check the S3 bucket where the files are located (dbfs:/user/hive...

Data Engineering

891 Views
2 replies
1 kudos

03-06-2023 8:27:58 AM

View Replies

Latest Reply

apingle
Contributor

03-06-2023 1:52:20 PM

1 kudos

When you run updates, deletes etc on a delta table, new files are created. However, the old files are not automatically deleted. This is to allow for features like time travel on the Delta tables. In order to delete older files for a delta table, you...

1 kudos

03-06-2023 1:52:20 PM

1 More Replies

by Hubert-Dudek • Esteemed Contributor III

03-03-2023 9:55:21 AM

428 Views
1 replies
5 kudos

Exciting news for Databricks users! #databricks launched a new feature that allows users to run job workflows continuously. Setting up a continuous jo...

Exciting news for Databricks users! #databricks launched a new feature that allows users to run job workflows continuously. Setting up a continuous job workflow is straightforward: create a job and select the continuous trigger option in the scheduli...

Data Engineering

428 Views
1 replies
5 kudos

03-03-2023 9:55:21 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

03-07-2023 10:26:59 AM

5 kudos

Thank you for sharing!!!

5 kudos

03-07-2023 10:26:59 AM

by Sujitha • Community Manager

03-01-2023 6:57:22 AM

388 Views
1 replies
1 kudos

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notesFebruary 2...

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week.Databricks platform release notesFebruary 21 - 28, 2023Ray on Databricks (Public Preview)With Databricks Runtime 12.0 and above, you can create ...

Data Engineering

388 Views
1 replies
1 kudos

03-01-2023 6:57:22 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

03-07-2023 10:26:03 AM

1 kudos

Thank you for sharing!!!

1 kudos

03-07-2023 10:26:03 AM

by SteveGPT • New Contributor III

02-23-2023 11:49:03 AM

2207 Views
6 replies
3 kudos

How to by pass SSL cert verification, using Repos with Azure Devops

Hi all, after some time working with Devops and Repos and getting used to the convenience our SSL Cert situation got jacked up somehow. While not ideal, I'd like to be able to temporarily bypass cert verification. There are ways to do this in the she...

Data Engineering

2207 Views
6 replies
3 kudos

02-23-2023 11:49:03 AM

View Replies

Latest Reply

SteveGPT
New Contributor III

03-07-2023 7:14:42 AM

3 kudos

Guess I'm out of luck on this one...

3 kudos

03-07-2023 7:14:42 AM

5 More Replies

by youssefmrini • Honored Contributor III

02-28-2023 3:05:44 AM

884 Views
2 replies
4 kudos

Resolved! Does DLT Support watermarking and Windowing ?

Yes it does.Here is the syntaxe for Watermarkinghttps://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-watermark.htmlHere it the syntaxe for Windowing https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html

Data Engineering

884 Views
2 replies
4 kudos

02-28-2023 3:05:44 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2023 1:15:02 AM

4 kudos

Hi @Youssef Mrini , Thank you for sharing the valuable information. Your insights are beneficial, and I appreciate the time and effort you put into gathering and presenting that information. I'm sure our peers will find it as valuable as us. Thanks ...

4 kudos

03-07-2023 1:15:02 AM

1 More Replies

by youssefmrini • Honored Contributor III

02-28-2023 3:16:37 AM

977 Views
2 replies
0 kudos

Resolved! Notebook cell output results limit increased ?

Data Engineering

977 Views
2 replies
0 kudos

02-28-2023 3:16:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2023 11:16:38 PM

0 kudos

Hi @Youssef Mrini Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

0 kudos

03-06-2023 11:16:38 PM

1 More Replies

by youssefmrini • Honored Contributor III

02-28-2023 3:07:50 AM

835 Views
2 replies
2 kudos

Resolved! Can I run Ray applications on Databricks ?

With Databricks Runtime 12.0 and above, you can create a Ray cluster and run Ray applications in Databricks with the Ray on Spark API.Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a ...

Data Engineering

835 Views
2 replies
2 kudos

02-28-2023 3:07:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2023 10:31:36 PM

2 kudos

Hi @Youssef Mrini Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

2 kudos

03-06-2023 10:31:36 PM

1 More Replies

by Jfoxyyc • Valued Contributor

02-15-2023 3:05:16 PM

2056 Views
8 replies
7 kudos

Databricks IDE - Notebooks

With the announcement of the official IDE support for VS Code, does any one know if there's a way to run notebooks in VSC Code on Databricks clusters?https://www.databricks.com/blog/2023/02/14/announcing-a-native-visual-studio-code-experience-for-dat...

Data Engineering

2056 Views
8 replies
7 kudos

02-15-2023 3:05:16 PM

View Replies

Latest Reply

Jfoxyyc
Valued Contributor

03-06-2023 8:59:06 AM

7 kudos

There has been no response regarding running notebook cells in VS Code, so no best response.

7 kudos

03-06-2023 8:59:06 AM

7 More Replies

by Tico23 • Contributor

03-05-2023 4:41:58 AM

1477 Views
3 replies
0 kudos

Resolved! AmazonS3 with Autoloader consume "too many" requests or maybe not!

After successfully loading 3 small files (2 KB each) in from AWS S3 using Auto Loader for learning purposes, I got, few hours later, a "AWS Free tier limit alert", although I haven't used the AWS account for a while.Does this streaming service on Da...

Data Engineering

1477 Views
3 replies
0 kudos

03-05-2023 4:41:58 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

03-06-2023 8:25:11 AM

0 kudos

Hi, Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. Auto Loader can load data files from AWS S3 (s3://), Azure Data Lake Storage Gen2 (ADLS Gen2, abfss://), Google Cloud Storage (GCS, gs://), Azur...

0 kudos

03-06-2023 8:25:11 AM

2 More Replies

by Ajay-Pandey • Esteemed Contributor III

02-26-2023 9:13:36 PM

906 Views
1 replies
5 kudos

Notebook cell output results limit increased- 10,000 rows or 2 MB. Hi all, Now, databricks start showing the first 10000 rows instead of 1000 rows.Tha...

Notebook cell output results limit increased- 10,000 rows or 2 MB.Hi all,Now, databricks start showing the first 10000 rows instead of 1000 rows.That will reduce the time of re-execution while working on fewer sizes of data that have rows between 100...

Data Engineering

906 Views
1 replies
5 kudos

02-26-2023 9:13:36 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-06-2023 6:38:31 AM

5 kudos

Thank you @Ajay Pandey for sharing the good news with your peers.

5 kudos

03-06-2023 6:38:31 AM

by Hubert-Dudek • Esteemed Contributor III

03-06-2023 3:12:08 AM

968 Views
3 replies
7 kudos

Starting from #databricks runtime 12.2 LTS, implicit lateral column aliasing is now supported. This feature enables you to reuse an expression defined...

Starting from #databricks runtime 12.2 LTS, implicit lateral column aliasing is now supported. This feature enables you to reuse an expression defined earlier in the same SELECT list, thus avoiding repetition of the same calculation.For instance, in ...

Data Engineering

968 Views
3 replies
7 kudos

03-06-2023 3:12:08 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

03-06-2023 5:17:54 AM

7 kudos

Informative Thanks for sharing.

7 kudos

03-06-2023 5:17:54 AM

2 More Replies

by Hubert-Dudek • Esteemed Contributor III

04-03-2022 8:21:03 AM

3126 Views
4 replies
23 kudos

Encrypt and decrypt personal data with Spark Databricks.We create a table that will include personal information. However, we want to hide personal id...

Encrypt and decrypt personal data with Spark Databricks.We create a table that will include personal information. However, we want to hide personal identifiers so no one can see them.We set a key. A key need to have 16, 24, or 32 bytes. 1 byte = 1 ch...

Data Engineering

3126 Views
4 replies
23 kudos

04-03-2022 8:21:03 AM

View Replies

Latest Reply

MaheshDBR
New Contributor II

03-04-2023 7:43:45 PM

23 kudos

@Hubert Dudek how can we decrypt the data outside of databricks with python? which is encrypted with aes_encrypt

23 kudos

03-04-2023 7:43:45 PM

3 More Replies

by STummala • New Contributor

03-01-2023 7:02:44 AM

1090 Views
2 replies
0 kudos

how to dynamically perform aggregation on all columns in a data frame even when some columns have different types like int , double string datetime or float in pyspark (i have 140-200 columns and need to perform aggregation/avg on each column)

need to aggregate all the numerical columns but need to this dynamically

Data Engineering

1090 Views
2 replies
0 kudos

03-01-2023 7:02:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-05-2023 11:16:13 PM

0 kudos

Hi @sandeep tummala , Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fe...

0 kudos

03-05-2023 11:16:13 PM

1 More Replies

by Nirbhay • New Contributor II

02-21-2023 7:36:23 AM

633 Views
2 replies
2 kudos

Resolved! newly created account not working and when changing password it is getting hangged

My community user account is not working

Data Engineering

633 Views
2 replies
2 kudos

02-21-2023 7:36:23 AM

View Replies

Latest Reply

Nirbhay
New Contributor II

03-05-2023 2:07:07 AM

2 kudos

Community-Edition please

2 kudos

03-05-2023 2:07:07 AM

1 More Replies

by raghub1 • New Contributor II

05-10-2022 1:12:02 AM

4375 Views
5 replies
3 kudos

Resolved! Writing PySpark DataFrame onto AWS Glue throwing error

I have followed the steps as mentioned in this blog : https://www.linkedin.com/pulse/aws-glue-data-catalog-metastore-databricks-deepak-rajak/ but when trying to saveAsTable(table_name), it is giving an error as IllegalArgumentException: Path must be ...

Data Engineering

4375 Views
5 replies
3 kudos

05-10-2022 1:12:02 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-21-2022 9:10:24 AM

3 kudos

Hey @Raghu Bharadwaj Tallapragada Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

3 kudos

06-21-2022 9:10:24 AM

4 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! How do I reduce the size of a hive table's S3 bucket

Exciting news for Databricks users! #databricks launched a new feature that allows users to run job workflows continuously. Setting up a continuous jo...

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notesFebruary 2...

How to by pass SSL cert verification, using Repos with Azure Devops

Resolved! Does DLT Support watermarking and Windowing ?

Resolved! Notebook cell output results limit increased ?

Resolved! Can I run Ray applications on Databricks ?

Databricks IDE - Notebooks

Resolved! AmazonS3 with Autoloader consume "too many" requests or maybe not!

Notebook cell output results limit increased- 10,000 rows or 2 MB. Hi all, Now, databricks start showing the first 10000 rows instead of 1000 rows.Tha...

Starting from #databricks runtime 12.2 LTS, implicit lateral column aliasing is now supported. This feature enables you to reuse an expression defined...

Encrypt and decrypt personal data with Spark Databricks.We create a table that will include personal information. However, we want to hide personal id...

how to dynamically perform aggregation on all columns in a data frame even when some columns have different types like int , double string datetime or float in pyspark (i have 140-200 columns and need to perform aggregation/avg on each column)

Resolved! newly created account not working and when changing password it is getting hangged

Resolved! Writing PySpark DataFrame onto AWS Glue throwing error

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...