Data Engineering

Forum Posts

Sorted by:

by goal1860 • New Contributor III

02-02-2023 2:14:59 PM

1692 Views
5 replies
1 kudos

Resolved! Failed to signup community version

I've been trying to create Community Edition account, but keep getting: "An error has occurred. Please try again later" message. I searched the other posts, there are some people running into the same issue as well, but don't see any solution posted....

Data Engineering

1692 Views
5 replies
1 kudos

02-02-2023 2:14:59 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 11:49:37 PM

1 kudos

Hi @Liang He Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

1 kudos

04-09-2023 11:49:37 PM

4 More Replies

by tibfab • New Contributor II

02-09-2023 7:32:23 AM

2387 Views
5 replies
0 kudos

How can I build a custom docker image for the ML runtime (e.g. 12.1 ML)?

I successfully built a custom docker image for the Standard runtime following the steps described on the page Customize containers with Databricks Container Services and based on the image databricksruntime/standard:11.3-LTS. However, I cannot find ...

Data Engineering

2387 Views
5 replies
0 kudos

02-09-2023 7:32:23 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 12:23:45 AM

0 kudos

Hi @Tibor Fabian Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!

0 kudos

04-10-2023 12:23:45 AM

4 More Replies

by nolanlavender00 • New Contributor

02-10-2023 11:39:10 AM

1897 Views
2 replies
0 kudos

How to control garbage collection while using Autoloader File Notification?

I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...

Data Engineering

1897 Views
2 replies
0 kudos

02-10-2023 11:39:10 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 12:23:03 AM

0 kudos

Hi @nolanlavender008 Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

0 kudos

04-10-2023 12:23:03 AM

1 More Replies

by joshi • New Contributor II

02-03-2023 6:38:36 PM

1291 Views
5 replies
0 kudos

Full screen video' button not working in spark certification videos

Hi All,Many users already posted about this but no action taken till now., i tried to use different browsers and system still not able to maximize the spark training videos.Many months passed still databricks people are not correcting this mistake. @...

Data Engineering

1291 Views
5 replies
0 kudos

02-03-2023 6:38:36 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 12:22:24 AM

0 kudos

Hi @Abhishek Joshi Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!

0 kudos

04-10-2023 12:22:24 AM

4 More Replies

by chanansh • Contributor

02-02-2023 4:15:48 PM

941 Views
3 replies
0 kudos

delta table grouping by key which is not partitioned by is very slow

I have a big data delta table with timestamp, key and metric(s) columns (e.g. m1, m2, ...).I often will group by the key (e.g. select max(m1) group by timestamp, key).I cannot partition by `key` because there are too many values( ~200K).I have tried ...

Data Engineering

941 Views
3 replies
0 kudos

02-02-2023 4:15:48 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 11:48:29 PM

0 kudos

Hi @Hanan Shteingart Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

0 kudos

04-09-2023 11:48:29 PM

2 More Replies

by Soma • Valued Contributor

04-05-2023 6:17:47 PM

1103 Views
5 replies
0 kudos

Cosmos db spark patch api

Hi all we are trying to do cosmos patch api to a array field but the problem I see is we need to collect the data to get the index can you please let us know if we have an alternative as this causes bottleneck on driver

Data Engineering

1103 Views
5 replies
0 kudos

04-05-2023 6:17:47 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-07-2023 11:42:43 PM

0 kudos

Hi @somanath Sankaran Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fee...

0 kudos

04-07-2023 11:42:43 PM

4 More Replies

by andrew0117 • Contributor

01-05-2023 12:03:56 PM

2817 Views
6 replies
2 kudos

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

how to guarantee the index is always following the file's original order no matter what. Currently, I'm using val df = spark.read.options(Map("header"-> "true", "inferSchema" -> "true")).csv("filePath").withColumn("index", monotonically_increasing...

Data Engineering

2817 Views
6 replies
2 kudos

01-05-2023 12:03:56 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 1:39:33 PM

2 kudos

monotonically_increasing_id will not as it is to guarantee that every partition has separate ids. What is the whole code? Do you load directory with a lot of CSVs? What "original order" means? Is it csvs ordered by file creation date, by file name? o...

2 kudos

01-05-2023 1:39:33 PM

5 More Replies

by Mado • Valued Contributor II

03-25-2023 9:46:46 PM

4451 Views
3 replies
0 kudos

How to update value of a column with MAP data-type in a delta table using a python dictionary and SQL UPDATE command?

I have a delta table created by:%sql CREATE TABLE IF NOT EXISTS dev.bronze.test_map ( id INT, table_updates MAP<STRING, TIMESTAMP>, CONSTRAINT test_map_pk PRIMARY KEY(id) ) USING DELTA LOCATION "abfss://bronze@Table Path"With initi...

Data Engineering

4451 Views
3 replies
0 kudos

03-25-2023 9:46:46 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-03-2023 11:36:21 PM

0 kudos

Hi @Mohammad Saber Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

0 kudos

04-03-2023 11:36:21 PM

2 More Replies

by kk007 • New Contributor III

04-07-2023 10:19:36 AM

1527 Views
4 replies
4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

Data Engineering

1527 Views
4 replies
4 kudos

04-07-2023 10:19:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 8:47:33 AM

4 kudos

@Kamal Kumar :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

4 kudos

04-09-2023 8:47:33 AM

3 More Replies

by zeta_load • New Contributor II

04-06-2023 5:39:28 AM

696 Views
1 replies
1 kudos

Resolved! Unique ID of table values is not unique anymore after merge every x-times

I have two tables with unique IDs:ID val ID val1 10 1 102 11 2 103 13 ...

Data Engineering

696 Views
1 replies
1 kudos

04-06-2023 5:39:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 8:33:15 AM

1 kudos

@Lukas Goldschmied :There are a few reasons why you might be experiencing this issue:Data Skew: Data skew is a common problem in distributed computing when one or more nodes in the cluster have more data to process than others. This can lead to long...

1 kudos

04-09-2023 8:33:15 AM

by Alexander1 • New Contributor III

02-03-2023 1:33:24 PM

5168 Views
5 replies
1 kudos

Resolved! Databricks JDBC/ODBC write batch size

I have spent way too much time to find a solution to the problem of efficiently writing data to Databricks via JDBC/ODBC. I have looked into countless docs, blogs and repos and I cannot find one example where someone is setting some kind of batch/bul...

Data Engineering

5168 Views
5 replies
1 kudos

02-03-2023 1:33:24 PM

View Replies

Latest Reply

Alexander1
New Contributor III

04-09-2023 1:44:07 AM

1 kudos

@Vidula Khanna yes, have done so. thanks.

1 kudos

04-09-2023 1:44:07 AM

4 More Replies

by nupur_dogra • New Contributor II

04-03-2023 9:07:57 AM

1059 Views
4 replies
0 kudos

Unable to get or download fundamentals of databricks lake house platform badge download

Hi Team,I have completed the fundamentals of databricks lakehouse platform and received the certificate but unable to download or get the badge.When I login to website I am unbale to see the badge as well.Please help me on this.I haven't received any...

Data Engineering

1059 Views
4 replies
0 kudos

04-03-2023 9:07:57 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 12:37:38 AM

0 kudos

Hi @Nupur Dogra Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

0 kudos

04-09-2023 12:37:38 AM

3 More Replies

by 744291 • New Contributor III

02-07-2023 9:00:20 PM

881 Views
6 replies
0 kudos

I have attended an event on certification preparation on databricks data engineer associate on 17th Jan 2023.I have filled the survey form and it was ...

I have attended an event on certification preparation on databricks data engineer associate on 17th Jan 2023.I have filled the survey form and it was mentioned that I will receive the voucher in early Feb.Still I have not received.Please update me as...

Data Engineering

881 Views
6 replies
0 kudos

02-07-2023 9:00:20 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 12:08:24 AM

0 kudos

Hi @Rituparna Das Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

0 kudos

04-09-2023 12:08:24 AM

5 More Replies

by Meghala • Valued Contributor II

02-04-2023 7:01:09 AM

2860 Views
11 replies
2 kudos

Exam issues

How to approach the databricks team if we facing any problem Some time question is not appearing properly so any one know the solution kindly tell me

Data Engineering

2860 Views
11 replies
2 kudos

02-04-2023 7:01:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 12:07:42 AM

2 kudos

Hi @S Meghala Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

2 kudos

04-09-2023 12:07:42 AM

10 More Replies

by Krishna264 • New Contributor

02-11-2023 1:28:46 AM

737 Views
2 replies
0 kudos

Delta write stream to different folders dynamically based on input file

I have root folder and files are getting ingested in sub folders . Want to build a workflow which will write stream based on file being ingested

Data Engineering

737 Views
2 replies
0 kudos

02-11-2023 1:28:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 12:05:30 AM

0 kudos

Hi @Krishnamoorthy Natarajan Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Y...

0 kudos

04-09-2023 12:05:30 AM

1 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Failed to signup community version

How can I build a custom docker image for the ML runtime (e.g. 12.1 ML)?

How to control garbage collection while using Autoloader File Notification?

Full screen video' button not working in spark certification videos

delta table grouping by key which is not partitioned by is very slow

Cosmos db spark patch api

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

How to update value of a column with MAP data-type in a delta table using a python dictionary and SQL UPDATE command?

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

Resolved! Unique ID of table values is not unique anymore after merge every x-times

Resolved! Databricks JDBC/ODBC write batch size

Unable to get or download fundamentals of databricks lake house platform badge download

I have attended an event on certification preparation on databricks data engineer associate on 17th Jan 2023.I have filled the survey form and it was ...

Exam issues

Delta write stream to different folders dynamically based on input file

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...