Data Engineering

Forum Posts

Sorted by:

by vinaykumar • New Contributor III

02-13-2023 11:14:19 PM

1118 Views
3 replies
0 kudos

File optimization for delta table (versioning and snapshot ) in storage S3

Delta table generates new file for every insert or update on table and keep the old version files also for versioning and time travel history . I have 1tb data as delta table and every 30 minutes , 90 percent data getting updated so file size will b...

Data Engineering

1118 Views
3 replies
0 kudos

02-13-2023 11:14:19 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-21-2023 2:21:15 AM

0 kudos

Hi @vinay kumar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

0 kudos

02-21-2023 2:21:15 AM

2 More Replies

by dispersion • New Contributor

02-13-2023 5:39:50 AM

702 Views
2 replies
1 kudos

Running large volume of SQL queries in Python notebooks. How to minimise overheads/maintenance.

I have around 200 SQL queries id like to run in databricks python notebooks. Id like to avoid creating an ETL process for each of the 200 SQL processes.Any suggestions on how to run the queries in a way that it loops through them so i have minimum am...

Data Engineering

702 Views
2 replies
1 kudos

02-13-2023 5:39:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-21-2023 2:17:23 AM

1 kudos

Hi @Chris French Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

1 kudos

02-21-2023 2:17:23 AM

1 More Replies

by jairomonassa • New Contributor

02-14-2023 3:36:12 PM

1074 Views
4 replies
2 kudos

Resolved! where can I find my certification id in order for apply in rewards from community ?

Data Engineering

1074 Views
4 replies
2 kudos

02-14-2023 3:36:12 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-21-2023 12:33:33 AM

2 kudos

Hi @jairo neder monassa moreira Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear...

2 kudos

02-21-2023 12:33:33 AM

3 More Replies

by JordGray_57117 • New Contributor II

02-13-2023 2:04:08 PM

856 Views
3 replies
0 kudos

Resolved! Is it possible to reset user passwords outside of the Admin Console UI?

There is a business requirement for some of our accounts to have their passwords rotated. This currently requires an admin to go in and manually reset the password for the account via UI. I wanted to know if there's a more automated way to handle thi...

Data Engineering

856 Views
3 replies
0 kudos

02-13-2023 2:04:08 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-21-2023 12:32:57 AM

0 kudos

Hi @Jordan Gray Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

0 kudos

02-21-2023 12:32:57 AM

2 More Replies

by zUnkn0wn990 • New Contributor II

02-20-2023 8:10:01 PM

444 Views
1 replies
2 kudos

Resolved! Lakehouse Fundamentals Accreditation badge not received

I have passed the test today for Lakehouse Fundamentals Accreditation, but have not yet received the badge yet.Please let me know how and when I can receive the badge for this passed test.Thank you.

Data Engineering

444 Views
1 replies
2 kudos

02-20-2023 8:10:01 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-20-2023 11:02:46 PM

2 kudos

Hi @Chang Su Lee Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training and our team will get back to you shortly.

2 kudos

02-20-2023 11:02:46 PM

by Mado • Valued Contributor II

02-15-2023 2:26:17 PM

2383 Views
3 replies
2 kudos

Databricks Audit Logs, Where the log files are stored? How to read them?

Hi,I want to access the Databricks Audit Logs to check user activity. For example, the number of times that a table was viewed by a user.I have a few questions in this regard. 1) Where the log files are stored? Are they stored on DBFS?2) Can I read l...

Data Engineering

2383 Views
3 replies
2 kudos

02-15-2023 2:26:17 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-19-2023 10:14:12 PM

2 kudos

Hi @Mohammad Saber Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

2 kudos

02-19-2023 10:14:12 PM

2 More Replies

by zeta_load • New Contributor II

02-17-2023 6:41:15 AM

765 Views
2 replies
0 kudos

Resolved! When does delta lake actually compute a table?

Maybe I'm completely wrong, but from my understanding delta lake only calculates a table at certain points, for instance when you display your data. Before that point, operations are only written to the log file and are not executed (meaning no chang...

Data Engineering

765 Views
2 replies
0 kudos

02-17-2023 6:41:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-19-2023 10:03:47 PM

0 kudos

Hi @Lukas Goldschmied Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

0 kudos

02-19-2023 10:03:47 PM

1 More Replies

by abhishek_dutta3 • New Contributor II

02-11-2023 8:28:44 PM

6814 Views
5 replies
0 kudos

Merge upsert in Delta is throwing error "concurrent merge in delta lake tables in Azure Databricks .ConcurrentAppendException"

This is the error which is coming while processing concurrent merge in delta lake tables in Azure Databricks .ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.. What are the o...

Data Engineering

6814 Views
5 replies
0 kudos

02-11-2023 8:28:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2023 9:42:01 PM

0 kudos

Hi @Abhishek Dutta Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

0 kudos

02-16-2023 9:42:01 PM

4 More Replies

by Smu_Tan • New Contributor

04-13-2022 1:24:43 PM

1324 Views
3 replies
1 kudos

Resolved! Does Databricks supports the Pytorch Distributed Training for multiple devices?

Hi, Im trying to use the databricks platform to do the pytorch distributed training, but I didnt find any info about this. What I expected is using multiple clusters to run a common job using pytorch distributed data parallel (DDP) with the code belo...

Data Engineering

1324 Views
3 replies
1 kudos

04-13-2022 1:24:43 PM

View Replies

Latest Reply

axb0
New Contributor III

02-19-2023 8:15:27 AM

1 kudos

With Databricks MLR, HorovodRunner is provided which supports distributed training and inference with PyTorch. Here's an example notebook for your reference: PyTorchDistributedDeepLearningTraining - Databricks.

1 kudos

02-19-2023 8:15:27 AM

2 More Replies

by vinaykumar • New Contributor III

02-14-2023 8:03:21 AM

816 Views
3 replies
0 kudos

Resolved! Time travel and version control- can create custom version control for each day data load when multiple updates happening in a day.

Time travel and version control- can create custom version control for each day data load when multiple updates happening in a day. For example , let’s assume we are doing multiple operation on table in a day every minute and want to keep time travel...

Data Engineering

816 Views
3 replies
0 kudos

02-14-2023 8:03:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2023 9:48:28 PM

0 kudos

0 kudos

02-16-2023 9:48:28 PM

2 More Replies

by srDataEngineer • New Contributor II

02-16-2023 8:17:47 AM

1930 Views
4 replies
2 kudos

Resolved! how does databricks time travel work

Hi, Since it is not very well explained, I want to know if the table history is a snapshot of the whole table at that point of time containing all the data or it tracks only some metadata of the table changes.To be more precise : if I have a table in...

Data Engineering

1930 Views
4 replies
2 kudos

02-16-2023 8:17:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-17-2023 10:56:13 PM

2 kudos

Hi @data engineer Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

2 kudos

02-17-2023 10:56:13 PM

3 More Replies

by aline_alvarez • New Contributor III

02-08-2023 4:07:05 AM

2048 Views
6 replies
7 kudos

Resolved! How can I delete a file in DBFS with Illegal character?

How can I delete a file in DBFS with Illegal character?Someone put the file named "planejamento_[4098.]___SHORT_SAIA_JEANS__.xlsx" inside the folder /FileStore and I can delete it, because of this error: java.net.URISyntaxException: Illegal character...

Data Engineering

2048 Views
6 replies
7 kudos

02-08-2023 4:07:05 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-17-2023 4:23:51 AM

7 kudos

try this %sh ls -li /dbfsif the file is located in a subdirectory you can change the path mentioned above.the %sh magic command gives you access to linux shell commands.

7 kudos

02-17-2023 4:23:51 AM

5 More Replies

by EDDatabricks • Contributor

02-13-2023 10:21:35 AM

1524 Views
2 replies
0 kudos

Resolved! Pool Max Capacity vs Cluster Max Workers

Hi all, we have a databricks instance on Azure with a Compute Cluster version 7.3 LTS. Currently the cluster has 4 max workers (min workers: 1) of type: Standard_D13_v2 and 1 driver of the same type. There are several jobs that are running on this cl...

Data Engineering

1524 Views
2 replies
0 kudos

02-13-2023 10:21:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2023 9:40:27 PM

0 kudos

Hi @EDDatabricks EDDatabricks Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

0 kudos

02-16-2023 9:40:27 PM

1 More Replies

by tinendra • New Contributor III

02-14-2023 4:19:07 AM

1983 Views
5 replies
5 kudos

How to reduce time while loading data into the azure synapse table?

Hi All,I just wanted to know if is there any option to reduce time while loading Pyspark Dataframe into the Azure synapse table using Databricks.like..I have a pyspark dataframe that has around 40k records and I am trying to load data into the azure ...

Data Engineering

1983 Views
5 replies
5 kudos

02-14-2023 4:19:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2023 9:39:53 PM

5 kudos

Hi @Tinendra Kumar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

5 kudos

02-16-2023 9:39:53 PM

4 More Replies

by Murthy1 • Contributor II

02-13-2023 7:26:16 AM

3659 Views
3 replies
3 kudos

Resolved! Impacts of running multiple jobs in parallel that refers the same notebook

Can I run multiple jobs(for example: 100+) in parallel that refers the same notebook? I supply each job with a different parameter. If we can do this, what would be the impact? (for example: reliability, performance, troubleshooting etc. )Example: N...

Data Engineering

3659 Views
3 replies
3 kudos

02-13-2023 7:26:16 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2023 9:26:18 PM

3 kudos

Hi @Murthy Ramalingam Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

3 kudos

02-16-2023 9:26:18 PM

2 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

File optimization for delta table (versioning and snapshot ) in storage S3

Running large volume of SQL queries in Python notebooks. How to minimise overheads/maintenance.

Resolved! where can I find my certification id in order for apply in rewards from community ?

Resolved! Is it possible to reset user passwords outside of the Admin Console UI?

Resolved! Lakehouse Fundamentals Accreditation badge not received

Databricks Audit Logs, Where the log files are stored? How to read them?

Resolved! When does delta lake actually compute a table?

Merge upsert in Delta is throwing error "concurrent merge in delta lake tables in Azure Databricks .ConcurrentAppendException"

Resolved! Does Databricks supports the Pytorch Distributed Training for multiple devices?

Resolved! Time travel and version control- can create custom version control for each day data load when multiple updates happening in a day.

Resolved! how does databricks time travel work

Resolved! How can I delete a file in DBFS with Illegal character?

Resolved! Pool Max Capacity vs Cluster Max Workers

How to reduce time while loading data into the azure synapse table?

Resolved! Impacts of running multiple jobs in parallel that refers the same notebook

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...