Data Engineering

Forum Posts

Sorted by:

by farefin • New Contributor II

10-19-2022 4:33:18 AM

4534 Views
2 replies
5 kudos

Need help in a pyspark code in Databricks to calculate a new measure column.

Details of the requirement is as below:I have a table with below structure:So i have to write a code in pyspark to calculate a new column.Logic for new column is Sum of Magnitude for different Categories divided by the total Magnitude.And it should b...

Data Engineering

4534 Views
2 replies
5 kudos

10-19-2022 4:33:18 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:36:25 AM

5 kudos

Hi @Faizan Arefin Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

5 kudos

11-27-2022 5:36:25 AM

1 More Replies

by tum • New Contributor II

10-19-2022 2:29:23 AM

6085 Views
3 replies
4 kudos

Create new job api error "MALFORMED_REQUEST"

hi,i'm trying to test create a new job api (v 2.1) with python, but i got error:{ 'error_code': 'MALFORMED_REQUEST', 'message': 'Invalid JSON given in the body of the request - expected a map'}How do i validate json body before posting ?this is my js...

Data Engineering

6085 Views
3 replies
4 kudos

10-19-2022 2:29:23 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:33:57 AM

4 kudos

Hi @tum m Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

4 kudos

11-27-2022 5:33:57 AM

2 More Replies

by numersoz • New Contributor III

11-23-2022 7:05:12 PM

13475 Views
5 replies
10 kudos

Is ZORDER required after table overwrite?

Hi,After appending new values to a delta table, I need to delete duplicate rows.After deleting duplicate rows using PySpark, I overwrite the table (keeping the schema).My question is, do I have to do ZORDER again?Another question, is there another wa...

Data Engineering

13475 Views
5 replies
10 kudos

11-23-2022 7:05:12 PM

View Replies

Latest Reply

DeepakMakwana74
New Contributor III

11-27-2022 5:30:50 AM

10 kudos

Hii @Nurettin Ersoz try to use incremental load of data so it will avoid duplicate and you can use full load once if you have updation in your data

10 kudos

11-27-2022 5:30:50 AM

4 More Replies

by Milind • New Contributor III

10-18-2022 3:21:42 AM

7330 Views
7 replies
23 kudos

Resolved! Is there syllabus change in self paced Data Engineering with Databrick course video?

Is there syllabus change in self paced Data Engineering with Databrick course video?Last week i started that video lecture, but today i found that everything is change.https://partner-academy.databricks.com/learn/course/62/data-engineering-with-datab...

Data Engineering

7330 Views
7 replies
23 kudos

10-18-2022 3:21:42 AM

View Replies

Latest Reply

DeepakMakwana74
New Contributor III

11-27-2022 5:25:57 AM

23 kudos

Hi @Milind Singh yes there is keep on updation of syllabus so it is required to be updated on self paced course

23 kudos

11-27-2022 5:25:57 AM

6 More Replies

by Sagar1 • New Contributor III

10-16-2022 8:08:10 AM

6040 Views
3 replies
4 kudos

How to identify or determine how many jobs will be performed if I submit code

I’m not able to find a source where it explains how to determine how many job a written piece of pyspark code will trigger. Can you please help me here. About stages I know that the number of shuffles equals to the number of stages.

Data Engineering

6040 Views
3 replies
4 kudos

10-16-2022 8:08:10 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:06:51 AM

4 kudos

Hi @sagar Varma Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

4 kudos

11-27-2022 5:06:51 AM

2 More Replies

by Kash • Contributor III

10-14-2022 6:42:13 AM

2320 Views
2 replies
6 kudos

Will Vacuum delete previous folders of data if we z-ordered by as_of_date each day?

Hi there,I've had horrible experiences Vacuuming tables in the past and losing tons of data so I wanted to confirm a few things about Vacuuming and Z-Order.Background:Each day we run an ETL job that appends data in a table and stores the data in S3 b...

Data Engineering

2320 Views
2 replies
6 kudos

10-14-2022 6:42:13 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:03:39 AM

6 kudos

Hi @Avkash Kana Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

6 kudos

11-27-2022 5:03:39 AM

1 More Replies

by Sweta • New Contributor II

10-12-2022 8:50:35 PM

5832 Views
6 replies
7 kudos

Can Delta Lake completely host a data warehouse and replace Redshift?

Our use case is simple - to store our PB scale data and transform and use for BI, reporting and analytics. As my title says am trying to eliminate expenditure on Redshift as we are starting as a green field. I know I have designed/used just Delta lak...

Data Engineering

5832 Views
6 replies
7 kudos

10-12-2022 8:50:35 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 4:51:29 AM

7 kudos

Hi @Swetha Marakani Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

7 kudos

11-27-2022 4:51:29 AM

5 More Replies

by Carlton • Contributor II

10-14-2022 6:45:12 AM

8717 Views
5 replies
14 kudos

I would like to know why CROSS JOIN fails recognize columns

Whenever I apply a CROSS JOIN to my Databricks SQL query I get a message letting me know that a column does not exists, but I'm not sure if the issue is with the CROSS JOIN.For example, the code should identify characters such as http, https, ://, / ...

Data Engineering

8717 Views
5 replies
14 kudos

10-14-2022 6:45:12 AM

View Replies

Latest Reply

Shalabh007
Honored Contributor

11-27-2022 1:35:51 AM

14 kudos

@CARLTON PATTERSON Since you have given an alias "tt" to your table "basecrmcbreport.organizations", to access corresponding columns you will have to access them in format tt.<column_name>in your code in line #4, try accessing the column 'homepage_u...

14 kudos

11-27-2022 1:35:51 AM

4 More Replies

by RyanD-AgCountry • Contributor

09-07-2022 7:46:48 AM

5738 Views
5 replies
7 kudos

Resolved! Azure Create Metastore button not available

With Unity Catalog gone GA on Azure, we are working through initial tests for setup within Databricks and Azure. However, we are not seeing the "Create Metastore" button available as indicated in documentation. We're also not seeing any additional pr...

Data Engineering

5738 Views
5 replies
7 kudos

09-07-2022 7:46:48 AM

View Replies

Latest Reply

Addi1
Databricks Partner

11-26-2022 9:42:10 AM

7 kudos

I'm facing the same issues listed above. "Create Metastore" button is unavailable for me as well.

7 kudos

11-26-2022 9:42:10 AM

4 More Replies

by yogu • Databricks Partner

11-26-2022 3:21:57 AM

1967 Views
0 replies
20 kudos

[NO SUBJECT]

Data Engineering

1967 Views
0 replies
20 kudos

11-26-2022 3:21:57 AM

by andreiten • New Contributor II

11-22-2022 1:06:36 AM

7611 Views
1 replies
3 kudos

Is there any example or guideline how to pass JSON parameters to the pipeline in Databricks workflow?

I used this source https://docs.databricks.com/workflows/jobs/jobs.html#:~:text=You%20can%20use%20Run%20Now,different%20values%20for%20existing%20parameters.&text=next%20to%20Run%20Now%20and,on%20the%20type%20of%20task. But there is no example of how...

Data Engineering

7611 Views
1 replies
3 kudos

11-22-2022 1:06:36 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-26-2022 1:55:33 AM

3 kudos

Hi @Andre Ten That's exactly how you specify the json parameters in databricks workflow. I have been doing in the same format and it works for me..removed the parameters as it is a bit sensitive. But I hope you get the point.Cheers.

3 kudos

11-26-2022 1:55:33 AM

by PaulP • New Contributor II

10-14-2022 12:36:54 PM

4772 Views
3 replies
6 kudos

What is the best expected starting time for a cluster when using a pool?

Hi! I'm doing some tests to get an idea of how much time could be saved starting a cluster by using a pool and was wondering if the results I get are what should be expected.We're using AWS Databricks and used i3.xlarge as instance type (if that matt...

Data Engineering

4772 Views
3 replies
6 kudos

10-14-2022 12:36:54 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-25-2022 11:11:49 PM

6 kudos

Hi @Paul Pelletier Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

6 kudos

11-25-2022 11:11:49 PM

2 More Replies

by marcus1 • New Contributor III

10-14-2022 10:44:39 AM

3163 Views
2 replies
1 kudos

Running jobs as a non-job owner

We have enabled Cluster, Pool and Job access, and non-job owners can not run a job even though they are administrators. This disables users from creating cluster resources.When a non-owner of a job attempts to run, they get a permission denied.My un...

Data Engineering

3163 Views
2 replies
1 kudos

10-14-2022 10:44:39 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-25-2022 11:10:37 PM

1 kudos

Hi @Marcus Simonsen Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

1 kudos

11-25-2022 11:10:37 PM

1 More Replies

by ConSpooky • New Contributor II

10-14-2022 9:27:56 AM

3173 Views
3 replies
4 kudos

Best practice for creating queries for data transformation?

My apologies in advance for sounding like a newbie. This is really just a curiosity question I have as an outsider observing my team clash with our client. Please ask any questions you have, and I will try my best to answer it.Currently, we are stori...

Data Engineering

3173 Views
3 replies
4 kudos

10-14-2022 9:27:56 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-25-2022 11:09:44 PM

4 kudos

Hi @Nick ConnorsHope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

4 kudos

11-25-2022 11:09:44 PM

2 More Replies

by Gerhard • New Contributor III

11-25-2022 5:59:42 AM

2344 Views
0 replies
1 kudos

Read proprietary files and transform contents to a table - error resilient process needed

We do have data stored in HDF5 files in a "proprietary" way. This data needs to be read, converted and transformed before it can be inserted into a delta table.All of this transformation is done in a custom python function that takes the HDF5 file an...

Data Engineering

2344 Views
0 replies
1 kudos

11-25-2022 5:59:42 AM

Databricks Community

Forum Posts

Need help in a pyspark code in Databricks to calculate a new measure column.

Create new job api error "MALFORMED_REQUEST"

Is ZORDER required after table overwrite?

Resolved! Is there syllabus change in self paced Data Engineering with Databrick course video?

How to identify or determine how many jobs will be performed if I submit code

Will Vacuum delete previous folders of data if we z-ordered by as_of_date each day?

Can Delta Lake completely host a data warehouse and replace Redshift?

I would like to know why CROSS JOIN fails recognize columns

Resolved! Azure Create Metastore button not available

[NO SUBJECT]

Is there any example or guideline how to pass JSON parameters to the pipeline in Databricks workflow?

What is the best expected starting time for a cluster when using a pool?

Running jobs as a non-job owner

Best practice for creating queries for data transformation?

Read proprietary files and transform contents to a table - error resilient process needed

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template