Data Engineering

Forum Posts

Sorted by:

by h_aloha • New Contributor III

04-18-2023 4:15:09 PM

2060 Views
2 replies
0 kudos

Difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2

Hi,Does anyone know what's the difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2?Looks like there is no practice exam for V3?Which version covers more stuff?Thanks,h_aloha

Data Engineering

2060 Views
2 replies
0 kudos

04-18-2023 4:15:09 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 9:11:13 PM

0 kudos

Hi @Helen Morgen Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training and our team will get back to you shortly.

0 kudos

04-23-2023 9:11:13 PM

1 More Replies

by User16790091296 • Contributor II

06-25-2021 3:34:38 PM

10807 Views
3 replies
0 kudos

What is the difference between single-tenant and E2 architecture?

Data Engineering

10807 Views
3 replies
0 kudos

06-25-2021 3:34:38 PM

View Replies

Latest Reply

NubeEra
New Contributor II

03-17-2023 4:02:54 AM

0 kudos

Databricks provides 4 main deployment models they are:Public Cloud Deployment Model: Databricks can be deployed on public cloud platforms such as AWS, Azure, and Google Cloud Platform. This is the most common deployment model for Databricks and provi...

0 kudos

03-17-2023 4:02:54 AM

2 More Replies

by chanansh • Contributor

01-18-2023 7:48:14 AM

1784 Views
2 replies
0 kudos

how to compute difference over time of a spark structure streaming?

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...

Data Engineering

1784 Views
2 replies
0 kudos

01-18-2023 7:48:14 AM

View Replies

Latest Reply

chanansh
Contributor

02-08-2023 5:32:54 AM

0 kudos

I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

0 kudos

02-08-2023 5:32:54 AM

1 More Replies

by SIRIGIRI • Contributor

01-13-2023 4:08:42 AM

945 Views
1 replies
1 kudos

sharikrishna26.medium.com

Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...

Data Engineering

945 Views
1 replies
1 kudos

01-13-2023 4:08:42 AM

View Replies

Latest Reply

sher
Valued Contributor II

01-14-2023 1:08:18 AM

1 kudos

thanks for sharing

1 kudos

01-14-2023 1:08:18 AM

by Aj2 • New Contributor III

12-13-2022 12:03:53 AM

13149 Views
1 replies
5 kudos

Resolved! What is the difference between Streaming live table and live table?

Data Engineering

13149 Views
1 replies
5 kudos

12-13-2022 12:03:53 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-13-2022 3:52:38 AM

5 kudos

A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...

5 kudos

12-13-2022 3:52:38 AM

by TariqueAnwer • New Contributor II

09-27-2022 6:58:01 AM

3724 Views
4 replies
3 kudos

Pyspark CSV Incorrect Count

B1123451020-502,"","{""m"": {""difference"": 60}}","","","",2022-02-12T15:40:00.783Z B1456741975-266,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z B1789753479-460,"","",",","","",2022-02-18T14:46:57.332Z B1456741977-123,"","{""...

Data Engineering

3724 Views
4 replies
3 kudos

09-27-2022 6:58:01 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-20-2022 7:43:24 PM

3 kudos

Hi @Tarique Anwer Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

3 kudos

11-20-2022 7:43:24 PM

3 More Replies

by irfanaziz • Contributor II

01-13-2022 4:39:22 AM

3243 Views
3 replies
1 kudos

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8...

Data Engineering

3243 Views
3 replies
1 kudos

01-13-2022 4:39:22 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

02-08-2022 4:41:55 PM

1 kudos

Hi @nafri A ,What is the error you are getting, can you share it please? Like @Hubert Dudek mentioned, both will call the same APIs

1 kudos

02-08-2022 4:41:55 PM

2 More Replies

by brickster_2018 • Databricks Employee

06-23-2021 12:25:51 PM

2268 Views
1 replies
0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

Data Engineering

2268 Views
1 replies
0 kudos

06-23-2021 12:25:51 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-23-2021 12:29:28 PM

0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

0 kudos

06-23-2021 12:29:28 PM

by User15787040559 • Databricks Employee

06-22-2021 3:31:30 PM

3262 Views
1 replies
0 kudos

What's the difference between Normalization and Standardization?

Normalization typically means rescales the values into a range of [0,1].Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).

Data Engineering

3262 Views
1 replies
0 kudos

06-22-2021 3:31:30 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 10:37:08 PM

0 kudos

Normalization typically means rescales the values into a range of [0,1]. Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).A link which explains better is - https://towardsdatascience.com...

0 kudos

06-22-2021 10:37:08 PM

by aladda • Databricks Employee

06-19-2021 8:49:44 PM

1936 Views
1 replies
0 kudos

Resolved! What is the difference between coalesce and repartition when it comes to shuffle partitions in spark

Data Engineering

1936 Views
1 replies
0 kudos

06-19-2021 8:49:44 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:51:39 PM

0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order. Ex:- when you want to write-out a single CSV file output instea...

0 kudos

06-19-2021 8:51:39 PM

by aladda • Databricks Employee

06-19-2021 8:31:09 PM

3503 Views
1 replies
0 kudos

Resolved! What is the difference between a Transformation and Action in Spark?

Data Engineering

3503 Views
1 replies
0 kudos

06-19-2021 8:31:09 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:37:50 PM

0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

0 kudos

06-19-2021 8:37:50 PM

by aladda • Databricks Employee

06-08-2021 1:14:12 PM

18888 Views
2 replies
0 kudos

Resolved! What's the difference between %run vs dbutils.notebook.run

Data Engineering

18888 Views
2 replies
0 kudos

06-08-2021 1:14:12 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:29:47 PM

0 kudos

%run is copying code from another notebook and executing it within the one its called from. All variables defined in the notebook being called are therefore visible to the caller notebook dbutils.notebook.run() is more around executing different note...

0 kudos

06-19-2021 8:29:47 PM

1 More Replies

Databricks Community

Difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2

What is the difference between single-tenant and E2 architecture?

how to compute difference over time of a spark structure streaming?

sharikrishna26.medium.com

Resolved! What is the difference between Streaming live table and live table?

Pyspark CSV Incorrect Count

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

What's the difference between Normalization and Standardization?

Resolved! What is the difference between coalesce and repartition when it comes to shuffle partitions in spark

Resolved! What is the difference between a Transformation and Action in Spark?

Resolved! What's the difference between %run vs dbutils.notebook.run