cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

h_aloha
by New Contributor III
  • 1195 Views
  • 2 replies
  • 0 kudos

Difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2

Hi,Does anyone know what's the difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2?Looks like there is no practice exam for V3?Which version covers more stuff?Thanks,h_aloha

  • 1195 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Helen Morgen​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 0 kudos
1 More Replies
User16790091296
by Contributor II
  • 6173 Views
  • 3 replies
  • 0 kudos
  • 6173 Views
  • 3 replies
  • 0 kudos
Latest Reply
NubeEra
New Contributor II
  • 0 kudos

Databricks provides 4 main deployment models they are:Public Cloud Deployment Model: Databricks can be deployed on public cloud platforms such as AWS, Azure, and Google Cloud Platform. This is the most common deployment model for Databricks and provi...

  • 0 kudos
2 More Replies
chanansh
by Contributor
  • 788 Views
  • 2 replies
  • 0 kudos

how to compute difference over time of a spark structure streaming?

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...

  • 788 Views
  • 2 replies
  • 0 kudos
Latest Reply
chanansh
Contributor
  • 0 kudos

I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

  • 0 kudos
1 More Replies
SIRIGIRI
by Contributor
  • 401 Views
  • 1 replies
  • 1 kudos

sharikrishna26.medium.com

Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...

  • 401 Views
  • 1 replies
  • 1 kudos
Latest Reply
sher
Valued Contributor II
  • 1 kudos

thanks for sharing

  • 1 kudos
Aj2
by New Contributor III
  • 8952 Views
  • 1 replies
  • 4 kudos
  • 8952 Views
  • 1 replies
  • 4 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 4 kudos

A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...

  • 4 kudos
TariqueAnwer
by New Contributor II
  • 1817 Views
  • 5 replies
  • 3 kudos

Pyspark CSV Incorrect Count

B1123451020-502,"","{""m"": {""difference"": 60}}","","","",2022-02-12T15:40:00.783Z B1456741975-266,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z B1789753479-460,"","",",","","",2022-02-18T14:46:57.332Z B1456741977-123,"","{""...

  • 1817 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Tarique Anwer​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 3 kudos
4 More Replies
irfanaziz
by Contributor II
  • 1695 Views
  • 3 replies
  • 1 kudos

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8...

  • 1695 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @nafri A​ ,What is the error you are getting, can you share it please? Like @Hubert Dudek​ mentioned, both will call the same APIs

  • 1 kudos
2 More Replies
Kaniz
by Community Manager
  • 716 Views
  • 1 replies
  • 0 kudos
  • 716 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

The differences are as follows:-Pig operates on the client-side of a cluster whereas Hive operates on the server-side of a cluster.Pig uses pig-Latin language whereas Hive uses HiveQL language.Pig is a Procedural Data Flow Language whereas Hive is a ...

  • 0 kudos
Kaniz
by Community Manager
  • 455 Views
  • 1 replies
  • 0 kudos
  • 455 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Typically a method is associated to an object and/or class while a function is not. For example, the following class has a single method called "my_method":class MyClass():   def __init__(self, a): self.a = a     def my_method(self): ...

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 1345 Views
  • 1 replies
  • 0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

  • 1345 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

  • 0 kudos
User15787040559
by New Contributor III
  • 1719 Views
  • 1 replies
  • 0 kudos

What's the difference between Normalization and Standardization?

Normalization typically means rescales the values into a range of [0,1].Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).

  • 1719 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Normalization typically means rescales the values into a range of [0,1]. Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).A link which explains better is - https://towardsdatascience.com...

  • 0 kudos
aladda
by Honored Contributor II
  • 744 Views
  • 1 replies
  • 0 kudos
  • 744 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.  Ex:- when you want to write-out a single CSV file output instea...

  • 0 kudos
aladda
by Honored Contributor II
  • 2254 Views
  • 1 replies
  • 0 kudos
  • 2254 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

  • 0 kudos
Labels