cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

h_aloha
by New Contributor III
  • 1774 Views
  • 2 replies
  • 0 kudos

Difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2

Hi,Does anyone know what's the difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2?Looks like there is no practice exam for V3?Which version covers more stuff?Thanks,h_aloha

  • 1774 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Helen Morgen​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 0 kudos
1 More Replies
User16790091296
by Contributor II
  • 8923 Views
  • 3 replies
  • 0 kudos
  • 8923 Views
  • 3 replies
  • 0 kudos
Latest Reply
NubeEra
New Contributor II
  • 0 kudos

Databricks provides 4 main deployment models they are:Public Cloud Deployment Model: Databricks can be deployed on public cloud platforms such as AWS, Azure, and Google Cloud Platform. This is the most common deployment model for Databricks and provi...

  • 0 kudos
2 More Replies
chanansh
by Contributor
  • 1447 Views
  • 2 replies
  • 0 kudos

how to compute difference over time of a spark structure streaming?

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...

  • 1447 Views
  • 2 replies
  • 0 kudos
Latest Reply
chanansh
Contributor
  • 0 kudos

I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

  • 0 kudos
1 More Replies
SIRIGIRI
by Contributor
  • 786 Views
  • 1 replies
  • 1 kudos

sharikrishna26.medium.com

Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...

  • 786 Views
  • 1 replies
  • 1 kudos
Latest Reply
sher
Valued Contributor II
  • 1 kudos

thanks for sharing

  • 1 kudos
Aj2
by New Contributor III
  • 11579 Views
  • 1 replies
  • 5 kudos
  • 11579 Views
  • 1 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...

  • 5 kudos
TariqueAnwer
by New Contributor II
  • 3121 Views
  • 5 replies
  • 3 kudos

Pyspark CSV Incorrect Count

B1123451020-502,"","{""m"": {""difference"": 60}}","","","",2022-02-12T15:40:00.783Z B1456741975-266,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z B1789753479-460,"","",",","","",2022-02-18T14:46:57.332Z B1456741977-123,"","{""...

  • 3121 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Tarique Anwer​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 3 kudos
4 More Replies
irfanaziz
by Contributor II
  • 2761 Views
  • 3 replies
  • 1 kudos

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8...

  • 2761 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @nafri A​ ,What is the error you are getting, can you share it please? Like @Hubert Dudek​ mentioned, both will call the same APIs

  • 1 kudos
2 More Replies
Kaniz_Fatma
by Community Manager
  • 1790 Views
  • 1 replies
  • 0 kudos
  • 1790 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

The differences are as follows:-Pig operates on the client-side of a cluster whereas Hive operates on the server-side of a cluster.Pig uses pig-Latin language whereas Hive uses HiveQL language.Pig is a Procedural Data Flow Language whereas Hive is a ...

  • 0 kudos
Kaniz_Fatma
by Community Manager
  • 796 Views
  • 1 replies
  • 0 kudos
  • 796 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Typically a method is associated to an object and/or class while a function is not. For example, the following class has a single method called "my_method":class MyClass():   def __init__(self, a): self.a = a     def my_method(self): ...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1986 Views
  • 1 replies
  • 0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

  • 1986 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

  • 0 kudos
User15787040559
by New Contributor III
  • 2601 Views
  • 1 replies
  • 0 kudos

What's the difference between Normalization and Standardization?

Normalization typically means rescales the values into a range of [0,1].Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).

  • 2601 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Normalization typically means rescales the values into a range of [0,1]. Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).A link which explains better is - https://towardsdatascience.com...

  • 0 kudos
aladda
by Honored Contributor II
  • 1495 Views
  • 1 replies
  • 0 kudos
  • 1495 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.  Ex:- when you want to write-out a single CSV file output instea...

  • 0 kudos
aladda
by Honored Contributor II
  • 3091 Views
  • 1 replies
  • 0 kudos
  • 3091 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

  • 0 kudos
Labels