cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

h_aloha
by New Contributor III
  • 1926 Views
  • 2 replies
  • 0 kudos

Difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2

Hi,Does anyone know what's the difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2?Looks like there is no practice exam for V3?Which version covers more stuff?Thanks,h_aloha

  • 1926 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Helen Morgen​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 0 kudos
1 More Replies
User16790091296
by Contributor II
  • 10119 Views
  • 3 replies
  • 0 kudos
  • 10119 Views
  • 3 replies
  • 0 kudos
Latest Reply
NubeEra
New Contributor II
  • 0 kudos

Databricks provides 4 main deployment models they are:Public Cloud Deployment Model: Databricks can be deployed on public cloud platforms such as AWS, Azure, and Google Cloud Platform. This is the most common deployment model for Databricks and provi...

  • 0 kudos
2 More Replies
chanansh
by Contributor
  • 1631 Views
  • 2 replies
  • 0 kudos

how to compute difference over time of a spark structure streaming?

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...

  • 1631 Views
  • 2 replies
  • 0 kudos
Latest Reply
chanansh
Contributor
  • 0 kudos

I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

  • 0 kudos
1 More Replies
SIRIGIRI
by Contributor
  • 893 Views
  • 1 replies
  • 1 kudos

sharikrishna26.medium.com

Difference between “ And ‘ in Spark Dataframe APIYou must tell your compiler that you want to represent a string inside a string using a different symbol for the inner string.Here is an example.“ Name = “HARI” “The above is wrong. Why? Because the in...

  • 893 Views
  • 1 replies
  • 1 kudos
Latest Reply
sher
Valued Contributor II
  • 1 kudos

thanks for sharing

  • 1 kudos
Aj2
by New Contributor III
  • 12512 Views
  • 1 replies
  • 5 kudos
  • 12512 Views
  • 1 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...

  • 5 kudos
TariqueAnwer
by New Contributor II
  • 3498 Views
  • 4 replies
  • 3 kudos

Pyspark CSV Incorrect Count

B1123451020-502,"","{""m"": {""difference"": 60}}","","","",2022-02-12T15:40:00.783Z B1456741975-266,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z B1789753479-460,"","",",","","",2022-02-18T14:46:57.332Z B1456741977-123,"","{""...

  • 3498 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Tarique Anwer​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 3 kudos
3 More Replies
irfanaziz
by Contributor II
  • 3056 Views
  • 3 replies
  • 1 kudos

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8...

  • 3056 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @nafri A​ ,What is the error you are getting, can you share it please? Like @Hubert Dudek​ mentioned, both will call the same APIs

  • 1 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 2146 Views
  • 1 replies
  • 0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

  • 2146 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

  • 0 kudos
User15787040559
by Databricks Employee
  • 3044 Views
  • 1 replies
  • 0 kudos

What's the difference between Normalization and Standardization?

Normalization typically means rescales the values into a range of [0,1].Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).

  • 3044 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Normalization typically means rescales the values into a range of [0,1]. Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).A link which explains better is - https://towardsdatascience.com...

  • 0 kudos
aladda
by Databricks Employee
  • 1722 Views
  • 1 replies
  • 0 kudos
  • 1722 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.  Ex:- when you want to write-out a single CSV file output instea...

  • 0 kudos
aladda
by Databricks Employee
  • 3334 Views
  • 1 replies
  • 0 kudos
  • 3334 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

  • 0 kudos
aladda
by Databricks Employee
  • 18465 Views
  • 2 replies
  • 0 kudos
  • 18465 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

%run is copying code from another notebook and executing it within the one its called from. All variables defined in the notebook being called are therefore visible to the caller notebook dbutils.notebook.run() is more around executing different note...

  • 0 kudos
1 More Replies
Labels