cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Srikanth_Gupta_
by Valued Contributor
  • 917 Views
  • 1 replies
  • 0 kudos
  • 917 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

coalesce avoids a full shuffle and could be used to decrease the number of partitionsrepartition results in a full shuffle and could be used to increase or decrease the number of partitions

  • 0 kudos
User16776430979
by New Contributor III
  • 2162 Views
  • 1 replies
  • 0 kudos

Repos branch control – how can we configure a job to run a specific branch?

For example, how can we ensure our jobs always run off the main/master branch?

  • 2162 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16781336501
New Contributor III
  • 0 kudos

We recommend having a top level folder to run jobs against. Best practice detailed here: https://docs.databricks.com/repos.html#best-practices-for-integrating-repos-with-cicd-workflows

  • 0 kudos
User16830818469
by New Contributor
  • 1553 Views
  • 2 replies
  • 0 kudos

Repos integration

Does repos work with on-prem/enterprise bit bucket?

  • 1553 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16781336501
New Contributor III
  • 0 kudos

If you have a private git server (e.g. behind VPN, IP whitelist), you will need to be enrolled in the git proxy private preview to use Repos, please contact your account team.

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 787 Views
  • 1 replies
  • 0 kudos

Time stamp changes in spark sql

Hi Team Is there a way to change the current timestamp from the current time zone to a different time zone .

  • 787 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

import sqlContext.implicits._import org.apache.spark.sql.functions._inputDF.select(   unix_timestamp($"unix_timestamp").alias("unix_timestamp"),   from_utc_timestamp($"unix_timestamp".cast(DataTypes.TimestampType), "UTC").alias("UTC"),   from_utc_tim...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 902 Views
  • 1 replies
  • 0 kudos

I understand Spark Streaming uses micro-batching. Does this increase latency?

I understand Spark Streaming uses micro-batching. Does this increase latency?

  • 902 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

While Spark does use a micro-batch execution model, this does not have much impact on applications, because the batches can be as short as 0.5 seconds. In most applications of streaming big data, the analytics is done over a larger window (say 10 min...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 451 Views
  • 0 replies
  • 0 kudos

Why Unity Catalouge ?Fine-grained permissions: Unity Catalog can enforce permissions for data at the row, column or view level instead of the file lev...

Why Unity Catalouge ?Fine-grained permissions: Unity Catalog can enforce permissions for data at the row, column or view level instead of the file level, so that you can always share just part of your data with a new user without copying it.An open, ...

  • 451 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 1275 Views
  • 1 replies
  • 0 kudos

Resolved! what are the join hints, available in spark 3.0, and how does it help compare to pervious spark version

what are the join hints, available in spark 3.0, and how does it help compare to pervious spark version 

  • 1275 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

4 types of join hints in Spark 3.0BROADCASTMERGESHUFFLE_HASHSHUFFLE_REPLICATE_NLMay be good idea to enable Adaptive Query Execution which speeds up Spark SQL join during run timeIn Spark 3.0, Adaptive Query Execution comes with below featuresDynamica...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1240 Views
  • 1 replies
  • 0 kudos

How is the photon engine different to catalyst optimizer

How is the photon engine different to catalyst optimizer

  • 1240 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

I got this question from some customers and I want ti clarify here tooI think we are conflating two things:Catalyst optimizer is about coming up "Steps to take to execute the query". For example, the optimizer will decide how and when to do the join...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 436 Views
  • 0 replies
  • 0 kudos

databricks.com

Can I create a Delta Lake table on Databricks and query it with open-source Spark?Yes, in order to do this, you would install Open Source Spark and Delta Lake, both are open source. Delta Engine, which is only available on Databricks, will make delta...

  • 436 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 427 Views
  • 0 replies
  • 0 kudos

Data scientist Job Profile will be relevant in the future? By seeing current features in Databricks like AUTO ML, I am assuming that should the Data ...

Data scientist Job Profile will be relevant in the future?By seeing current features in Databricks like AUTO ML, I am assuming that should the Data scientist job will be mostly automated and sooner the data scientist in the company will start decl...

  • 427 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 456 Views
  • 0 replies
  • 0 kudos

Spark 3.0 Pandas UDF  Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new int...

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new interface can also be used for the existing Grouped Aggregate Pandas UDFs. In addition, the old Pandas U...

  • 456 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 587 Views
  • 0 replies
  • 0 kudos

Cluster Sizees on DB sql Cluster size Driver size Worker count 2X-Small i3.2xlarge 1 X-Small i3.2xlarge 2 Small i3.4xlarge 4 Med...

Cluster Sizees on DB sql Cluster size Driver size Worker count 2X-Small i3.2xlarge 1 X-Small i3.2xlarge 2 Small i3.4xlarge 4 Medium i3.8xlarge 8 Large i3.8xlarge 16 X-Large i3.16xlarge 32 2X-Large i3.16xlarge...

  • 587 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 691 Views
  • 0 replies
  • 0 kudos

Muti Cluster Load balancing Multi-cluster Load Balancing: the minimum and maximum number of clusters over which queries sent to the endpoint are distr...

Muti Cluster Load balancingMulti-cluster Load Balancing: the minimum and maximum number of clusters over which queries sent to the endpoint are distributed. The default is Off with a maximum of 1 cluster. When set to On, the default is minimum 1 clus...

  • 691 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels