cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

brickster_2018
by Databricks Employee
  • 4077 Views
  • 2 replies
  • 0 kudos

Resolved! Is Spark Driver a synonym for Spark Master daemon

If I understand correctly, Spark driver is a master process. Is it the same as the Spark Master. I get confused with the Spark master and Spark driver.

  • 4077 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is a common misconception. Spark Master and Spark driver are two independent and isolated JVM's running on the same instance. Spark Master's responsibilities are to ensure the Spark worker's daemons are up and running and monitor the health. Als...

  • 0 kudos
1 More Replies
koantek_user
by New Contributor
  • 1208 Views
  • 1 replies
  • 0 kudos

lateral view explode in databricks - need help

We are working on snowflake to databricks migration and we encountered the lateral flatten function of snowflake which we tried to convert to lateral view explode in databricks- but its output is a subset of lateral flatten-----------------------http...

  • 1208 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Valued Contributor III
  • 0 kudos

Hi AzureSnowflake,I see you're migrating from Snowflake to Databricks and running into some issues with the LATERAL FLATTEN function in Snowflake. Specifically, you're finding that the LATERAL VIEW EXPLODE in Databricks isn't providing the full outpu...

  • 0 kudos
Maatari
by New Contributor III
  • 487 Views
  • 1 replies
  • 0 kudos

Resolved! Pre-Partitioning a delta table to reduce suffling of wide operation

Assuming i need to perfom a groupby i.e. aggregation on a dataset stored in a delta table. If the delta table is partitioned by the field by which to group, can that have an impact on the suffling that the groupby would normally cause ? As a connecte...

  • 487 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Valued Contributor III
  • 0 kudos

Hi Maatari!How are you doing today?When you group data by a column in a Delta table, Spark typically has to shuffle the data to get all the same values together. But if your Delta table is already partitioned by that same column, the shuffling is muc...

  • 0 kudos
rameshybr
by New Contributor II
  • 647 Views
  • 1 replies
  • 1 kudos

Resolved! Data Bricks workflow - Executing the workflow with different input parameters with concurrently

How can I trigger a workflow concurrently (multiple times) with different input parameters? Please share your thoughts or any related articles.

  • 647 Views
  • 1 replies
  • 1 kudos
Latest Reply
Doug-Leal
New Contributor III
  • 1 kudos

See tutorials/articles below with all the steps to create a workflow and pass parameters to the job/workflow: https://docs.databricks.com/en/jobs/create-run-jobs.html https://docs.databricks.com/en/jobs/parameter-value-references.htmlhttps://docs.dat...

  • 1 kudos
Rishabh-Pandey
by Esteemed Contributor
  • 1359 Views
  • 1 replies
  • 3 kudos

Key Advantages of Serverless Compute in Databricks

Serverless compute in Databricks offers several advantages, enhancing efficiency, scalability, and ease of use. Here are some key benefits:1. Simplified Infrastructure ManagementNo Server Management: Users don't need to manage or configure servers or...

  • 1359 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ashu24
Contributor
  • 3 kudos

Thanks for the clear understanding 

  • 3 kudos
Prashanth24
by New Contributor III
  • 1242 Views
  • 4 replies
  • 4 kudos

Databricks Worker node - Would like to know number of memory in each core

Under Databricks Compute and Worker nodes, we find different types of types as belowStandard_D4ds_v5 => 16 GB Memory, 4 CoresStandard_D8ds_v5 => 32 GB Memory, 8 CoresIn Databricks, each node will have one executor. I have questions below(1) How much ...

  • 1242 Views
  • 4 replies
  • 4 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 4 kudos

3. If there is any background process, what are all those activities?Background processes in Databricks include several key activities:Cluster Management: Databricks manages the cluster's lifecycle, including starting, stopping, and scaling up or dow...

  • 4 kudos
3 More Replies
Pritam
by New Contributor II
  • 3662 Views
  • 4 replies
  • 1 kudos

Not able create Job via Jobs api in databricks

I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...

  • 3662 Views
  • 4 replies
  • 1 kudos
Latest Reply
rAlex
New Contributor III
  • 1 kudos

@Pritam Arya​  I had the same problem today. In order to use the JSON that you can get from the GUI in an existing job, in a request to the Jobs API, you want to use just the JSON that is the value of the settings key.

  • 1 kudos
3 More Replies
c-thiel
by New Contributor
  • 208 Views
  • 0 replies
  • 0 kudos

APPLY INTO Highdate instead of NULL for __END_AT

I really like the APPLY INTO function to keep track of changes and historize them in SCD2.However, I am a bit confused that current records get an __END_AT of NULL. Typically, __END_AT should be a highgate (i.e. 9999-12-31) or similar, so that a poin...

  • 208 Views
  • 0 replies
  • 0 kudos
m997al
by Contributor III
  • 530 Views
  • 2 replies
  • 0 kudos

Cannot use Databricks REST API for secrets "get" inside bash script (byte format only)

Hi,I am trying to use Databricks-backed secret scopes inside Azure DevOps pipelines.  I am almost successful.I can use the REST API to "get" a secret value back inside my bash script, but, the value is in byte format, so it is unusable as a local var...

  • 530 Views
  • 2 replies
  • 0 kudos
Latest Reply
m997al
Contributor III
  • 0 kudos

I wanted to add an addendum to this.  So in Azure DevOps, when working with yaml files, you can use the Azure DevOps pipelines "Library" to load environment variables.  When you look at those environment variables, in the Azure DevOps pipeline librar...

  • 0 kudos
1 More Replies
Prashanth24
by New Contributor III
  • 905 Views
  • 3 replies
  • 4 kudos

Number of Min and Max nodes count for processing 5 TB data

I need to ingest full load with 5 TB of data by applying business transformations and wants to process it in 2-3 hours. Any criteria needs to be considered while selecting min and max node workers for this full load processing. 

  • 905 Views
  • 3 replies
  • 4 kudos
Latest Reply
joeharris76
New Contributor II
  • 4 kudos

Need more details about the workload to fully advise but generally speaking:use the latest generation of cloud instances enable Unity Catalogenable PhotonIf the source data is raw CSV then the load should scale linearly. For example, if 64 nodes comp...

  • 4 kudos
2 More Replies
Maatari
by New Contributor III
  • 226 Views
  • 0 replies
  • 0 kudos

Readying a partitioned Table in Spark Structured Streaming

Does the pre-partitioning of a Delta Table has an influence on the number of "default" Partition of a Dataframe when readying the data?Put differently, using spark structured streaming, when readying from a delta table, is the number of Dataframe par...

  • 226 Views
  • 0 replies
  • 0 kudos
Maatari
by New Contributor III
  • 188 Views
  • 0 replies
  • 0 kudos

Chaining stateful Operator

I would like to do a groupby followed by a join in structured streaming. I would read from from two delta table in snapshot mode i.e. latest snapshot.My question is specifically about chaining the stateful operator. groupby is update modechaning grou...

  • 188 Views
  • 0 replies
  • 0 kudos
Paxi
by New Contributor
  • 302 Views
  • 0 replies
  • 0 kudos

Maven libs often failed during installation

Dear Community,I have a Databricks compute where I added 2 Maven libs using a custom repository from Nexus (because of a company policy, Databricks cannot communicate with the public internet, so I must use a private Nexus repo using a firewall). Sin...

  • 302 Views
  • 0 replies
  • 0 kudos
udi_azulay
by New Contributor II
  • 812 Views
  • 2 replies
  • 1 kudos

Variant type table within DLT

Hi,I have a table with Variant type (preview) and works well in 15.3, when i try to run a code that reference this Variant type in a DLT pipeline i get : com.databricks.sql.transaction.tahoe.DeltaUnsupportedTableFeatureException: [DELTA_UNSUPPORTED_F...

  • 812 Views
  • 2 replies
  • 1 kudos
Latest Reply
thomas-totter
New Contributor II
  • 1 kudos

Preview channel version currently is at 15.2. So we should be only one minor version increment away from variant being available in DLT (at least i hope so...).

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels