cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 4841 Views
  • 2 replies
  • 0 kudos

Resolved! Is Spark Driver a synonym for Spark Master daemon

If I understand correctly, Spark driver is a master process. Is it the same as the Spark Master. I get confused with the Spark master and Spark driver.

  • 4841 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is a common misconception. Spark Master and Spark driver are two independent and isolated JVM's running on the same instance. Spark Master's responsibilities are to ensure the Spark worker's daemons are up and running and monitor the health. Als...

  • 0 kudos
1 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 4430 Views
  • 1 replies
  • 3 kudos

Key Advantages of Serverless Compute in Databricks

Serverless compute in Databricks offers several advantages, enhancing efficiency, scalability, and ease of use. Here are some key benefits:1. Simplified Infrastructure ManagementNo Server Management: Users don't need to manage or configure servers or...

  • 4430 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ashu24
Contributor
  • 3 kudos

Thanks for the clear understanding 

  • 3 kudos
Pritam
by New Contributor II
  • 4758 Views
  • 4 replies
  • 1 kudos

Not able create Job via Jobs api in databricks

I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...

  • 4758 Views
  • 4 replies
  • 1 kudos
Latest Reply
rAlex
New Contributor III
  • 1 kudos

@Pritam Arya​  I had the same problem today. In order to use the JSON that you can get from the GUI in an existing job, in a request to the Jobs API, you want to use just the JSON that is the value of the settings key.

  • 1 kudos
3 More Replies
c-thiel
by New Contributor
  • 843 Views
  • 0 replies
  • 0 kudos

APPLY INTO Highdate instead of NULL for __END_AT

I really like the APPLY INTO function to keep track of changes and historize them in SCD2.However, I am a bit confused that current records get an __END_AT of NULL. Typically, __END_AT should be a highgate (i.e. 9999-12-31) or similar, so that a poin...

  • 843 Views
  • 0 replies
  • 0 kudos
m997al
by Contributor III
  • 889 Views
  • 2 replies
  • 0 kudos

Cannot use Databricks REST API for secrets "get" inside bash script (byte format only)

Hi,I am trying to use Databricks-backed secret scopes inside Azure DevOps pipelines.  I am almost successful.I can use the REST API to "get" a secret value back inside my bash script, but, the value is in byte format, so it is unusable as a local var...

  • 889 Views
  • 2 replies
  • 0 kudos
Latest Reply
m997al
Contributor III
  • 0 kudos

I wanted to add an addendum to this.  So in Azure DevOps, when working with yaml files, you can use the Azure DevOps pipelines "Library" to load environment variables.  When you look at those environment variables, in the Azure DevOps pipeline librar...

  • 0 kudos
1 More Replies
Prashanth24
by New Contributor III
  • 1856 Views
  • 4 replies
  • 4 kudos

Number of Min and Max nodes count for processing 5 TB data

I need to ingest full load with 5 TB of data by applying business transformations and wants to process it in 2-3 hours. Any criteria needs to be considered while selecting min and max node workers for this full load processing. 

  • 1856 Views
  • 4 replies
  • 4 kudos
Latest Reply
joeharris76
New Contributor II
  • 4 kudos

Need more details about the workload to fully advise but generally speaking:use the latest generation of cloud instances enable Unity Catalogenable PhotonIf the source data is raw CSV then the load should scale linearly. For example, if 64 nodes comp...

  • 4 kudos
3 More Replies
Maatari
by New Contributor III
  • 935 Views
  • 0 replies
  • 0 kudos

Readying a partitioned Table in Spark Structured Streaming

Does the pre-partitioning of a Delta Table has an influence on the number of "default" Partition of a Dataframe when readying the data?Put differently, using spark structured streaming, when readying from a delta table, is the number of Dataframe par...

  • 935 Views
  • 0 replies
  • 0 kudos
Maatari
by New Contributor III
  • 796 Views
  • 0 replies
  • 0 kudos

Chaining stateful Operator

I would like to do a groupby followed by a join in structured streaming. I would read from from two delta table in snapshot mode i.e. latest snapshot.My question is specifically about chaining the stateful operator. groupby is update modechaning grou...

  • 796 Views
  • 0 replies
  • 0 kudos
raghunathr
by New Contributor III
  • 1541 Views
  • 3 replies
  • 1 kudos

Service Account Access granted still getting as User does not have USE SCHEMA on Schema

Hi All, We have ran into scenario, where Azure Data Factory connecting to Azure Data Bricks through linkedServices, Where its trying to connect with System Assigned Managed Identity (SAMI). Specific SAMI added to compute and unity catalog for usage.s...

Data Engineering
azure_data_factory
azure_databricks
grants
permission_issue
unity_catlog
  • 1541 Views
  • 3 replies
  • 1 kudos
Latest Reply
raghunathr
New Contributor III
  • 1 kudos

Still we have trouble on external_storage location now. That specific Managed Identity which added to Databricks Resource now got everything needed for Unity Catalog DEV/Tables. But, Even in External Location that SPN added but still getting error as...

  • 1 kudos
2 More Replies
mv-rs
by New Contributor
  • 1385 Views
  • 0 replies
  • 0 kudos

Structured streaming not working with Serverless compute

Hi,I have a structured streaming process that is working with a normal compute but when attempting to run using Serverless, the pipeline is failing, and I'm being met with the error seen in the image below.CONTEXT: I have a Git repo with two folders,...

  • 1385 Views
  • 0 replies
  • 0 kudos
mahfooziiitian
by New Contributor II
  • 1399 Views
  • 3 replies
  • 0 kudos

get saved query by name using rest API or databricks SDK

Hi All,I want to get the saved query by name using rest API or databricks SDK. So It do not find any direct end point or method which can give us the saved query by name.I have one solution as given below:get the list all queriesfilter the my queries...

Data Engineering
python
REST API
Saved Queries
  • 1399 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mahfooziiitian ,The answer is no, currently you can get saved query only by id. If your are afraid of exceeding concurrent calls, then design a process that as a first step will use list queries endpoint to extract endpoint IDs and names and save...

  • 0 kudos
2 More Replies
gb_dbx
by New Contributor II
  • 867 Views
  • 4 replies
  • 3 kudos

Does Databricks plan to create a Python API of the COPY INTO spark SQL statement in the future ?

Hi,I am wondering if Databricks has planned to create a Python API of spark SQL's COPY INTO statement ?In my company we created some kind of a Python wrapper of the SQL COPY INTO statement, but it has lots of design issues and is hard to maintain. I ...

  • 867 Views
  • 4 replies
  • 3 kudos
Latest Reply
gb_dbx
New Contributor II
  • 3 kudos

Okay maybe I should take a look at Auto Loader then, I didn't know Auto Loader could basically do the same as COPY INTO, I originally thought it was only used for streaming and not batch ingestion.And Auto Loader has a dedicated Python API then ?And ...

  • 3 kudos
3 More Replies
biafch
by Contributor
  • 1010 Views
  • 2 replies
  • 2 kudos

How to load a json file in pyspark with colon character in folder name

Hi,I have a folder that contains subfolders that have json files.My subfolders look like this:2024-08-12T09:34:37:452Z2024-08-12T09:25:45:185ZI attach these subfolder names to a variable called FolderName and then try to read my json file like this:d...

  • 1010 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @biafch ,I've tried to replicate your example and it worked for me. But it seems that it is common problem and some object storage may not support that.[HADOOP-14217] Object Storage: support colon in object path - ASF JIRA (apache.org)Which object...

  • 2 kudos
1 More Replies
sarguido
by New Contributor II
  • 4225 Views
  • 5 replies
  • 2 kudos

Delta Live Tables: bulk import of historical data?

Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...

  • 4225 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sarah Guido​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels