cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Prashanth24
by New Contributor III
  • 3977 Views
  • 5 replies
  • 4 kudos

Databricks Worker node - Would like to know number of memory in each core

Under Databricks Compute and Worker nodes, we find different types of types as belowStandard_D4ds_v5 => 16 GB Memory, 4 CoresStandard_D8ds_v5 => 32 GB Memory, 8 CoresIn Databricks, each node will have one executor. I have questions below(1) How much ...

  • 3977 Views
  • 5 replies
  • 4 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 4 kudos

Hi @Prashanth24, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.   If the response resolves your issue, kindly mark it as the accepted solutio...

  • 4 kudos
4 More Replies
koantek_user
by New Contributor
  • 1913 Views
  • 2 replies
  • 0 kudos

geometric functions in databricks

Hi All,We are working on a migration project from snowflake to databricks and there are some scripts that utilizegeometric functions like st_makepoint, st_geohash from snowflake scripts which we need to convert to databricksHas some encountered this ...

  • 1913 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @koantek_user, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This w...

  • 0 kudos
1 More Replies
rameshybr
by New Contributor II
  • 2127 Views
  • 2 replies
  • 1 kudos

Resolved! Data Bricks workflow - Executing the workflow with different input parameters with concurrently

How can I trigger a workflow concurrently (multiple times) with different input parameters? Please share your thoughts or any related articles.

  • 2127 Views
  • 2 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @rameshybr, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will...

  • 1 kudos
1 More Replies
koantek_user
by New Contributor
  • 5456 Views
  • 2 replies
  • 0 kudos

lateral view explode in databricks - need help

We are working on snowflake to databricks migration and we encountered the lateral flatten function of snowflake which we tried to convert to lateral view explode in databricks- but its output is a subset of lateral flatten-----------------------http...

  • 5456 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @koantek_user, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This w...

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 6103 Views
  • 2 replies
  • 0 kudos

Resolved! Is Spark Driver a synonym for Spark Master daemon

If I understand correctly, Spark driver is a master process. Is it the same as the Spark Master. I get confused with the Spark master and Spark driver.

  • 6103 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is a common misconception. Spark Master and Spark driver are two independent and isolated JVM's running on the same instance. Spark Master's responsibilities are to ensure the Spark worker's daemons are up and running and monitor the health. Als...

  • 0 kudos
1 More Replies
Rishabh-Pandey
by Databricks MVP
  • 12461 Views
  • 1 replies
  • 3 kudos

Key Advantages of Serverless Compute in Databricks

Serverless compute in Databricks offers several advantages, enhancing efficiency, scalability, and ease of use. Here are some key benefits:1. Simplified Infrastructure ManagementNo Server Management: Users don't need to manage or configure servers or...

  • 12461 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ashu24
Contributor
  • 3 kudos

Thanks for the clear understanding 

  • 3 kudos
Pritam
by New Contributor II
  • 6603 Views
  • 4 replies
  • 1 kudos

Not able create Job via Jobs api in databricks

I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...

  • 6603 Views
  • 4 replies
  • 1 kudos
Latest Reply
rAlex
New Contributor III
  • 1 kudos

@Pritam Arya​  I had the same problem today. In order to use the JSON that you can get from the GUI in an existing job, in a request to the Jobs API, you want to use just the JSON that is the value of the settings key.

  • 1 kudos
3 More Replies
m997al
by Contributor III
  • 1581 Views
  • 2 replies
  • 0 kudos

Cannot use Databricks REST API for secrets "get" inside bash script (byte format only)

Hi,I am trying to use Databricks-backed secret scopes inside Azure DevOps pipelines.  I am almost successful.I can use the REST API to "get" a secret value back inside my bash script, but, the value is in byte format, so it is unusable as a local var...

  • 1581 Views
  • 2 replies
  • 0 kudos
Latest Reply
m997al
Contributor III
  • 0 kudos

I wanted to add an addendum to this.  So in Azure DevOps, when working with yaml files, you can use the Azure DevOps pipelines "Library" to load environment variables.  When you look at those environment variables, in the Azure DevOps pipeline librar...

  • 0 kudos
1 More Replies
Prashanth24
by New Contributor III
  • 3017 Views
  • 4 replies
  • 4 kudos

Number of Min and Max nodes count for processing 5 TB data

I need to ingest full load with 5 TB of data by applying business transformations and wants to process it in 2-3 hours. Any criteria needs to be considered while selecting min and max node workers for this full load processing. 

  • 3017 Views
  • 4 replies
  • 4 kudos
Latest Reply
joeharris76
Databricks Employee
  • 4 kudos

Need more details about the workload to fully advise but generally speaking:use the latest generation of cloud instances enable Unity Catalogenable PhotonIf the source data is raw CSV then the load should scale linearly. For example, if 64 nodes comp...

  • 4 kudos
3 More Replies
raghunathr
by New Contributor III
  • 2579 Views
  • 3 replies
  • 1 kudos

Service Account Access granted still getting as User does not have USE SCHEMA on Schema

Hi All, We have ran into scenario, where Azure Data Factory connecting to Azure Data Bricks through linkedServices, Where its trying to connect with System Assigned Managed Identity (SAMI). Specific SAMI added to compute and unity catalog for usage.s...

Data Engineering
azure_data_factory
azure_databricks
grants
permission_issue
unity_catlog
  • 2579 Views
  • 3 replies
  • 1 kudos
Latest Reply
raghunathr
New Contributor III
  • 1 kudos

Still we have trouble on external_storage location now. That specific Managed Identity which added to Databricks Resource now got everything needed for Unity Catalog DEV/Tables. But, Even in External Location that SPN added but still getting error as...

  • 1 kudos
2 More Replies
mahfooziiitian
by New Contributor II
  • 2341 Views
  • 3 replies
  • 0 kudos

get saved query by name using rest API or databricks SDK

Hi All,I want to get the saved query by name using rest API or databricks SDK. So It do not find any direct end point or method which can give us the saved query by name.I have one solution as given below:get the list all queriesfilter the my queries...

Data Engineering
python
REST API
Saved Queries
  • 2341 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mahfooziiitian ,The answer is no, currently you can get saved query only by id. If your are afraid of exceeding concurrent calls, then design a process that as a first step will use list queries endpoint to extract endpoint IDs and names and save...

  • 0 kudos
2 More Replies
gb_dbx
by New Contributor II
  • 1701 Views
  • 4 replies
  • 3 kudos

Does Databricks plan to create a Python API of the COPY INTO spark SQL statement in the future ?

Hi,I am wondering if Databricks has planned to create a Python API of spark SQL's COPY INTO statement ?In my company we created some kind of a Python wrapper of the SQL COPY INTO statement, but it has lots of design issues and is hard to maintain. I ...

  • 1701 Views
  • 4 replies
  • 3 kudos
Latest Reply
gb_dbx
New Contributor II
  • 3 kudos

Okay maybe I should take a look at Auto Loader then, I didn't know Auto Loader could basically do the same as COPY INTO, I originally thought it was only used for streaming and not batch ingestion.And Auto Loader has a dedicated Python API then ?And ...

  • 3 kudos
3 More Replies
biafch
by Contributor
  • 2020 Views
  • 2 replies
  • 2 kudos

How to load a json file in pyspark with colon character in folder name

Hi,I have a folder that contains subfolders that have json files.My subfolders look like this:2024-08-12T09:34:37:452Z2024-08-12T09:25:45:185ZI attach these subfolder names to a variable called FolderName and then try to read my json file like this:d...

  • 2020 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @biafch ,I've tried to replicate your example and it worked for me. But it seems that it is common problem and some object storage may not support that.[HADOOP-14217] Object Storage: support colon in object path - ASF JIRA (apache.org)Which object...

  • 2 kudos
1 More Replies
sarguido
by New Contributor II
  • 6076 Views
  • 5 replies
  • 2 kudos

Delta Live Tables: bulk import of historical data?

Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...

  • 6076 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sarah Guido​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
4 More Replies
bulbur
by New Contributor II
  • 2709 Views
  • 1 replies
  • 0 kudos

Use pandas in DLT pipeline

Hi,I am trying to work with pandas in a delta live table. I have created some example code: import pandas as pd import pyspark.sql.functions as F pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "...

  • 2709 Views
  • 1 replies
  • 0 kudos
Latest Reply
bulbur
New Contributor II
  • 0 kudos

I have taken the advice given by the documentation (However, you can include these functions outside of table or view function definitions because this code is run once during the graph initialization phase.) and moved the toPandas call to a function...

  • 0 kudos
Labels