cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

busuu
by New Contributor
  • 135 Views
  • 3 replies
  • 1 kudos

Failed to checkout Git repository: RESOURCE_DOES_NOT_EXIST: Attempted to move non-existing node

I'm having issues with checking out Git repo in Workflows. Databricks can access files from commit `a` but fails to checkout the branch when attempting to access commit `b`. The error occurs specifically when trying to checkout commit `b`, and Databr...

busuu_0-1738776211583.png
  • 135 Views
  • 3 replies
  • 1 kudos
Latest Reply
Augustus
New Contributor II
  • 1 kudos

I didn't do anything to fix it. Databricks support did something to my workspace to fix the issue. 

  • 1 kudos
2 More Replies
melikaabedi
by New Contributor
  • 26 Views
  • 1 replies
  • 0 kudos

databricks apps

Imagine I develop an app in Databricks with #databricks-apps. Is it possible for someone outside the organization to use it just by accessing a URL, without having a Databricks account? thank you in advance for your hel

  • 26 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @melikaabedi, No, only users in the account can access a Databricks app, same way you would with AI/BI dashboards.

  • 0 kudos
Austin1
by New Contributor
  • 35 Views
  • 0 replies
  • 0 kudos

VSCode Integration for Data Science Analysts

Probably not posting this in the right forum, but can't find a good fit.This is a bit convuluted because we make things hard at work. I have access to a single LLM via VSCode (Amazon Q).  Since I can't use that within Databricks but I want my team to...

  • 35 Views
  • 0 replies
  • 0 kudos
ggsmith
by Contributor
  • 329 Views
  • 8 replies
  • 3 kudos

Resolved! Workflow SQL Task Query Showing Empty

I am trying to create a SQL task in Workflows. I have my query which executes successfully in the SQL editor, and it is saved in a repo.However, when I try to execute the task, the below error shows.Query text can not be empty: BAD_REQUEST: Query tex...

ggsmith_0-1738014329449.png ggsmith_1-1738014420683.png ggsmith_2-1738014505322.png
  • 329 Views
  • 8 replies
  • 3 kudos
Latest Reply
ggsmith
Contributor
  • 3 kudos

It ended up being that the query wasn't actually saved. Once I manually clicked save, the query preview showed and the task ran successfully. I'm really surprised that was the reason. I had moved the query around to different folders and closed and r...

  • 3 kudos
7 More Replies
jonhieb
by New Contributor
  • 102 Views
  • 2 replies
  • 0 kudos

Resolved! [Databricks Asset Bundles] Triggering Delta Live Tables

I would like to know how to schedule a DLT pipeline using DAB's.I'm trying to trigger a Delta Live Table pipeline using Databricks Asset Bundles. Below is my YAML code:resources:  pipelines:    data_quality_pipelines:      name: data_quality_pipeline...

  • 102 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As of now, Databricks Asset Bundles do not support direct scheduling of DLT pipelines using cron expressions within the bundle configuration. Instead, you can achieve scheduling by creating a Databricks job that triggers the DLT pipeline and then sch...

  • 0 kudos
1 More Replies
noorbasha534
by Contributor
  • 60 Views
  • 1 replies
  • 0 kudos

Spot instances usage in Azure Databricks

Hi all,as per the below article -https://community.databricks.com/t5/technical-blog/optimize-costs-for-your-data-and-ai-workloads-with-azure-and-aws/ba-p/662411. it is possible to choose the number of spot instances using 'availability' parameter. Bu...

  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @noorbasha534, Thanks for your question! 1. 'availability' Parameter: The 'availability' parameter in Azure Databricks controls whether the compute uses on-demand or spot instances. The values for this parameter are: ON_DEMAND_AZURE: This value...

  • 0 kudos
noorbasha534
by Contributor
  • 69 Views
  • 1 replies
  • 0 kudos

Data processing metrics

Dear all,What are some proven ways of capturing data processing metrics (number of rows processed/updated/inserted, number of micro-batches etc etc) in a PySpark/SQL code based notebook irrespective of the fact it uses auto-loader, structured streami...

  • 69 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @noorbasha534, You can use the StreamingQueryListener interface to capture metrics like the number of input rows, processing time, and batch duration. This can be integrated into your PySpark code to log these metrics in real-time. Example: from ...

  • 0 kudos
noorbasha534
by Contributor
  • 75 Views
  • 1 replies
  • 0 kudos

Error handling - SQL states

Dear all,Few questions please - 1. Has anyone successfully used the below way of dealing with error handling in PySpark (example: that contains data frames) as well as SQL code based notebooks - from pyspark.errors import PySparkException try: spa...

  • 75 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @noorbasha534,   The approach you mentioned for error handling in PySpark using PySparkException is a valid method. It allows you to catch specific exceptions related to PySpark operations and handle them accordingly. Logging errors into tables ...

  • 0 kudos
JrV
by New Contributor
  • 252 Views
  • 2 replies
  • 0 kudos

Sparql and RDF data

Hello Databricks Community,Does anyone have experience with running SPARQL (https://en.wikipedia.org/wiki/SPARQL) queries in Databricks?Make a connection to the Community SolidServer https://github.com/CommunitySolidServer/CommunitySolidServerand que...

  • 252 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

You can use the rdflib library to connect to the Community SolidServer and execute SPARQL queries. from rdflib import Graph  

  • 0 kudos
1 More Replies
kazinahian
by New Contributor III
  • 3464 Views
  • 2 replies
  • 1 kudos

How can I create a new calculated field in databricks by using pyspark.

Hello:Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. Appreciate your empathic solution. 

Data Engineering
calculation
  • 3464 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

I want to group by "category" "subcategory" and "monthly" sales value.  sub_total_df = df.groupBy("category", "subcategory", "monthly").agg(sum("sales_value").alias("sub_total")) You could always type in your query in the Databricks notebook, by clic...

  • 1 kudos
1 More Replies
hardeeksharma
by New Contributor II
  • 56 Views
  • 1 replies
  • 1 kudos

Data ingestion issue with THAI data

I have a use case where my file has data in Thai characters. The source location is azure blob storage, here files are stored in text format. I am using the following code to read the file, but when I am downloading the data from catalog it encloses ...

  • 56 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Do the quotes exist in original data?

  • 1 kudos
dc-rnc
by New Contributor II
  • 106 Views
  • 1 replies
  • 1 kudos

Resolved! How to deploy an asset bundle job that triggers another one

Hello everyone.Using DAB, is there a dynamic value reference or something equivalent to get a job_id to be used inside the YAML definition of another Databricks job? I'd like to trigger that job from another one, but if I'm using a CI/CD pipeline to ...

  • 106 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

resources: jobs: my-first-job: name: my-first-job tasks: - task_key: my-first-job-task new_cluster: spark_version: "13.3.x-scala2.12" node_type_id: "i3.xlarge" num_workers: 2 ...

  • 1 kudos
Nes_Hdr
by New Contributor III
  • 1153 Views
  • 12 replies
  • 1 kudos

Limitations for Unity Catalog on single user access mode clusters

Hello! According to Databricks documentation on azure :"On Databricks Runtime 15.3 and below, fine-grained access control on single user compute is not supported. Specifically:You cannot access a table that has a row filter or column mask.You cannot ...

Nes_Hdr_0-1732872787713.png
  • 1153 Views
  • 12 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@Nes_Hdr Single user compute uses fine-grained access control to access the tables with RLS/CLM enabled.There is no specific details about OPTIMIZE being supported in Single user mode. Under this doc limitations of FGAC mentions that  "No support for...

  • 1 kudos
11 More Replies
ashraf1395
by Valued Contributor
  • 108 Views
  • 3 replies
  • 0 kudos

Databricks Workflow design

I have 7 - 8 different dlt pipelines which have to be run at the same time according to their batch type i.e. hourly and daily. Right now they are triggered effectively according to their batch type. I want to move to a next stage where I want to clu...

  • 108 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Valued Contributor
  • 0 kudos

Hi @VZLA , I got the idea. There will be a small change in the way, we will use it. Since we don't schedule the workflow in databricks we trigger it using the API. So I will pass a job parameter along with the trigger according to the timestamp wheth...

  • 0 kudos
2 More Replies
Mattias
by New Contributor II
  • 188 Views
  • 3 replies
  • 0 kudos

How to increase timeout in Databricks Workflows DBT task

Hi,I have a Databricks Workflows DBT task that targets a PRO SQL warehouse. However, the task fails with a "to many retries" error (see below) if the PRO SQL warehouse is not up and running when the task starts. How can I increase the timeout or allo...

  • 188 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mattias
New Contributor II
  • 0 kudos

One option seems to be to reference a custom "profiles.yml" in the job configuration and specify a custom DBT Databricks connector timeout there (https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters).However,...

  • 0 kudos
2 More Replies
Labels