cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jonhieb
by New Contributor III
  • 4235 Views
  • 1 replies
  • 0 kudos

Resolved! Job deploy with git source using Asset Bundles

Hi, I'm trying to deploy a job with a notebook task based on a git source. But I'm facing an error when I try to deploy.This is the YAML file:resources:jobs:data_quality_pipelines_job:name: schedule_data_quality_jobschedule:quartz_cron_expression: "0...

  • 4235 Views
  • 1 replies
  • 0 kudos
Latest Reply
jonhieb
New Contributor III
  • 0 kudos

I found the solution for this one:When we’re running a job that needs to consume files from a Git repo, in addition to declaring the git_source clause, we also need to declare the source clause within the task configuration.The image below demonstrat...

  • 0 kudos
Niki_M
by New Contributor II
  • 1895 Views
  • 4 replies
  • 0 kudos

Databricks Access Management

Hi, Can someone suggest me what's the least level of access we can provide to any external business user to able to run a notebook and download the results. I tried with giving read and run level permissions at notebook level and also with a user gro...

Niki_M_0-1743796896887.png
  • 1895 Views
  • 4 replies
  • 0 kudos
Latest Reply
saisaran_g
Contributor
  • 0 kudos

Hello @Niki_M , Just adding one more check : did you added the user group to the cluster with minimal access ? Saran

  • 0 kudos
3 More Replies
nbrisson
by New Contributor II
  • 1601 Views
  • 2 replies
  • 0 kudos

Hitting 504 Error with Databricks Delta Share open sharing protocol

When trying to query a particularly large table that my team has been given access to in a share (we are querying the table using a profile file with bearer token), we are continually hitting the following error:io.delta.sharing.client.util.Unexpecte...

  • 1601 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @nbrisson This could be due to large metadata in the Delta table being queried via Delta Sharing. There are some known limitations, you can refer to these docs for more details:Troubleshoot common sharing issues in Delta SharingRESOURCE_LIMIT_EXCE...

  • 0 kudos
1 More Replies
joao_augusto
by New Contributor III
  • 2069 Views
  • 2 replies
  • 0 kudos

Data loss with spark streaming and kafka

Hi guys!I'm facing a problem, and I have no idea where it came from. My process is not appending all the topic data into my bronze table. I checked the topic, and the data is there.For example, I have some rows that are still in my Kafka topic but do...

  • 2069 Views
  • 2 replies
  • 0 kudos
Latest Reply
SriramMohanty
Databricks Employee
  • 0 kudos

1)To Troubleshoot: Add metrics collection to your job to track the number of records processed vs. dropped.You can check the kafka offset.Compare Kafka offsets with checkpoint committed offsets. 2) Consider modifying your code to use continuous proce...

  • 0 kudos
1 More Replies
mh177
by New Contributor II
  • 2381 Views
  • 2 replies
  • 1 kudos

Change Data Feed And Column Masks

Hi there,Wondering if anyone can help me. I have had a job set up to stream from one change data feed enabled delta table to another delta table and has been executing successfully. I then added column masks to the source table from which I am stream...

  • 2381 Views
  • 2 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi mh177,How are you doing today?, As per my understanding, it sounds like everything was working fine until you added column masks to your source table. The error you're seeing basically means that once a table has row or column-level security polic...

  • 1 kudos
1 More Replies
sandy311
by New Contributor III
  • 966 Views
  • 2 replies
  • 1 kudos

Private package installation using DAB on job cluster

I'm using a .whl job to upload and install a package on a job cluster using DAB. However, I'm facing an issue with a private package from Azure Artifacts. It works fine when running in CI or Azure DevOps pipelines because the PAT token is available a...

  • 966 Views
  • 2 replies
  • 1 kudos
Latest Reply
sandy311
New Contributor III
  • 1 kudos

Hi @Brahmareddy Thanks for details, could you please provide examples if possible?

  • 1 kudos
1 More Replies
satyam-verma
by New Contributor
  • 2825 Views
  • 2 replies
  • 0 kudos

Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

I’m transitioning from all-purpose clusters to job compute to optimize costs. Previously, we reused an existing_cluster_id in the job configuration to reduce total job runtime.My use case:A parent job triggers multiple child jobs sequentially.I want ...

  • 2825 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi satyam-verma,How are you doing today?, As per my understanding, switching from all-purpose clusters to job compute can definitely help with cost optimization. In your case, where a parent job triggers multiple child jobs, it makes sense to want to...

  • 0 kudos
1 More Replies
ankit001mittal
by New Contributor III
  • 2103 Views
  • 5 replies
  • 3 kudos

is there a way to find DLT job and task records in system.lakeflow.job_task_run_timeline?

Hi Guys,We're working on the monetization product and are trying to understand how much costs are coming from our jobs and DLT and all purpose interactive sessions? and are currently exploring the system.lakeflow.job_task_run_timeline table to find t...

  • 2103 Views
  • 5 replies
  • 3 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 3 kudos

Hi Ankit, Thanks for your follow-up. The pipeline_events table you're referring to is part of the system tables for Delta Live Tables (DLT), and if it’s not showing up in your workspace, it’s likely because system tables haven't been enabled yet in y...

  • 3 kudos
4 More Replies
noorbasha534
by Valued Contributor II
  • 902 Views
  • 1 replies
  • 2 kudos

Events subscription in Databricks delta tables

Dear all, We are maintaining a global/enterprise data platform for a customer.We like to capture events based on data streaming happening on Databricks based delta tables. ((data streams run for at least 15 hrs a day; so, events should be generated b...

  • 902 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi noorbasha534,How are you doing today?, As per my understanding, what you're looking to build is a really powerful and smart setup—kind of like a data-driven event notification system for streaming Delta tables. While there’s no out-of-the-box feat...

  • 2 kudos
subhas
by New Contributor II
  • 1094 Views
  • 1 replies
  • 1 kudos

Auto Loader bringing NULL Records

Hi        I am using auto loader to fetch some records stored in two files. Please see below my code. It fetches records from two files correctly and then it starts fetching NULL records. I attach option("cleanSource",    ) to  readStream. But it is ...

  • 1094 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi subhas,How are you doing today?, As per my understanding, It looks like the issue is happening because you're using /FileStore, which isn’t fully supported by Autoloader’s cleanSource option. Even though the code looks mostly fine, Autoloader expe...

  • 1 kudos
cmathieu
by New Contributor III
  • 1026 Views
  • 1 replies
  • 2 kudos

Resolved! OPTIMIZE command on heavily nested table OOM error

I'm trying to run the OPTIMIZE command on a table with less than 2000 rows, but it is causing an out of memory issue. The problem seems to come from the fact that it is a heavily nested table in staging between a json file and flattened table. The ta...

  • 1026 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi cmathieu,How are you doing today?, As per my understanding, Yeah, this sounds like one of those cases where the data volume is small, but the complexity of the schema is what’s causing the trouble. If your table has deeply nested JSON structures, ...

  • 2 kudos
abelian-grape
by New Contributor III
  • 1779 Views
  • 3 replies
  • 1 kudos

Build a streaming table on top of a snowflake table

Is it possible to create a streaming table on top of a snowflake tablel accessible via the Lakehouse Federation?

  • 1779 Views
  • 3 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi abelian-grape,How are you doing today?, As per my understanding, Right now, it’s not possible to create a streaming table directly on top of a Snowflake table that’s accessed through Lakehouse Federation in Databricks. Lakehouse Federation allows ...

  • 1 kudos
2 More Replies
RajNath
by New Contributor II
  • 5433 Views
  • 2 replies
  • 1 kudos

Cost of using delta sharing with unity catalog

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for la...

  • 5433 Views
  • 2 replies
  • 1 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 1 kudos

@RajNath I am also looking for information around this. as far as I understood, it uses provider side compute. did you get the same info?...

  • 1 kudos
1 More Replies
Arvind007
by New Contributor II
  • 2814 Views
  • 3 replies
  • 1 kudos

Resolved! Issue while reading external iceberg table from GCS path using spark SQL

 df = spark.sql("select * from bqms_table;"); df.show();ENV - DBRT 16.3 (includes Apache Spark 3.5.2, Scala 2.12)org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.1 Py4JJavaError: An error occurred while calling o471.showString. : org.apache.spar...

  • 2814 Views
  • 3 replies
  • 1 kudos
Latest Reply
Arvind007
New Contributor II
  • 1 kudos

 I tried given solutions but it seems issue still persist. Appreciate if it can be resolved by Databricks soon for better integration b/w GCP and Databricks .

  • 1 kudos
2 More Replies
abelian-grape
by New Contributor III
  • 1468 Views
  • 2 replies
  • 0 kudos

Triggering Downstream Workflow in Databricks from New Inserts in Snowflake

Hi Databricks experts,I have a table in Snowflake that tracks newly added items, and a downstream data processing workflow that needs to be triggered whenever new items are added. I'm currently using Lakehouse Federation to query the Snowflake tables...

  • 1468 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi abelian-grape,Great question! Since you're using Lakehouse Federation to access the Snowflake table, and Databricks can't directly stream from or listen to inserts in Snowflake, the best approach is to use an interval-based polling mechanism in Da...

  • 0 kudos
1 More Replies
Labels