cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Punit_Prajapati
by New Contributor
  • 57 Views
  • 3 replies
  • 6 kudos

Resolved! SERVERLESS SQL WAREHOUSE

Hello All,I have two questions regarding the serverless SQL warehouse which are following:1.) If I create a small Serverless SQL Warehouse in Databricks that shows 12 DBUs/hour, will I be charged 12 DBUs even if I don’t run any queries in that hour? ...

  • 57 Views
  • 3 replies
  • 6 kudos
Latest Reply
BigRoux
Databricks Employee
  • 6 kudos

Shua42 hits the nail on the head. If I can be so bold as to summarize: You are only charged when the Warehouse is running regardless of how much or how little you use it.  We do have an auto stop feature you can configure. Essentially, you set a time...

  • 6 kudos
2 More Replies
tgburrin-afs
by New Contributor
  • 1503 Views
  • 3 replies
  • 0 kudos

Limiting concurrent tasks in a job

I have a job with > 10 tasks in it that interacts with an external system outside of databricks.  At the moment that external system cannot handle more than 3 of the tasks executing concurrently.  How can I limit the number of tasks that concurrently...

  • 1503 Views
  • 3 replies
  • 0 kudos
Latest Reply
_J
New Contributor II
  • 0 kudos

Same thing here; job concurrency is good but nothing for task; some jobs we do have countless parallel tasks so by not controlling it the downstream servers goes to a grinding halt and tasks just terminate.It needs what we call a spinlock on tasks to...

  • 0 kudos
2 More Replies
YuriS
by New Contributor
  • 517 Views
  • 1 replies
  • 0 kudos

VACUUM with Azure Storage Inventory Report is not working

Could someone please advise regarding VACUUM with Azure Storage Inventory Report as i have failed to make it work.DBR 15.4 LTS, VACUUM command is being run with USING INVENTORY clause, as follows:VACUUM schema.table USING INVENTORY ( select 'https://...

  • 517 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi YuriS,How are you doing today?, As per my understanding, you're absolutely right to look into the USING INVENTORY clause for VACUUM, especially when dealing with large storage footprints. The tricky part is that while this feature is part of open-...

  • 0 kudos
ShreevaniRao
by New Contributor
  • 898 Views
  • 11 replies
  • 4 kudos

Newbie learning DLT pipelines

Hello,I am learning to create DLT pipelines using different graphs using a  14 day trial version of the premium Databricks. I have currently one graph Mat view -> Streaming Table -> Mat view. When i ran this pipeline(serverless compute) 1st time, ran...

  • 898 Views
  • 11 replies
  • 4 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 4 kudos

use this https://www.youtube.com/watch?v=iqf_QHC7tgQ&list=PL2IsFZBGM_IGpBGqxhkiNyEt4AuJXA0Gg it will help you a lot

  • 4 kudos
10 More Replies
640913
by New Contributor III
  • 8869 Views
  • 3 replies
  • 1 kudos

%pip install requirements.txt - path not found

Hi everyone,I was just testing things out to come up with a reasonable way of working with version management in DB and was inspired by the commands specified here. For my team and I, it makes no sense to put the requirements file in the dbfs locatio...

  • 8869 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rajat-TVSM
Visitor
  • 1 kudos

Hello, were you able to find a solution to this?

  • 1 kudos
2 More Replies
smpa01
by New Contributor
  • 50 Views
  • 1 replies
  • 0 kudos

Debugging jobs/run-now endpoint

I am not being able to run jobs/runnow endpoint. I am getting an error asError fetching files: 403 - {"error_code":"PERMISSION_DENIED","message":"User xxxx-dxxxx-xxx-xxxx does not have Manage Run or Owner or Admin permissions on job 437174060919465",...

smpa01_0-1744904979789.png smpa01_1-1744905123010.png
  • 50 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
New Contributor III
  • 0 kudos

Hi @smpa01, make sure the API token or authentication you're using is linked to the account that has the necessary permissions on the job. You can also try running the job directly from the Databricks UI, if it works there but not via the API, it ind...

  • 0 kudos
Yunky007
by New Contributor
  • 42 Views
  • 2 replies
  • 0 kudos

ETL pipeline

I have an ETL pipeline in workflows which I am using to create materialized view. I want to schedule the pipeline for 10 hours only starting from 10 am. How can I schedule that? I can only see hourly basis schedule or cron syntax. I want the compute ...

  • 42 Views
  • 2 replies
  • 0 kudos
Latest Reply
Isi
Contributor
  • 0 kudos

Hey @Yunky007 You should use the cron expression 0 10 * * * to start the process at 10 AM.Then, inside your script, implement a loop or mechanism that keeps the logic running for 10 hours, that’s the trick. import time from datetime import datetime, ...

  • 0 kudos
1 More Replies
petitregny
by New Contributor
  • 84 Views
  • 2 replies
  • 0 kudos

Reading from an S3 bucket using boto3 on serverless cluster

Hello All,I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.I am using the two standard functions below, but I get a credentials error (Error reading CSV from S3: Unable to locate credentials).I don't have this issu...

  • 84 Views
  • 2 replies
  • 0 kudos
Latest Reply
Isi
Contributor
  • 0 kudos

Hi @petitregny ,The issue you’re encountering is likely due to the access mode of your cluster. Serverless compute uses standard/shared access mode, which does not allow you to directly access AWS credentials (such as the instance profile) in the sam...

  • 0 kudos
1 More Replies
notwarte
by New Contributor II
  • 149 Views
  • 4 replies
  • 0 kudos

Unity Catalog storage amounts

Hi,I am using Azure and I do have predictive optimization enable on the catalog. I have wrote a script to calculate the data amounts of all of the tables -  looping over all of the tables and running "describe detail".All of the tables amount to ~ 1....

wiselka_1-1744630645723.png
  • 149 Views
  • 4 replies
  • 0 kudos
Latest Reply
Isi
Contributor
  • 0 kudos

Hey @notwarte,Using the __databricks_internal catalog to trace the underlying storage location is a solid approach for investigating their footprint.Regarding your question about storage duplication: yes, materialized views in Databricks do store a p...

  • 0 kudos
3 More Replies
Malthe
by New Contributor II
  • 23 Views
  • 1 replies
  • 0 kudos

Parametrize DLT pipeline

If I'm using Databricks Asset Bundles, how would I parametrize a DLT pipeline based on a static configuration file.In pseudo-code, I would have a .py-file:import dlt # Something that pulls a pipeline resource (or artifact) and parses from JSON table...

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
Emmitt18Lefebvr
New Contributor
  • 0 kudos

Hello!To parametrize a Databricks DLT pipeline with a static configuration file using Asset Bundles, include your JSON/YAML config file in the bundle. In your DLT pipeline code, read this file using Python's file I/O (referencing its deployed path). ...

  • 0 kudos
dc-rnc
by New Contributor III
  • 831 Views
  • 3 replies
  • 0 kudos

DAB | Set tag based on job parameter

Hi Community.Since I wasn't able to find a way to set a job tag dynamically at runtime based on a parameter that is passed to the job, I was wondering if it is possible or if there is an equivalent way to do it.Thank you. Regards.

  • 831 Views
  • 3 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

Based on the provided context, it appears that there isn't a direct way within Databricks to dynamically set job tags at runtime based on a parameter passed to the job. However, there are alternative approaches you can consider to work around this li...

  • 0 kudos
2 More Replies
cs_de
by New Contributor
  • 61 Views
  • 4 replies
  • 3 kudos

How do I deploy or run one job if I have multiple jobs in a Databricks Asset Bundle?

How do I deploy or run a single job if I have 2 or more jobs defined in my asset bundle?$databricks bundle deploy job1 #does not work I do not see a flag to identify what job to run.

  • 61 Views
  • 4 replies
  • 3 kudos
Latest Reply
mark_ott
Databricks Employee
  • 3 kudos

I haven't done it with multiple jobs, but I think under resources you name multiple jobs, then when you deploy you just call that job key.  

  • 3 kudos
3 More Replies
GregTyndall
by New Contributor II
  • 645 Views
  • 3 replies
  • 0 kudos

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?"incrementalization_issues": [{"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MO...

  • 645 Views
  • 3 replies
  • 0 kudos
Latest Reply
PotnuruSiva
Databricks Employee
  • 0 kudos

@GregTyndall Yes, the current limit is 2 by default. But we can increase up to 5 with the below flag added to the pipeline settings. pipelines.enzyme.numberOfJoinsThreshold 5

  • 0 kudos
2 More Replies
Christian_C
by New Contributor
  • 213 Views
  • 0 replies
  • 0 kudos

Google Pub Sub and Delta live table

I am using delta live table and pub sub to ingest message from 30 different topics in parallel. I noticed that initialization time can be very long around 15 minutes. Does someone knows how to reduced initialization time in dlt ? Thanks You 

  • 213 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels