cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

acj1459
by New Contributor
  • 673 Views
  • 0 replies
  • 0 kudos

Azure Databricks Data Load

Hi All,I have 10 tables present on On-prem MS SQL DB and want to load 10 table data incrementally into Bronze delta table as append only. From Bronze to Silver , using merge query I want to load latest record into Silver delta table . Whatever latest...

  • 673 Views
  • 0 replies
  • 0 kudos
MRTN
by New Contributor III
  • 7025 Views
  • 3 replies
  • 2 kudos

Resolved! Configure multiple source paths for auto loader

I am currently using two streams to monitor data in two different containers on an Azure storage account. Is there any way to configure an autoloader to read from two different locations? The schemas of the files are identical.

  • 7025 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Morten Stakkeland​ :Yes, it's possible to configure an autoloader to read from multiple locations.You can define multiple CloudFiles sources for the autoloader, each pointing to a different container in the same storage account. In your case, since ...

  • 2 kudos
2 More Replies
N_M
by Contributor
  • 24127 Views
  • 7 replies
  • 4 kudos

Resolved! use job parameters in scripts

Hi CommunityI made some research, but I wasn't lucky, and I'm a bit surprised I can't find anything about it.So, I would simply access the job parameters when using python scripts (not notebooks).My flow doesn't use notebooks, but I still need to dri...

  • 24127 Views
  • 7 replies
  • 4 kudos
Latest Reply
N_M
Contributor
  • 4 kudos

The only working workaround I found has been provided in another threadRe: Retrieve job-level parameters in Python - Databricks Community - 44720I will repost it here (thanks @julio_resende )You need to push down your parameters to a task level. Eg:C...

  • 4 kudos
6 More Replies
Shiva3
by New Contributor III
  • 1552 Views
  • 2 replies
  • 0 kudos

How to know actual size of delta and non-delta tables also the no of files actually exists on S3.

I have set of delta and non-delta tables, their data is on AWS s3, I want to know the total size of my delta and non-delta table in actual excluding files belongs to operations DELETE, VACCUM etc. , also I need to know how much files each delta versi...

  • 1552 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @Shiva3, To manage the size of Delta and non-Delta tables on AWS S3, excluding irrelevant files, start by using `DESCRIBE HISTORY` to monitor Delta table metrics and `VACUUM` to clean up old files, setting a retention period as needed. For non-Del...

  • 0 kudos
1 More Replies
a-sky
by New Contributor II
  • 2393 Views
  • 1 replies
  • 1 kudos

Databricks job stalls without error, unable to pin-point error, all compute metrics seem ok

I have a job that gets stuck on "Determining DBIO File fragment" and I have not been able to figure out why this job keeps getting stuck. I monitor the job cluster metrics through out the job and it doesnt seem like its hitting any bottlenecks with m...

asky_0-1721405223209.png asky_1-1721404695718.png asky_2-1721404734997.png asky_3-1721404753865.png
  • 2393 Views
  • 1 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @a-sky, This message indicates that Databricks is figuring out which file fragments are cached, which can be slow, especially with frequent cluster scaling. To address this, you can try disabling delta caching with `spark.conf.set("spark.databrick...

  • 1 kudos
DMehmuda
by New Contributor
  • 2475 Views
  • 1 replies
  • 0 kudos

Issue with round off value while loading to delta table

I have a float dataype column in delta table and data to be loaded should be rounded off to 2 decimal places. I'm casting the column to DECIMAL(18,10) type and then using round function from pyspark.sql.function for rounding off values to 2 decimal p...

  • 2475 Views
  • 1 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @DMehmuda, The issue arises because floating-point numbers in Delta tables can retain more decimal places than expected. To ensure values are stored with the correct precision, explicitly cast the column to `DECIMAL(18,2)` before writing to the De...

  • 0 kudos
prem14f
by New Contributor II
  • 1628 Views
  • 1 replies
  • 0 kudos

Handling Concurrent Writes to a Delta Table by delta-rs and Databricks Spark Job

Hi @dennyglee, @Retired_mod.If I am writing data into a Delta table using delta-rs and a Databricks job, but I lose some transactions, how can I handle this?Given that Databricks runs a commit service and delta-rs uses DynamoDB for transaction logs, ...

  • 1628 Views
  • 1 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @prem14f, To manage lost transactions, implement retry logic with automatic retries and ensure idempotent writes to avoid duplication. For concurrent writers, use optimistic concurrency control, which allows for conflict detection and resolution d...

  • 0 kudos
pjv
by New Contributor III
  • 4963 Views
  • 1 replies
  • 1 kudos

Resolved! Connection error when accessing dbutils secrets

We have daily running pipelines that need to access dbutils secrets for API keys. However, the dbutils.secrets.get function within our python code we get the following error:org.apache.http.conn.HttpHostConnectException: Connect to us-central1.gcp.da...

  • 4963 Views
  • 1 replies
  • 1 kudos
erigaud
by Honored Contributor
  • 8291 Views
  • 2 replies
  • 3 kudos

Get total number of files of a Delta table

I'm looking to know programatically how many files a delta table is made of.I know I can do %sqlDESCRIBE DETAIL my_tableBut that would only give me the number of files of the current version. I am looking to know the total number of files (basically ...

  • 8291 Views
  • 2 replies
  • 3 kudos
Latest Reply
ADavid
New Contributor II
  • 3 kudos

What was the solution?

  • 3 kudos
1 More Replies
Brian-Nowak
by New Contributor II
  • 2439 Views
  • 3 replies
  • 5 kudos

DBR 15.4 LTS Beta Unable to Write Files to Azure Storage Account

Hi there!I believe I might have identified a bug with DBR 15.4 LTS Beta. The basic task of saving data to a delta table, as well as an even more basic operation of saving a file to cloud storage, is failing on 15.4, but working perfectly fine on 15.3...

  • 2439 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ricklen
New Contributor III
  • 5 kudos

We have the same issue since yesterday (6/8/2024), running on DBR 15.3 or 15.4 LTS Beta. It seems to have something to do with large table's indeed. Tried with multiple .partition sizes.

  • 5 kudos
2 More Replies
Ricklen
by New Contributor III
  • 1400 Views
  • 1 replies
  • 1 kudos

VSCode Databricks Extension Performance

Hello Everyone!I've been using the Databricks extension in VSCode for a while know and I'm syncing my repository to my Databricks workspace. In the beginning syncing files to my workspace was basically instant. But now it is starting to take a lot of...

  • 1400 Views
  • 1 replies
  • 1 kudos
alm
by New Contributor III
  • 995 Views
  • 1 replies
  • 0 kudos

Define SQL table name using Python

I want to control which schema a notebook writes. I want it to depend on the user that runs the notebook.For now, the scope is to suport languages Python and SQL. I have written a Python function, `get_path`, that returns the full path of the destina...

  • 995 Views
  • 1 replies
  • 0 kudos
rajeevk
by New Contributor
  • 1280 Views
  • 1 replies
  • 0 kudos

Is there a %%capture or equivalent possible in databricks notebook

I want to suppress all output of a cell, including text and charts plots, Is it possible to do in Data Bricks. I am able to do the same in other notebook environments, but exactly the same does not work in Databricks. Any insight or even understandab...

  • 1280 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @rajeevk ,The one way is to use cell hiding:Databricks notebook interface and controls | Databricks on AWS

  • 0 kudos
Pawanukey12
by New Contributor
  • 1075 Views
  • 1 replies
  • 0 kudos

How to get the details of the notebook i.e who is the owner of a notebook ?

I am using azure data bricks. we have a version control system git along with it . How do i get to know if this particular notebook is created or owned by whom ??

  • 1075 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Pawanukey12 ,There is no direct API to get the owner of a notebook using the notebook path in Databricks. However, you can manually check the owner of the notebook by the notebook name. You can manually go to the folder where the notebook is loca...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels