cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

prem14f
by New Contributor II
  • 2156 Views
  • 1 replies
  • 0 kudos

Handling Concurrent Writes to a Delta Table by delta-rs and Databricks Spark Job

Hi @dennyglee, @Retired_mod.If I am writing data into a Delta table using delta-rs and a Databricks job, but I lose some transactions, how can I handle this?Given that Databricks runs a commit service and delta-rs uses DynamoDB for transaction logs, ...

  • 2156 Views
  • 1 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @prem14f, To manage lost transactions, implement retry logic with automatic retries and ensure idempotent writes to avoid duplication. For concurrent writers, use optimistic concurrency control, which allows for conflict detection and resolution d...

  • 0 kudos
pjv
by New Contributor III
  • 5331 Views
  • 1 replies
  • 1 kudos

Resolved! Connection error when accessing dbutils secrets

We have daily running pipelines that need to access dbutils secrets for API keys. However, the dbutils.secrets.get function within our python code we get the following error:org.apache.http.conn.HttpHostConnectException: Connect to us-central1.gcp.da...

  • 5331 Views
  • 1 replies
  • 1 kudos
erigaud
by Honored Contributor
  • 8981 Views
  • 2 replies
  • 3 kudos

Get total number of files of a Delta table

I'm looking to know programatically how many files a delta table is made of.I know I can do %sqlDESCRIBE DETAIL my_tableBut that would only give me the number of files of the current version. I am looking to know the total number of files (basically ...

  • 8981 Views
  • 2 replies
  • 3 kudos
Latest Reply
ADavid
New Contributor II
  • 3 kudos

What was the solution?

  • 3 kudos
1 More Replies
Brian-Nowak
by Databricks Partner
  • 2859 Views
  • 3 replies
  • 5 kudos

DBR 15.4 LTS Beta Unable to Write Files to Azure Storage Account

Hi there!I believe I might have identified a bug with DBR 15.4 LTS Beta. The basic task of saving data to a delta table, as well as an even more basic operation of saving a file to cloud storage, is failing on 15.4, but working perfectly fine on 15.3...

  • 2859 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ricklen
New Contributor III
  • 5 kudos

We have the same issue since yesterday (6/8/2024), running on DBR 15.3 or 15.4 LTS Beta. It seems to have something to do with large table's indeed. Tried with multiple .partition sizes.

  • 5 kudos
2 More Replies
Ricklen
by New Contributor III
  • 1760 Views
  • 1 replies
  • 1 kudos

VSCode Databricks Extension Performance

Hello Everyone!I've been using the Databricks extension in VSCode for a while know and I'm syncing my repository to my Databricks workspace. In the beginning syncing files to my workspace was basically instant. But now it is starting to take a lot of...

  • 1760 Views
  • 1 replies
  • 1 kudos
alm
by Databricks Partner
  • 1188 Views
  • 1 replies
  • 0 kudos

Define SQL table name using Python

I want to control which schema a notebook writes. I want it to depend on the user that runs the notebook.For now, the scope is to suport languages Python and SQL. I have written a Python function, `get_path`, that returns the full path of the destina...

  • 1188 Views
  • 1 replies
  • 0 kudos
rajeevk
by New Contributor
  • 1569 Views
  • 1 replies
  • 0 kudos

Is there a %%capture or equivalent possible in databricks notebook

I want to suppress all output of a cell, including text and charts plots, Is it possible to do in Data Bricks. I am able to do the same in other notebook environments, but exactly the same does not work in Databricks. Any insight or even understandab...

  • 1569 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @rajeevk ,The one way is to use cell hiding:Databricks notebook interface and controls | Databricks on AWS

  • 0 kudos
Pawanukey12
by New Contributor
  • 1415 Views
  • 1 replies
  • 0 kudos

How to get the details of the notebook i.e who is the owner of a notebook ?

I am using azure data bricks. we have a version control system git along with it . How do i get to know if this particular notebook is created or owned by whom ??

  • 1415 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Pawanukey12 ,There is no direct API to get the owner of a notebook using the notebook path in Databricks. However, you can manually check the owner of the notebook by the notebook name. You can manually go to the folder where the notebook is loca...

  • 0 kudos
ruoyuqian
by New Contributor II
  • 1869 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Live Table run outside out pipeline

I have created a notebook for my Delta Live Table pipeline and it runs without errors however if I run the notebook alone in my cluster it,  says not allowed and show this error.  Does it mean I can only run delta live table in the pipeline and canno...

  • 1869 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 0 kudos

Hi @ruoyuqian Delta Live Tables (DLT) have specific execution contexts and dependencies that are managed within their pipeline environment. This is why the code runs successfully only when executed within the pipeline, as DLT creates its own job clus...

  • 0 kudos
ShankarM
by Databricks Partner
  • 2256 Views
  • 2 replies
  • 0 kudos

Intelligent source to target mapping

I want to implement source to target mapping in such a way that source and target columns are auto mapped using intelligent AI mapping resulting in reduction of mapping efforts especially when there are 100+ columns in a table. Metadata information o...

  • 2256 Views
  • 2 replies
  • 0 kudos
Latest Reply
ShankarM
Databricks Partner
  • 0 kudos

Can you please reply to my latest follow up question?

  • 0 kudos
1 More Replies
thiagoawstest
by Contributor
  • 1890 Views
  • 1 replies
  • 0 kudos

add or change roles

Hello, I have a Databricks environment provisioned by AWS. I would like to know if it is possible to add new roles or change existing roles. In my environment, Admin and User appear. I have the following need: how can I have a group, but the users th...

  • 1890 Views
  • 1 replies
  • 0 kudos
copper-carrot
by New Contributor II
  • 2175 Views
  • 1 replies
  • 1 kudos

spark.sql() is suddenly giving an error "Unable to instantiate org.apache.hadoop.hive.metastore.Hive

spark.sql() is suddenly giving an error "Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient" on databricks jobs and python scripts that worked last month.  No local changes on my end.What could be the cause of this and what sh...

  • 2175 Views
  • 1 replies
  • 1 kudos
neointab
by New Contributor
  • 1021 Views
  • 1 replies
  • 0 kudos

how to restrict group/user cant create unstricted cluster.

we have set up the entitlement,but it doest work, i checked the blogs. it also need the set up in cluster policy. but i dont find how to set up in cluster policy. could you give some suggestions?

  • 1021 Views
  • 1 replies
  • 0 kudos
Latest Reply
antonuzzo96
New Contributor III
  • 0 kudos

Hi, have you checked if users are admins inside the workspace? Because this can greatly change the policies and restrictions on the clusters

  • 0 kudos
hpant1
by New Contributor III
  • 1225 Views
  • 1 replies
  • 0 kudos

Does it make sense to create volume at external location in dev enviroment?

I have create a dev resource group for databricks which includes "storage account", "access connector" and "databricks workspace". In the storage account I have created a container which is linked to the metastore. This container also contain raw dat...

  • 1225 Views
  • 1 replies
  • 0 kudos
Latest Reply
antonuzzo96
New Contributor III
  • 0 kudos

Hei, for some use cases we have created external volumes in Databricks because they needed to access them outside of Databricks and directly on the storage account, as the files had to interact with other tools.

  • 0 kudos
hpant1
by New Contributor III
  • 1028 Views
  • 1 replies
  • 2 kudos

What is more optimized way of writing delta table in a workflow, "append" or "overwrite"?

What is more optimized way of writing delta table in a workflow which is running every hour, "append" or "overwrite"?

  • 1028 Views
  • 1 replies
  • 2 kudos
Latest Reply
Witold
Databricks Partner
  • 2 kudos

There's no "optimized way", as these are two different concepts, and depend on your use case: Overwrite  removes existing data, i.e. replaces it with new data, while append adds new data to your existing table.

  • 2 kudos
Labels