cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vr
by Contributor III
  • 203 Views
  • 10 replies
  • 2 kudos

remote_query() is not working

I am trying to experiment with remote_query() function according to the documentation. The feature is in public preview, so I assume it should be available to everyone now.select * from remote_query( 'my_connection', database => 'mydb', dbtable...

  • 203 Views
  • 10 replies
  • 2 kudos
Latest Reply
vr
Contributor III
  • 2 kudos

I have the same error if I query SELECT 1 from remote_query(). From documentation:> To use the remote_query function, you first need to create a Unity Catalog connectionSo, not sure why it rebels against creators

  • 2 kudos
9 More Replies
mdungey
by New Contributor II
  • 29 Views
  • 3 replies
  • 0 kudos

Deleting Lakeflow pipelines impact on objects within.

I've seen hidden in some forums that Databricks are working on a fix so that when you delete a LDP pipeline it doesn't delete the underlying objects(streaming tables, mat views etc..).  Can anyone from an official source confirm this and maybe give s...

  • 29 Views
  • 3 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

yes, I would take that as a pinch of salt

  • 0 kudos
2 More Replies
SRJDB
by Visitor
  • 26 Views
  • 1 replies
  • 0 kudos

How to restrict the values permitted in a job or task parameter?

Hi, apologies if this is a daft question - I'm relatively new to Databricks and still finding my feet!I have a notebook with a parameter set within it via a widget, like this:dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"]) my_variable = d...

  • 26 Views
  • 1 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

Databricks Job/Task parameter interface does not provide a built-in UI feature to restrict the possible values entered by user. Yoou can add a runtime validation code inside the notebook to allow/fail based on the values entered in.

  • 0 kudos
mafzal669
by Visitor
  • 23 Views
  • 1 replies
  • 0 kudos

Admin user creation

Hi,I have created an azure account using my personal email id. I want to create this email id as Group Id in databricks admin console. But when I am adding a new user it says the user with this email id already exist. Could someone please help. I use...

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

As the User IDs and Group IDs share the same namespace in Databricks you cannot create a Group with the same email address that is already registered as a User in your Databricks account.You better rename the group.

  • 0 kudos
suchitpathak08
by New Contributor
  • 37 Views
  • 3 replies
  • 0 kudos

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

Hi everyone,I’m running into two blocking issues while trying to run a Delta Live Tables (DLT) pipeline on Databricks (Azure). I’m hoping someone can help me understand what’s going wrong.1. Unity Catalog cannot access underlying ADLS storageEvery DL...

  • 37 Views
  • 3 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor III
  • 0 kudos

DLT pipelines always spin up job compute, and Azure is strict about SKU availability per region & per subscription. Most common causes Quota for that VM family is set to 2 vCPUsDatabricks shows: “Estimated available: 2” “QuotaExceeded” The SKU exists...

  • 0 kudos
2 More Replies
Johan_Van_Noten
by New Contributor III
  • 85 Views
  • 2 replies
  • 2 kudos

Long-running Python http POST hangs

As one of the steps in my data engineering pipeline, I need to perform a POST request to a http (not -s) server.This all works fine, except for the situation described below: it then hangs indefinitely.Environment:Azure Databricks Runtime 13.3 LTSPyt...

  • 85 Views
  • 2 replies
  • 2 kudos
Latest Reply
Johan_Van_Noten
New Contributor III
  • 2 kudos

Thanks for your quick and extensive reply.Given that I don't have any administration rights on the Azure/Databricks environment and don't have the REST-server under control, some of the sensible suggestions are difficult.I will work with IT to check ...

  • 2 kudos
1 More Replies
Mathew-Vesely
by New Contributor
  • 43 Views
  • 2 replies
  • 0 kudos

Archive of legacy system into Databricks with structure and semi-structured data

We are currently exploring using Data Bricks to store and archive data from a legacy syste. The governance features of Unity Catalogue will give us the required capabilities to ensure we meet our legal, statutory and policy requirements for data rete...

  • 43 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

Classic 360 Custoemr View case and Databricks is certainly the right platform to do so.Strcutred Data - Stores in the Delta TablesEmail and PDFs - stored in Volumes, however, metadata as path to the volumes stored in delta table against customer-idIn...

  • 0 kudos
1 More Replies
Suheb
by New Contributor III
  • 30 Views
  • 1 replies
  • 0 kudos

How can I efficiently archive old data in Delta tables without slowing queries?

How can I remove or move older rows from my main Delta table so that queries on recent data are faster, while still keeping access to the historical data if needed?

  • 30 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

Hi Suheb, when using delta tables with databricks, whenever you use proper liquid clustering indexes or partitions, you should get a good performance in comparison to relational engines to deal with big data volumes.However, you can also separate tab...

  • 0 kudos
Dom1
by New Contributor III
  • 93 Views
  • 2 replies
  • 1 kudos

Pull JAR from private Maven repository (Azure Artifactory)

Hi,I currently struggle on the following task:We want to push our code to a private repository (Azure Artifactory) and then pull it from databricks when the job runs. It currently works only with wheels inside a PyPi repo in the artifactory. I found ...

  • 93 Views
  • 2 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

Databricks can install Maven libraries by coordinate and lets you point at a custom repository URL. However, passing credentials for authenticated private Maven repositories directly through the Libraries UI/Jobs is not natively supported today and r...

  • 1 kudos
1 More Replies
Suheb
by New Contributor III
  • 111 Views
  • 4 replies
  • 4 kudos

What are common pitfalls when migrating large on-premise ETL workflows to Databricks and how did you

When moving your big data pipelines from local servers to Databricks, what problems usually happen, and how did you fix them?

  • 111 Views
  • 4 replies
  • 4 kudos
Latest Reply
tarunnagar
New Contributor III
  • 4 kudos

Migrating large on-premise ETL workflows to Databricks often goes wrong when teams try to “lift and shift” legacy logic directly into Spark. Poor data layout, tiny files, and inefficient partitioning can quickly hurt performance, so restructuring dat...

  • 4 kudos
3 More Replies
pooja_bhumandla
by New Contributor III
  • 58 Views
  • 2 replies
  • 1 kudos

Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–80 MB

Hi Community,I encountered the following error:      Failed to store executor broadcast spark_join_relation_1622863 (size = Some(67141632)) in BlockManager              with storageLevel=StorageLevel(memory, deserialized, 1 replicas)in a Structured S...

pooja_bhumandla_0-1764236942720.png
  • 58 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 1 kudos

What Spark Does During a Broadcast Join-Spark identifies the smaller table (say 80MB).The driver collects this small table to a single JVM.The driver serializes the table into a broadcast variable.The broadcast variable is shipped to all executors.Ex...

  • 1 kudos
1 More Replies
mkkao924
by New Contributor II
  • 76 Views
  • 3 replies
  • 1 kudos

Best practice to handle SQL table archives?

Many of our source data are setup in a way that the main table only keep small amount of data, and historical data are move to another archive table with very similar schema.My goal is have one table in Databricks, maybe with a flag to indicate if th...

  • 76 Views
  • 3 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

I would need to dive deeper in your scenario but it sounds to me a strategy could be:1) Create a view in your SQL Server database with "current data" UNION "historical data". You can set an additional boolean field with True in first query and False ...

  • 1 kudos
2 More Replies
excavator-matt
by Contributor
  • 44 Views
  • 1 replies
  • 0 kudos

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

Hi!We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.However, I don't understand how we are supposed to use th...

Data Engineering
abac
LakeFlow
Streaming tables
tags
  • 44 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 0 kudos

@excavator-matt “Can we tag streaming tables with ABAC and expect it to be safe across versions?”Yes, streaming tables are fully subject to UC ABAC, but if the table is physically recreated, table‑level tags can be lost“Is there first‑class support i...

  • 0 kudos
Direo
by Contributor II
  • 48 Views
  • 4 replies
  • 0 kudos

[DAB] registered_model aliases not being applied to Unity Catalog despite successful deploy

HiI'm experiencing an issue with Databricks Asset Bundles where model aliases defined in the bundle configuration are not being applied to Unity Catalog, even though the deployment succeeds and the Terraform state shows the aliases are set.Environmen...

  • 48 Views
  • 4 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Can you try by explicitly adding: databricks model-versions get-by-alias <catalog>.<schema>.<model> staging

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels