cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ashok_Vengala
by New Contributor
  • 731 Views
  • 1 replies
  • 0 kudos

Unity Catalog Iceberg API Deprecation

 Hello Databricks Team,We are currently working with the Unity Catalog Iceberg API and have observed different behavior between older and newly created workspaces. Observed error (new workspace):{ "error": { "message": "Legacy Iceberg endpoints under...

  • 731 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Ashok_Vengala, This comes up fairly often, so let me break it down. the different behavior you are seeing between older and newer workspaces is expected, and I can help clarify what is happening here. UNDERSTANDING THE DEPRECATION The legacy Iceb...

  • 0 kudos
ajay_wavicle
by Databricks Partner
  • 2888 Views
  • 1 replies
  • 1 kudos

How do i connect azure storage accounts with User Managed Identity given access to Databricks

I want to connect azure storage accounts with User Managed Identity given access to Databricks. I want to use azure cli and connect to storage accounts independently

  • 2888 Views
  • 1 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ajay_wavicle, Good timing on this question. Connecting Azure storage accounts to Databricks using a User-Assigned Managed Identity is a great approach -- it avoids the need to manage secrets and supports storage firewall configurations. Here is a...

  • 1 kudos
ajay_wavicle
by Databricks Partner
  • 643 Views
  • 4 replies
  • 1 kudos

%sh python modules run losses access to spark

%sh python modules run losses access to spark. How do i regain spark session and access the databricks tables

  • 643 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ajay_wavicle, Thanks for the detailed writeup. The reason you lose access to Spark when using %sh is that it launches a completely separate Linux process on the driver node. That process runs outside the notebook runtime, so it has no connection ...

  • 1 kudos
3 More Replies
JD18
by New Contributor
  • 1575 Views
  • 2 replies
  • 1 kudos

SCD-2 backfilling with streaming tabels

Hi there,Im new to Databricks and trying to build a SCD2 type table using AUTO CDC approach. while it quite simple to create a scd2 table Im unable to do a backfill.Full context.I have raw data(order, customer info) from 2019 and creating a dimension...

  • 1575 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @JD18, Welcome to Databricks and Thanks for raising this. SCD Type 2 backfilling with streaming tables is a common need, and the good news is that the AUTO CDC framework (formerly APPLY CHANGES INTO) has built-in capabilities to handle this -- you...

  • 1 kudos
1 More Replies
turagittech
by Contributor
  • 335 Views
  • 1 replies
  • 0 kudos

Lakeflow CDC SQL Server Assett Bundle update notification email

Hi allI deployed a CDC gateway with an Asset Bundle as per https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/sql-server-source-setupI need to update the email for the cdc_gateway, but with a bundle I have to update the job that creates th...

  • 335 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @turagittech, Great question and I understand the concern. The good news is that updating the notification email in your Asset Bundle YAML and redeploying should NOT recreate your gateway or ingestion pipeline resources. Let me explain why and wal...

  • 0 kudos
kmcas10
by New Contributor
  • 485 Views
  • 1 replies
  • 0 kudos

Unicode converter buffer overflow error.

We are currently using Informatica Powercenter and pulling down data from Databricks PVC using an ODBC connection and its been working great.  Our company is moving to Databricks SaaS and I am trying to get Informatica Powercenter to connect to SaaS ...

  • 485 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @kmcas10, Let me share some guidance on this. this is a scenario that comes up when transitioning from Databricks PVC (private cloud) to SaaS with legacy ODBC tooling, and there are several things you can try to bridge the gap until your full migr...

  • 0 kudos
seefoods
by Valued Contributor
  • 979 Views
  • 2 replies
  • 0 kudos

Resolved! databricks clusters failed

Hello guyz, when i run process to parse pdf with  docling on serveless cluster using wheel python i get this error. Someone know what's happend?Cordially INTERNAL: [ENVIRONMENT_SETUP_ERROR.PYTHON_NOTEBOOK_ENVIRONMENT] An internal error occurred while...

  • 979 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @seefoods, Interesting scenario. docling is a powerful PDF parsing library and it is great that you are exploring it on Databricks. The ENVIRONMENT_SETUP_ERROR.PYTHON_NOTEBOOK_ENVIRONMENT error you are seeing is related to how serverless compute h...

  • 0 kudos
1 More Replies
NishantTiwari
by New Contributor II
  • 876 Views
  • 5 replies
  • 1 kudos

Cluster Issue

Driver: c5.4xlarge · Workers: c5.4xlarge · 8 workers · On-demand and Spot · fall back to On-demand · DBR: 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12) · us-east-1cIn my databricks job there is a step NDS download which we used to download files ...

  • 876 Views
  • 5 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @NishantTiwari, I see you have already upgraded to DBR 14.3+ but are still hitting the same SSL errors. That makes sense, and here is why: the two errors you are seeing point to the 3rd party server using weak or outdated SSL certificates, not an ...

  • 1 kudos
4 More Replies
QueryingQuail
by New Contributor III
  • 4389 Views
  • 6 replies
  • 1 kudos

Best practice for adding fixed metadata columns at point of ingestion

Hello all,We are currently working with ingestion of data from source systems using a mix of custom code and managed connectors (e.g. the Dynamics 365 (Synapse Link) connector) in conjunction with Auto CDC / Auto CDC from snapshot. I’m trying to unde...

  • 4389 Views
  • 6 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @QueryingQuail, Good question -- I can see from the follow-up discussion that you are looking for practical guidance that goes beyond what a generic AI prompt would give you -- specifically how to handle this across both managed connectors (like t...

  • 1 kudos
5 More Replies
DoredlaCharan
by New Contributor III
  • 781 Views
  • 5 replies
  • 1 kudos

MongoDB to databricks driver killed and compute re-attached

I started reading the data from the mongodb using the spark read it uses mongo-spark-connector, by default there will be sample size as 1000 meaning referring only 1000 documents in the collection to make them as columns in the dataframe, so i increa...

  • 781 Views
  • 5 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @DoredlaCharan, The root cause here is straightforward: setting sampleSize to 100,000 forces the MongoDB Spark Connector to pull 100K documents onto your driver node just for schema inference. With 100+ keys per document and mergeSchema enabled, t...

  • 1 kudos
4 More Replies
DylanStout
by Contributor
  • 1631 Views
  • 1 replies
  • 0 kudos

ODBC driver installation - help needed

Hello, I’m trying to use pyodbc inside Databricks to connect to a SQL Server database, but I’m working in a restricted, offline Databricks workspace (no outbound internet).What I’ve learned so far:Databricks clusters do not include Microsoft’s ODBC D...

  • 1631 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @DylanStout, This is worth walking through carefully. It sounds like you have already done solid research on the constraints. Let me walk you through the most likely reason your init script is hanging and provide a complete working approach for of...

  • 0 kudos
ravipal-global
by New Contributor II
  • 1860 Views
  • 4 replies
  • 0 kudos

delete and reload append only delta live tables with autoloader

We have a set of streaming dlt pipelines following a medallion pattern where s3 bucket -> autoloader -> bronze delta tables -> silver delta tables -> gold delta tables. All delta tables are in a unity catalog under separate schemas. We need a solutio...

  • 1860 Views
  • 4 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @ravipal-global, I have seen this pattern before. The behavior you are seeing is expected. Let me explain why it happens and then walk through several approaches that can help you achieve delete-and-reload without requiring a full refresh of your ...

  • 0 kudos
3 More Replies
Pratikmsbsvm
by Contributor
  • 4387 Views
  • 2 replies
  • 1 kudos

Data Migration from SAP S/4HANA to Databricks

May someone please help me designing the Migration of SAP S/4 HANA to Databricks. How to design this. what all we need to consider as LLD.1. How Data needs to be extracted and by which tool ? near–real-time replication is required2. Each layer for Da...

  • 4387 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Pratikmsbsvm, Happy to help with this one. SAP S/4HANA to Databricks is one of the most common enterprise data migration scenarios, and there are several well-proven approaches depending on your requirements for data freshness, volume, and budget...

  • 1 kudos
1 More Replies
bunny_9090
by New Contributor
  • 901 Views
  • 1 replies
  • 0 kudos

Precision Variance Observed in FLOAT to DOUBLE Data Migration to Delta Tables

Hi Team,We would like to bring to your attention a precision-related variance observed during data migration from our legacy platform into db Delta tables.In the legacy system, several numeric columns are defined using the FLOAT data type. During ing...

  • 901 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @bunny_9090, Let me walk you through this. Your analysis of the root cause is spot on. Let me expand on what is happening and walk through the recommended approaches to address it. WHY THIS HAPPENS -- IEEE 754 FLOATING-POINT REPRESENTATION Both FL...

  • 0 kudos
Ham
by New Contributor II
  • 2453 Views
  • 1 replies
  • 1 kudos

Resolved! Best-practice guidance for routing Databricks SDK (Python)ingestion logs into AzureMonitor/Analytics

Hi everyone!I’m running a config-driven ingestion stack that uses the Databricks SDK (Python notebooks + GitHub Actions). All logging currently uses the standard Python logging module inside notebooks/jobs (example: ingest.py, logger.py).I’d like to ...

  • 2453 Views
  • 1 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Ham, This is a common scenario, and there are good solutions. There are several layers to getting "Databricks SDK (Python) ingestion logs" into Azure Monitor, depending on exactly which logs you need. I will walk through each approach from simple...

  • 1 kudos
Labels