cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

akuma643
by New Contributor II
  • 3667 Views
  • 3 replies
  • 1 kudos

The authentication value "ActiveDirectoryManagedIdentity" is not valid.

Hi Team,i am trying to connect to SQL server hosted in azure vm using Entra id authentication from Databricks.("authentication", "ActiveDirectoryManagedIdentity")Below is the notebook script i am using. driver = "com.microsoft.sqlserver.jdbc.SQLServe...

  • 3667 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

You are encountering an error because the default SQL Server JDBC driver bundled with Databricks may not fully support the authentication value "ActiveDirectoryManagedIdentity"—this option requires at least version 10.2.0 of the Microsoft SQL Server ...

  • 1 kudos
2 More Replies
cdn_yyz_yul
by New Contributor III
  • 66 Views
  • 4 replies
  • 1 kudos

delta as streaming source, can the reader reads only newly appended rows?

Hello everyone,In our implementation of Medallion Architecture, we want to stream changes with spark structured streaming. I would like some advice on how to use delta table as source correctly, and if there is performance (memory usage) concern in t...

  • 66 Views
  • 4 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

In your scenario using Medallion Architecture with Delta tables as both streaming source and sink, it is important to understand Spark Structured Streaming behavior and performance characteristics, especially with joins and memory usage. Here is a di...

  • 1 kudos
3 More Replies
Shubhankar_123
by New Contributor
  • 44 Views
  • 1 replies
  • 0 kudos

Internal error 500 on databricks vector search endpoint

We are facing an internal 500 error accessing the vector search endpoint through streamlit application, if I refresh the application sometimes the error goes away, it has now started to become an usual occurrence. If I try to query the endpoint using...

Shubhankar_123_0-1762828283546.png Shubhankar_123_1-1762828283550.png Shubhankar_123_2-1762828283551.png Shubhankar_123_3-1762828283553.png
  • 44 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The intermittent Internal 500 errors you’re experiencing when accessing the vector search endpoint through a Streamlit app on Databricks—while direct console queries work—suggest an issue with the interaction between your Streamlit app’s environment ...

  • 0 kudos
SumitB14
by New Contributor
  • 37 Views
  • 1 replies
  • 0 kudos

Databricks Nested Json Flattening

Hi Databricks Community,I am facing an issue while exploding nested JSON data.In the content column, I have dynamic nested JSON, and I am using the below approach to parse and explode it.from pyspark.sql import SparkSessionfrom pyspark.sql.functions ...

  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are encountering an AttributeError related to strip, which likely means that some entries for activity.value are not strings (maybe None or dicts) and your code expects all to be strings before calling .strip(). This kind of problem can arise if ...

  • 0 kudos
Allen123Maria_1
by New Contributor
  • 16 Views
  • 1 replies
  • 0 kudos

Optimizing Azure Functions for Performance and Cost with Variable Workloads

Hey, everyone!!I use Azure Functions in a project where the workloads change a lot. Sometimes it's quiet, and other times we get a lot of traffic.Azure Functions is very scalable, but I've had some trouble with cold starts and keeping costs down.I'm ...

  • 16 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Improving Azure Functions performance and cost efficiency, especially with unpredictable workloads, requires a blend of technical tuning, architecture design, and proactive monitoring. Here’s how to address cold starts, costs, and scaling on the Prem...

  • 0 kudos
ShivangiB1
by New Contributor III
  • 52 Views
  • 2 replies
  • 0 kudos

DATABRICKS LAKEFLOW SQL SERVER INGESTION PIPELINE ERROR

Hey Team,I am getting below error while creating pipeline :com.databricks.pipelines.execution.extensions.managedingestion.errors.ManagedIngestionNonRetryableException: [INGESTION_GATEWAY_DDL_OBJECTS_MISSING] DDL objects missing on table 'coedb.dbo.so...

  • 52 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you are seeing means Databricks cannot capture DDL (table definition) changes, even though CDC (Change Data Capture) and CT (Change Tracking) are enabled. You must run the specific DDL support objects script for Databricks ingestion and the...

  • 0 kudos
1 More Replies
shubham007
by New Contributor III
  • 58 Views
  • 2 replies
  • 0 kudos

Urgency: How to do Data Migration task using Databricks Lakebridge tool ?

Dear community expert,I have completed two phases Analyzer & Converter of Databricks Lakebridge but stuck at migrating data from source to target using lakebridge. I have watched BrickBites Series on Lakebridge but did not find on how to migrate data...

  • 58 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To migrate tables and views from Snowflake (source) to Databricks (target) using Lakebridge, you must export your data from Snowflake into a supported cloud storage (usually as Parquet files), then import these files into Databricks Delta tables. Lak...

  • 0 kudos
1 More Replies
liquibricks
by New Contributor
  • 24 Views
  • 1 replies
  • 0 kudos

Moving tables between pipelines in production

We are testing an ingestion from kafka to databricks using a streaming table. The streaming table was created by a DAB deployed to "production" which runs as a service principal. This means the service principal is the "owner" of the table.We now wan...

  • 24 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

You’ve hit two limitations:Streaming tables don’t allow SET OWNER – ownership cannot be changed.Lakeflow pipeline ID changes require pipeline-level permissions – if you’re not the pipeline owner, you can’t run ALTER STREAMING TABLE ... SET PIPELINE_I...

  • 0 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 8150 Views
  • 8 replies
  • 0 kudos

How we can send databricks log to Azure Application Insight ?

Hi All,I want to send databricks logs to azure application insight.Is there any way we can do it ??Any blog or doc will help me.

  • 8150 Views
  • 8 replies
  • 0 kudos
Latest Reply
loic
Contributor
  • 0 kudos

Hello,I finally used he AppInsights agent from OpenTelemetry which is documented in the official Microsoft documentation here:https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=javaBelow is an adaptation of this "Get ...

  • 0 kudos
7 More Replies
pooja_bhumandla
by New Contributor III
  • 39 Views
  • 2 replies
  • 0 kudos

Seeking Insights on Liquid Clustering (LC) Based on Table Sizes

Hi all,I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.Specifically, I’m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size range...

  • 39 Views
  • 2 replies
  • 0 kudos
Latest Reply
pooja_bhumandla
New Contributor III
  • 0 kudos

Hi @bianca_unifeye , thank you for your response.My tables range in size from 1 KB to 5 TB. Given this, I’d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario. Thanks in advance for shari...

  • 0 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 265 Views
  • 3 replies
  • 1 kudos

When to Use and when Not to Use Liquid Clustering?

 Hi everyone,I’m looking for some practical guidance and experiences around when to choose Liquid Clustering versus sticking with traditional partitioning + Z-ordering.From what I’ve gathered so far:For small tables (<10TB), Liquid Clustering gives s...

  • 265 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Deciding between Liquid Clustering and traditional partitioning with Z-ordering depends on table size, query patterns, number of clustering columns, and file optimization needs. For tables under 10TB with queries consistently filtered on 1–2 columns,...

  • 1 kudos
2 More Replies
DatabricksEngi1
by Contributor
  • 131 Views
  • 4 replies
  • 0 kudos

Resolved! MERGE operation not performing data skipping with liquid clustering on key columns

 Hi, I need some help understanding a performance issue.I have a table that reads approximately 800K records every 30 minutes in an incremental manner.Let’s say its primary key is:timestamp, x, y This table is overwritten every 30 minutes and serves ...

  • 131 Views
  • 4 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor II
  • 0 kudos

MERGE is not a pure read plus filter operationEven though Liquid Clustering organizes your data by key ranges and writes min/max stats, the MERGE engine has to identify both matches and non-matches.That means the query planner must:Scan all candidate...

  • 0 kudos
3 More Replies
leenack
by Visitor
  • 53 Views
  • 5 replies
  • 1 kudos

No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

I created a simple Databricks procedure that should return a single value."SELECT 1 AS result;"When I call this procedure from my .NET API using ExecuteReader, ExecuteAdapter, or ExecuteScalar, the call completes without any errors, but no rows are r...

  • 53 Views
  • 5 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

Tested in my environments and it is not possible indeed to get results despite being executed correctly in SQL Warehouses. I tried with multiple options but no luck. However, as a workaround, I would recommend to replace or partially replace part of ...

  • 1 kudos
4 More Replies
intelliconnectq
by New Contributor II
  • 23 Views
  • 1 replies
  • 0 kudos

Loading CSV from private S3 bucket

Trying to load a csv file from a private S3 bucketplease clarify requirements to do this- Can I do it in community edition (if yes then how)? How to do it in premium version?I have IAM role and I also access key & secret 

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

Assuming you have these pre-requisites: A private S3 bucket (e.g., s3://my-private-bucket/data/file.csv)An IAM user or role with access (list/get) to that bucketThe AWS Access Key ID and Secret Access Key (client and secret)The most straightforward w...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels