cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

akuma643
by New Contributor II
  • 4942 Views
  • 3 replies
  • 1 kudos

The authentication value "ActiveDirectoryManagedIdentity" is not valid.

Hi Team,i am trying to connect to SQL server hosted in azure vm using Entra id authentication from Databricks.("authentication", "ActiveDirectoryManagedIdentity")Below is the notebook script i am using. driver = "com.microsoft.sqlserver.jdbc.SQLServe...

  • 4942 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

You are encountering an error because the default SQL Server JDBC driver bundled with Databricks may not fully support the authentication value "ActiveDirectoryManagedIdentity"—this option requires at least version 10.2.0 of the Microsoft SQL Server ...

  • 1 kudos
2 More Replies
cdn_yyz_yul
by Contributor II
  • 1926 Views
  • 4 replies
  • 1 kudos

Resolved! delta as streaming source, can the reader reads only newly appended rows?

Hello everyone,In our implementation of Medallion Architecture, we want to stream changes with spark structured streaming. I would like some advice on how to use delta table as source correctly, and if there is performance (memory usage) concern in t...

  • 1926 Views
  • 4 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

In your scenario using Medallion Architecture with Delta tables as both streaming source and sink, it is important to understand Spark Structured Streaming behavior and performance characteristics, especially with joins and memory usage. Here is a di...

  • 1 kudos
3 More Replies
Shubhankar_123
by New Contributor
  • 833 Views
  • 1 replies
  • 0 kudos

Internal error 500 on databricks vector search endpoint

We are facing an internal 500 error accessing the vector search endpoint through streamlit application, if I refresh the application sometimes the error goes away, it has now started to become an usual occurrence. If I try to query the endpoint using...

Shubhankar_123_0-1762828283546.png Shubhankar_123_1-1762828283550.png Shubhankar_123_2-1762828283551.png Shubhankar_123_3-1762828283553.png
  • 833 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The intermittent Internal 500 errors you’re experiencing when accessing the vector search endpoint through a Streamlit app on Databricks—while direct console queries work—suggest an issue with the interaction between your Streamlit app’s environment ...

  • 0 kudos
SumitB14
by New Contributor
  • 429 Views
  • 1 replies
  • 0 kudos

Databricks Nested Json Flattening

Hi Databricks Community,I am facing an issue while exploding nested JSON data.In the content column, I have dynamic nested JSON, and I am using the below approach to parse and explode it.from pyspark.sql import SparkSessionfrom pyspark.sql.functions ...

  • 429 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are encountering an AttributeError related to strip, which likely means that some entries for activity.value are not strings (maybe None or dicts) and your code expects all to be strings before calling .strip(). This kind of problem can arise if ...

  • 0 kudos
ShivangiB1
by New Contributor III
  • 1009 Views
  • 2 replies
  • 0 kudos

Resolved! DATABRICKS LAKEFLOW SQL SERVER INGESTION PIPELINE ERROR

Hey Team,I am getting below error while creating pipeline :com.databricks.pipelines.execution.extensions.managedingestion.errors.ManagedIngestionNonRetryableException: [INGESTION_GATEWAY_DDL_OBJECTS_MISSING] DDL objects missing on table 'coedb.dbo.so...

  • 1009 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you are seeing means Databricks cannot capture DDL (table definition) changes, even though CDC (Change Data Capture) and CT (Change Tracking) are enabled. You must run the specific DDL support objects script for Databricks ingestion and the...

  • 0 kudos
1 More Replies
shubham007
by Databricks Partner
  • 884 Views
  • 2 replies
  • 0 kudos

Resolved! Urgency: How to do Data Migration task using Databricks Lakebridge tool ?

Dear community expert,I have completed two phases Analyzer & Converter of Databricks Lakebridge but stuck at migrating data from source to target using lakebridge. I have watched BrickBites Series on Lakebridge but did not find on how to migrate data...

  • 884 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To migrate tables and views from Snowflake (source) to Databricks (target) using Lakebridge, you must export your data from Snowflake into a supported cloud storage (usually as Parquet files), then import these files into Databricks Delta tables. Lak...

  • 0 kudos
1 More Replies
Ajay-Pandey
by Databricks MVP
  • 9449 Views
  • 8 replies
  • 0 kudos

How we can send databricks log to Azure Application Insight ?

Hi All,I want to send databricks logs to azure application insight.Is there any way we can do it ??Any blog or doc will help me.

  • 9449 Views
  • 8 replies
  • 0 kudos
Latest Reply
loic
Contributor
  • 0 kudos

Hello,I finally used he AppInsights agent from OpenTelemetry which is documented in the official Microsoft documentation here:https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=javaBelow is an adaptation of this "Get ...

  • 0 kudos
7 More Replies
pooja_bhumandla
by Databricks Partner
  • 4839 Views
  • 3 replies
  • 3 kudos

When to Use and when Not to Use Liquid Clustering?

 Hi everyone,I’m looking for some practical guidance and experiences around when to choose Liquid Clustering versus sticking with traditional partitioning + Z-ordering.From what I’ve gathered so far:For small tables (<10TB), Liquid Clustering gives s...

  • 4839 Views
  • 3 replies
  • 3 kudos
Latest Reply
mark_ott
Databricks Employee
  • 3 kudos

Deciding between Liquid Clustering and traditional partitioning with Z-ordering depends on table size, query patterns, number of clustering columns, and file optimization needs. For tables under 10TB with queries consistently filtered on 1–2 columns,...

  • 3 kudos
2 More Replies
DatabricksEngi1
by Contributor
  • 986 Views
  • 4 replies
  • 0 kudos

Resolved! MERGE operation not performing data skipping with liquid clustering on key columns

 Hi, I need some help understanding a performance issue.I have a table that reads approximately 800K records every 30 minutes in an incremental manner.Let’s say its primary key is:timestamp, x, y This table is overwritten every 30 minutes and serves ...

  • 986 Views
  • 4 replies
  • 0 kudos
Latest Reply
bianca_unifeye
Databricks MVP
  • 0 kudos

MERGE is not a pure read plus filter operationEven though Liquid Clustering organizes your data by key ranges and writes min/max stats, the MERGE engine has to identify both matches and non-matches.That means the query planner must:Scan all candidate...

  • 0 kudos
3 More Replies
turagittech
by Contributor
  • 3244 Views
  • 3 replies
  • 1 kudos

Resolved! Schema updating with CI/CD development in SQL

Hi all,I am working to resolve how to build tables in a development workspace catalog and then easily migrate the code to a production catalog without manually altering the schema name. For those unaware, you can't have the same catalog names in deve...

  • 3244 Views
  • 3 replies
  • 1 kudos
Latest Reply
evanc
New Contributor II
  • 1 kudos

I will like to see an example from databricks that how will alembic and databricks works together, especially with schema evo. I think once schema got changed by schema evo, the alembic version not valid anymore. Wondering how to handle it properly. ...

  • 1 kudos
2 More Replies
petitregny
by New Contributor II
  • 4828 Views
  • 5 replies
  • 2 kudos

Reading from an S3 bucket using boto3 on serverless cluster

Hello All,I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.I am using the two standard functions below, but I get a credentials error (Error reading CSV from S3: Unable to locate credentials).I don't have this issu...

  • 4828 Views
  • 5 replies
  • 2 kudos
Latest Reply
Ramana
Valued Contributor
  • 2 kudos

Boto3 with Access/Secret Key worked. I will try the Service Credentials.  If Databricks Documentation is right, Instance Profiles with Serverless should work to establish Boto3 connection, but, unfortunately, setting up instance profiles on Serverles...

  • 2 kudos
4 More Replies
shubham007
by Databricks Partner
  • 1695 Views
  • 2 replies
  • 0 kudos

Urgency: Getting Lakebridge installation failed in our organization environment (laptop)

Dear community expert,I’m reaching out for assistance with installing Databricks Lakebridge on my organization laptop. I have confirmed the stated prerequisites are installed: Java 22+, Python 3.11+, and the latest Databricks CLI, but the installer f...

  • 1695 Views
  • 2 replies
  • 0 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 0 kudos

Additionally, you can check the following documentation and links that I found, which may help you:1) To view the available Python versions and guidance on using a virtual environment (virtualenv or conda):https://www.piwheels.org/project/databricks-...

  • 0 kudos
1 More Replies
lecarusin
by New Contributor II
  • 467 Views
  • 4 replies
  • 1 kudos

Help regarding a python notebook and s3 file structure

Hello all, I am new to this forum, so please forgive if I am posting in the wrong location (I'd appreciate if the post is moved by mods or am told where to post).I am looking for help with an optimization of a python code I have. This python notebook...

  • 467 Views
  • 4 replies
  • 1 kudos
Latest Reply
arunpalanoor
New Contributor II
  • 1 kudos

I am not sure if I fully understand how your data pipeline is setup, but have you considered incremental data loading say using something similar to "COPY INTO" method which would only read your incremental load, and then apply a 90 day filter on top...

  • 1 kudos
3 More Replies
vikram_p
by Databricks Partner
  • 1082 Views
  • 1 replies
  • 0 kudos

Resolved! Generate embeddings for 50 million rows in dataframe

Hello All,I have dataframe with 5 million rows and before we can setup vector search endpoint against index, we want to generate embeddings column for each of those rows. Please suggest whats an optimal way to do this?We are in development phase so w...

  • 1082 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
Databricks MVP
  • 0 kudos

The easiest and most reliable way to generate embeddings for millions of rows is to let Databricks Vector Search compute them automatically during synchronization from a Delta table.Vector Search can generate embeddings for you, keep them updated whe...

  • 0 kudos
Divya_Bhadauria
by New Contributor III
  • 271 Views
  • 1 replies
  • 0 kudos

Does Databricks Runtime 7.3+ include built-in Hadoop S3 connector configurations?

I came across the KB article S3 connection reset error, which mentions not using the following Spark settings for the Hadoop S3 connector for DBR 7.3 and above:spark.hadoop.fs.s3.impl com.databricks.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl com.data...

  • 271 Views
  • 1 replies
  • 0 kudos
Latest Reply
hasnat_unifeye
Databricks Partner
  • 0 kudos

No, you don’t need to set those on DBR 7.3 and above.From 7.3+ Databricks already uses the newer Hadoop S3A connector by default, so those com.databricks.s3a.S3AFileSystem settings are not part of the default config and shouldn’t be added.If they are...

  • 0 kudos
Labels