Data Engineering

Forum Posts

Sorted by:

by sta_gas • Visitor

2 hours ago

15 Views
2 replies
0 kudos

Data profiling monitoring with foreign catalog

Hi team,I’m currently working with Azure Databricks and have created a foreign catalog for my source database in Azure SQL. I can successfully run SELECT statements from Databricks to the Azure SQL database.However, I would like to set up data profil...

Data Engineering

15 Views
2 replies
0 kudos

2 hours ago

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

an hour ago

0 kudos

Hi @sta_gas ,Since data quality monitoring is in beta I'm quite sure they don't support foreign tables as of now (but they forgot to mentioned it in docs).But more important question if they ever will be supported. For me data quality monitoring appl...

0 kudos

an hour ago

1 More Replies

by adrianhernandez • New Contributor III

Friday

96 Views
2 replies
1 kudos

Wheel permissions issue

I get a : org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY,SELECT on any file. SQLSTATE: 42501 at com.databricks.sql.acl.Unauthorized.throwInsufficientPermissionsError(P...

Data Engineering

96 Views
2 replies
1 kudos

Friday

View Replies

Latest Reply

NandiniN
Databricks Employee

Friday

1 kudos

Hi @adrianhernandez , The permissions error indicates you need to have the privileges for "any file". To resolve this, Can you try by adding the corresponding permissions and see if it works: %sql GRANT SELECT ON ANY FILE TO `username` %sql GRANT MO...

1 kudos

Friday

1 More Replies

by Hritik_Moon • New Contributor II

2 hours ago

26 Views
5 replies
4 kudos

Stop Cache in free edition

Hello,I am using databricks free edition, is there a way to turn off IO caching.I am trying to learn optimization and cant see any difference in query run time with caching enabled.

Data Engineering

26 Views
5 replies
4 kudos

2 hours ago

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

2 hours ago

4 kudos

Hi @Hritik_Moon ,I guess you cannot. To disable disk cache you need to have ability to run following command:spark.conf.set("spark.databricks.io.cache.enabled", "[true | false]")But serverless compute does not support setting most Spark properties fo...

4 kudos

2 hours ago

4 More Replies

by jorperort • Contributor

07-04-2025 12:51:09 PM

1981 Views
4 replies
2 kudos

Resolved! Executing Bash Scripts or Binaries Directly in Databricks Jobs on Single Node Cluster

Hi,Is it possible to directly execute a Bash script or a binary executable from the operating system of a Databricks job compute node using a single node cluster?I’m using databricks asset bundels for job initialization and execution. When the job s...

Data Engineering

1981 Views
4 replies
2 kudos

07-04-2025 12:51:09 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

a week ago

2 kudos

Hello @jorperort , I did some research internally and have some tips/suggestions for you to consider: Based on the research and available documentation, it is not possible to directly execute a Bash script or binary executable from the operating sy...

2 kudos

a week ago

3 More Replies

by Vsleg • Contributor

03-13-2024 7:14:10 AM

3090 Views
5 replies
0 kudos

Enabling enableChangeDataFeed on Streaming Table created in DLT

Hello, Can I enable Change Data Feed on Streaming Tables? How should I do this? I couldn't find this in the existing documentation https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed .

Data Engineering

3090 Views
5 replies
0 kudos

03-13-2024 7:14:10 AM

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

2 hours ago

0 kudos

Hi @Vsleg i think you cannot enable cdf like this for streaming tables. it is not natively supported for DLT streaming tables , please have a look here = Propagating Deletes: Managing Data Removal using D... - Databricks Community - 90978

0 kudos

2 hours ago

4 More Replies

by Chris_N • New Contributor

Friday

125 Views
3 replies
1 kudos

Unable to configure clustering on DLT tables

Hi TeamI have a DLT pipeline with `cluster_by` property configured for all my tables. The code looks something like below:@Dlt.table( name="flows", cluster_by=["from"] ) def flows(): <LOGIC>It was all working fine and in couple of days, the queries w...

Data Engineering

125 Views
3 replies
1 kudos

Friday

View Replies

Latest Reply

NandiniN
Databricks Employee

Friday

1 kudos

Hi @Chris_N , You have mentioned - "I couldn't find any cluster properties configured." If they existed and were changed, you can use the delta history command to check if someone changed on the clustering information. It is possible there were ch...

1 kudos

Friday

2 More Replies

by SuMiT1 • New Contributor III

Saturday

107 Views
2 replies
0 kudos

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...

Data Engineering

107 Views
2 replies
0 kudos

Saturday

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

3 hours ago

0 kudos

Hi @SuMiT1 are you using any iac tool like terraform etc. or you want to try out manually using your own identity?

0 kudos

3 hours ago

1 More Replies

by georgemichael40 • New Contributor II

yesterday

43 Views
1 replies
1 kudos

Best approach for writing/updating delta tables from python?

Hi,We are migrating a local dash app to the Databricks infrastructure (using databricks apps and our delta lake).The local app does the following (among others):- takes Excel files form the end-user- read in-memory and transforms to pandas dataframe-...

Data Engineering

43 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

3 hours ago

1 kudos

Hi @georgemichael40 My suggestion wud be to try using MERGE INTO for delta tables which works with connector then using delete/insert statements. This will also keep your code in SQL as you wanted. your tables are not large so this shud be sufficient...

1 kudos

3 hours ago

by rachelh • Visitor

yesterday

39 Views
3 replies
0 kudos

[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY on any file

Just wondering if anyone could help me understand why we are hitting this error: `[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY on any file`A job is trying to create a table with an external location (alread...

Data Engineering

39 Views
3 replies
0 kudos

yesterday

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

4 hours ago

0 kudos

Hi @rachelh As I understand , you need to look for azure access connector setup for your unity catalog because Serverless clusters run under a Azure Databricks-managed identity, not the service principal.Access Connector (Azure Managed Identity): Use...

0 kudos

4 hours ago

2 More Replies

by Hritik_Moon • New Contributor II

4 hours ago

17 Views
1 replies
2 kudos

Reading snappy.parquet

I stored a dataframe as delta in the catalog. It created multiple folders with snappy.parquet files. Is there a way to read these snappy.parquet files.it reads with pandas but with spark it gives error "incompatible format"

Data Engineering

17 Views
1 replies
2 kudos

4 hours ago

View Replies

Latest Reply

Khaja_Zaffer
Contributor III

4 hours ago

2 kudos

Hello good day @Hritik_Moon That incompatible format is expected as when you try to read in parquet because of presence of delta_log created with delta format which follows acid principals its like AnalysisException.recommended would be read in delta...

2 kudos

4 hours ago

by AlanDanque • New Contributor

07-08-2025 7:46:16 AM

1240 Views
2 replies
0 kudos

Salesforce Bulk API 2.0 not getting all rows from large table

Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?

Data Engineering

1240 Views
2 replies
0 kudos

07-08-2025 7:46:16 AM

View Replies

Latest Reply

ManojkMohan
Honored Contributor

yesterday

0 kudos

@AlanDanque I am working on a similar use case and will share screen shots shortlyBut to reach the root cause can you share the below detailsChecks at SalesforceDescriptionHeader used?Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included...

0 kudos

yesterday

1 More Replies

by Nidhig • Contributor

yesterday

64 Views
1 replies
0 kudos

Conversational Agent App integration with genie in Databricks

Hi,I have recently explore the feature of conversational agent app from marketplace integration with Genie Space.The connection setup went well but I could find sync issue between the app and genie space. Even after multiple deployment I couldn't see...

Data Engineering

64 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

HariSankar
Contributor III

yesterday

0 kudos

Hi @Nidhig,This isn’t expected behavior,it usually happens when the app's service principal lacks permissions to access the SQL warehouse, Genie Space, or underlying Unity Catalog tables.Try these fixes:--> SQL Warehouse: Go to Compute -> SQL Warehou...

0 kudos

yesterday

by saicharandeepb • New Contributor III

Friday

151 Views
1 replies
0 kudos

Capturing Streaming Metrics in Near Real-Time Using Cluster Logs

Over the past few weeks, I’ve been exploring ways to capture streaming metrics from our data load jobs. The goal is to monitor job performance and behavior in real time, without disrupting our existing data load pipelines.Initial Exploration: Streami...

Data Engineering

151 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Krishna_S
Databricks Employee

Saturday

0 kudos

Hi @saicharandeepb Good job on doing such detailed research on monitoring structured streaming. If you need lower latency than rolling log permits, then have you tried this:Cluster-wide listener injection: Use spark.extraListeners to register a cust...

0 kudos

Saturday

by Abrarali8708 • New Contributor II

a week ago

332 Views
4 replies
4 kudos

Resolved! Node type not available in Central India (Student Subscription)

Hi Community,I have deployed an Azure Databricks workspace in the Central India region using a student subscription. While trying to create a compute resource, I encountered an error stating that the selected node type is not available in Central Ind...

Data Engineering

332 Views
4 replies
4 kudos

a week ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

Tuesday

4 kudos

@Abrarali8708 As discussed can you trymanaging the Azure Policy definition:Locate the policy definition ID /providers/Microsoft.Authorization/policyDefinitions/b86dabb9-b578-4d7b-b842-3b45e95769a1.Modify the parameter listOfAllowedLocations to inclu...

4 kudos

Tuesday

3 More Replies

by Michał • New Contributor III

09-03-2025 6:41:10 AM

665 Views
5 replies
3 kudos

how to process a streaming lakeflow declarative pipeline in batches

Hi, I've got a problem and I have run out of ideas as to what else I can try. Maybe you can help? I've got a delta table with hundreds millions of records on which I have to perform relatively expensive operations. I'd like to be able to process some...

Data Engineering

665 Views
5 replies
3 kudos

09-03-2025 6:41:10 AM

View Replies

Latest Reply

mmayorga
Databricks Employee

3 weeks ago

3 kudos

Hi @Michał , One detail/feature to consider when working with Declarative Pipelines is that they manage and auto-tune configuration aspects, including rate limiting (maxBytesPerTrigger or maxFilesPerTrigger). Perhaps that's why you could not see this...

3 kudos

3 weeks ago

4 More Replies

Databricks Community

Forum Posts

Data profiling monitoring with foreign catalog

Wheel permissions issue

Stop Cache in free edition

Resolved! Executing Bash Scripts or Binaries Directly in Databricks Jobs on Single Node Cluster

Enabling enableChangeDataFeed on Streaming Table created in DLT

Unable to configure clustering on DLT tables

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

Best approach for writing/updating delta tables from python?

[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY on any file

Reading snappy.parquet

Salesforce Bulk API 2.0 not getting all rows from large table

Conversational Agent App integration with genie in Databricks

Capturing Streaming Metrics in Near Real-Time Using Cluster Logs

Resolved! Node type not available in Central India (Student Subscription)

how to process a streaming lakeflow declarative pipeline in batches

Join Us as a Local Community Builder!

Issue with Lakebridge transpile installation – SSL...

Spark JDBC Netsuite error - SQLSyntaxErrorExcepti...

Syncing lakebase table to delta table

Online Table Migration

How can I execute a Spark SQL query inside a Unity...