cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sta_gas
by Visitor
  • 15 Views
  • 2 replies
  • 0 kudos

Data profiling monitoring with foreign catalog

Hi team,I’m currently working with Azure Databricks and have created a foreign catalog for my source database in Azure SQL. I can successfully run SELECT statements from Databricks to the Azure SQL database.However, I would like to set up data profil...

sta_gas_0-1760357690503.png
  • 15 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @sta_gas ,Since data quality monitoring is in beta I'm quite sure they don't support foreign tables as of now (but they forgot to mentioned it in docs).But more important question if they ever will be supported. For me data quality monitoring appl...

  • 0 kudos
1 More Replies
adrianhernandez
by New Contributor III
  • 96 Views
  • 2 replies
  • 1 kudos

Wheel permissions issue

I get a : org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY,SELECT on any file. SQLSTATE: 42501 at com.databricks.sql.acl.Unauthorized.throwInsufficientPermissionsError(P...

  • 96 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hi @adrianhernandez ,  The permissions error indicates you need to have the privileges for "any file". To resolve this, Can you try by adding the corresponding permissions and see if it works: %sql GRANT SELECT ON ANY FILE TO `username` %sql GRANT MO...

  • 1 kudos
1 More Replies
Hritik_Moon
by New Contributor II
  • 26 Views
  • 5 replies
  • 4 kudos

Stop Cache in free edition

Hello,I am using databricks free edition, is there a way to turn off IO caching.I am trying to learn optimization and cant see any difference in query run time with caching enabled.

  • 26 Views
  • 5 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @Hritik_Moon ,I guess you cannot. To disable disk cache you need to have ability to run following command:spark.conf.set("spark.databricks.io.cache.enabled", "[true | false]")But serverless compute does not support setting most Spark properties fo...

  • 4 kudos
4 More Replies
jorperort
by Contributor
  • 1981 Views
  • 4 replies
  • 2 kudos

Resolved! Executing Bash Scripts or Binaries Directly in Databricks Jobs on Single Node Cluster

Hi,Is it possible to directly execute a Bash script or a binary executable from the operating system of a Databricks job compute node using a single node cluster?I’m using databricks asset bundels  for job initialization and execution. When the job s...

  • 1981 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Hello @jorperort , I did some research internally and have some tips/suggestions for you to consider:   Based on the research and available documentation, it is not possible to directly execute a Bash script or binary executable from the operating sy...

  • 2 kudos
3 More Replies
Vsleg
by Contributor
  • 3090 Views
  • 5 replies
  • 0 kudos

Enabling enableChangeDataFeed on Streaming Table created in DLT

Hello, Can I enable Change Data Feed on Streaming Tables? How should I do this? I couldn't find this in the existing documentation https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed .

  • 3090 Views
  • 5 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @Vsleg i think you cannot enable cdf like this for streaming tables. it is not natively supported for DLT streaming tables , please have a look here = Propagating Deletes: Managing Data Removal using D... - Databricks Community - 90978

  • 0 kudos
4 More Replies
Chris_N
by New Contributor
  • 125 Views
  • 3 replies
  • 1 kudos

Unable to configure clustering on DLT tables

Hi TeamI have a DLT pipeline with `cluster_by` property configured for all my tables. The code looks something like below:@Dlt.table( name="flows", cluster_by=["from"] ) def flows(): <LOGIC>It was all working fine and in couple of days, the queries w...

  • 125 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hi @Chris_N ,   You have mentioned - "I couldn't find any cluster properties configured." If they existed and were changed, you can use the delta history command to check if someone changed on the clustering information.  It is possible there were ch...

  • 1 kudos
2 More Replies
SuMiT1
by New Contributor III
  • 107 Views
  • 2 replies
  • 0 kudos

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...

  • 107 Views
  • 2 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @SuMiT1 are you using any iac tool like terraform etc. or you want to try out manually using your own identity?

  • 0 kudos
1 More Replies
georgemichael40
by New Contributor II
  • 43 Views
  • 1 replies
  • 1 kudos

Best approach for writing/updating delta tables from python?

Hi,We are migrating a local dash app to the Databricks infrastructure (using databricks apps and our delta lake).The local app does the following (among others):- takes Excel files form the end-user- read in-memory and transforms to pandas dataframe-...

  • 43 Views
  • 1 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

Hi @georgemichael40 My suggestion wud be to try using MERGE INTO for delta tables which works with connector then using delete/insert statements. This will also keep your code in SQL as you wanted. your tables are not large so this shud be sufficient...

  • 1 kudos
rachelh
by Visitor
  • 39 Views
  • 3 replies
  • 0 kudos

[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY on any file

Just wondering if anyone could help me understand why we are hitting this error: `[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY on any file`A job is trying to create a table with an external location (alread...

  • 39 Views
  • 3 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @rachelh As I understand , you need to look for azure access connector setup for your unity catalog because Serverless clusters run under a Azure Databricks-managed identity, not the service principal.Access Connector (Azure Managed Identity): Use...

  • 0 kudos
2 More Replies
Hritik_Moon
by New Contributor II
  • 17 Views
  • 1 replies
  • 2 kudos

Reading snappy.parquet

I stored a dataframe as delta in the catalog. It created multiple folders with snappy.parquet files. Is there a way to read these snappy.parquet files.it reads with pandas but with spark it gives error "incompatible format"

  • 17 Views
  • 1 replies
  • 2 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 2 kudos

Hello good day @Hritik_Moon That incompatible format is expected as when you try to read in parquet because of presence of delta_log created with delta format which follows acid principals its like AnalysisException.recommended would be read in delta...

  • 2 kudos
AlanDanque
by New Contributor
  • 1240 Views
  • 2 replies
  • 0 kudos

Salesforce Bulk API 2.0 not getting all rows from large table

Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?

  • 1240 Views
  • 2 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 0 kudos

@AlanDanque I am working on a similar use case and will share screen shots shortlyBut to reach the root cause can you share the below detailsChecks at SalesforceDescriptionHeader used?Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included...

  • 0 kudos
1 More Replies
Nidhig
by Contributor
  • 64 Views
  • 1 replies
  • 0 kudos

Conversational Agent App integration with genie in Databricks

Hi,I have recently explore the feature of conversational agent app from marketplace integration with Genie Space.The connection setup went well but I could find sync issue between the app and genie space. Even after multiple deployment I couldn't see...

  • 64 Views
  • 1 replies
  • 0 kudos
Latest Reply
HariSankar
Contributor III
  • 0 kudos

Hi @Nidhig,This isn’t expected behavior,it usually happens when the app's service principal lacks permissions to access the SQL warehouse, Genie Space, or underlying Unity Catalog tables.Try these fixes:--> SQL Warehouse: Go to Compute -> SQL Warehou...

  • 0 kudos
saicharandeepb
by New Contributor III
  • 151 Views
  • 1 replies
  • 0 kudos

Capturing Streaming Metrics in Near Real-Time Using Cluster Logs

Over the past few weeks, I’ve been exploring ways to capture streaming metrics from our data load jobs. The goal is to monitor job performance and behavior in real time, without disrupting our existing data load pipelines.Initial Exploration: Streami...

saicharandeepb_0-1760081131866.png
  • 151 Views
  • 1 replies
  • 0 kudos
Latest Reply
Krishna_S
Databricks Employee
  • 0 kudos

Hi @saicharandeepb  Good job on doing such detailed research on monitoring structured streaming. If you need lower latency than rolling log permits, then have you tried this:Cluster-wide listener injection: Use spark.extraListeners to register a cust...

  • 0 kudos
Abrarali8708
by New Contributor II
  • 332 Views
  • 4 replies
  • 4 kudos

Resolved! Node type not available in Central India (Student Subscription)

Hi Community,I have deployed an Azure Databricks workspace in the Central India region using a student subscription. While trying to create a compute resource, I encountered an error stating that the selected node type is not available in Central Ind...

  • 332 Views
  • 4 replies
  • 4 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 4 kudos

@Abrarali8708  As discussed can you trymanaging the Azure Policy definition:Locate the policy definition ID /providers/Microsoft.Authorization/policyDefinitions/b86dabb9-b578-4d7b-b842-3b45e95769a1.Modify the parameter listOfAllowedLocations to inclu...

  • 4 kudos
3 More Replies
Michał
by New Contributor III
  • 665 Views
  • 5 replies
  • 3 kudos

how to process a streaming lakeflow declarative pipeline in batches

Hi, I've got a problem and I have run out of ideas as to what else I can try. Maybe you can help? I've got a delta table with hundreds millions of records on which I have to perform relatively expensive operations. I'd like to be able to process some...

  • 665 Views
  • 5 replies
  • 3 kudos
Latest Reply
mmayorga
Databricks Employee
  • 3 kudos

Hi @Michał , One detail/feature to consider when working with Declarative Pipelines is that they manage and auto-tune configuration aspects, including rate limiting (maxBytesPerTrigger or maxFilesPerTrigger). Perhaps that's why you could not see this...

  • 3 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels