cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alesventus
by Contributor
  • 7372 Views
  • 4 replies
  • 2 kudos

Unity Catalog metastore is down error

When I want to run notebook in databricks all queries, saves and read take really long and I found error message in the clusters event log that says: Metastore is down. So, I think cluster is not able to connect to the metastore right now. Could be t...

Data Engineering
metastore
Unity Catalog
  • 7372 Views
  • 4 replies
  • 2 kudos
Latest Reply
alesventus
Contributor
  • 2 kudos

This issue is solely related to the VNET. Azure engineer must set up connection within VNET correctly. 

  • 2 kudos
3 More Replies
jwilliam
by Contributor
  • 5440 Views
  • 3 replies
  • 2 kudos

Resolved! How to mount Azure Blob Storage with OAuth2?

We already know that we can mount Azure Data Lake Gen2 with OAuth2 using this:configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", ...

  • 5440 Views
  • 3 replies
  • 2 kudos
Latest Reply
dssatpute
New Contributor II
  • 2 kudos

Try replacing wasbs with abfss and dfs with blob in the URI, should work! 

  • 2 kudos
2 More Replies
ayush25091995
by New Contributor III
  • 3670 Views
  • 6 replies
  • 0 kudos

Resolved! how to get schema and catalog name in sql warehouse query history API

Hi,we are using SQL history query API by selecting catalog and schema name directly from SQL editor instead of passing it through query, we are not getting the schema name and catalog name in query text for that particular id.So, how can we get the s...

  • 3670 Views
  • 6 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

True  ! try this :import requests import json # Define your Databricks workspace URL and API token databricks_instance = "https://<your-databricks-instance>" api_token = "dapi<your-api-token>" # Fetch SQL query history def get_query_history(): ...

  • 0 kudos
5 More Replies
Anonymous
by Not applicable
  • 33436 Views
  • 7 replies
  • 0 kudos

Resolved! Tuning shuffle partitions

Is the best practice for tuning shuffle partitions to have the config "autoOptimizeShuffle.enabled" on? I see it is not switched on by default. Why is that?

  • 33436 Views
  • 7 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

AQE applies to all queries that are:Non-streamingContain at least one exchange (usually when there’s a join, aggregate, or window), one sub-query, or both.Not all AQE-applied queries are necessarily re-optimized. The re-optimization might or might no...

  • 0 kudos
6 More Replies
ayush25091995
by New Contributor III
  • 1010 Views
  • 1 replies
  • 0 kudos

how to pass page_token while calling API to get query history in SQL warehouse

Hi,I am getting each queryid getting duplicated in next page when calling API query history for SQL warehouse in next page, how ever page token is different for different pages.how should we pass Page token ?since in databricks doc, it is mentioned w...

  • 1010 Views
  • 1 replies
  • 0 kudos
Latest Reply
ayush25091995
New Contributor III
  • 0 kudos

any help on this plz?

  • 0 kudos
oakhill
by New Contributor III
  • 1781 Views
  • 4 replies
  • 0 kudos

Optimal process for loading data where the full dataset is provided every day?

We receive several datasets where the full dump is delivered daily or weekly. What is the best way to ingest this into Databricks using DLT or basic PySpark while adhering to the medallion?1. If we use AutoLoader into Bronze, We'd end up with increme...

  • 1781 Views
  • 4 replies
  • 0 kudos
Latest Reply
dbrx_user
New Contributor III
  • 0 kudos

Agree with @Witold to apply CDC as early as possible. Depending on where the initial files get deposited, I'd recommend having an initial raw layer to your medallion which is just your cloud storage account - so each day or week the files get deposit...

  • 0 kudos
3 More Replies
Spencer_Kent
by New Contributor III
  • 21056 Views
  • 10 replies
  • 6 kudos

Shared cluster configuration that permits `dbutils.fs` commands

My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually us...

insufficient_permissions_on_shared_cluster shared_cluster_config individual_use_cluster
  • 21056 Views
  • 10 replies
  • 6 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 6 kudos

Can you not use a No Isolation Shared cluster with Table access controls enabled on workspace level? 

  • 6 kudos
9 More Replies
vinaykumar
by New Contributor III
  • 4684 Views
  • 4 replies
  • 1 kudos

Can define custom session variable for login user authentication in databricks for Row -Column level security .

can create custom session variable for login user authentication in databricks .Like HANA session Variables, we have scenarios like today’s spotfire where we use a single generic user to connect to HANA ( we don’t have single sign on enabled ) in th...

  • 4684 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @vinay kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
3 More Replies
Wolverine
by New Contributor III
  • 1809 Views
  • 3 replies
  • 1 kudos

Databricks Magic Command

I am trying few commands what is the equivalent magic command of dbutils.fs.rm("dbfs:/sampledir",True) Actually I am looking how to use magic commands in same way as dbutils  . For Instance  dbutils.fs.head('dbfs:/FileStore/<<name>>.csv',10) Gives 10...

  • 1809 Views
  • 3 replies
  • 1 kudos
Latest Reply
Witold
Honored Contributor
  • 1 kudos

You could use shell commands, like %sh rm -r sampledir You need to check for the correct path before, I currently don't know where dbfs folders are exactly mounted

  • 1 kudos
2 More Replies
ajithgaade
by New Contributor III
  • 3268 Views
  • 8 replies
  • 2 kudos

Autoloader includeExistingFiles with retry didn't update the schema

Hi,written in pyspark.databricks autoloader job with retry didn't merge/update the schema.spark.readStream.format("cloudFiles").option("cloudFiles.format", "parquet").option("cloudFiles.schemaLocation", checkpoint_path).option("cloudFiles.includeExis...

  • 3268 Views
  • 8 replies
  • 2 kudos
Latest Reply
mtajmouati
Contributor
  • 2 kudos

Hello,Try this :  from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder \ .appName("Auto Loader Schema Evolution") \ .getOrCreate() # Source and checkpoint paths source_path = "s3://path" checkpoint_pa...

  • 2 kudos
7 More Replies
ksenija
by Contributor
  • 1944 Views
  • 4 replies
  • 0 kudos

DLT pipeline - DebeziumJDBCMicroBatchProvider not found

Hi!I created DLT pipeline and I'm getting this error:[STREAM_FAILED] Query [id = ***, runId = ***] terminated with exception: object com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found.I'm using Serverless.How to verify that the require...

Data Engineering
DebeziumJDBCMicroBatchProvider
dlt
  • 1944 Views
  • 4 replies
  • 0 kudos
Latest Reply
ksenija
Contributor
  • 0 kudos

@Dnirmania, @jlachniet I didn’t manage to resolve this issue, but I created a regular notebook and I’m using MERGE statement. If you can’t merge all data at once, you can use a loop with hourly intervals

  • 0 kudos
3 More Replies
EDDatabricks
by Contributor
  • 3173 Views
  • 2 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 3173 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dilisha
New Contributor II
  • 0 kudos

Hi @EDDatabricks  - were you able to find the fix for this? I am also facing a similar issue. Added more details here  - Getting concurrent Append exception after upgradin... - Databricks Community - 76521

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels