cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AhmedAlnaqa
by Contributor
  • 1828 Views
  • 1 replies
  • 1 kudos

Resolved! Enhancements: interact with DBFS breadcrumbs

Hi there,This is my first thread and it's a baby-foot step in the Databricks community, especially Data engineering section.am working in the community edition and I found this enhancement needed to be implemented: The need is to make the breadcrumbs...

DBFS.png
  • 1828 Views
  • 1 replies
  • 1 kudos
Latest Reply
amr
Databricks Employee
  • 1 kudos

Good feedback, thank you. we are actully looking to complelty revamp the databricks community edition and the experience will be much simpler. stay tuned.

  • 1 kudos
Sid_SBA
by New Contributor
  • 1167 Views
  • 1 replies
  • 0 kudos

Resolved! How to integrate the CI/CD process with Databricks using Azure Devops on Catalog level.

How to integrate the CI/CD process with Databricks using Azure Devops on Catalog level instead of workspace level. I would like to understand the process if this is possible, given that if the catalog is used in different workspaces in same subscript...

  • 1167 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

CICD is not related to catalogs, it is related to environment (workspaces), there are lots of tutorials on youtube on how to setup Azure DevOps CICD to move assets from one workspace to another and start a job. you will need to use Databricks Plugin ...

  • 0 kudos
anoopdk
by New Contributor II
  • 2267 Views
  • 1 replies
  • 1 kudos

Add option to skip or deactivate a task

It would be beneficial to have an option like a toggle to activate or deactivate a Task in the Job graph interface. This mainly helps to skip execution of a task and reactivate it as required. Currently there is no option to say I want this task to b...

  • 2267 Views
  • 1 replies
  • 1 kudos
Latest Reply
amr
Databricks Employee
  • 1 kudos

Maybe just load the task with an empty notebook, and once decided just load the right notebook. not ideal but should do the job I guess

  • 1 kudos
Priya_Data_Eng
by New Contributor
  • 1258 Views
  • 1 replies
  • 0 kudos

Special character data preservation

This data frame has two columns name and info. Name has value as John and info has vale as 1® VOC.After writing this data, I can read correct values in databricks but when I download the csv file and load it in notepad ( utf-8 ) , it is showing no va...

  • 1258 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

Try to read the file back into databricks using spark.read, do you the see the charchaters showing? if yes, then it is an editor problem, use another editor such as Notepad++, if not, then the data is not on the write encoder, try different encoder o...

  • 0 kudos
alesventus
by Contributor
  • 2041 Views
  • 2 replies
  • 0 kudos

Tasks in job are in pending state

I have databricks job with around 70 notebooks. When job starts, only one notebook gets executed and the rest of the notebooks that are at the beginning of the line are in state PENDING (not blocked). Looks like notebooks cannot run in parallel for t...

job_start.jpg job_middle.jpg
  • 2041 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Maybe something related to autoscaling options? So when databricks detects increased workload it will scale up number of workers and then the rest of notebooks get executed. Do you use DLT ?

  • 0 kudos
1 More Replies
alesventus
by Contributor
  • 7378 Views
  • 4 replies
  • 2 kudos

Unity Catalog metastore is down error

When I want to run notebook in databricks all queries, saves and read take really long and I found error message in the clusters event log that says: Metastore is down. So, I think cluster is not able to connect to the metastore right now. Could be t...

Data Engineering
metastore
Unity Catalog
  • 7378 Views
  • 4 replies
  • 2 kudos
Latest Reply
alesventus
Contributor
  • 2 kudos

This issue is solely related to the VNET. Azure engineer must set up connection within VNET correctly. 

  • 2 kudos
3 More Replies
jwilliam
by Contributor
  • 5444 Views
  • 3 replies
  • 2 kudos

Resolved! How to mount Azure Blob Storage with OAuth2?

We already know that we can mount Azure Data Lake Gen2 with OAuth2 using this:configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", ...

  • 5444 Views
  • 3 replies
  • 2 kudos
Latest Reply
dssatpute
New Contributor II
  • 2 kudos

Try replacing wasbs with abfss and dfs with blob in the URI, should work! 

  • 2 kudos
2 More Replies
ayush25091995
by New Contributor III
  • 3675 Views
  • 6 replies
  • 0 kudos

Resolved! how to get schema and catalog name in sql warehouse query history API

Hi,we are using SQL history query API by selecting catalog and schema name directly from SQL editor instead of passing it through query, we are not getting the schema name and catalog name in query text for that particular id.So, how can we get the s...

  • 3675 Views
  • 6 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

True  ! try this :import requests import json # Define your Databricks workspace URL and API token databricks_instance = "https://<your-databricks-instance>" api_token = "dapi<your-api-token>" # Fetch SQL query history def get_query_history(): ...

  • 0 kudos
5 More Replies
Anonymous
by Not applicable
  • 33479 Views
  • 7 replies
  • 0 kudos

Resolved! Tuning shuffle partitions

Is the best practice for tuning shuffle partitions to have the config "autoOptimizeShuffle.enabled" on? I see it is not switched on by default. Why is that?

  • 33479 Views
  • 7 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

AQE applies to all queries that are:Non-streamingContain at least one exchange (usually when there’s a join, aggregate, or window), one sub-query, or both.Not all AQE-applied queries are necessarily re-optimized. The re-optimization might or might no...

  • 0 kudos
6 More Replies
ayush25091995
by New Contributor III
  • 1011 Views
  • 1 replies
  • 0 kudos

how to pass page_token while calling API to get query history in SQL warehouse

Hi,I am getting each queryid getting duplicated in next page when calling API query history for SQL warehouse in next page, how ever page token is different for different pages.how should we pass Page token ?since in databricks doc, it is mentioned w...

  • 1011 Views
  • 1 replies
  • 0 kudos
Latest Reply
ayush25091995
New Contributor III
  • 0 kudos

any help on this plz?

  • 0 kudos
oakhill
by New Contributor III
  • 1783 Views
  • 4 replies
  • 0 kudos

Optimal process for loading data where the full dataset is provided every day?

We receive several datasets where the full dump is delivered daily or weekly. What is the best way to ingest this into Databricks using DLT or basic PySpark while adhering to the medallion?1. If we use AutoLoader into Bronze, We'd end up with increme...

  • 1783 Views
  • 4 replies
  • 0 kudos
Latest Reply
dbrx_user
New Contributor III
  • 0 kudos

Agree with @Witold to apply CDC as early as possible. Depending on where the initial files get deposited, I'd recommend having an initial raw layer to your medallion which is just your cloud storage account - so each day or week the files get deposit...

  • 0 kudos
3 More Replies
Spencer_Kent
by New Contributor III
  • 21075 Views
  • 10 replies
  • 6 kudos

Shared cluster configuration that permits `dbutils.fs` commands

My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually us...

insufficient_permissions_on_shared_cluster shared_cluster_config individual_use_cluster
  • 21075 Views
  • 10 replies
  • 6 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 6 kudos

Can you not use a No Isolation Shared cluster with Table access controls enabled on workspace level? 

  • 6 kudos
9 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels