cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

adhi_databricks
by Contributor
  • 833 Views
  • 1 replies
  • 0 kudos

DATABRICKS CLEANROOMS

Hi Team,I have a few questions regarding Databricks Cleanrooms:For onboarding first-party data, does the collaborator need a Databricks account with an enabled UC workspace?How is it useful for activating data for retargeting or prospecting use cases...

  • 833 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

For onboarding first-party data, the collaborator does need a Databricks account with an enabled Unity Catalog (UC) workspace. This is necessary to map system tables into its metastore and to observe non-UC governed assets. Activating data for retarg...

  • 0 kudos
sanket-kelkar
by New Contributor II
  • 1205 Views
  • 1 replies
  • 0 kudos

Auto OPTIMIZE causing a data discrepancy

I have a delta table in Azure Databricks that gets MERGEd every 10 minutes.In the attached screenshot, in the version history of this table, I see a MERGE operation every 10 minutes which is expected. Along with that, I see the OPTIMIZE operation aft...

  • 1205 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Can you please provide more context about this, specifically with respect to the DBR Release and reproducibility of this scenario? Any metrics or plan change differences between both select statements, while the Optimize was in progress and after? Th...

  • 0 kudos
AcrobaticMonkey
by New Contributor II
  • 1124 Views
  • 1 replies
  • 0 kudos

Cannot Get Query Results in SQL Alerts

Example Query:select name, date from errors;Now i want to trigger an alert if count is greater than 1, and a notification should be sent to slack with output rows (name and date values). Even if i use {{QUERY_RESULT_ROWS}}, it only gives value after ...

  • 1124 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Note I have not tried this myself, but can you try the following and let me know if it helps: Create the query, so SELECT name, date FROM errors;Set up the alert. Set the condition to trigger the alert when the count of rows is greater than 1.Create ...

  • 0 kudos
jonathanjone
by New Contributor
  • 796 Views
  • 1 replies
  • 0 kudos

Facing Some Issues with Tablet PC and Databricks Product – Any Advice?

Hello everyone,I’m having some trouble using Databricks SQL Analytics v2.1 on my tablet PC, and I was wondering if anyone here has had similar experiences or could offer some advice.The main issues I’m facing are:Performance Slowdowns: When I run com...

  • 796 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @jonathanjone , 1 - Performance Slowdowns could be because of the warehouse size and the query count, if there are many queries, the warehouse has a limit of 10 query in parallel, else you see the query being queued.  You could also check if the q...

  • 0 kudos
guangyi
by Contributor III
  • 1114 Views
  • 1 replies
  • 1 kudos

Resolved! Has the numUpdateRetryAttempts property been deprecated?

I noticed there is a numUpdateRetryAttempts property mentioned in the the document https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/properties  used for configing the retry times of any DLT pipeline, but I cannot find it in the DL...

  • 1114 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

According to the Delta Live Tables properties reference, pipelines.numUpdateRetryAttempts is a recognized configuration parameter. It specifies the maximum number of attempts to retry an update before failing the update when a retryable failure occur...

  • 1 kudos
Viswanth
by New Contributor II
  • 2351 Views
  • 3 replies
  • 0 kudos

Implementing Conditional Logic for Dependent Tasks Using SQL Output and Task Values

Hi team,I'm working on setting up a workflow with task dependencies where a subsequent task should execute conditionally, based on the result of a preceding SQL task. Specifically, I need to evaluate an if/else condition on the output of the SQL quer...

  • 2351 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramana
Valued Contributor
  • 0 kudos

This feature is in the Private Preview.

  • 0 kudos
2 More Replies
emiliec
by New Contributor
  • 1010 Views
  • 1 replies
  • 0 kudos

QGIS python command in Databricks notebook

Hello, I would like to run a qgis python script in a databricks notebook. Currently, databricks doesn't recognize the qgis package. For example, i'd like to run this small example : from qgis.core import *# Supply path to qgis install locationQgsAppl...

  • 1010 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

QGIS is not directly usable in Scala or Spark environments as it is a standalone Geographic Information System (GIS) application, installing and using it directly within Databricks may not be straightforward due to the specific environment and depend...

  • 0 kudos
mdsultan
by New Contributor II
  • 1472 Views
  • 4 replies
  • 0 kudos

MetaStore Issues

Hi,I am using Student Account in Azure and created databricks workspace. I am trying to locate Manage Account to create MetaStore but I am not successful. Would need your help.no Manage Account option available.If you see I am an Admin,  Thanks for a...

mdsultan_0-1730365288842.png mdsultan_1-1730365458932.png
  • 1472 Views
  • 4 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Databricks student account, may have certain limitations compared to a full Azure Databricks account. For student accounts, you might not have the necessary permissions to create a Unity Catalog metastore. Typically, creating and managing a metastore...

  • 0 kudos
3 More Replies
maikelos272
by New Contributor II
  • 7921 Views
  • 5 replies
  • 1 kudos

Cannot create storage credential without Contributor role

Hello,I am trying to create a Storage Credential. I have created the access connector and gave the managed identity "Storage Blob Data Owner" permissions. However when I want to create a storage credential I get the following error:Creating a storage...

  • 7921 Views
  • 5 replies
  • 1 kudos
Latest Reply
subhash_1692
New Contributor II
  • 1 kudos

Did someone find a solution?{ "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Refresh token not found for userId: Some(2302042022180399)", "details": [ { "@type": "type.googleapis.com/google.rpc.RequestInfo", "request_id": "d731471b-b...

  • 1 kudos
4 More Replies
tbao
by New Contributor
  • 1140 Views
  • 1 replies
  • 0 kudos

Scala notebooks don't automatically print variables

It seems like with Scala notebook if I declare some variables or have important statements then the cell runs it will automatically print out the variables and import statements. Is there a way to disable this, so only explicit println are output?

  • 1140 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

There isn't a configuration that can be set to True/False and control this behavior for some statements. This output is part of Databrick's interactive notebook design, where all evaluated statements—such as imports, variable declarations, and expres...

  • 0 kudos
noimeta
by Contributor III
  • 7317 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks SQL: catalog of each query

Currently, we are migrating from hive metastore to UC. We have several dashboards and a huge number of queries whose catalogs have been set to hive_metastore and using <db>.<table> access pattern.I'm just wondering if there's a way to switch catalogs...

  • 7317 Views
  • 7 replies
  • 4 kudos
Latest Reply
h_h_ak
Contributor
  • 4 kudos

May you can also have a look here, if you need hot fix  https://github.com/databrickslabs/ucx 

  • 4 kudos
6 More Replies
rkand
by New Contributor
  • 1457 Views
  • 2 replies
  • 0 kudos

Glob pattern for copy into

I am trying to load some files in my Azure storage container using copy into method. The files have a naming convention of "2023-<month>-<date> <timestamp>".csv.gz. All the files are in one folder.  I want to load only files for month 2. So I've used...

  • 1457 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

TL;DR Try removing the trailing slash in the FROM value. The trailing slash in FROM confuses the URI parser, making it think that PATTERN might be an absolute path rather than a relative one. The error message points to a problem not with respect to ...

  • 0 kudos
1 More Replies
Ameshj
by New Contributor III
  • 24230 Views
  • 12 replies
  • 2 kudos

Resolved! Dbfs init script migration

I need help with migrating from dbfs on databricks to workspace. I am new to databricks and am struggling with what is on the links provided.My workspace.yml also has dbfs hard-coded. Included is a full deployment with great expectations.This was don...

Data Engineering
Azure Databricks
dbfs
Great expectations
python
  • 24230 Views
  • 12 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Glad it worked and helped you.

  • 2 kudos
11 More Replies
Data_Engineer3
by Contributor III
  • 5139 Views
  • 5 replies
  • 0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL]​ #[Spark streaming]​ #[Spark structured streaming]​ #Spark​ 

  • 5139 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html  Also, what is the challenge while using foreachbatch?

  • 0 kudos
4 More Replies
ShaliniC
by New Contributor II
  • 1396 Views
  • 4 replies
  • 1 kudos

workflow fails when ran using a job cluster but not in shared cluster

Hi,We have a workflow which calls 3 notebooks and when we run this workflow using shared cluster it runs fine, but when ran with job cluster , one of the notebooks fail.This notebook uses sql function Lpad and looks like it errors because of it. Has ...

  • 1396 Views
  • 4 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

notebooks are executing sequentially or parallel in this workflow?

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels