cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

maikelos272
by New Contributor II
  • 7636 Views
  • 5 replies
  • 1 kudos

Cannot create storage credential without Contributor role

Hello,I am trying to create a Storage Credential. I have created the access connector and gave the managed identity "Storage Blob Data Owner" permissions. However when I want to create a storage credential I get the following error:Creating a storage...

  • 7636 Views
  • 5 replies
  • 1 kudos
Latest Reply
subhash_1692
New Contributor II
  • 1 kudos

Did someone find a solution?{ "error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Refresh token not found for userId: Some(2302042022180399)", "details": [ { "@type": "type.googleapis.com/google.rpc.RequestInfo", "request_id": "d731471b-b...

  • 1 kudos
4 More Replies
tbao
by New Contributor
  • 1028 Views
  • 1 replies
  • 0 kudos

Scala notebooks don't automatically print variables

It seems like with Scala notebook if I declare some variables or have important statements then the cell runs it will automatically print out the variables and import statements. Is there a way to disable this, so only explicit println are output?

  • 1028 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

There isn't a configuration that can be set to True/False and control this behavior for some statements. This output is part of Databrick's interactive notebook design, where all evaluated statements—such as imports, variable declarations, and expres...

  • 0 kudos
noimeta
by Contributor III
  • 7095 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks SQL: catalog of each query

Currently, we are migrating from hive metastore to UC. We have several dashboards and a huge number of queries whose catalogs have been set to hive_metastore and using <db>.<table> access pattern.I'm just wondering if there's a way to switch catalogs...

  • 7095 Views
  • 7 replies
  • 4 kudos
Latest Reply
h_h_ak
Contributor
  • 4 kudos

May you can also have a look here, if you need hot fix  https://github.com/databrickslabs/ucx 

  • 4 kudos
6 More Replies
rkand
by New Contributor
  • 1363 Views
  • 2 replies
  • 0 kudos

Glob pattern for copy into

I am trying to load some files in my Azure storage container using copy into method. The files have a naming convention of "2023-<month>-<date> <timestamp>".csv.gz. All the files are in one folder.  I want to load only files for month 2. So I've used...

  • 1363 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

TL;DR Try removing the trailing slash in the FROM value. The trailing slash in FROM confuses the URI parser, making it think that PATTERN might be an absolute path rather than a relative one. The error message points to a problem not with respect to ...

  • 0 kudos
1 More Replies
Ameshj
by New Contributor III
  • 23098 Views
  • 12 replies
  • 2 kudos

Resolved! Dbfs init script migration

I need help with migrating from dbfs on databricks to workspace. I am new to databricks and am struggling with what is on the links provided.My workspace.yml also has dbfs hard-coded. Included is a full deployment with great expectations.This was don...

Data Engineering
Azure Databricks
dbfs
Great expectations
python
  • 23098 Views
  • 12 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Glad it worked and helped you.

  • 2 kudos
11 More Replies
Data_Engineer3
by Contributor III
  • 4964 Views
  • 5 replies
  • 0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL]​ #[Spark streaming]​ #[Spark structured streaming]​ #Spark​ 

  • 4964 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html  Also, what is the challenge while using foreachbatch?

  • 0 kudos
4 More Replies
ShaliniC
by New Contributor II
  • 1259 Views
  • 4 replies
  • 1 kudos

workflow fails when ran using a job cluster but not in shared cluster

Hi,We have a workflow which calls 3 notebooks and when we run this workflow using shared cluster it runs fine, but when ran with job cluster , one of the notebooks fail.This notebook uses sql function Lpad and looks like it errors because of it. Has ...

  • 1259 Views
  • 4 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

notebooks are executing sequentially or parallel in this workflow?

  • 1 kudos
3 More Replies
Kguy
by New Contributor II
  • 2328 Views
  • 5 replies
  • 0 kudos

Delta live type 2 scd Liquid clustering on Start and end dates

I've created a DLT pipeline that creates type 2 SCDs and often the __Start_at and __end_at columns are beyond the first 32 columns for stat collection.I'd like to add these columns to liquid clustering without increasing the number of columns in the ...

  • 2328 Views
  • 5 replies
  • 0 kudos
Latest Reply
Kguy
New Contributor II
  • 0 kudos

Are these responses generated by chatgpt? They don't answer the question and very much have the tone of generative AI

  • 0 kudos
4 More Replies
MadhuraC
by New Contributor II
  • 1083 Views
  • 2 replies
  • 0 kudos

Error connecting to MySQL from Databricks: (2003, "Can't connect to MySQL server")

Hello Community,I'm facing an issue connecting to a MySQL database hosted on AWS RDS from within a Data bricks notebook. My Python script to connect to MySQL works fine locally, but when I run it in Data bricks, I receive this error:Error connecting ...

  • 1083 Views
  • 2 replies
  • 0 kudos
Latest Reply
MadhuraC
New Contributor II
  • 0 kudos

It is Databricks in AWS.

  • 0 kudos
1 More Replies
RangaSarangan
by New Contributor II
  • 3521 Views
  • 2 replies
  • 3 kudos

Resolved! Asset Bundles pause_status Across Different Environments

HiQuestion probably around best practices, but curious if someone else has dealt with a similar situation. I have 2 Databricks workspaces - one for Dev and one for Prod. Had to be two workspaces because Azure Landing Zones had to be air gapped from e...

  • 3521 Views
  • 2 replies
  • 3 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 3 kudos

Hi @RangaSarangan ,We have faced same issue and solved using databricks workflow API and json file for job metadata that consist job and thier respective status for each env.You can create azure devops that run after your cicd pipeline and change the...

  • 3 kudos
1 More Replies
eriodega
by Contributor
  • 808 Views
  • 1 replies
  • 0 kudos

CREATED WIDGET - SQL syntax - how do I specify a label?

What is the syntax in SQL for creating a widget in a notebook with a label?This documentation says "The last argument is label, an optional value for the label shown over the widget text box or dropdown."The one example provided on that page doesn't ...

  • 808 Views
  • 1 replies
  • 0 kudos
Latest Reply
nefflev
New Contributor II
  • 0 kudos

Hi @eriodega I do not know how it works with sql but a possibility is to use a python cell in your SQL notebook and create it like this:%python dbutils.widgets.text("some_name", "a great default", "some label/description")All the best 

  • 0 kudos
ehpogue
by New Contributor III
  • 12002 Views
  • 3 replies
  • 1 kudos

Schedule a Notebook Dashboard

Hey all,I have a workflow that updates a delta table, and then runs a notebook that generates a dashboard. I was hoping that by adding this second step that the dashboard would get updated to show the most current data, instead of the user needing to...

  • 12002 Views
  • 3 replies
  • 1 kudos
Latest Reply
trevormccormick
New Contributor III
  • 1 kudos

@ehpogue at the end of the day I just used chatgpt to rewrite a bunch of python code into SQL and mash together all of the temporary views into one giant query. hacky but it did work

  • 1 kudos
2 More Replies
Mani2105
by New Contributor II
  • 1174 Views
  • 1 replies
  • 0 kudos

Managed Table

Hi Experts,I have a workspace created and associated a metastore with it, the metastore points to a storage location USDATA and then I create two catalogs in the workspace and one is using default meta store as the external storage location and other...

Mani2105_0-1730143852722.png
  • 1174 Views
  • 1 replies
  • 0 kudos
Latest Reply
agallard
Contributor
  • 0 kudos

Hi @Mani2105,if i create a table in the sales catalog without  specifiying any external location, will the tables created be managed and will go to the Sales storage account Yes, if you create a table in the sales catalog without specifying any exter...

  • 0 kudos
SenthilJ
by New Contributor III
  • 3769 Views
  • 2 replies
  • 1 kudos

Resolved! Unity Catalog Metastore Details

hi,I would like to seek response to my following questions regarding Unity Catalog Metastore's path.While configuring metastore, designating a metastore storage account (in case of Azure, it's ADLS Gen2) seems to be an optional thing. In case I confi...

Data Engineering
Unity Catalog
  • 3769 Views
  • 2 replies
  • 1 kudos
Latest Reply
PL_db
Databricks Employee
  • 1 kudos

The storage container you configure for the metastore will contain the files of managed tables and volumes. The metadata is stored in a database of the Databricks control plane.

  • 1 kudos
1 More Replies
PassionateDBD
by New Contributor II
  • 6137 Views
  • 1 replies
  • 0 kudos

DLT full refresh

Running a task with full refresh in delta live tables removes existing data and reloads it from scratch. We are ingesting data from an event hub topic and from files. The event hub topic stores messages for seven days after arrival. If we would run a...

  • 6137 Views
  • 1 replies
  • 0 kudos
Latest Reply
JesseS
New Contributor II
  • 0 kudos

I know it's a bit after the fact, but in case you didn't solve it, I came across this article in the Databricks documentation.  You can set pipelines.reset.allowed to false on a table to prevent a full refresh of a table.  Ref: https://docs.databrick...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels