cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dikla
by New Contributor II
  • 350 Views
  • 4 replies
  • 1 kudos

Resolved! Issues Creating Genie Space via API Join Specs Are Not Persisted

Hi,I’m experimenting with the new API to create a Genie Space.I’m able to successfully create the space, but the join definitions are not created, even though I’m passing a join_specs object in the same format returned by GET /spaces/{id} for an exis...

  • 350 Views
  • 4 replies
  • 1 kudos
Latest Reply
mtaran
Databricks Employee
  • 1 kudos

The serialized space JSON is incorrect. It has `join_specs` and `sql_snippets` nested under `data_sources`, but they should be nested under `instructions` instead. There they apply as expected.

  • 1 kudos
3 More Replies
kenny_hero
by New Contributor II
  • 167 Views
  • 5 replies
  • 1 kudos

How do I import a python module when deploying with DAB?

Below is how the folder structure of my project looks like: resources/ |- etl_event/ |- etl_event.job.yml src/ |- pipeline/ |- etl_event/ |- transformers/ |- transformer_1.py |- utils/ |- logger.py databricks.ym...

  • 167 Views
  • 5 replies
  • 1 kudos
Latest Reply
kenny_hero
New Contributor II
  • 1 kudos

@Hubert-Dudek, thank you for your response. I really appreciate it.However, I still cannot get the import to work even after I follow your instructions. Here is the folder structure:The transformer code is below:from pyspark import pipelines as dp f...

  • 1 kudos
4 More Replies
Maxrb
by New Contributor II
  • 165 Views
  • 1 replies
  • 1 kudos

Import functions in databricks asset bundles using source: WORKSPACE

Hi,We are using Databricks asset bundles, and we create functions which we import in notebooks, for instance:from utils import helperswhere utils is just a folder in our root. When running this with source: WORKSPACE, it will fail to resolve the impo...

  • 165 Views
  • 1 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

In Git folders, the repo root is auto-added to the Python path, so imports like from utils import helpers work, while in workspace folders, only the notebook’s directory is on the path, which is why it breaks. The quick fix is a tiny bootstrap that a...

  • 1 kudos
ramsai
by New Contributor
  • 127 Views
  • 3 replies
  • 3 kudos

Resolved! Serverless Compute Access Restriction Not Supported at User Level

The requirement is to disable serverless compute access for specific users while allowing them to use only their assigned clusters, without restricting serverless compute at the workspace level. After reviewing the available configuration options, th...

  • 127 Views
  • 3 replies
  • 3 kudos
Latest Reply
Masood_Joukar
New Contributor II
  • 3 kudos

Hi @ramsai ,how about a workaround ?setting budget policies at account level.Attribute usage with serverless budget policies | Databricks on AWS

  • 3 kudos
2 More Replies
RyanHager
by Contributor
  • 331 Views
  • 2 replies
  • 2 kudos

Resolved! Liquid Clustering and S3 Performance

Are there any performance concerns when using liquid clustering and AWS S3.  I believe all the parquet files go in the same folder (Prefix in AWS S3 Terms) verses folders per partition when using "partition by".  And there is this note on S3 performa...

  • 331 Views
  • 2 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Even though liquid clustering removes Hive-style partition folders, it typically doesn’t cause S3 prefix performance issues on Databricks. Delta tables don’t rely on directory listing for reads; they use the transaction log to locate exact files. In ...

  • 2 kudos
1 More Replies
EdemSeitkh
by New Contributor III
  • 8830 Views
  • 6 replies
  • 0 kudos

Resolved! Pass catalog/schema/table name as a parameter to sql task

Hi, i am trying to pass catalog name as a parameter into query for sql task, and it pastes it with single quotes, which results in error. Is there a way to pass raw value or other possible workarounds? query:INSERT INTO {{ catalog }}.pas.product_snap...

  • 8830 Views
  • 6 replies
  • 0 kudos
Latest Reply
detom
New Contributor
  • 0 kudos

This works USE CATALOG IDENTIFIER({{ catalog_name }});USE SCHEMA IDENTIFIER({{ schema_name }});

  • 0 kudos
5 More Replies
Gilad-Shai
by New Contributor
  • 239 Views
  • 12 replies
  • 11 kudos

Creating Serverless Cluster

Hi everyone,I am trying to create a cluster in Databricks Free Edition, but I keep getting the following error:"Cannot create serverless cluster, please try again later."I have attempted this on different days and at different times, but the issue pe...

  • 239 Views
  • 12 replies
  • 11 kudos
Latest Reply
Gilad-Shai
New Contributor
  • 11 kudos

Thank you all ( @Sanjeeb2024 , @Sanjeeb2024, @JAHNAVI , @Manoj12421 ), it works!It was not a DataBricks Free Edition as @Masood_Joukar  said.    

  • 11 kudos
11 More Replies
Sainath368
by Contributor
  • 331 Views
  • 4 replies
  • 2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 331 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 2 kudos
3 More Replies
halsgbs
by New Contributor
  • 99 Views
  • 3 replies
  • 2 kudos

Alerts V2 Parameters

Hi, I'm working on using Databricks python SDK to create an alert using a notebook, but it seems with V1 there is no way to add subscribers and with V2 there is no option for adding parameters. Is my understanding correct or am I missing something? A...

  • 99 Views
  • 3 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Alerts V2 (Public Preview) do not support query parameters yet. This is a documented limitation. Legacy alerts (V1) do support parameters and will use the default values defined in the SQL editor. For notifications, both legacy alerts and Alerts V2 a...

  • 2 kudos
2 More Replies
lziolkow2
by New Contributor
  • 124 Views
  • 4 replies
  • 2 kudos

Resolved! Strange DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error

I use databricks 17.3 runtime.I try to run following code.CREATE OR REPLACE TABLE default.target_table (key1 INT,key2 INT,key3 INT,val STRING) USING DELTA;INSERT INTO target_table(key1, key2, key3, val) VALUES(1, 1, 1, 'a');CREATE OR REPLACE TABLE de...

  • 124 Views
  • 4 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, you need to put all of the keys in the ON part of the clause rather then in the where condition. This code works: MERGE INTO target_table AS target USING source_table AS source ON target.key1 = source.key1 AND target.key2 = source.key2 AND target...

  • 2 kudos
3 More Replies
erigaud
by Honored Contributor
  • 3939 Views
  • 11 replies
  • 9 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

  • 3939 Views
  • 11 replies
  • 9 kudos
Latest Reply
protmaks
New Contributor III
  • 9 kudos

In the new version v0.281.0, catalog and schema parameterization for the Databricks Dashboard finally works. I tested it and wrote examples - https://medium.com/@protmaks/dynamic-catalog-schema-in-databricks-dashboards-b7eea62270c6

  • 9 kudos
10 More Replies
ganesh_raskar
by New Contributor II
  • 159 Views
  • 5 replies
  • 0 kudos

Installing Custom Packages on Serverless Compute via Databricks Connect

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.Works: Traditional Compute ClusterCustom package pre-i...

Data Engineering
data-engineering
databricks-connect
  • 159 Views
  • 5 replies
  • 0 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 0 kudos

Hi @ganesh_raskar - If you can provide which custom package and exact code and error, I can try to replicate at my end and explore the suitable option. 

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 21284 Views
  • 9 replies
  • 17 kudos

Resolved! MetadataChangedException

A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...

  • 21284 Views
  • 9 replies
  • 17 kudos
Latest Reply
lprevost
Contributor III
  • 17 kudos

I'm also having the same problem.   I'm using autoloader to load many files into a delta table with an identity column.  What used to work now dies with this problem -- after running for a long time!!

  • 17 kudos
8 More Replies
siva_pusarla
by New Contributor II
  • 287 Views
  • 6 replies
  • 0 kudos

workspace notebook path not recognized by dbutils.notebook.run() when running from a workflow/job

result = dbutils.notebooks.run("/Workspace/YourFolder/NotebookA", timeout_seconds=600, arguments={"param1": "value1"}) print(result)I was able to execute the above code manually from a notebook.But when i run the same notebook as a job, it fails stat...

  • 287 Views
  • 6 replies
  • 0 kudos
Latest Reply
siva-anantha
Contributor
  • 0 kudos

@siva_pusarla: We use the following pattern and it works,1) Calling notebook - constant location used by Job.            + src/framework                   + notebook_executor.py2) Callee notebooks - dynamic            + src/app/notebooks             ...

  • 0 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels