cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Shreyash_Gupta
by New Contributor
  • 11 Views
  • 1 replies
  • 0 kudos

How do Databricks notebooks differ from traditional Jupyter notebooks

Can someone please explain the key difference between a Databricks notebook and a Jupyter notebook.

  • 11 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The key differences between a Databricks notebook and a Jupyter notebook are as follows: Integration and Collaboration: Databricks Notebooks: These are integrated within the Databricks platform, providing a unified experience for data science and ma...

  • 0 kudos
Harsha777
by New Contributor III
  • 77 Views
  • 5 replies
  • 1 kudos

Resolved! Does column masking work with job clusters

Hi,We are trying to experiment with the column masking feature.Here is our use case:We have added a masking function to one of the columns of a tablethe table is part of a notebook with some transformation logicthe notebook is executed as part of a w...

Harsha777_0-1732696132629.png Harsha777_1-1732696804007.png
  • 77 Views
  • 5 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Hello, the shared cluster on a job will act the same as in an all purpose cluster, basically means that the cluster will be available for any user with permissions to it, in a job there will not be much actions to be done but when an action you are r...

  • 1 kudos
4 More Replies
MarkD
by New Contributor II
  • 3234 Views
  • 9 replies
  • 0 kudos

SET configuration in SQL DLT pipeline does not work

Hi,I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.SET startDate='2020-01-01'; CREATE OR REFRESH LIVE TABLE filtered AS SELECT * FROM my_table WHERE created_at > ${startDate};It is g...

Data Engineering
Delta Live Tables
dlt
sql
  • 3234 Views
  • 9 replies
  • 0 kudos
Latest Reply
smit_tw
New Contributor
  • 0 kudos

@anardinelli Can you please help with a solution? I am am having issue with setting a variable in delta live table pipeline and use it with APPLY CHANGES INTO syntax. 

  • 0 kudos
8 More Replies
jeremy98
by New Contributor II
  • 40 Views
  • 1 replies
  • 0 kudos

Resolved! cloning the data between two catalogs

Hello community,I was writing this piece of code to do the data migration between two catalogs:  # Read data and partitioning print(f"Loading {table_name} from production catalog...") prod_df_table_name = f"prod_catalog.`00_bro...

  • 40 Views
  • 1 replies
  • 0 kudos
Latest Reply
jeremy98
New Contributor II
  • 0 kudos

FYI,I did it increasing the size of the cluster using much cores and directly written:prod_df_table.write \.format("delta") \.mode("overwrite") \.saveAsTable(stg_df_table_name)

  • 0 kudos
anantkharat
by Visitor
  • 34 Views
  • 0 replies
  • 0 kudos

Getting

payload = {"clusters": [{"num_workers": 4}],"pipeline_id": pipeline_id}update_url = f"{workspace_url}/api/2.0/pipelines/{pipeline_id}"response = requests.put(update_url, headers=headers, json=payload)for this, i'm getting below output with status cod...

  • 34 Views
  • 0 replies
  • 0 kudos
jeremy98
by New Contributor II
  • 143 Views
  • 6 replies
  • 1 kudos

Resolved! use include property specified for a particular workspace using DABs

Hello, community,Is there a field in the YAML file used with DABs to specify files based on the workspace in use? For example, if I want to deploy notebooks and workflows for staging, they need to be a set of resources that differ from those in produ...

  • 143 Views
  • 6 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Yes, you can specify different sets of resources for different environments (such as staging and production) in the YAML file used with Databricks Asset Bundles (DABs). This is achieved using the targets mapping in the databricks.yml file.https://doc...

  • 1 kudos
5 More Replies
ashap551
by New Contributor II
  • 39 Views
  • 1 replies
  • 0 kudos

JDBC Connection to NetSuite SuiteAnalytics Using Token-Based-Authentication (TBA)

I'm trying to connect to NetSuite2.com using Pyspark from a Databricks Notebook utilizing a JDBC driver.I was successful in setting up my DBVisualizer connection by installing the JDBC Driver (JAR) and generating the password with the one-time hashin...

  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
alicerichard65
  • 0 kudos

It seems like the issue might be related to the password generation or the JDBC URL configuration. Here are a few things you can check: NextCareurgentcare1. Password Generation: Ensure that the generate_tba_password function is correctly implemented ...

  • 0 kudos
AlexSantiago
by New Contributor II
  • 2194 Views
  • 9 replies
  • 2 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 2194 Views
  • 9 replies
  • 2 kudos
Latest Reply
mcveyroosevelt
New Contributor
  • 2 kudos

I have another opinion too. The error occurs because raw input or interactive prompts are not supported in certain environments like Databricks. To resolve this, replace interactive authentication with a programmatic approach. For example, use Spotif...

  • 2 kudos
8 More Replies
NhanNguyen
by Contributor II
  • 177 Views
  • 5 replies
  • 1 kudos

ConcurrentAppendException After Delta Table was enable Liquid Clustering and Row level concurrency

Everytime I run parallel job it always failed with this error: ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.I did a lot of reseaches also create liquid clustering table an...

  • 177 Views
  • 5 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

What do the other merge commands look like?

  • 1 kudos
4 More Replies
shane_t
by New Contributor II
  • 5186 Views
  • 2 replies
  • 3 kudos

Unity Catalog + Medallion Architecture

I am looking for a reference architecture or an example on how to organize unity catalog while adhering to the medallion architecture.What are some common naming conventions and methods?How to you isolate environments (dev/prod)?I was thinking of som...

  • 5186 Views
  • 2 replies
  • 3 kudos
Latest Reply
ssharma
New Contributor
  • 3 kudos

Hi Shane,Could you share what you ended up doing in your scenario. I have similar requirements and would like to understand how you implemented yours Saurabh

  • 3 kudos
1 More Replies
ismaelhenzel
by New Contributor III
  • 79 Views
  • 1 replies
  • 0 kudos

Delta live tables - foreign keys

I'm creating ingestions using delta live tables, the dlt support the use of schema, with constraints like foreign keys. The problem is: how can i create foreign keys between the same pipeline, that has no read/write relation, but has foreign key rela...

  • 79 Views
  • 1 replies
  • 0 kudos
Latest Reply
merry867
New Contributor
  • 0 kudos

Hello,Thanks for this post.Best Regardsmerry867

  • 0 kudos
dixonantony
by New Contributor II
  • 22 Views
  • 0 replies
  • 0 kudos

Not able create table form external spark

py4j.protocol.Py4JJavaError: An error occurred while calling o123.sql.: io.unitycatalog.client.ApiException: generateTemporaryPathCredentials call failed with: 401 - {"error_code":"UNAUTHENTICATED","message":"Request to generate access credential for...

  • 22 Views
  • 0 replies
  • 0 kudos
Eeg
by New Contributor III
  • 122 Views
  • 2 replies
  • 0 kudos

Resolved! querying snowflake database using databricks query federation: no active warehouse

Hello Databricks community,I'm confused right now because I was able to query snowflake table using query federation 2 days ago. But now it's giving me error about no active warehouse:Status of query associated with resultSet is FAILED_WITH_ERROR. No...

  • 122 Views
  • 2 replies
  • 0 kudos
Latest Reply
Eeg
New Contributor III
  • 0 kudos

Hello@Alberto_Umana Thank you very much for you response.I was able to solve it on my side. The issue was lying on snowflake side. I realized I had to not only grant USAGE permission but also OPERATE permission to my snowflake account. Also added sfR...

  • 0 kudos
1 More Replies
Frustrated_DE
by New Contributor III
  • 171 Views
  • 4 replies
  • 2 kudos

Data comparison

Hi,   Are there any tools within Databricks for large volume data comparisons, I appreciate there's methods for dataframe comparisons for unit testing (assertDataFrameEqual) but it is my understanding these are for testing transformations on smallish...

  • 171 Views
  • 4 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

Borrowed from LinkedIn, here is a SQL query you can use to compare two tables (or dataframes) with hash_src as ( select hash(*) as hash_val from my.source.table ), hash_tgt as ( select hash(*) as hash_val from my.target.table ) select sum(hash_val) ...

  • 2 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels