cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dm7
by New Contributor II
  • 3443 Views
  • 1 replies
  • 0 kudos

Unit Testing DLT Pipelines

Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.We want to know how we can unit test the PySpark logic/transformations independently without having to s...

  • 3443 Views
  • 1 replies
  • 0 kudos
WWoman
by Contributor
  • 2719 Views
  • 2 replies
  • 0 kudos

Resolved! Persisting query history data

Hello,I am looking for a way to persist query history data. I have not have direct access to the system tables. I do have access to a query_history view created by selecting from the system.query.history and system.access.audit system tables. I want ...

  • 2719 Views
  • 2 replies
  • 0 kudos
Latest Reply
syed_sr7
New Contributor II
  • 0 kudos

Is any system table there for query history?

  • 0 kudos
1 More Replies
CarstenWeber
by New Contributor III
  • 12507 Views
  • 9 replies
  • 3 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 12507 Views
  • 9 replies
  • 3 kudos
Latest Reply
chhavibansal
New Contributor III
  • 3 kudos

@daniel_sahal any possible reason you know of why it works in OSS spark while it does not work in databricks notebook ? Why is there a disparity.

  • 3 kudos
8 More Replies
camilo_s
by Contributor
  • 729 Views
  • 0 replies
  • 1 kudos

Parametrizing query for DEEP CLONE

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to p...

  • 729 Views
  • 0 replies
  • 1 kudos
greyfine
by New Contributor II
  • 13344 Views
  • 5 replies
  • 5 kudos

Hi Everyone , I was wondering if it is possible to have alerts set up on query level for pyspark notebooks that are run on schedule in databricks so if we have some expected result from it we can receive a mail alert ?

In Above you can see we have 3 workspaces - we have the alert option available in the sql workspace but not in our data science and engineering space , anyway we can incorporate this in our DS and Engineering space ?

image.png
  • 13344 Views
  • 5 replies
  • 5 kudos
Latest Reply
JKR
Contributor
  • 5 kudos

How can I receive call on teams/number/slack if any jobs fails?

  • 5 kudos
4 More Replies
Aidzillafont
by New Contributor II
  • 1455 Views
  • 1 replies
  • 0 kudos

How to pick the right cluster for your workflow

Hi All,I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset t...

  • 1455 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ravivarma
Databricks Employee
  • 0 kudos

Hello @Aidzillafont , Greetings! Please find below the document which explains the Compute configuration best practices Doc: https://docs.databricks.com/en/compute/cluster-config-best-practices.html I hope this helps you! Regards, Ravi

  • 0 kudos
Sadam97
by New Contributor III
  • 681 Views
  • 0 replies
  • 0 kudos

Databricks (GCP) Cluster not resolving Hostname into IP address

we have #mongodb hosts that must be resolved to private internal loadbalancer ips ( of another cluster ), and that we are unable to add host aliases in the Databricks GKE cluster in order for the spark to be able to connect to a mongodb and resolve t...

  • 681 Views
  • 0 replies
  • 0 kudos
feliximmanuel
by New Contributor II
  • 1210 Views
  • 0 replies
  • 1 kudos

Error: oidc: fetch .well-known: Get "https://%E2%80%93host/oidc/.well-known/oauth-authorization-serv

I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...

  • 1210 Views
  • 0 replies
  • 1 kudos
Sudheer_DB
by New Contributor II
  • 1023 Views
  • 3 replies
  • 0 kudos

DLT SQL schema definition

Hi All,While defining a schema in creating a table using Autoloader and DLT using SQL, I am getting schema mismatch error between the defined schema and inferred schema. CREATE OR REFRESH STREAMING TABLE csv_test(a0 STRING,a1 STRING,a2 STRING,a3 STRI...

Sudheer_DB_0-1719375711422.png
  • 1023 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Sudheer_DB You can specify your own _rescued_data column name by setting up rescuedDataColumn option.https://docs.databricks.com/en/ingestion/auto-loader/schema.html#what-is-the-rescued-data-column

  • 0 kudos
2 More Replies
hr959
by New Contributor II
  • 2217 Views
  • 1 replies
  • 0 kudos

Access Control/Management Question

I have two workspaces made with the same account using same metastore and region, and I want the second workspace to be able to access only certain rows of tables from data held in the first workspace based on a user group condition. Is this possible...

  • 2217 Views
  • 1 replies
  • 0 kudos
Latest Reply
hr959
New Contributor II
  • 0 kudos

Sorry, forgot to mention! When I tried delta sharing, all my workspaces have the same sharing identifier so the data never actually showed up in the "shared with me", and then I wasn't able to access the data I shared. It was in "shared by me" in bot...

  • 0 kudos
pm71
by New Contributor II
  • 2118 Views
  • 4 replies
  • 3 kudos

Issue with os and sys Operations in Repo Path on Databricks

Hi,Starting from today, I have encountered an issue when performing operations using the os and sys modules within the Repo path in my Databricks environment. Specifically, any operation that involves these modules results in a timeout error. However...

  • 2118 Views
  • 4 replies
  • 3 kudos
Latest Reply
mgradowski
New Contributor III
  • 3 kudos

https://status.azuredatabricks.net/pages/incident/5d49ec10226b9e13cb6a422e/667c08fa17fef71767abda04"Degraded performance" is a pretty mild way of saying almost nothing productve can be done ATM...

  • 3 kudos
3 More Replies
hfyhn
by New Contributor
  • 879 Views
  • 0 replies
  • 0 kudos

DLT, combine LIVE table with data masking and row filter

I need to apply data masking and row filters to my table. At the same time I would like to use DLT Live tables. However, as far as I can see, DLT Live tables are not compatble with Live tables. What are my options? Move the tables from out of the mat...

  • 879 Views
  • 0 replies
  • 0 kudos
Hertz
by New Contributor II
  • 1386 Views
  • 1 replies
  • 0 kudos

System Tables / Audit Logs action_name createWarehouse/createEndpoint

I am creating a cost dashboard across multiple accounts. I am working get sql warehouse names and warehouse ids so I can combine with system.access.billing on warehouse_id.  But the only action_names that include both the warehouse_id and warehouse_n...

Data Engineering
Audit Logs
cost monitor
createEndpoint
createWarehouse
  • 1386 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hertz
New Contributor II
  • 0 kudos

I just wanted to circle back to this. It appears that the ID is returned in the response column of the create action_name.

  • 0 kudos
HASSAN_UPPAL123
by New Contributor II
  • 1578 Views
  • 1 replies
  • 0 kudos

SPARK_GEN_SUBQ_0 WHERE 1=0, Error message from Server: Configuration schema is not available

Hi Community,I'm trying to read the data from sample schema from table nation from data-bricks catalog via spark but i'm getting this error.com.databricks.client.support.exceptions.GeneralException: [Databricks][JDBCDriver](500051) ERROR processing q...

Data Engineering
pyspark
python
  • 1578 Views
  • 1 replies
  • 0 kudos
Latest Reply
HASSAN_UPPAL123
New Contributor II
  • 0 kudos

Hi Community,I'm still facing the issue can someone please provide me any solution how to fix above error.

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels