cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nag_kanchan
by New Contributor III
  • 1260 Views
  • 0 replies
  • 0 kudos

Applying SCD in DLT using 3 different tables at source

My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).This a...

  • 1260 Views
  • 0 replies
  • 0 kudos
Magnus
by Contributor
  • 4764 Views
  • 1 replies
  • 1 kudos

FIELD_NOT_FOUND when selecting field not part of original schema

Hi,I'm implementing a DLT pipeline using Auto Loader to ingest json files. The json files contains an array called Items that contains records and two of the fields in the records wasn't part of the original schema, but has been added later. Auto Loa...

Data Engineering
Auto Loader
Delta Live Tables
  • 4764 Views
  • 1 replies
  • 1 kudos
choi_2
by New Contributor III
  • 46325 Views
  • 1 replies
  • 0 kudos

maintaining cluster and databases in Databricks Community Edition

I am using the Databricks Community Edition, but the cluster usage is limited to 2 hours and it automatically terminates. So I have to attach the cluster every time to run the notebook again. As I read other discussions, I learned it is not something...

Data Engineering
communityedition
  • 46325 Views
  • 1 replies
  • 0 kudos
Feather
by New Contributor III
  • 11996 Views
  • 12 replies
  • 9 kudos

Resolved! DLT pipeline MLFlow UDF error

I am running this notebook via the dlt pipeline in preview mode.everything works up until the predictions table that should be created with a registered model inferencing the gold table. This is the  error: com databricks spark safespark UDFException...

Feather_0-1699311273694.png Feather_1-1699311414386.png
  • 11996 Views
  • 12 replies
  • 9 kudos
Latest Reply
BarryC
New Contributor III
  • 9 kudos

Hi @Feather Have you also tried specifying the version of the library as well?

  • 9 kudos
11 More Replies
oosterhuisf
by New Contributor II
  • 2548 Views
  • 1 replies
  • 0 kudos

break production using a shallow clone

Hi,If you create a shallow clone using the latest LTS, and drop the clone using a SQL warehouse (either current or preview), the source table is broken beyond repair. Data reads and writes still work, but vacuum will remain forever broken. I've attac...

  • 2548 Views
  • 1 replies
  • 0 kudos
Latest Reply
oosterhuisf
New Contributor II
  • 0 kudos

To add to that: the manual does not state that this might happen

  • 0 kudos
icyflame92
by New Contributor II
  • 14681 Views
  • 2 replies
  • 1 kudos

Resolved! Access storage account with private endpoint

Hi, I need guidance on connecting Databricks (not VNET injected) to a storage account with Private Endpoint.We have a client who created Databricks with (public ip and not VNET Injected). It’s using a managed VNET in the Databricks managed resource g...

Data Engineering
ADLS
azure
  • 14681 Views
  • 2 replies
  • 1 kudos
Latest Reply
rudyevers
New Contributor III
  • 1 kudos

 No this is not possible because the workspace is not part of the virtual network and since than can not access the storage over it's private endpoint. It is all mentioned in de documentation:https://www.databricks.com/blog/2020/02/28/securely-access...

  • 1 kudos
1 More Replies
jx1226
by New Contributor III
  • 3239 Views
  • 0 replies
  • 0 kudos

Connect to storage with private endpoint from workspace EnableNoPublicIP=No and VnetInjection=No

We know that Databricks with VNET injection (our own VNET) allows is to connect to blob storage/ ADLS Gen2 over private endpoints and peering. This is what we typically do.We have a client who created Databricks with EnableNoPublicIP=No (secure clust...

  • 3239 Views
  • 0 replies
  • 0 kudos
grazie
by Contributor
  • 4380 Views
  • 2 replies
  • 0 kudos

Azure Databricks, migrating delta table data with CDF on.

We are on Azure Databricks over ADLS Gen2 and have a set of tables and workflows that process data from and between those tables, using change data feeds. (We are not yet using Unity Catalog, and also not Hive metastore, just accessing delta tables f...

  • 4380 Views
  • 2 replies
  • 0 kudos
Latest Reply
grazie
Contributor
  • 0 kudos

As it turns out, due to a misunderstanding, the responses from Azure support were answering a slightly different question (about Azure Table Storage instead of Delta Tables on Blob/ADLS Gen2), so we'll try there again. However, still interested in id...

  • 0 kudos
1 More Replies
hafeez
by Contributor
  • 3891 Views
  • 1 replies
  • 1 kudos

Resolved! Hive metastore table access control End of Support

Hello,We are using Databricks with Hive metastore and not Unity Catalog.We would like to know if there is any End of Support on Table Access Control with Hive as this link it states that it is legacy.https://docs.databricks.com/en/data-governance/tab...

  • 3891 Views
  • 1 replies
  • 1 kudos
Michael_Galli
by Contributor III
  • 1468 Views
  • 0 replies
  • 0 kudos

Many dbutils.notebook.run interations in a workflow -> Failed to checkout Github repository Error

Hi all,I have a workflow that runs one single notebook with dbutils.notebook.run() and different parameters in one long loop.At some point, I do have random git erros in the notebook run:com.databricks.WorkflowException: com.databricks.NotebookExecut...

  • 1468 Views
  • 0 replies
  • 0 kudos
IonFreeman_Pace
by New Contributor III
  • 6054 Views
  • 4 replies
  • 1 kudos

Resolved! First notebook in ML course fails with wrong runtime

Help! I'm trying to run this first notebook in the Scalable MachIne LEarning (SMILE) course.https://github.com/databricks-academy/scalable-machine-learning-with-apache-spark-english/blob/published/ML%2000a%20-%20Spark%20Review.pyIt fails on the first...

  • 6054 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

it means your cluster type has to be a ML runtime.When you create a cluster in databricks, you can choose between different runtimes.These have different version (spark version), but also different types:For your case you need to select the ML menu o...

  • 1 kudos
3 More Replies
Hoping
by New Contributor
  • 3271 Views
  • 0 replies
  • 0 kudos

Size of each partitioned file (partitioned by default)

When I try a describe detail I get the number of files the delta table is partitioned into. How can I check the size of each file of these files that make up my entire table ?Will I be able to query each partitioned file to understand how they have b...

  • 3271 Views
  • 0 replies
  • 0 kudos
eric-cordeiro
by New Contributor II
  • 2072 Views
  • 0 replies
  • 0 kudos

Insufficient Permission when writing to AWS Redshift

I'm trying to write a table in AWS Redshift using the following code:try:    (df_source.write        .format("redshift")        .option("dbtable", f"{redshift_schema}.{table_name}")        .option("tempdir", tempdir)        .option("url", url)       ...

  • 2072 Views
  • 0 replies
  • 0 kudos
pgruetter
by Contributor
  • 2446 Views
  • 1 replies
  • 0 kudos

Streaming problems after Vaccum

Hi allTo read from a large Delta table, I'm using readStream but with a trigger(availableNow=True) as I only want to run it daily. This worked well for an intial load and then incremental loads after that.At some point though, I received an error fro...

  • 2446 Views
  • 1 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels