cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nag_kanchan
by New Contributor III
  • 400 Views
  • 0 replies
  • 0 kudos

Applying SCD in DLT using 3 different tables at source

My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).This a...

  • 400 Views
  • 0 replies
  • 0 kudos
MichaelO
by New Contributor III
  • 483 Views
  • 1 replies
  • 0 kudos

gateway.create route for open source models

Am I able to use gateway.create_route in mlflow for open source LLM models?I'm aware of the syntax for propietary models like for openAI: from mlflow import gateway gateway.create_route( name=OpenAI_embeddings_route_name...

Data Engineering
llm
mlflow
  • 483 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @MichaelO, Certainly! The MLflow AI Gateway provides a way to manage and deploy models, including both proprietary and open source ones.  Let’s explore how you can create a route for an open source model using the MLflow AI Gateway. What is the ML...

  • 0 kudos
Magnus
by Contributor
  • 1398 Views
  • 2 replies
  • 2 kudos

Resolved! FIELD_NOT_FOUND when selecting field not part of original schema

Hi,I'm implementing a DLT pipeline using Auto Loader to ingest json files. The json files contains an array called Items that contains records and two of the fields in the records wasn't part of the original schema, but has been added later. Auto Loa...

Data Engineering
Auto Loader
Delta Live Tables
  • 1398 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Magnus , It seems you’re encountering an issue with schema evolution in your DLT pipeline using Auto Loader.    Let’s explore how you can improve your notebook implementation.   Schema Inference and Evolution: Auto Loader can automatically detect...

  • 2 kudos
1 More Replies
coltonflowers
by New Contributor III
  • 1373 Views
  • 1 replies
  • 1 kudos

Resolved! MLFlow Spark UDF Error

After trying to run spark_udf = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model,env_manager="virtualenv")We get the following error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 4 times, most re...

  • 1373 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @coltonflowers , The error you’re encountering seems to be related to a connection issue. Let’s explore some potential solutions: Check Network Connectivity: Ensure that the machine running your Spark job has proper network connectivity. Veri...

  • 1 kudos
Phani1
by Valued Contributor
  • 414 Views
  • 1 replies
  • 0 kudos

Unity catalog accounts

Hi Team,We have the requirement to have metadata(Unity catalog) in one AWS account and data storage(Delta tables under data) in another account, is it possible to do that , Do we face any technical/Security issue??

  • 414 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Let’s address your requirement regarding Unity Catalog metadata and Delta tables storage in separate AWS accounts.   Unity Catalog Accounts: Unity Catalog (UC) is a fine-grained governance solution for data and AI on the Databricks Lakeho...

  • 0 kudos
Venu_DE1
by New Contributor
  • 650 Views
  • 1 replies
  • 0 kudos

Issue with merge command between streaming dataframe and delta table

Hi,We are trying to build and upsert logic for a Delta table for that we are writing a merge command between streaming dataframe and delta table dataframe. Please find the below code    merge_sql = f"""        Merge command come here"""spark.sql(merg...

  • 650 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Venu_DE1, The error message you’re encountering indicates that you’re trying to execute a query with streaming sources, but you’re missing the necessary .start() method for your streaming DataFrame.   Let’s address this issue step by step:   Stre...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 775 Views
  • 1 replies
  • 0 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 775 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @William_Scardua, Certainly! Data quality is a critical aspect in any organization, ensuring that data is accurate, consistent, and reliable.   Here are some key components of a robust data quality framework:   Data Governance: Establish policies,...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 2576 Views
  • 1 replies
  • 0 kudos

Pyspark or Scala ?

Hi guys,Many people use pyspark to develop their pipelines, in your opinion in which cases is it better to use one or the other? Or is it better to choose a single language?Thanks

  • 2576 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

PySpark and Scala are both powerful tools for data processing and pipeline development in the big data ecosystem.  Let’s explore their strengths and use cases: PySpark:Python API for Spark: PySpark allows you to harness the simplicity of Python while...

  • 0 kudos
choi_2
by New Contributor II
  • 10574 Views
  • 2 replies
  • 0 kudos

Resolved! maintaining cluster and databases in Databricks Community Edition

I am using the Databricks Community Edition, but the cluster usage is limited to 2 hours and it automatically terminates. So I have to attach the cluster every time to run the notebook again. As I read other discussions, I learned it is not something...

Data Engineering
communityedition
  • 10574 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @choi_2 ,  I understand the challenges you’re facing with Databricks Community Edition (CE) and the limitations it imposes on cluster usage. While CE provides a micro-cluster and a notebook environment, it does have some restrictions.    Let’s add...

  • 0 kudos
1 More Replies
Feather
by New Contributor III
  • 3930 Views
  • 14 replies
  • 9 kudos

Resolved! DLT pipeline MLFlow UDF error

I am running this notebook via the dlt pipeline in preview mode.everything works up until the predictions table that should be created with a registered model inferencing the gold table. This is the  error: com databricks spark safespark UDFException...

Feather_0-1699311273694.png Feather_1-1699311414386.png
  • 3930 Views
  • 14 replies
  • 9 kudos
Latest Reply
BarryC
New Contributor III
  • 9 kudos

Hi @Feather Have you also tried specifying the version of the library as well?

  • 9 kudos
13 More Replies
icyflame92
by New Contributor II
  • 1941 Views
  • 4 replies
  • 3 kudos

Resolved! Access storage account with private endpoint

Hi, I need guidance on connecting Databricks (not VNET injected) to a storage account with Private Endpoint.We have a client who created Databricks with (public ip and not VNET Injected). It’s using a managed VNET in the Databricks managed resource g...

Data Engineering
ADLS
azure
  • 1941 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 3 kudos
3 More Replies
jx1226
by New Contributor
  • 1404 Views
  • 1 replies
  • 0 kudos

Connect to storage with private endpoint from workspace EnableNoPublicIP=No and VnetInjection=No

We know that Databricks with VNET injection (our own VNET) allows is to connect to blob storage/ ADLS Gen2 over private endpoints and peering. This is what we typically do.We have a client who created Databricks with EnableNoPublicIP=No (secure clust...

  • 1404 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @jx1226 , Certainly! Let’s break down your requirements and explore the options for connecting your Databricks workspace to blob storage and ADLS Gen2 using private endpoints. Workspace Configuration: Your client’s Databricks workspace is set ...

  • 0 kudos
deng_dev
by New Contributor III
  • 1806 Views
  • 1 replies
  • 0 kudos

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

Hi!We are creating table in streaming job every micro-batch using spark.sql('create or replace table ... using delta as ...') command. This query includes combining data from multiple tables.Sometimes our job fails with error:py4j.Py4JException: An e...

  • 1806 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @deng_dev , The error message you’re encountering, java.util.NoSuchElementException: key not found: Filter (isnotnull(uuid#42326735) AND isnotnull(actor_uuid#42326740)), indicates that there’s an issue with the query execution.   Let’s address thi...

  • 0 kudos
oosterhuisf
by New Contributor II
  • 678 Views
  • 2 replies
  • 0 kudos

break production using a shallow clone

Hi,If you create a shallow clone using the latest LTS, and drop the clone using a SQL warehouse (either current or preview), the source table is broken beyond repair. Data reads and writes still work, but vacuum will remain forever broken. I've attac...

  • 678 Views
  • 2 replies
  • 0 kudos
Latest Reply
oosterhuisf
New Contributor II
  • 0 kudos

To add to that: the manual does not state that this might happen

  • 0 kudos
1 More Replies
Michael_Galli
by Contributor II
  • 514 Views
  • 1 replies
  • 1 kudos

Resolved! Many dbutils.notebook.run interations in a workflow -> Failed to checkout Github repository Error

Hi all,I have a workflow that runs one single notebook with dbutils.notebook.run() and different parameters in one long loop.At some point, I do have random git erros in the notebook run:com.databricks.WorkflowException: com.databricks.NotebookExecut...

  • 514 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Michael_Galli, It appears that you’re encountering GitHub-related issues during your notebook runs in Databricks.    Let’s address this step by step:   GitHub API Limit: Databricks enforces rate limits for all REST API calls, including those rela...

  • 1 kudos
Labels
Top Kudoed Authors