cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianRRL
by Valued Contributor II
  • 1557 Views
  • 7 replies
  • 3 kudos

DLT Potential Bug: File Reprocessing Issue with "cloudFiles.allowOverwrites": "true"

Hi there, I ran into a peculiar case and I'm wondering if anyone else has run into this and can offer an explanation. We have a DLT process to pull CSV files from a landing location and insert (append) them into target tables. We have the setting "cl...

  • 1557 Views
  • 7 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Apologies, that could be the internet or networking issue. So, in DLT you will be able to change the DBR but will have to use custom image, it may be tricky if you have not done it earlier.  By default, photon will be used in serverelss. It may be a ...

  • 3 kudos
6 More Replies
FabianGutierrez
by Contributor
  • 2528 Views
  • 3 replies
  • 1 kudos

Issue with DAB (Databricks Asset Bundle) requesting Terraform files

Hi community,Since recently (2 days ago) we have been receiving the following error when validating and deploying our DAB (Databricks Asset Bundle):"Error: error downloading Terraform: Get "https://releases.hashicorp.com/terraform/1.5.5/index.json": ...

  • 2528 Views
  • 3 replies
  • 1 kudos
Latest Reply
FabianGutierrez
Contributor
  • 1 kudos

Some update, we cannot get the FW cleared on time so we need to go for the offline optiion, that is download everything form Terraform and DB templated but it is not as clear or intuitive as describe. Using their Container unfortunately not a option ...

  • 1 kudos
2 More Replies
pjv
by New Contributor III
  • 1339 Views
  • 1 replies
  • 0 kudos

How to ensure pyspark udf execution is distributed across worker nodes

Hi,I have the following databricks notebook code defined: pyspark_dataframe = create_pyspark_dataframe(some input data)MyUDF = udf(myfunc, StringType())pyspark_dataframe = pyspark_dataframe.withColumn('UDFOutput', DownloadUDF(input data columns))outp...

  • 1339 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@pjv Can you please try the following, you'll basically want to have more than a single partition: from pyspark.sql import SparkSession from pyspark.sql.functions import udf from pyspark.sql.types import StringType # Initialize Spark session (if not...

  • 0 kudos
Vasu_Kumar_T
by New Contributor II
  • 394 Views
  • 1 replies
  • 0 kudos

Larger than Max error :

Hi,We are trying to pass the keys to decrypt a file and receiving the above error as in attached.Please help in case we need to change and configuration or set any options to avoid this error. Thanks. Vasu 

VasuKumarT_0-1728473121954.png
  • 394 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@Vasu_Kumar_T can you provide some more details or context? Feel free to replace sensitive data. Where are you getting this? How are you passing the keys to decrypt a file? Is there a move comprehensive stacktrace apart from this message in the image...

  • 0 kudos
sangram11
by New Contributor
  • 1056 Views
  • 4 replies
  • 0 kudos

Myths about vacuum command

I identified some myths while working with vacuum command spark 3.5.x.1. vacuum command is not working with days. Instead it's retain clause is asking explicitly to supply values in hours. I tried many times, and it is throwing parse syntax error (wh...

sangram11_0-1730255825227.png sangram11_1-1730256066071.png
  • 1056 Views
  • 4 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for reporting this Sangram. Are these youtube and educational contents in the Databricks channel? > set delta.databricks.delta.retentionDurationCheck.enabled = false. It works if I want to delete obsolete files whose lifespan is less than defa...

  • 0 kudos
3 More Replies
kidexp
by New Contributor II
  • 25979 Views
  • 7 replies
  • 2 kudos

Resolved! How to install python package on spark cluster

Hi, How can I install python packages on spark cluster? in local, I can use pip install. I want to use some external packages which is not installed on was spark cluster. Thanks for any suggestions.

  • 25979 Views
  • 7 replies
  • 2 kudos
Latest Reply
Mikejerere
New Contributor II
  • 2 kudos

If --py-files doesn’t work, try this shorter method:Create a Conda Environment: Install your packages.conda create -n myenv python=3.xconda activate myenvpip install your-packagePackage and Submit: Use conda-pack and spark-submit with --archives.cond...

  • 2 kudos
6 More Replies
Akshay_127877
by New Contributor II
  • 47966 Views
  • 8 replies
  • 1 kudos

How to open Streamlit URL that is hosted by Databricks in local web browser?

I have run this webapp code on Databricks notebook. It works properly without any errors. With databricks acting as server, I am unable open this link on my browser for this webapp.But when I run the code on my local IDE, I am able to just open the U...

image
  • 47966 Views
  • 8 replies
  • 1 kudos
Latest Reply
navallyemul
New Contributor III
  • 1 kudos

@Akshay_127877 : Were you able to resolve this issue?

  • 1 kudos
7 More Replies
IoannaV
by New Contributor
  • 1109 Views
  • 1 replies
  • 0 kudos

Issue with Uploading Oracle Driver in Azure Databricks Cluster

Hi, Could you please help me with the following ?I am facing the bellow issue when I try to upload a jar file in the Azure Databricks Libraries.Only Wheel and requirements file from /Workspace are allowed on Assigned UC cluster. Denied library is Jar...

  • 1109 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hey, This is by design. I understand the jobs are failing when run on a UC Single user cluster since it is unable to install a Jar package located in the /Workspace path. This is however a known behaviour and is already documented below: https://docs...

  • 0 kudos
lprevost
by Contributor II
  • 769 Views
  • 1 replies
  • 0 kudos

Using Autoloader in DLT: ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP]

I've been using Autloloader in a DLT pipeline loading data from an s3 location to my hive_metastore shared with AWS glue.I'm now trying to migrate this over to Unity Catalog to take advantage of liquid clustering and data quality.However, I'm getting...

  • 769 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

https://kb.databricks.com/unity-catalog/invalid_parameter_valuelocation_overlap-overlaps-with-managed-storage-error 

  • 0 kudos
nagendrapruthvi
by New Contributor
  • 772 Views
  • 2 replies
  • 0 kudos

Cannot login to databricks using SSO

 Hi, I created accounts with Databricks for both production and staging environments at my company, but I made a mistake with the case of the email addresses. For production, I used Xyz@company.com, and for staging, I used xyz@company.com.Now that my...

  • 772 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Okay, so I checked some documents - The email addresses will also be case-insensitive, the same behavior as in AWS, Azure and GCP. This means that email addresses will be stored in lowercase in Databricks. So, the issue is not with case sensitivity b...

  • 0 kudos
1 More Replies
ElaPG1
by New Contributor
  • 526 Views
  • 1 replies
  • 0 kudos

all-purpose compute for Oracle queries

Hi,I am looking for any guidelines, best practices regarding compute configuration for extracting data from Oracle db and saving it as parquet files. Right now I have a DBR workflow with for each task, concurrency = 31 (as I need to copy the data fro...

  • 526 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @ElaPG1 , While the cluster sounds like a pretty good one with Autoscaling, it depends on the workload too. The Standard_D8s_v5 instances you are using have 32GB memory and 8 cores. While these are generally good, you might want to experiment with...

  • 0 kudos
Garrus990
by New Contributor II
  • 369 Views
  • 1 replies
  • 0 kudos

Passing UNIX-based parameter to a task

Hey,I would like to pass to a task a parameter that is based on a UNIX function. Concretely, I would like to specify dates - dynamically calculated with respect to the date of running my job. I wanted to it like that:["--period-start", "$(date -d '-7...

  • 369 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @Garrus990 , To pass a parameter to a task that is based on a UNIX function, you can use the Databricks Jobs API to dynamically calculate dates with respect to the date of running your job.  Use a Notebook to Calculate Dates: Create a notebook tha...

  • 0 kudos
mlopsuser
by New Contributor
  • 886 Views
  • 1 replies
  • 0 kudos

Databricks Asset Bundles and MLOps Structure for different model training -1 model per DAB or 1 DAB

I have two different datasets that will be used to train two separate regression models Each dataset has its own preprocessing steps, and the models will have independent training pipelines.What best practice approach for organizing Databricks Asset ...

  • 886 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @mlopsuser , For organizing Databricks Asset Bundles (DABs) in your scenario with two separate regression models and datasets, it is generally recommended to create one DAB per model and dataset. This approach aligns with best practices for modula...

  • 0 kudos
olivier-soucy
by Contributor
  • 2817 Views
  • 4 replies
  • 1 kudos

Resolved! Spark Streaming foreachBatch with Databricks connect

I'm trying to use the foreachBatch method of a Spark Streaming DataFrame with databricks-connect. Given that spark connect supported was added to  `foreachBatch` in 3.5.0, I was expecting this to work.Configuration:- DBR 15.4 (Spark 3.5.0)- databrick...

  • 2817 Views
  • 4 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@olivier-soucy Are you sure that you're using DBR 15.4 and databricks-connect 15.4.2?I've seen this issue when using databricks-connect 15.4.x with DBR 14.3LTS.Anyway, I've just tested that with the same versions you've provided and it works on my en...

  • 1 kudos
3 More Replies
SharathE
by New Contributor III
  • 1599 Views
  • 2 replies
  • 0 kudos

Incremental refresh of materialized view in serverless DLT

Hello, Every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE" . How can I achieve incremental refresh in serverless in DLT pipelines?

  • 1599 Views
  • 2 replies
  • 0 kudos
Latest Reply
drewipson
New Contributor III
  • 0 kudos

Make sure you are using the aggregates and SQL restrictions outlined in this article. https://docs.databricks.com/en/optimizations/incremental-refresh.htmlIf a SQL function is non-deterministic (current_timestamp() is a common one) you will have a CO...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels