I have a DLT pipeline running in continous mode. I have a stream to stream join which runs for the first 5hrs but then fails with a Null Pointer Exception. I need assistance to know what I need to do to handle this. my code is structured as below:@dl...
We are using a compute for an Interactive Cluster in Production which incurs X amount of cost. We want to know what are the options available to use with near about the same processing power as the current compute but incur a cost of Y, which is less...
Hello @Ikanip ,
You can utilize the Databricks Pricing Calculator to estimate costs.
For detailed information on compute capacity, please refer to your cloud provider's documentation regarding Virtual Machine instance types.
Hello Databricks Community,I am currently working on creating a Terraform script to provision clusters in Databricks. However, I've noticed that by default, the clusters created using Terraform have the policy set to "Unrestricted."I would like to co...
What happens to a currently running job when a workspace is deployed again using Terraform? Are the jobs paused/resumed, or are they left unaffected without any down time? Searching for this specific scenario doesn't seem to come up with anything and...
Hello everyone,There isn't an official document outlining the step-by-step procedure for enabling Unity Catalog in Azure Databricks.If anyone has created documentation or knows the process, please share it here.Thank you in advance.
Hi,Are there any plans to build native slack integration? I'm envisioning a one-time connector to Slack that would automatically populate all channels and users to select to use for example when configuring an alert notification. It is does not seem ...
Cluster libraries are supported from version 15.0 - Databricks Runtime 15.0 | Databricks on AWS.How can I specify requirements.txt file path in the libraries in a job cluster in my workflow? Can I use relative path? Is it relative from the root of th...
To specify the requirements.txt file path for libraries in a job cluster workflow in Databricks, you have a few options.
Let’s break it down:
Upload the requirements.txt File:
First, upload your requirements.txt file to your Databricks workspace....
In my spark application, I am using set of python libraries. I am submitting spark application as Jar Task. But I am not able to find any option provide Archive Files.So, in order to handle python dependencies, I am using approach:Create archive file...
Hi @Abhay_1002,
Using --py-files Argument: When submitting a Spark application, you can use the --py-files argument to add Python files (including .zip or .egg archives) to be distributed with your application1. However, this approach is typical...
Issues with UTF-8 in DLTI am having issues with UTF-8 in DLT:I have tried to set the spark config on the cluster running the DLT pipeline: I have fixed this with normal compute under advanced settings like this:spark.conf.set("spark.driver.extraJava...
Hi @Kaniz! Sorry for a long wait...The problem is not the columns or the data itself, the UTF-8 option for csv is working fine. The issue is with table_names not being compatible it seems. If I run the query through Auto Loader outside DLT and use ba...
I'm seeking advice regarding Databricks bundles. In my scenario, I have multiple production environments where I aim to execute the same DLT. To simplify, let's assume the DLT reads data from 'eventhub-region-name,' with this being the only differing...
Hi @mderela,
When dealing with Databricks bundles in a multi-environment setup, there are some best practices you can follow to ensure smooth execution and maintainable code.
Let’s explore a couple of recommendations:
Parameterization and Configu...
Hello:)as part of deploying an app that previously ran directly on emr to databricks, we are running experiments using LTS 9.1, and getting the following error: PythonException: An exception was thrown from a UDF: 'pyspark.serializers.SerializationEr...
Hey @NandiniN The error currently stopped happening, but we are not feeling "safe" yet,could you tell me when the fix was published? just so we try and pin point to see if the fix is what solved it?
Have been running into an issue when running a pymc-marketing model in a Databricks notebook. The cell that fits the model gets hung up and the progress bar stops moving, however the code completes and dumps all needed output into a folder. After the...
Hello,I am running a job that depends on the information provided in column storage_sub_directory in system.information_schema.tables .... and it worked until 1-2 weeks ago.Now I discovered in the doc that this column is deprecated and always null , ...
In a pyspark application, I am using set of python libraries. In order to handle python dependencies while running pyspark application, I am using the approach provided by spark : Create archive file of Python virtual environment using required set o...
Hi,
I have not tried it but based on the doc you have to go by this approach. ./environment/bin/pythonmust be replaced with the correct path.
import os
from pyspark.sql import SparkSession
os.environ['PYSPARK_PYTHON'] = "./environment/bin/python"
sp...