cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

scottbisaillon
by New Contributor
  • 600 Views
  • 1 replies
  • 0 kudos

Databricks Running Jobs and Terraform

What happens to a currently running job when a workspace is deployed again using Terraform? Are the jobs paused/resumed, or are they left unaffected without any down time? Searching for this specific scenario doesn't seem to come up with anything and...

  • 600 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @scottbisaillon, When deploying a workspace again using Terraform, the behaviour regarding currently running jobs depends on the specific Terraform version and the platform you are using.   Let’s explore the details: Terraform Cloud (form...

  • 0 kudos
TinasheChinyati
by New Contributor
  • 1156 Views
  • 1 replies
  • 0 kudos

Stream to stream join NullPointerException

I have a DLT pipeline running in continous mode. I have a stream to stream join which runs for the first 5hrs but then fails with a Null Pointer Exception. I need assistance to know what I need to do to handle this. my code is structured as below:@dl...

  • 1156 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @TinasheChinyati, It looks like you’re encountering a Null Pointer Exception in your DLT pipeline when performing a stream-to-stream join. Let’s break down the issue and explore potential solutions: The error message indicates that the query te...

  • 0 kudos
liormayn
by New Contributor III
  • 1549 Views
  • 5 replies
  • 3 kudos

OSError: [Errno 78] Remote address changed

Hello:)as part of deploying an app that previously ran directly on emr to databricks, we are running experiments using LTS 9.1, and getting the following error: PythonException: An exception was thrown from a UDF: 'pyspark.serializers.SerializationEr...

  • 1549 Views
  • 5 replies
  • 3 kudos
Latest Reply
NandiniN
Honored Contributor
  • 3 kudos

Hi @liormayn , I can understand. I see the fix went on 20 March 2024, you would have to restart the clusters. Thanks!

  • 3 kudos
4 More Replies
Ikanip
by New Contributor II
  • 1477 Views
  • 4 replies
  • 2 kudos

Resolved! How to choose a compute, and how to find alternatives for the current compute being used?

We are using a compute for an Interactive Cluster in Production which incurs X amount of cost. We want to know what are the options available to use with near about the same processing power as the current compute but incur a cost of Y, which is less...

  • 1477 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Contributor III
  • 2 kudos

Hello @Ikanip , You can utilize the Databricks Pricing Calculator to estimate costs. For detailed information on compute capacity, please refer to your cloud provider's documentation regarding Virtual Machine instance types.

  • 2 kudos
3 More Replies
Hubcap7700
by New Contributor
  • 288 Views
  • 1 replies
  • 0 kudos

Native Slack Integration

Hi,Are there any plans to build native slack integration? I'm envisioning a one-time connector to Slack that would automatically populate all channels and users to select to use for example when configuring an alert notification. It is does not seem ...

  • 288 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Hubcap7700, If you have any further details or specific requirements, feel free to share, and I’ll be happy to assist! 

  • 0 kudos
sujan1
by New Contributor
  • 386 Views
  • 1 replies
  • 0 kudos

requirements.txt with cluster libraries

Cluster libraries are supported from version 15.0 - Databricks Runtime 15.0 | Databricks on AWS.How can I specify requirements.txt file path in the libraries in a job cluster in my workflow? Can I use relative path? Is it relative from the root of th...

  • 386 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

To specify the requirements.txt file path for libraries in a job cluster workflow in Databricks, you have a few options. Let’s break it down: Upload the requirements.txt File: First, upload your requirements.txt file to your Databricks workspace....

  • 0 kudos
Abhay_1002
by New Contributor
  • 191 Views
  • 1 replies
  • 0 kudos

Archive file support in Jar Type application

In my spark application, I am using set of python libraries. I am submitting spark application as Jar Task. But I am not able to find any option provide Archive Files.So, in order to handle python dependencies, I am using approach:Create archive file...

  • 191 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Abhay_1002,  Using --py-files Argument: When submitting a Spark application, you can use the --py-files argument to add Python files (including .zip or .egg archives) to be distributed with your application1. However, this approach is typical...

  • 0 kudos
EirikMa
by New Contributor II
  • 718 Views
  • 2 replies
  • 0 kudos

UTF-8 troubles in DLT

Issues with UTF-8 in DLTI am having issues with UTF-8 in DLT:I have tried to set the spark config on the cluster running the DLT pipeline:  I have fixed this with normal compute under advanced settings like this:spark.conf.set("spark.driver.extraJava...

EirikMa_0-1711360526822.png EirikMa_1-1711361452104.png
Community Discussions
data engineering
  • 718 Views
  • 2 replies
  • 0 kudos
Latest Reply
EirikMa
New Contributor II
  • 0 kudos

Hi @Kaniz_Fatma! Sorry for a long wait...The problem is not the columns or the data itself, the UTF-8 option for csv is working fine. The issue is with table_names not being compatible it seems. If I run the query through Auto Loader outside DLT and ...

  • 0 kudos
1 More Replies
mderela
by New Contributor II
  • 226 Views
  • 1 replies
  • 0 kudos

Databricks bundles - good practice for multiprocessing envs

I'm seeking advice regarding Databricks bundles. In my scenario, I have multiple production environments where I aim to execute the same DLT. To simplify, let's assume the DLT reads data from 'eventhub-region-name,' with this being the only differing...

  • 226 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @mderela, When dealing with Databricks bundles in a multi-environment setup, there are some best practices you can follow to ensure smooth execution and maintainable code. Let’s explore a couple of recommendations: Parameterization and Configu...

  • 0 kudos
Abhay_1002
by New Contributor
  • 201 Views
  • 1 replies
  • 0 kudos

Issue with Python Package Management in Spark application

In a pyspark application, I am using set of python libraries. In order to handle python dependencies while running pyspark application, I am using the approach provided by spark : Create archive file of Python virtual environment using required set o...

  • 201 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Hi, I have not tried it but based on the doc you have to go by this approach. ./environment/bin/pythonmust be replaced with the correct path. import os from pyspark.sql import SparkSession os.environ['PYSPARK_PYTHON'] = "./environment/bin/python" sp...

  • 0 kudos
Nagarathna
by New Contributor II
  • 690 Views
  • 3 replies
  • 1 kudos

File not found error when trying to read json file from aws s3 using with open.

I am trying to reading json from aws s3 using with open in databricks notebook using shared cluster.Error message:No such file or directory:'/dbfs/mnt/datalake/input_json_schema.json'In single instance cluster the above error is not found.  

  • 690 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

Hi @Nagarathna , I just tried it on a shared cluster and did not face any issue. What is the exact error that you are facing? Complete stacktrace might help. Just to confirm are you accessing the "/dbfs/mnt/datalake/input.json" from the same workspac...

  • 1 kudos
2 More Replies
databricksdev
by New Contributor II
  • 369 Views
  • 2 replies
  • 0 kudos

Can we customize job run name when running azure data bricks notebook jobs from azure data factory

Hi All,we are executing databricks notebook activity  inside the child pipeline thru ADF. we are getting child pipeline name in job name while executing databricks job.  Is it possible to get master pipeline name as job name or customize job name thr...

  • 369 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

I think we should raise a Request/Product Feedback. Not sure if it would be Databricks that would own it or Microsoft but you may submit feedback for Databricks here - https://docs.databricks.com/en/resources/ideas.html  

  • 0 kudos
1 More Replies
dbx_687_3__1b3Q
by New Contributor III
  • 498 Views
  • 2 replies
  • 0 kudos

Impersonating a user

How do I impersonate a user? I can't find any documentation that explains how to do this or even hint that it's possible.Use case: I perform administrative tasks like assign grants and roles to catalogs, schemas, and tables for the benefit of busines...

  • 498 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Hidbx_687_3__1b3Q, Actually, I have seen impersonation, is this something that you are looking for? https://docs.gcp.databricks.com/en/dev-tools/google-id-auth.html#step-5-impersonate-the-google-cloud-service-account

  • 0 kudos
1 More Replies
AlexG
by New Contributor II
  • 828 Views
  • 3 replies
  • 1 kudos

Query results in csv file include 'null' string for blank cell

After running a sql script, when downloading the results to a csv file, the file includes a null string for blank cells (see screenshot). Is ther a setting I can change to simply get empty cells instead? 

AlexG_1-1702927614092.png
  • 828 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

Hi AlexG, I tested with the table content containing null and with empty data and it works as expected in the download option too. Here is an eg: CREATE TABLE my_table_null_test1 ( id INT, name STRING ); INSERT INTO my_table_null_test1 (id, name)...

  • 1 kudos
2 More Replies
DataBricks_Use1
by New Contributor
  • 346 Views
  • 2 replies
  • 0 kudos

FileReadException Error

Hi,I am getting FilereadException Error while reading JSON file using REST API Connector.It comes when data is huge in Json File and it's not able to handle more than 1 Lac records.Error details:org.apache.spark.SparkException: Job aborted due to sta...

  • 346 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Hello @DataBricks_Use1 , It would great if you could add the entire stack trace, as Jose mentioned. But there should be a "Caused by:" section below which would give you an idea of what's the reason for this failure and then you can work on that.  fo...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!