Data Engineering

Forum Posts

Sorted by:

by RakeshRakesh_De • New Contributor III

Thursday

363 Views
7 replies
0 kudos

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Hi,I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3I have tried all possible way but its not w...

Data Engineering

csv

EmptyValue

FileRead

363 Views
7 replies
0 kudos

Thursday

View Replies

Latest Reply

-werners-
Esteemed Contributor III

26m ago

0 kudos

OK, after some tests:The trick is in surrounding text in your csv with quotes. Like that spark can actually make a difference between a missing value and an empty value. Missing values are null and can only be converted to something else implicitel...

0 kudos

26m ago

6 More Replies

by ajbush • New Contributor III

01-26-2023 5:33:23 PM

9113 Views
6 replies
2 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

Data Engineering

9113 Views
6 replies
2 kudos

01-26-2023 5:33:23 PM

View Replies

Latest Reply

aagarwal
Visitor

48m ago

2 kudos

@ludgervisser We are trying to connect to Snowflake via Azure AD user through the externalbrowser method but the browser window doesn't open. Could you please share an example code of how you managed to achieve this, or to some documentation? @BobGeo...

2 kudos

48m ago

5 More Replies

by Brad • Contributor

yesterday

61 Views
2 replies
0 kudos

Pushdown in Postgres

Hi team,In Databricks I need to query a postgres source likeselect * from postgres_tbl where id in (select id from df)the df is got from a hive table. If I use JDBC driver, and doquery = '(select * from postgres_tbl) as t' src_df = spark.read.format(...

Data Engineering

61 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Brad
Contributor

yesterday

0 kudos

Thanks for response. I cannot do that as we incrementally loading from source very frequently. We cannot read full data each time.

0 kudos

yesterday

1 More Replies

by alpine • New Contributor

02-20-2024 9:02:20 AM

454 Views
2 replies
0 kudos

Deploy lock force acquired error when deploying asset bundle using databricks cli

I'm running this command on a DevOps pipeline.databricks bundle deploy -t devI receive this error and have tried using --force-lock but it still doesn't work.Error: deploy lock force acquired by name@company.com at 2024-02-20 16:38:34.99794209 +0000 ...

Data Engineering

454 Views
2 replies
0 kudos

02-20-2024 9:02:20 AM

View Replies

Latest Reply

Li_Li
Visitor

yesterday

0 kudos

Hi, I had the same error. Could I ask if this --force-lock has anything to do with the terraform lock? or it's a separate lock only for bundle? Where can I find documentation about this flag? thank you in advance.

0 kudos

yesterday

1 More Replies

by VovaVili • New Contributor

3 weeks ago

374 Views
2 replies
0 kudos

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Hello all,The official documentation for Databricks Connect states that, for Databricks Runtime versions 13.0 and above, my cluster needs to have Unity Catalog enabled for me to use Databricks Connect, and use a Databricks cluster through an IDE like...

Data Engineering

374 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

mohaimen_syed
New Contributor III

yesterday

0 kudos

Hi, I'm currently using Databricks Connect without the Unity Catalog on VS Code. Although I have connected the Unity Catalog separately on multiple occasion I don't thing its required.Here is the doc:https://docs.databricks.com/en/dev-tools/databrick...

0 kudos

yesterday

1 More Replies

by AnaMocanu • New Contributor

Monday

188 Views
2 replies
0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

Data Engineering

188 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

Monday

0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

0 kudos

Monday

1 More Replies

by Hubert-Dudek • Esteemed Contributor III

yesterday

194 Views
1 replies
1 kudos

The star inside WHERE

The star (*) can be used inside the WHERE clause in #Databricks as of runtime version 15.

Data Engineering

194 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

Lakshay
Esteemed Contributor

yesterday

1 kudos

Thank you for sharing

1 kudos

yesterday

by Clampazzo • Visitor

yesterday

61 Views
1 replies
0 kudos

Can I see queries sent to All Purpose Compute from Power BI?

I am brand new to Databricks and am working on connecting a power bi semantic model to our databricks instance. I have successfully connected it to an All Purpose Compute but was wondering if there was a way I could see the queries that power bi is ...

Data Engineering

Power BI

sql

61 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Gaut23
New Contributor II

yesterday

0 kudos

For All purpose compute, best bet would be to use the system tables,specifically the system.access.audit table. https://docs.databricks.com/en/administration-guide/system-tables/index.html

0 kudos

yesterday

by Olaoye_Somide • New Contributor

Sunday

99 Views
1 replies
0 kudos

How to Implement Custom Logging in Databricks without Using _jvm Attribute with Spark Connect?

Hello Databricks Community,I am currently working in a Databricks environment and trying to set up custom logging using Log4j in a Python notebook. However, I've run into a problem due to the use of Spark Connect, which does not support the _jvm attr...

Data Engineering

Apache Spark

data engineering

99 Views
1 replies
0 kudos

Sunday

View Replies

Latest Reply

arpit
Contributor III

yesterday

0 kudos

import logging logging.getLogger().setLevel(logging.WARN) log = logging.getLogger("DATABRICKS-LOGGER") log.warning("Hello")

0 kudos

yesterday

by anish2102 • New Contributor II

a week ago

199 Views
4 replies
1 kudos

Resolved! Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

In my notebook, i am performing few join operations which are taking more than 30s in cluster 14.3 LTS where same operation is taking less than 4s in 13.3 LTS cluster. Can someone help me how can i optimize pyspark operations like joins and withColum...

Data Engineering

clustr-14.3

spark-3.5

199 Views
4 replies
1 kudos

a week ago

View Replies

Latest Reply

Lakshay
Esteemed Contributor

yesterday

1 kudos

Thank you for sharing the analysis

1 kudos

yesterday

3 More Replies

by SG • New Contributor II

07-20-2023 1:01:53 PM

554 Views
3 replies
1 kudos

Customize job run name when running jobs from adf

Hi guys, i am running my Databricks jobs on a cluster job from azure datafactory using a databricks Python activity When I monitor my jobs in workflow-> job runs . I see that the run name is a concatenation of adf pipeline name , Databricks python ac...

Data Engineering

554 Views
3 replies
1 kudos

07-20-2023 1:01:53 PM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

yesterday

1 kudos

I don't think that level of customisation is provided. However, I can suggest some workarounds:REST API: Create a job on the fly with desired name within ADF and trigger it using REST API in Web activity. This way you can track job completion status ...

1 kudos

yesterday

2 More Replies

by Mohit_m • Valued Contributor II

06-27-2022 5:24:40 AM

1190 Views
2 replies
3 kudos

Resolved! Could not initialize class error

User is running a job triggered from ADF in Databricks. In this job they need to use custom libraries that are in jars. Most of the times jobs are running fine, however sometimes it fails with:java.lang.NoClassDefFoundError: Could not initializeAny s...

Data Engineering

1190 Views
2 replies
3 kudos

06-27-2022 5:24:40 AM

View Replies

Latest Reply

Mohit_m
Valued Contributor II

06-27-2022 5:25:15 AM

3 kudos

Can you please check if there are more than one jar containing this class . If multiple jars of the same type are available on the cluster, then there is no guarantee of JVM picking the proper classes for processing, which results in the intermittent...

3 kudos

06-27-2022 5:25:15 AM

1 More Replies

by Jorge3 • New Contributor III

yesterday

172 Views
3 replies
2 kudos

Resolved! [Databricks Assets Bundles] Workflow trigger on file arrival

Hi everyone!I'm setting up a workflow using Databricks Assets Bundles (DABs). And I want to configure my workflow to be trigger on file arrival. However all the examples I've found in the documentation use schedule triggers. Does anyone know if it is...

Data Engineering

172 Views
3 replies
2 kudos

yesterday

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

yesterday

2 kudos

Hi @Jorge3 Yes, you can use continues mode also.Please find syntax below - resources: jobs: dbx_job: name: continuous_job_name continuous: pause_status: UNPAUSED queue: enabled: true

2 kudos

yesterday

2 More Replies

by Rene • New Contributor

yesterday

67 Views
1 replies
0 kudos

Can we build IOT data trading platform by using Databricks?

I have an idea of sharing & trading IoT data streamlined from many data sources on the incentive platform.I would be appreciate it if you guys discuss with me about the idea.Thank you

Data Engineering

67 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

betty4920taylor
Visitor

yesterday

0 kudos

Hello @Rene,Building an IoT data trading platform using Databricks is indeed a feasible and innovative idea. Databricks provides a unified analytics platform that can handle massive amounts of data processing and advanced analytics, which is essentia...

0 kudos

yesterday

by ismaelhenzel • New Contributor II

2 weeks ago

563 Views
2 replies
2 kudos

Resolved! Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken p...

Data Engineering

bunlde

CICD

Databricks

563 Views
2 replies
2 kudos

2 weeks ago

View Replies

Latest Reply

ismaelhenzel
New Contributor II

yesterday

2 kudos

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to imp...

2 kudos

yesterday

1 More Replies

User

Count

1600

735

343

284

246

Databricks

Forum Posts

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Connecting to Snowflake using an SSO user from Azure Databricks

Pushdown in Postgres

Deploy lock force acquired error when deploying asset bundle using databricks cli

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Best way to parse Google Analytics data in Databricks notebook

The star inside WHERE

Can I see queries sent to All Purpose Compute from Power BI?

How to Implement Custom Logging in Databricks without Using _jvm Attribute with Spark Connect?

Resolved! Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

Customize job run name when running jobs from adf

Resolved! Could not initialize class error

Resolved! [Databricks Assets Bundles] Workflow trigger on file arrival

Can we build IOT data trading platform by using Databricks?

Resolved! Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...

I have to run the notebook in concurrently using p...

Invalid configuration fs.azure.account.key trying ...