Data Engineering

Forum Posts

Sorted by:

by deng_dev • New Contributor III

18 hours ago

61 Views
1 replies
0 kudos

Cached Views in MERGE INTO operation

Hi everyone!I want to use in-memory cached views in a merge into operation, but I am not entirely sure if the exactly saved in-memory view is used in this operation or not.So, suppose I have a table named table_1 and a cached view named cached_view_1...

Data Engineering

61 Views
1 replies
0 kudos

18 hours ago

View Replies

Latest Reply

shan_chandra
Honored Contributor III

11 hours ago

0 kudos

@deng_dev - Are you using external metastore by any chance. From the physical plan, we could see the catalog`.`db`.`table_1` is not cached. If it is glue catalog, then caching can be enabled based on the below configs in the article below https://do...

0 kudos

11 hours ago

by Anonymous • Not applicable

06-07-2021 10:50:07 AM

4825 Views
15 replies
8 kudos

Resolved! What are some best practices for CICD?

A number of people have questions on using Databricks in a productionalized environment. What are the best practices to enable CICD automation?

Data Engineering

4825 Views
15 replies
8 kudos

06-07-2021 10:50:07 AM

View Replies

Latest Reply

BaivabMohanty
New Contributor II

15 hours ago

8 kudos

Any leads/posts for Databricks CI/CD integration with Bitbucket pipeline. I am facing the below error while I creation my CICD pipeline pipelines:branches:master:- step:name: Deploy Databricks Changesimage: docker:19.03.12services:- dockerscript:# U...

8 kudos

15 hours ago

14 More Replies

by Sambit_S • New Contributor II

16 hours ago

35 Views
0 replies
0 kudos

Error during deserializing protobuf data

I am receiving protobuf data in a json attribute and along with it I receive a descriptor file.I am using from_protobuf to deserialize the data as below,It works most of the time but giving error when there are some recursive fields within the protob...

Data Engineering

35 Views
0 replies
0 kudos

16 hours ago

by drag7ter • New Contributor II

21 hours ago

405 Views
2 replies
0 kudos

Resolved! Not able to set run_as service_principal_name

I'm trying to run: databricks bundle deploy -t prod --profile PROD_Service_Principal My bundle looks: bundle: name: myproject include: - resources/jobs/bundles/*.yml targets: # The 'dev' target, for development purposes. This target is the de...

Data Engineering

405 Views
2 replies
0 kudos

21 hours ago

View Replies

Latest Reply

drag7ter
New Contributor II

17 hours ago

0 kudos

In my case I replaced alias PROD_Service_Principal with id c250831b-5a2a-4461-a855-83b9102f797e and it works. Not intuitive, probably this is a bug in CLI ot bundles service_principal_name: c250831b-5a2a-4461-a855-83b9102f797e

0 kudos

17 hours ago

1 More Replies

by madrhr • Visitor

19 hours ago

54 Views
0 replies
0 kudos

SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering

%sh

.py

bash shell

SparkContext

SparkShell

54 Views
0 replies
0 kudos

19 hours ago

by YannLevavasseur • New Contributor

20 hours ago

127 Views
0 replies
0 kudos

SQL function refactoring into Databricks environment

Hello all,I'm currently working on importing some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

Data Engineering

127 Views
0 replies
0 kudos

20 hours ago

by EhsanSaba • New Contributor

21 hours ago

127 Views
0 replies
0 kudos

RocksDB results in empty stream/stream joins dataframe

Since we enable RocksDB in our spark.conf the stream to stream joins/unions results in empty dataframe, does anyone else have the same experience? it is on AWSspark.conf.set("spark.sql.streaming.stateStore.providerClass","com.databricks.sql.streaming...

Data Engineering

127 Views
0 replies
0 kudos

21 hours ago

by RakeshRakesh_De • New Contributor III

Thursday

385 Views
7 replies
0 kudos

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Hi,I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3I have tried all possible way but its not w...

Data Engineering

csv

EmptyValue

FileRead

385 Views
7 replies
0 kudos

Thursday

View Replies

Latest Reply

-werners-
Esteemed Contributor III

23 hours ago

0 kudos

OK, after some tests:The trick is in surrounding text in your csv with quotes. Like that spark can actually make a difference between a missing value and an empty value. Missing values are null and can only be converted to something else implicitel...

0 kudos

23 hours ago

6 More Replies

by ajbush • New Contributor III

01-26-2023 5:33:23 PM

9187 Views
6 replies
2 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

Data Engineering

9187 Views
6 replies
2 kudos

01-26-2023 5:33:23 PM

View Replies

Latest Reply

aagarwal
Visitor

23 hours ago

2 kudos

@ludgervisser We are trying to connect to Snowflake via Azure AD user through the externalbrowser method but the browser window doesn't open. Could you please share an example code of how you managed to achieve this, or to some documentation? @BobGeo...

2 kudos

23 hours ago

5 More Replies

by Brad • Contributor

yesterday

90 Views
2 replies
0 kudos

Pushdown in Postgres

Hi team,In Databricks I need to query a postgres source likeselect * from postgres_tbl where id in (select id from df)the df is got from a hive table. If I use JDBC driver, and doquery = '(select * from postgres_tbl) as t' src_df = spark.read.format(...

Data Engineering

90 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Brad
Contributor

yesterday

0 kudos

Thanks for response. I cannot do that as we incrementally loading from source very frequently. We cannot read full data each time.

0 kudos

yesterday

1 More Replies

by alpine • New Contributor

02-20-2024 9:02:20 AM

464 Views
2 replies
0 kudos

Deploy lock force acquired error when deploying asset bundle using databricks cli

I'm running this command on a DevOps pipeline.databricks bundle deploy -t devI receive this error and have tried using --force-lock but it still doesn't work.Error: deploy lock force acquired by name@company.com at 2024-02-20 16:38:34.99794209 +0000 ...

Data Engineering

464 Views
2 replies
0 kudos

02-20-2024 9:02:20 AM

View Replies

Latest Reply

Li_Li
New Contributor

yesterday

0 kudos

Hi, I had the same error. Could I ask if this --force-lock has anything to do with the terraform lock? or it's a separate lock only for bundle? Where can I find documentation about this flag? thank you in advance.

0 kudos

yesterday

1 More Replies

by VovaVili • New Contributor

3 weeks ago

444 Views
2 replies
0 kudos

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Hello all,The official documentation for Databricks Connect states that, for Databricks Runtime versions 13.0 and above, my cluster needs to have Unity Catalog enabled for me to use Databricks Connect, and use a Databricks cluster through an IDE like...

Data Engineering

444 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

mohaimen_syed
New Contributor III

yesterday

0 kudos

Hi, I'm currently using Databricks Connect without the Unity Catalog on VS Code. Although I have connected the Unity Catalog separately on multiple occasion I don't thing its required.Here is the doc:https://docs.databricks.com/en/dev-tools/databrick...

0 kudos

yesterday

1 More Replies

by AnaMocanu • New Contributor

Monday

302 Views
2 replies
0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

Data Engineering

302 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

Monday

0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

0 kudos

Monday

1 More Replies

by Hubert-Dudek • Esteemed Contributor III

yesterday

211 Views
1 replies
1 kudos

The star inside WHERE

The star (*) can be used inside the WHERE clause in #Databricks as of runtime version 15.

Data Engineering

211 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

Lakshay
Esteemed Contributor

yesterday

1 kudos

Thank you for sharing

1 kudos

yesterday

by Clampazzo • New Contributor

yesterday

74 Views
1 replies
0 kudos

Can I see queries sent to All Purpose Compute from Power BI?

I am brand new to Databricks and am working on connecting a power bi semantic model to our databricks instance. I have successfully connected it to an All Purpose Compute but was wondering if there was a way I could see the queries that power bi is ...

Data Engineering

Power BI

sql

74 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Gaut23
New Contributor II

yesterday

0 kudos

For All purpose compute, best bet would be to use the system tables,specifically the system.access.audit table. https://docs.databricks.com/en/administration-guide/system-tables/index.html

0 kudos

yesterday

User

Count

1600

735

343

284

246

Databricks

Forum Posts

Cached Views in MERGE INTO operation

Resolved! What are some best practices for CICD?

Error during deserializing protobuf data

Resolved! Not able to set run_as service_principal_name

SparkContext lost when running %sh script.py

SQL function refactoring into Databricks environment

RocksDB results in empty stream/stream joins dataframe

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Connecting to Snowflake using an SSO user from Azure Databricks

Pushdown in Postgres

Deploy lock force acquired error when deploying asset bundle using databricks cli

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Best way to parse Google Analytics data in Databricks notebook

The star inside WHERE

Can I see queries sent to All Purpose Compute from Power BI?

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...

I have to run the notebook in concurrently using p...