Data Engineering

Forum Posts

Sorted by:

by Oliver_Angelil • Valued Contributor II

2m ago

0 Views
0 replies
0 kudos

Append-only table from non-streaming source in Databricks Delta Live Tables (DLT)

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.The pipeline runs successfully on the first run. However on the seco...

Data Engineering

0 Views
0 replies
0 kudos

2m ago

by Anske • New Contributor II

18m ago

8 Views
0 replies
0 kudos

DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...

Data Engineering

Delta Live Tables

8 Views
0 replies
0 kudos

18m ago

by zahra_Khedri • Visitor

2 hours ago

51 Views
1 replies
0 kudos

An error occurred when loading Jobs and Workflows App.

Hi,I was trying to open the Workflows but there is an error "An error occurred when loading Jobs and Workflows App." we need help to know why it happened and how we can resolve it please.

Data Engineering

51 Views
1 replies
0 kudos

2 hours ago

View Replies

Latest Reply

GeoPer
Visitor

2 hours ago

0 kudos

Same...and the weirdest is that all of the services looks healthy in https://status.databricks.com/Region: eu-central-1Provider: AWSCould anyone provide some info here?

0 kudos

2 hours ago

by stepysamud • Visitor

2 hours ago

43 Views
0 replies
0 kudos

Workflow UI broken after creating job via the api

Hi all,I'm in the progress of migrating from Databricks Azure to Databricks AWS.One part of this is migrating all our workflows which I wanted to via the /api/2.1/jobs/create api with the workflow passed via the json body. I have successfully created...

Data Engineering

43 Views
0 replies
0 kudos

2 hours ago

by madrhr • New Contributor

yesterday

68 Views
2 replies
1 kudos

SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering

%sh

.py

bash shell

SparkContext

SparkShell

68 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

Yeshwanth
Contributor III

yesterday

1 kudos

@madrhr I think this occurs because one session is initiated within the Python script (.py file), while in the Databricks notebook, we have a pre-configured Spark session. It is important to note that we cannot use more than one Spark session per not...

1 kudos

yesterday

1 More Replies

by niruban • New Contributor

yesterday

25 Views
0 replies
0 kudos

Migrate a notebook that reside in workspace using Databricks Asset Bundle

Hello Community Folks -Did anyone implemented migration of notebooks that is in workspace to production databricks workspace using Databricks Asset Bundle? If so can you please help me with any documentation which I can refer? Thanks!!RegardsNiruban ...

Data Engineering

25 Views
0 replies
0 kudos

yesterday

by deng_dev • New Contributor III

yesterday

130 Views
1 replies
0 kudos

Cached Views in MERGE INTO operation

Hi everyone!I want to use in-memory cached views in a merge into operation, but I am not entirely sure if the exactly saved in-memory view is used in this operation or not.So, suppose I have a table named table_1 and a cached view named cached_view_1...

Data Engineering

130 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

shan_chandra
Honored Contributor III

yesterday

0 kudos

@deng_dev - Are you using external metastore by any chance. From the physical plan, we could see the catalog`.`db`.`table_1` is not cached. If it is glue catalog, then caching can be enabled based on the below configs in the article below https://do...

0 kudos

yesterday

by Anonymous • Not applicable

06-07-2021 10:50:07 AM

4838 Views
15 replies
8 kudos

Resolved! What are some best practices for CICD?

A number of people have questions on using Databricks in a productionalized environment. What are the best practices to enable CICD automation?

Data Engineering

4838 Views
15 replies
8 kudos

06-07-2021 10:50:07 AM

View Replies

Latest Reply

BaivabMohanty
New Contributor II

yesterday

8 kudos

Any leads/posts for Databricks CI/CD integration with Bitbucket pipeline. I am facing the below error while I creation my CICD pipeline pipelines:branches:master:- step:name: Deploy Databricks Changesimage: docker:19.03.12services:- dockerscript:# U...

8 kudos

yesterday

14 More Replies

by Sambit_S • New Contributor II

yesterday

40 Views
0 replies
0 kudos

Error during deserializing protobuf data

I am receiving protobuf data in a json attribute and along with it I receive a descriptor file.I am using from_protobuf to deserialize the data as below,It works most of the time but giving error when there are some recursive fields within the protob...

Data Engineering

40 Views
0 replies
0 kudos

yesterday

by drag7ter • New Contributor II

yesterday

514 Views
2 replies
0 kudos

Resolved! Not able to set run_as service_principal_name

I'm trying to run: databricks bundle deploy -t prod --profile PROD_Service_Principal My bundle looks: bundle: name: myproject include: - resources/jobs/bundles/*.yml targets: # The 'dev' target, for development purposes. This target is the de...

Data Engineering

514 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

drag7ter
New Contributor II

yesterday

0 kudos

In my case I replaced alias PROD_Service_Principal with id c250831b-5a2a-4461-a855-83b9102f797e and it works. Not intuitive, probably this is a bug in CLI ot bundles service_principal_name: c250831b-5a2a-4461-a855-83b9102f797e

0 kudos

yesterday

1 More Replies

by YannLevavasseur • New Contributor

yesterday

141 Views
0 replies
0 kudos

SQL function refactoring into Databricks environment

Hello all,I'm currently working on importing some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

Data Engineering

141 Views
0 replies
0 kudos

yesterday

by EhsanSaba • New Contributor

yesterday

214 Views
0 replies
0 kudos

RocksDB results in empty stream/stream joins dataframe

Since we enable RocksDB in our spark.conf the stream to stream joins/unions results in empty dataframe, does anyone else have the same experience? it is on AWSspark.conf.set("spark.sql.streaming.stateStore.providerClass","com.databricks.sql.streaming...

Data Engineering

214 Views
0 replies
0 kudos

yesterday

by RakeshRakesh_De • New Contributor III

a week ago

392 Views
7 replies
0 kudos

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Hi,I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3I have tried all possible way but its not w...

Data Engineering

csv

EmptyValue

FileRead

392 Views
7 replies
0 kudos

a week ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

yesterday

0 kudos

OK, after some tests:The trick is in surrounding text in your csv with quotes. Like that spark can actually make a difference between a missing value and an empty value. Missing values are null and can only be converted to something else implicitel...

0 kudos

yesterday

6 More Replies

by ajbush • New Contributor III

01-26-2023 5:33:23 PM

9204 Views
6 replies
2 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

Data Engineering

9204 Views
6 replies
2 kudos

01-26-2023 5:33:23 PM

View Replies

Latest Reply

aagarwal
New Contributor

yesterday

2 kudos

@ludgervisser We are trying to connect to Snowflake via Azure AD user through the externalbrowser method but the browser window doesn't open. Could you please share an example code of how you managed to achieve this, or to some documentation? @BobGeo...

2 kudos

yesterday

5 More Replies

by Brad • Contributor

Tuesday

100 Views
2 replies
0 kudos

Pushdown in Postgres

Hi team,In Databricks I need to query a postgres source likeselect * from postgres_tbl where id in (select id from df)the df is got from a hive table. If I use JDBC driver, and doquery = '(select * from postgres_tbl) as t' src_df = spark.read.format(...

Data Engineering

100 Views
2 replies
0 kudos

Tuesday

View Replies

Latest Reply

Brad
Contributor

Tuesday

0 kudos

Thanks for response. I cannot do that as we incrementally loading from source very frequently. We cannot read full data each time.

0 kudos

Tuesday

1 More Replies

User

Count

1600

735

343

284

246

Databricks

Forum Posts

Append-only table from non-streaming source in Databricks Delta Live Tables (DLT)

DLT apply_changes applies only deletes and inserts not updates

An error occurred when loading Jobs and Workflows App.

Workflow UI broken after creating job via the api

SparkContext lost when running %sh script.py

Migrate a notebook that reside in workspace using Databricks Asset Bundle

Cached Views in MERGE INTO operation

Resolved! What are some best practices for CICD?

Error during deserializing protobuf data

Resolved! Not able to set run_as service_principal_name

SQL function refactoring into Databricks environment

RocksDB results in empty stream/stream joins dataframe

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Connecting to Snowflake using an SSO user from Azure Databricks

Pushdown in Postgres

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...

I have to run the notebook in concurrently using p...