Data Engineering

Forum Posts

Sorted by:

by madrhr • New Contributor II

a week ago

263 Views
3 replies
2 kudos

Resolved! SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering

%sh

.py

bash shell

SparkContext

SparkShell

263 Views
3 replies
2 kudos

a week ago

View Replies

Latest Reply

madrhr
New Contributor II

Monday

2 kudos

I got it eventually working with a combination of:from databricks.sdk.runtime import *spark.sparkContext.addPyFile("/path/to/your/file")sys.path.append("path/to/your")

2 kudos

Monday

2 More Replies

by NOOR_BASHASHAIK • Contributor

03-12-2024 11:33:31 AM

428 Views
3 replies
0 kudos

Machine Type for VACUUM operation

Dear allI have a workflow with 2 tasks : one that does OPTIMIZE, followed by one that does VACUUM. I used a cluster with F32s driver and F64s - 8 workers (auto-scaling enabled). All 8 workers are launched by Databricks as soon as OPTIMIZE starts. As ...

Data Engineering

best practice

F series

optimize

vacuum

428 Views
3 replies
0 kudos

03-12-2024 11:33:31 AM

View Replies

Latest Reply

ArturOA
New Contributor

Monday

0 kudos

Hi,were you able to get any useful help on this?

0 kudos

Monday

2 More Replies

by PrebenOlsen • New Contributor III

2 weeks ago

223 Views
2 replies
0 kudos

How to migrate Git repos with DLT configurations

Hi!I want to migrate all my databricks related code from one github repo to another. I knew this wouldn't be straight forward. When I copy my code for one DLT, I get the errororg.apache.spark.sql.catalyst.ExtendedAnalysisException: Table 'vessel_batt...

Data Engineering

223 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

PrebenOlsen
New Contributor III

Monday

0 kudos

Does cloning take considerably less time then recreating the tables?Can I resume append operations to a cloned table?

0 kudos

Monday

1 More Replies

by Anshul_DBX • New Contributor

Sunday

84 Views
1 replies
1 kudos

Masking rules with Delta Sharing

Hi,We tried Delta sharing to PBI which worked fine, But facing issues while trying to apply row, column level filtering or data masking. It fails with error that its not supported.Can anyone please confirm, if delta sharing with masking rules works w...

Data Engineering

84 Views
1 replies
1 kudos

Sunday

View Replies

Latest Reply

Yeshwanth
Valued Contributor

Sunday

1 kudos

Hi @Anshul_DBX good day! The issue you are encountering is due to a limitation in Delta Sharing. As per the provided information, Delta Sharing does not support row-level security or column masks. This means that you cannot apply row and column level...

1 kudos

Sunday

by SreeG • New Contributor

Sunday

129 Views
1 replies
0 kudos

CICD for Work Flows

HiI am facing issues when deploying work flows to different environment. The same works for Notebooks and Scripts, when deploying the work flows, it failed with "Authorization Failed. Your token may be expired or lack the valid scope". Anything shoul...

Data Engineering

CICD

129 Views
1 replies
0 kudos

Sunday

View Replies

Latest Reply

Yeshwanth
Valued Contributor

Sunday

0 kudos

Hi Sree, Good day! Looking at the error message it seems like the token is expired. Could you please check if your PAT Token is valid? Have you created the PAT Token for the workspace that you are integrating with? Regards, Yesh

0 kudos

Sunday

by Yohannes • New Contributor

Saturday

99 Views
1 replies
0 kudos

Databricks cli workflow

Is there a way that I can set up and configure a Databricks workflow job and tasks from Databricks cli or api tools by using python? Any help would be appreciated. #databricksworkflow #databricks

Data Engineering

99 Views
1 replies
0 kudos

Saturday

View Replies

Latest Reply

steyler-db
New Contributor III

Saturday

0 kudos

Hello and yes, you can set up and configure a Databricks workflow job and tasks using Databricks CLI or API tools with Python. Here are some resources and steps to guide you: Create and run Databricks Jobs: This document: ( https://docs.databrick...

0 kudos

Saturday

by de-hru • New Contributor III

05-22-2023 12:48:02 AM

860 Views
2 replies
1 kudos

Address Validation, Correction and Enrichment with Databricks Spark Engine

Hi all!In our project, we're thinking about "Validation, Correction and Enrichment of Postal Addresses" with Databricks. For sure we'd need some kind of batch processing, because we have millions of addresses in our system.I'm aware of Address Valida...

Data Engineering

860 Views
2 replies
1 kudos

05-22-2023 12:48:02 AM

View Replies

Latest Reply

Sam99
New Contributor

Saturday

1 kudos

Happy to help. Feel free to reach out https://www.linkedin.com/in/saleh-sultan-143ab036?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app

1 kudos

Saturday

1 More Replies

by Phani1 • Valued Contributor

Friday

103 Views
1 replies
0 kudos

udf in databricks

Hi Team,Is there a particular reason why we should avoid using UDF and instead convert to DataFrame code?Are there any restrictions or limitations (in terms of performance or governance) when using UDFs in Databricks? Regards,Janga

Data Engineering

udf

103 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Walter_C
Valued Contributor II

Saturday

0 kudos

Hello some of the things you need to take in consideration is that:UDFs might introduce significant processing bottlenecks into code execution. Databricks uses a number of different optimizers automatically for code written with included Apache Spark...

0 kudos

Saturday

by ande • New Contributor

Friday

100 Views
1 replies
0 kudos

IP address for accessing external SFTP server

I am trying to pull in data to my Databricks workspace via an external SFTP server. I am using Azure for my compute. To access the SFTP server they need to whitelist my IP address. My IP address in Azure Databricks seems to be constantly changing fro...

Data Engineering

100 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Walter_C
Valued Contributor II

Saturday

0 kudos

Azure Databricks, like many cloud services, does not provide static IP addresses for outbound connections. This is because the compute resources are dynamically allocated and can change over time. One potential workaround could be to use a Virtual N...

0 kudos

Saturday

by User15787040559 • New Contributor III

06-22-2021 3:39:55 PM

17900 Views
2 replies
5 kudos

What's the difference between a Global view and a Temp view?

The difference between Global and Temp is how the lifetime of the view is tied to the application:http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.createOrReplaceTempView.html?highlight=createorreplacetempview#pyspar...

Data Engineering

17900 Views
2 replies
5 kudos

06-22-2021 3:39:55 PM

View Replies

Latest Reply

ScottSmithDB
Contributor III

Friday

5 kudos

Correct A Temp View is scoped to the SparkSession and dropped when that session closes. Each notebook runs in its own SparkSession. The Global Temp View is scoped to the cluster and dropped when the cluster re-starts or you drop it. ---------------...

5 kudos

Friday

1 More Replies

by sp1 • New Contributor II

01-15-2023 10:02:07 PM

8465 Views
7 replies
4 kudos

Resolved! Pass date value as parameter in Databricks SQL notebook

I want to pass yesterday date (In the example 20230115*.csv) in the csv file. Don't know how to create parameter and use it here.CREATE OR REPLACE TEMPORARY VIEW abc_delivery_logUSING CSVOPTIONS ( header="true", delimiter=",", inferSchema="true", pat...

Data Engineering

8465 Views
7 replies
4 kudos

01-15-2023 10:02:07 PM

View Replies

Latest Reply

Asifpanjwani
New Contributor

Friday

4 kudos

@Kaniz @sp1 @Chaitanya_Raju @daniel_sahal Hi Everyone,I need the same scenario on SQL code, because my DBR cluster not allowed me to run python codeError: Unsupported cell during execution. SQL warehouses only support executing SQL cells.I appreciate...

4 kudos

Friday

6 More Replies

by Mathias_Peters • New Contributor II

Friday

82 Views
0 replies
0 kudos

On the fly transformations on DLT tables

Hi, I am loading data from a kinesis data stream using DLT. CREATE STREAMING TABLE Consumers_kinesis_2 ( ..., unbase64(data) String, ... ) AS SELECT * FROM STREAM read_kinesis (...) Is it possible to directly cast, unbase64, and/or transform the resu...

Data Engineering

82 Views
0 replies
0 kudos

Friday

by Paul92S • New Contributor III

02-20-2024 9:29:32 AM

936 Views
2 replies
1 kudos

Resolved! DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Hi,I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_...

Data Engineering

936 Views
2 replies
1 kudos

02-20-2024 9:29:32 AM

View Replies

Latest Reply

Palash01
Contributor III

02-20-2024 5:49:50 PM

1 kudos

Hey @Paul92S Looking at the error message it looks like column "metric_name" is the culprit here:Understanding the Error:Character Limit Violation: The error indicates that values in the metric_name column are exceeding the maximum length allowed fo...

1 kudos

02-20-2024 5:49:50 PM

1 More Replies

by erigaud • Honored Contributor

Friday

61 Views
0 replies
0 kudos

Pass Dataframe to child job in "Run Job" task

Hello,I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a...

Data Engineering

61 Views
0 replies
0 kudos

Friday

by Hubert-Dudek • Esteemed Contributor III

03-14-2023 6:39:22 AM

5122 Views
10 replies
6 kudos

Databricks now supports event-driven workloads, especially for loading cloud files from external locations. This means you can save costs and resource...

Databricks now supports event-driven workloads, especially for loading cloud files from external locations. This means you can save costs and resources by triggering your Databricks jobs only when new files arrive in your cloud storage instead of mou...

Data Engineering

5122 Views
10 replies
6 kudos

03-14-2023 6:39:22 AM

View Replies

Latest Reply

adriennn
Contributor

Friday

6 kudos

@daniel_sahal I get your point, but if for a scheduled trigger you can get all kind of attributes on the trigger time (arguably, this is available for all the triggers), then why wouldn't the most important attribute of a file event not be available ...

6 kudos

Friday

9 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Resolved! SparkContext lost when running %sh script.py

Machine Type for VACUUM operation

How to migrate Git repos with DLT configurations

Masking rules with Delta Sharing

CICD for Work Flows

Databricks cli workflow

Address Validation, Correction and Enrichment with Databricks Spark Engine

udf in databricks

IP address for accessing external SFTP server

What's the difference between a Global view and a Temp view?

Resolved! Pass date value as parameter in Databricks SQL notebook

On the fly transformations on DLT tables

Resolved! DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Pass Dataframe to child job in "Run Job" task

Databricks now supports event-driven workloads, especially for loading cloud files from external locations. This means you can save costs and resource...

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...