Data Engineering

Forum Posts

Sorted by:

by Sagas • Visitor

4 hours ago

36 Views
2 replies
1 kudos

SparkR or sparklyr not showing history

Hi,for some reason Azure Databricks doesn't show History if the data is saved with SparkR (2 in the figure below) or Sparklyr (3), but it does show it with Data Ingestion (0) or with PySpark (1). Is this a known bug or am I doing something wrong? Is ...

Data Engineering

sparklyr

SparkR

36 Views
2 replies
1 kudos

4 hours ago

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

1 kudos

Hi @Sagas, Let’s address your questions regarding Azure Databricks, SparkR, and Sparklyr. History in Azure Databricks: Each operation that modifies a Delta Lake table creates a new table version. You can use history information to audit operation...

1 kudos

3 hours ago

1 More Replies

by patrickw • New Contributor

Friday

125 Views
2 replies
0 kudos

connect timed out error - Connecting to SQL Server from Databricks

I am getting a connect timed out error when attempting to access a sql server. I can successfully ping the server from Databricks. I have used the jdbc connection and the sqlserver included driver and both result in the same error. I have also attemp...

Data Engineering

125 Views
2 replies
0 kudos

Friday

View Replies

Latest Reply

Walter_C
Valued Contributor II

Saturday

0 kudos

Can you run the following command in a notebook using the same cluster you are using to connect:%sh nc -vz <hostname> <port> This test will confirm us if we are able to communicate with the SQL server by using the port you are defining to connect. If...

0 kudos

Saturday

1 More Replies

by DLL • New Contributor

an hour ago

15 Views
0 replies
0 kudos

Some columns are being dropped when moving to pandas data set.

Some columns are being dropped when moving to pandas data set. I see part of the dataset, but it does not show when displaying..

Data Engineering

15 Views
0 replies
0 kudos

an hour ago

by subha2 • New Contributor

Saturday

80 Views
1 replies
0 kudos

metadata driven DQ validation for multiple tables dynamically

There are multiple tables in the config/metadata table. These tables need to bevalidated for DQ rules.1.Natural Key / Business Key /Primary Key cannot be null orblank.2.Natural Key/Primary Key cannot be duplicate.3.Join columns missing values4.Busine...

Data Engineering

80 Views
1 replies
0 kudos

Saturday

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @subha2, To dynamically validate the data quality (DQ) rules for tables configured in a metadata-driven system using PySpark, you can follow these steps: Define Metadata for Tables: First, create a metadata configuration that describes the rules ...

0 kudos

3 hours ago

by Phani1 • Valued Contributor

6 hours ago

45 Views
1 replies
1 kudos

Job cluster configuration for 24/7

Hi Team,We intend to activate the job cluster around the clock. We consider the following parameters regarding cost: - Data volumes - Client SLA for job completion- Starting with a small cluster configuration Please advise on any other options we s...

Data Engineering

45 Views
1 replies
1 kudos

6 hours ago

View Replies

Latest Reply

Kaniz
Community Manager

4 hours ago

1 kudos

Hi @Phani1, When configuring a job cluster for 24/7 operation, it’s essential to consider cost, performance, and scalability. Here are some recommendations based on your specified parameters: Data Volumes: Analyze your data volumes carefully. If...

1 kudos

4 hours ago

by Sikki • New Contributor

Friday

153 Views
5 replies
0 kudos

Databricks Asset Bundle Workflow Redeployment Issue

Hello All,In my Databricks workflows, I have three tasks configured, with the final task set to run only if the condition "ALL_DONE" is met. During the first deployment, I observed that the dependency "ALL_DONE" was correctly assigned to the last tas...

Data Engineering

153 Views
5 replies
0 kudos

Friday

View Replies

Latest Reply

Yeshwanth
Valued Contributor

yesterday

0 kudos

Hi @Sikki Good day! There was an issue and it was fixed recently. Could you please confirm if you are still facing the issue? Best regards,

0 kudos

yesterday

4 More Replies

by madrhr • New Contributor

Wednesday

214 Views
3 replies
1 kudos

SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering

%sh

.py

bash shell

SparkContext

SparkShell

214 Views
3 replies
1 kudos

Wednesday

View Replies

Latest Reply

madrhr
New Contributor

4 hours ago

1 kudos

I got it eventually working with a combination of:from databricks.sdk.runtime import *spark.sparkContext.addPyFile("/path/to/your/file")sys.path.append("path/to/your")

1 kudos

4 hours ago

2 More Replies

by NOOR_BASHASHAIK • Contributor

03-12-2024 11:33:31 AM

404 Views
3 replies
0 kudos

Machine Type for VACUUM operation

Dear allI have a workflow with 2 tasks : one that does OPTIMIZE, followed by one that does VACUUM. I used a cluster with F32s driver and F64s - 8 workers (auto-scaling enabled). All 8 workers are launched by Databricks as soon as OPTIMIZE starts. As ...

Data Engineering

best practice

F series

optimize

vacuum

404 Views
3 replies
0 kudos

03-12-2024 11:33:31 AM

View Replies

Latest Reply

ArturOA
Visitor

6 hours ago

0 kudos

Hi,were you able to get any useful help on this?

0 kudos

6 hours ago

2 More Replies

by PrebenOlsen • New Contributor III

a week ago

109 Views
2 replies
0 kudos

How to migrate Git repos with DLT configurations

Hi!I want to migrate all my databricks related code from one github repo to another. I knew this wouldn't be straight forward. When I copy my code for one DLT, I get the errororg.apache.spark.sql.catalyst.ExtendedAnalysisException: Table 'vessel_batt...

Data Engineering

109 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

PrebenOlsen
New Contributor III

6 hours ago

0 kudos

Does cloning take considerably less time then recreating the tables?Can I resume append operations to a cloned table?

0 kudos

6 hours ago

1 More Replies

by Anshul_DBX • Visitor

yesterday

54 Views
1 replies
1 kudos

Masking rules with Delta Sharing

Hi,We tried Delta sharing to PBI which worked fine, But facing issues while trying to apply row, column level filtering or data masking. It fails with error that its not supported.Can anyone please confirm, if delta sharing with masking rules works w...

Data Engineering

54 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

Yeshwanth
Valued Contributor

yesterday

1 kudos

Hi @Anshul_DBX good day! The issue you are encountering is due to a limitation in Delta Sharing. As per the provided information, Delta Sharing does not support row-level security or column masks. This means that you cannot apply row and column level...

1 kudos

yesterday

by SreeG • New Contributor

yesterday

47 Views
1 replies
0 kudos

CICD for Work Flows

HiI am facing issues when deploying work flows to different environment. The same works for Notebooks and Scripts, when deploying the work flows, it failed with "Authorization Failed. Your token may be expired or lack the valid scope". Anything shoul...

Data Engineering

CICD

47 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Yeshwanth
Valued Contributor

yesterday

0 kudos

Hi Sree, Good day! Looking at the error message it seems like the token is expired. Could you please check if your PAT Token is valid? Have you created the PAT Token for the workspace that you are integrating with? Regards, Yesh

0 kudos

yesterday

by Mailendiran • New Contributor II

Saturday

84 Views
1 replies
0 kudos

Unity Catalog - Storage Account Data Access

I was exploring on unity catalog option on Databricks premium workspace.I understood that i need to create storage account credentials and external connection in workspace.Later, i can access the cloud data using 'abfss://storage_account_details' .I ...

Data Engineering

84 Views
1 replies
0 kudos

Saturday

View Replies

Latest Reply

Palash01
Contributor III

yesterday

0 kudos

Hey @Mailendiran In Databricks, mounting storage to DBFS (Databricks File System) using the `abfss` protocol is a common practice for accessing data stored externally in Azure Blob Storage. While you typically use the full `abfss` path to access data...

0 kudos

yesterday

by Yohannes • New Contributor

Saturday

79 Views
1 replies
0 kudos

Databricks cli workflow

Is there a way that I can set up and configure a Databricks workflow job and tasks from Databricks cli or api tools by using python? Any help would be appreciated. #databricksworkflow #databricks

Data Engineering

79 Views
1 replies
0 kudos

Saturday

View Replies

Latest Reply

steyler-db
New Contributor III

Saturday

0 kudos

Hello and yes, you can set up and configure a Databricks workflow job and tasks using Databricks CLI or API tools with Python. Here are some resources and steps to guide you: Create and run Databricks Jobs: This document: ( https://docs.databrick...

0 kudos

Saturday

by de-hru • New Contributor III

05-22-2023 12:48:02 AM

832 Views
2 replies
1 kudos

Address Validation, Correction and Enrichment with Databricks Spark Engine

Hi all!In our project, we're thinking about "Validation, Correction and Enrichment of Postal Addresses" with Databricks. For sure we'd need some kind of batch processing, because we have millions of addresses in our system.I'm aware of Address Valida...

Data Engineering

832 Views
2 replies
1 kudos

05-22-2023 12:48:02 AM

View Replies

Latest Reply

Sam99
New Contributor

Saturday

1 kudos

Happy to help. Feel free to reach out https://www.linkedin.com/in/saleh-sultan-143ab036?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app

1 kudos

Saturday

1 More Replies

by Phani1 • Valued Contributor

Friday

78 Views
1 replies
0 kudos

udf in databricks

Hi Team,Is there a particular reason why we should avoid using UDF and instead convert to DataFrame code?Are there any restrictions or limitations (in terms of performance or governance) when using UDFs in Databricks? Regards,Janga

Data Engineering

udf

78 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Walter_C
Valued Contributor II

Saturday

0 kudos

Hello some of the things you need to take in consideration is that:UDFs might introduce significant processing bottlenecks into code execution. Databricks uses a number of different optimizers automatically for code written with included Apache Spark...

0 kudos

Saturday

User

Count

1601

736

343

284

247

Databricks

Forum Posts

SparkR or sparklyr not showing history

connect timed out error - Connecting to SQL Server from Databricks

Some columns are being dropped when moving to pandas data set.

metadata driven DQ validation for multiple tables dynamically

Job cluster configuration for 24/7

Databricks Asset Bundle Workflow Redeployment Issue

SparkContext lost when running %sh script.py

Machine Type for VACUUM operation

How to migrate Git repos with DLT configurations

Masking rules with Delta Sharing

CICD for Work Flows

Unity Catalog - Storage Account Data Access

Databricks cli workflow

Address Validation, Correction and Enrichment with Databricks Spark Engine

udf in databricks

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...