Data Engineering

Forum Posts

Sorted by:

by vinayaka_pallak • Visitor

yesterday

10 Views
0 replies
0 kudos

Pytest on Notebook

I am currently exploring testing methodologies for Databricks notebooks and would like to inquire whether it's possible to write pytest tests for notebooks that contain code not encapsulated within functions or classes.***********************a = 4b ...

Data Engineering

10 Views
0 replies
0 kudos

yesterday

by Phani1 • Valued Contributor

Tuesday

141 Views
4 replies
0 kudos

Parallel execution of SQL cell in Databricks Notebooks

Hi Team,Please provide guidance on enabling SQL cells parallel execution in a notebook containing multiple SQL cells. Currently, when we execute notebook and all the SQL cells they run sequentially. I would appreciate assistance on how to execute th...

Data Engineering

delta

141 Views
4 replies
0 kudos

Tuesday

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

yesterday

0 kudos

Hi @Phani1 Yes you can achieve this scenario with the help of Databricks Workflow jobs where you can create task and dependencies for each other.

0 kudos

yesterday

3 More Replies

by Ameshj • Visitor

yesterday

98 Views
5 replies
0 kudos

Dbfs init script migration

I need help with migrating from dbfs on databricks to workspace. I am new to databricks and am struggling with what is on the links provided.My workspace.yml also has dbfs hard-coded. Included is a full deployment with great expectations.This was don...

Data Engineering

Azure Databricks

dbfs

Great expectations

python

98 Views
5 replies
0 kudos

yesterday

View Replies

Latest Reply

NandiniN
Valued Contributor II

yesterday

0 kudos

There's also this KB specific to init script migration - https://kb.databricks.com/clusters/migration-guidance-for-init-scripts-on-dbfs

0 kudos

yesterday

4 More Replies

by subha2 • New Contributor

Saturday

108 Views
2 replies
0 kudos

metadata driven DQ validation for multiple tables dynamically

There are multiple tables in the config/metadata table. These tables need to bevalidated for DQ rules.1.Natural Key / Business Key /Primary Key cannot be null orblank.2.Natural Key/Primary Key cannot be duplicate.3.Join columns missing values4.Busine...

Data Engineering

108 Views
2 replies
0 kudos

Saturday

View Replies

Latest Reply

Kaniz
Community Manager

Monday

0 kudos

Hi @subha2, To dynamically validate the data quality (DQ) rules for tables configured in a metadata-driven system using PySpark, you can follow these steps: Define Metadata for Tables: First, create a metadata configuration that describes the rules ...

0 kudos

Monday

1 More Replies

by rt-slowth • Contributor

01-15-2024 12:07:53 AM

602 Views
6 replies
0 kudos

why the userIdentity is anonymous?

Do you know why the userIdentity is anonymous in AWS Cloudtail's logs even though I have specified an instance profile?

Data Engineering

602 Views
6 replies
0 kudos

01-15-2024 12:07:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-18-2024 1:27:52 AM

0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

0 kudos

01-18-2024 1:27:52 AM

5 More Replies

by rt-slowth • Contributor

01-10-2024 6:33:50 PM

785 Views
4 replies
2 kudos

AutoLoader File notification mode Configuration with AWS

from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/XXXXX/' S3_SCHEMA = 's3://datalake-lab/XXXXX/schemas/' ...

Data Engineering

785 Views
4 replies
2 kudos

01-10-2024 6:33:50 PM

View Replies

Latest Reply

djhs
New Contributor III

Tuesday

2 kudos

Was this resolved? I run into the same issue

2 kudos

Tuesday

3 More Replies

by jaredrohe • New Contributor II

10-26-2023 7:35:09 PM

1392 Views
4 replies
1 kudos

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

Hello,I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here...

Data Engineering

Access Mode

Delta Live Tables

Instance Profiles

No Isolation Shared

1392 Views
4 replies
1 kudos

10-26-2023 7:35:09 PM

View Replies

Latest Reply

djhs
New Contributor III

Tuesday

1 kudos

Hi, I'm running into the same issue. Was this solved?

1 kudos

Tuesday

3 More Replies

by Manzilla • Visitor

yesterday

36 Views
0 replies
0 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

Data Engineering

36 Views
0 replies
0 kudos

yesterday

by qwerty1 • Contributor

03-23-2023 5:46:15 AM

2396 Views
5 replies
14 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

Data Engineering

2396 Views
5 replies
14 kudos

03-23-2023 5:46:15 AM

View Replies

Latest Reply

source2sea
Contributor

02-10-2024 11:10:14 AM

14 kudos

I see db runtime 14 is out, but still 2.12, when would databricks plan to support 2.13 or 3 thank you

14 kudos

02-10-2024 11:10:14 AM

4 More Replies

by bamhn • New Contributor II

01-16-2023 6:52:07 PM

2335 Views
3 replies
2 kudos

My cluster can't access any tables in data catalogs

My goal is to have table access control in the data science and engineering workspace. So I enabled access control to my cluster using this config "spark.databricks.acl.dfAclsEnabled": "true" and my cluster is shown as Table ACLs enabled now (shield ...

Data Engineering

2335 Views
3 replies
2 kudos

01-16-2023 6:52:07 PM

View Replies

Latest Reply

Karthik_Venu
New Contributor II

yesterday

2 kudos

Here is my use case: https://community.databricks.com/t5/data-engineering/structured-streaming-using-delta-as-source-and-delta-as-sink-and/td-p/67825And I get this error: "py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.Datase...

2 kudos

yesterday

2 More Replies

by Karthik_Venu • New Contributor II

yesterday

56 Views
1 replies
0 kudos

Structured Streaming using Delta as Source and Delta as Sink and Delta tables are under unity catalo

Hello Everyone,Here is my use case.1. My source table (bronze delta table) is under unity catalog and is a transaction (Insert/Update) table.2. My target table (silver delta table) is also under unity catalog.3. On daily basis I need to ingest the in...

Data Engineering

56 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Karthik_Venu
New Contributor II

yesterday

0 kudos

I came across this article : readStream() is not whitelisted error when running a query - Databricksit states the solution as " You should use a cluster that does not have table access control enabled for streaming queries."However, the source and ta...

0 kudos

yesterday

by daz • New Contributor III

07-26-2022 4:30:02 PM

3666 Views
9 replies
3 kudos

DLT managed by non-existent pipeline

I am building out a new DLT pipeline and have since had to rebuild it from scratch. Having deleted the old pipeline and constructed a new one I now get this error:Table 'X' is already managed by pipeline 'Y'. As I only have the one pipeline how would...

Data Engineering

3666 Views
9 replies
3 kudos

07-26-2022 4:30:02 PM

View Replies

Latest Reply

Shinaider777
Visitor

yesterday

3 kudos

rename your function from @Dlt.table, for exemple:@Dlt.table( comment="exemple", table_properties={"exemple": "exemple"}, partition_cols=["a", "b", "c"])def modify_this_name():

3 kudos

yesterday

8 More Replies

by Kayla • Contributor

2 weeks ago

232 Views
5 replies
0 kudos

Errors When Using R on Unity Catalog Clusters

We are running into errors when running workflows with multiple jobs using the same notebook/different parameters. They are reading from tables we still have in hive_metastore, there's no Unity Catalog tables or functionality referenced anywhere. We'...

Data Engineering

232 Views
5 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

mariusatkinson
Visitor

yesterday

0 kudos

Ah, I suspected that it might have something to do with fine grained access control and an incompatability with R and UC when it's configured like in that way. Obvisouly if you don't, it's not that.

0 kudos

yesterday

4 More Replies

by MrD • Visitor

yesterday

31 Views
0 replies
0 kudos

Issue with autoscalling the cluster

Hi All, My job is breaking as the cluster is not able to autoscale. below is the log,can it be due to AWS vms are not spinning up or can be due to issue databricks configuration.Does anyone has faced it before ?TERMINATING Compute terminated. Reason:...

Data Engineering

31 Views
0 replies
0 kudos

yesterday

by Shazam • Visitor

yesterday

35 Views
0 replies
0 kudos

Ingestion time clustering -Initial load

As per info available ingestion time clustering makes use of time of the time a file is written or ingested in databricks. In a use case where there is new delta table and an etl which runs in timely fashion(say daily) inserting records, am able to ...

Data Engineering

35 Views
0 replies
0 kudos

yesterday

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Pytest on Notebook

Parallel execution of SQL cell in Databricks Notebooks

Dbfs init script migration

metadata driven DQ validation for multiple tables dynamically

why the userIdentity is anonymous?

AutoLoader File notification mode Configuration with AWS

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

Delta Live table - Adding streaming to existing table

Resolved! When will databricks runtime be released for Scala 2.13?

My cluster can't access any tables in data catalogs

Structured Streaming using Delta as Source and Delta as Sink and Delta tables are under unity catalo

DLT managed by non-existent pipeline

Errors When Using R on Unity Catalog Clusters

Issue with autoscalling the cluster

Ingestion time clustering -Initial load

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...