Data Engineering

Forum Posts

Sorted by:

by 6502 • New Contributor III

05-02-2024 7:53:34 AM

857 Views
1 replies
0 kudos

Delete on streaming table and starting startingVersion

I deleted for mistake some records from a streaming table, and of course, the streaming job stopped working. So I restored the table at the version before the delete was done, and attempted to restart the job using the startingVersion to the new vers...

Data Engineering

857 Views
1 replies
0 kudos

05-02-2024 7:53:34 AM

View Replies

Latest Reply

raphaelblg
Honored Contributor II

05-02-2024 3:34:59 PM

0 kudos

Hello @6502, It appears you've used the `startingVersion` parameter in your streaming query, which causes the stream to begin processing data from the version prior to the DELETE operation version. However, the DELETE operation will still be processe...

0 kudos

05-02-2024 3:34:59 PM

by nilton • New Contributor II

05-02-2024 12:34:42 PM

741 Views
2 replies
0 kudos

Query table based on table_name from information_schema

Hi,I have one table that changes the name every 60 days. The name simple increases the number version, for example:* Firtst 60 days: table_name_v1. After 60 days: table_name_v2 and so on.What i want is to query the table wich name returned in the que...

Data Engineering

741 Views
2 replies
0 kudos

05-02-2024 12:34:42 PM

View Replies

Latest Reply

radothede
New Contributor III

05-02-2024 1:31:45 PM

0 kudos

The simpliest way would be propably using spark.sql%py tbl_name = 'table_v1' df = spark.sql(f'select * from {tbl_name}') display(df) From there, You can simply create temporary view:%py df.createOrReplaceTempView('table_act')and query it using SQL st...

0 kudos

05-02-2024 1:31:45 PM

1 More Replies

by rt-slowth • Contributor

01-10-2024 6:33:50 PM

1942 Views
5 replies
2 kudos

AutoLoader File notification mode Configuration with AWS

from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/XXXXX/' S3_SCHEMA = 's3://datalake-lab/XXXXX/schemas/' ...

Data Engineering

1942 Views
5 replies
2 kudos

01-10-2024 6:33:50 PM

View Replies

Latest Reply

djhs
New Contributor III

04-30-2024 6:53:35 AM

2 kudos

Was this resolved? I run into the same issue

2 kudos

04-30-2024 6:53:35 AM

4 More Replies

by 185369 • New Contributor II

05-04-2023 1:58:11 AM

1519 Views
4 replies
1 kudos

Resolved! DLT with UC Access Denied sqs

I am going to use the newly released DLT with UC.But it keeps getting access denied. As I keep tracking the reasons, it seems that an account. ID other than my account ID or Databricks account ID is being requested.I cannot use '*' in principal attri...

Data Engineering

1519 Views
4 replies
1 kudos

05-04-2023 1:58:11 AM

View Replies

Latest Reply

Priyag1
Honored Contributor II

05-04-2023 2:27:58 AM

1 kudos

Every service on AWS, an SQS queue, and all the other services in your stack using that queue will be configured with minimal permissions, leading to access issues. So, make sure you get your IAM policies set up correctly before deploying to producti...

1 kudos

05-04-2023 2:27:58 AM

3 More Replies

by brianbraunstein • New Contributor II

05-02-2024 7:51:39 AM

515 Views
1 replies
0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

Data Engineering

515 Views
1 replies
0 kudos

05-02-2024 7:51:39 AM

View Replies

Latest Reply

brianbraunstein
New Contributor II

05-02-2024 9:22:53 AM

0 kudos

Ok, it looks like Databricks might have broken this functionality shortly after it came out: https://community.databricks.com/t5/data-engineering/parameterized-spark-sql-not-working/m-p/57969/highlight/true#M30972

0 kudos

05-02-2024 9:22:53 AM

by QuantumFries • New Contributor II

04-29-2024 7:09:23 PM

1251 Views
4 replies
3 kudos

Change {{job.start_time.[iso_date]}} Timezone

I am trying to schedule some jobs using workflows and leveraging dynamic variables. One caveat is that when I try to use {{job.start_time.[iso_date]}} it seems to be defaulted to UTC, is there a way to change it?

Data Engineering

1251 Views
4 replies
3 kudos

04-29-2024 7:09:23 PM

View Replies

Latest Reply

artsheiko
Valued Contributor III

04-30-2024 7:19:26 AM

3 kudos

Hi, all the dynamic values are in UTC (documentation). Maybe you can use the code like the one presented below + pass the variables between tasks (see Share information between tasks in a Databricks job) ? %python from datetime import datetime, timed...

3 kudos

04-30-2024 7:19:26 AM

3 More Replies

by Abbe • New Contributor II

12-20-2022 7:04:50 AM

1554 Views
2 replies
0 kudos

Update data type of a column within a table that has a GENERATED ALWAYS AS IDENTITY-column

I want to cast the data type of a column "X" in a table "A" where column "ID" is defined as GENERATED ALWAYS AS IDENTITY. Databricks refer to overwrite to achieve this: https://docs.databricks.com/delta/update-schema.htmlThe following operation:(spar...

Data Engineering

1554 Views
2 replies
0 kudos

12-20-2022 7:04:50 AM

View Replies

Latest Reply

RajuBolla
New Contributor II

05-02-2024 6:32:16 AM

0 kudos

Update is not working but delete is when i changed to DEFAULT property AnalysisException: UPDATE on IDENTITY column "XXXX_ID" is not supported.

0 kudos

05-02-2024 6:32:16 AM

1 More Replies

by Phani1 • Valued Contributor

04-30-2024 4:52:26 AM

1241 Views
4 replies
0 kudos

Parallel execution of SQL cell in Databricks Notebooks

Hi Team,Please provide guidance on enabling SQL cells parallel execution in a notebook containing multiple SQL cells. Currently, when we execute notebook and all the SQL cells they run sequentially. I would appreciate assistance on how to execute th...

Data Engineering

delta

1241 Views
4 replies
0 kudos

04-30-2024 4:52:26 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

05-01-2024 10:29:16 PM

0 kudos

Hi @Phani1 Yes you can achieve this scenario with the help of Databricks Workflow jobs where you can create task and dependencies for each other.

0 kudos

05-01-2024 10:29:16 PM

3 More Replies

by subha2 • New Contributor II

04-27-2024 12:47:17 AM

621 Views
2 replies
0 kudos

metadata driven DQ validation for multiple tables dynamically

There are multiple tables in the config/metadata table. These tables need to bevalidated for DQ rules.1.Natural Key / Business Key /Primary Key cannot be null orblank.2.Natural Key/Primary Key cannot be duplicate.3.Join columns missing values4.Busine...

Data Engineering

621 Views
2 replies
0 kudos

04-27-2024 12:47:17 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

04-29-2024 4:11:31 AM

0 kudos

Hi @subha2, To dynamically validate the data quality (DQ) rules for tables configured in a metadata-driven system using PySpark, you can follow these steps: Define Metadata for Tables: First, create a metadata configuration that describes the rules ...

0 kudos

04-29-2024 4:11:31 AM

1 More Replies

by rt-slowth • Contributor

01-15-2024 12:07:53 AM

1279 Views
6 replies
0 kudos

why the userIdentity is anonymous?

Do you know why the userIdentity is anonymous in AWS Cloudtail's logs even though I have specified an instance profile?

Data Engineering

1279 Views
6 replies
0 kudos

01-15-2024 12:07:53 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 1:27:52 AM

0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

0 kudos

01-18-2024 1:27:52 AM

5 More Replies

by qwerty1 • Contributor

03-23-2023 5:46:15 AM

3322 Views
5 replies
14 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

Data Engineering

3322 Views
5 replies
14 kudos

03-23-2023 5:46:15 AM

View Replies

Latest Reply

source2sea
Contributor

02-10-2024 11:10:14 AM

14 kudos

I see db runtime 14 is out, but still 2.12, when would databricks plan to support 2.13 or 3 thank you

14 kudos

02-10-2024 11:10:14 AM

4 More Replies

by bamhn • New Contributor II

01-16-2023 6:52:07 PM

3556 Views
3 replies
2 kudos

My cluster can't access any tables in data catalogs

My goal is to have table access control in the data science and engineering workspace. So I enabled access control to my cluster using this config "spark.databricks.acl.dfAclsEnabled": "true" and my cluster is shown as Table ACLs enabled now (shield ...

Data Engineering

3556 Views
3 replies
2 kudos

01-16-2023 6:52:07 PM

View Replies

Latest Reply

Karthik_Venu
New Contributor II

05-01-2024 1:11:05 PM

2 kudos

Here is my use case: https://community.databricks.com/t5/data-engineering/structured-streaming-using-delta-as-source-and-delta-as-sink-and/td-p/67825And I get this error: "py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.Datase...

2 kudos

05-01-2024 1:11:05 PM

2 More Replies

by Karthik_Venu • New Contributor II

05-01-2024 8:49:12 AM

354 Views
1 replies
0 kudos

Structured Streaming using Delta as Source and Delta as Sink and Delta tables are under unity catalo

Hello Everyone,Here is my use case.1. My source table (bronze delta table) is under unity catalog and is a transaction (Insert/Update) table.2. My target table (silver delta table) is also under unity catalog.3. On daily basis I need to ingest the in...

Data Engineering

354 Views
1 replies
0 kudos

05-01-2024 8:49:12 AM

View Replies

Latest Reply

Karthik_Venu
New Contributor II

05-01-2024 1:05:41 PM

0 kudos

I came across this article : readStream() is not whitelisted error when running a query - Databricksit states the solution as " You should use a cluster that does not have table access control enabled for streaming queries."However, the source and ta...

0 kudos

05-01-2024 1:05:41 PM

by daz • New Contributor III

07-26-2022 4:30:02 PM

4937 Views
9 replies
3 kudos

DLT managed by non-existent pipeline

I am building out a new DLT pipeline and have since had to rebuild it from scratch. Having deleted the old pipeline and constructed a new one I now get this error:Table 'X' is already managed by pipeline 'Y'. As I only have the one pipeline how would...

Data Engineering

4937 Views
9 replies
3 kudos

07-26-2022 4:30:02 PM

View Replies

Latest Reply

Shinaider777
New Contributor II

05-01-2024 10:34:38 AM

3 kudos

rename your function from @Dlt.table, for exemple:@Dlt.table( comment="exemple", table_properties={"exemple": "exemple"}, partition_cols=["a", "b", "c"])def modify_this_name():

3 kudos

05-01-2024 10:34:38 AM

8 More Replies

by Kayla • Contributor III

04-16-2024 11:12:09 AM

1430 Views
5 replies
0 kudos

Errors When Using R on Unity Catalog Clusters

We are running into errors when running workflows with multiple jobs using the same notebook/different parameters. They are reading from tables we still have in hive_metastore, there's no Unity Catalog tables or functionality referenced anywhere. We'...

Data Engineering

1430 Views
5 replies
0 kudos

04-16-2024 11:12:09 AM

View Replies

Latest Reply

mariusatkinson
New Contributor II

05-01-2024 9:12:11 AM

0 kudos

Ah, I suspected that it might have something to do with fine grained access control and an incompatability with R and UC when it's configured like in that way. Obvisouly if you don't, it's not that.

0 kudos

05-01-2024 9:12:11 AM

4 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

Delete on streaming table and starting startingVersion

Query table based on table_name from information_schema

AutoLoader File notification mode Configuration with AWS

Resolved! DLT with UC Access Denied sqs

spark.sql not supporting kwargs as documented

Change {{job.start_time.[iso_date]}} Timezone

Update data type of a column within a table that has a GENERATED ALWAYS AS IDENTITY-column

Parallel execution of SQL cell in Databricks Notebooks

metadata driven DQ validation for multiple tables dynamically

why the userIdentity is anonymous?

Resolved! When will databricks runtime be released for Scala 2.13?

My cluster can't access any tables in data catalogs

Structured Streaming using Delta as Source and Delta as Sink and Delta tables are under unity catalo

DLT managed by non-existent pipeline

Errors When Using R on Unity Catalog Clusters

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error