Data Engineering

Forum Posts

Sorted by:

by amitkmaurya • Visitor

14 hours ago

41 Views
1 replies
0 kudos

How to increase executor memory in Databricks jobs

May be I am new to Databricks that's why I have confusion.Suppose I have worker memory of 64gb in Databricks job max 12 nodes...and my job is failing due to Executor Lost due to 137 (OOM if found on internet).So, to fix this I need to increase execut...

Data Engineering

41 Views
1 replies
0 kudos

14 hours ago

View Replies

Latest Reply

raphaelblg
New Contributor III

3m ago

0 kudos

Hello @amitkmaurya , Increasing compute resources may not always be the best strategy. To gain more insights into each executor's memory usage, check the cluster metrics tab for your cluster. If one executor has a much higher memory usage than the ot...

0 kudos

3m ago

by Red1 • New Contributor III

12-30-2023 5:50:09 PM

847 Views
6 replies
1 kudos

Autoingest not working with Unity Catalog in DLT pipeline

Hey Everyone,I've built a very simple pipeline with a single DLT using auto ingest, and it works, provided I don't specify the output location. When I build the same pipeline but set UC as the output location, it fails when setting up S3 notification...

Data Engineering

847 Views
6 replies
1 kudos

12-30-2023 5:50:09 PM

View Replies

Latest Reply

Red1
New Contributor III

26m ago

1 kudos

Hey @Babu_Krishnan I was! I had to reach out to my Databricks support engineer directly and the resolution was to add "cloudfiles.awsAccessKey" and "cloudfiles.awsSecretKey" to the params as in the screenshot below (apologies, i don't know why the sc...

1 kudos

26m ago

5 More Replies

by Devsql • Visitor

11 hours ago

48 Views
1 replies
0 kudos

How to find that given Parquet file got imported into Bronze Layer ?

Hi Team,Recently we had created new Databricks project/solution (based on Medallion architecture) having Bronze-Silver-Gold Layer based tables. So we have created Delta-Live-Table based pipeline for Bronze-Layer implementation. Source files are Parqu...

Data Engineering

Azure Databricks

Bronze Job

Delta Live Table

Delta Live Table Pipeline

48 Views
1 replies
0 kudos

11 hours ago

View Replies

Latest Reply

raphaelblg
New Contributor III

29m ago

0 kudos

Hello @Devsql , It appears that you are creating DLT bronze tables using a standard spark.read operation. This may explain why the DLT table doesn't include "new files" during a REFRESH operation. For incremental ingestion of bronze layer data into y...

0 kudos

29m ago

by 6502 • New Contributor III

9 hours ago

42 Views
1 replies
0 kudos

Delete on streaming table and starting startingVersion

I deleted for mistake some records from a streaming table, and of course, the streaming job stopped working. So I restored the table at the version before the delete was done, and attempted to restart the job using the startingVersion to the new vers...

Data Engineering

42 Views
1 replies
0 kudos

9 hours ago

View Replies

Latest Reply

raphaelblg
New Contributor III

59m ago

0 kudos

Hello @6502, It appears you've used the `startingVersion` parameter in your streaming query, which causes the stream to begin processing data from the version prior to the DELETE operation version. However, the DELETE operation will still be processe...

0 kudos

59m ago

by Erik_L • Contributor II

2 hours ago

23 Views
0 replies
0 kudos

BUG: Unity Catalog kills UDF

We have UDFs in a few locations and today we noticed they died in performance. This seems to be caused by Unity Catalog.Test environment 1:Databricks Runtime Environment: 14.3 / 15.1Compute: 1 master, 4 nodesPolicy: UnrestrictedAccess Mode: SharedTes...

Data Engineering

23 Views
0 replies
0 kudos

2 hours ago

by nilton • Visitor

4 hours ago

39 Views
2 replies
0 kudos

Query table based on table_name from information_schema

Hi,I have one table that changes the name every 60 days. The name simple increases the number version, for example:* Firtst 60 days: table_name_v1. After 60 days: table_name_v2 and so on.What i want is to query the table wich name returned in the que...

Data Engineering

39 Views
2 replies
0 kudos

4 hours ago

View Replies

Latest Reply

radothede
Visitor

3 hours ago

0 kudos

The simpliest way would be propably using spark.sql%py tbl_name = 'table_v1' df = spark.sql(f'select * from {tbl_name}') display(df) From there, You can simply create temporary view:%py df.createOrReplaceTempView('table_act')and query it using SQL st...

0 kudos

3 hours ago

1 More Replies

by radothede • Visitor

3 hours ago

24 Views
0 replies
0 kudos

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?The reason I'm asking is I can see a downgrade...

Data Engineering

24 Views
0 replies
0 kudos

3 hours ago

by rt-slowth • Contributor

01-10-2024 6:33:50 PM

930 Views
5 replies
2 kudos

AutoLoader File notification mode Configuration with AWS

from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/XXXXX/' S3_SCHEMA = 's3://datalake-lab/XXXXX/schemas/' ...

Data Engineering

930 Views
5 replies
2 kudos

01-10-2024 6:33:50 PM

View Replies

Latest Reply

djhs
New Contributor III

Tuesday

2 kudos

Was this resolved? I run into the same issue

2 kudos

Tuesday

4 More Replies

by 185369 • New Contributor II

05-04-2023 1:58:11 AM

841 Views
4 replies
1 kudos

Resolved! DLT with UC Access Denied sqs

I am going to use the newly released DLT with UC.But it keeps getting access denied. As I keep tracking the reasons, it seems that an account. ID other than my account ID or Databricks account ID is being requested.I cannot use '*' in principal attri...

Data Engineering

841 Views
4 replies
1 kudos

05-04-2023 1:58:11 AM

View Replies

Latest Reply

Priyag1
Honored Contributor II

05-04-2023 2:27:58 AM

1 kudos

Every service on AWS, an SQS queue, and all the other services in your stack using that queue will be configured with minimal permissions, leading to access issues. So, make sure you get your IAM policies set up correctly before deploying to producti...

1 kudos

05-04-2023 2:27:58 AM

3 More Replies

by israelst • New Contributor II

01-15-2024 12:53:49 AM

350 Views
3 replies
1 kudos

DLT can't authenticate with kinesis using instance profile

When running my notebook using personal compute with instance profile I am indeed able to readStream from kinesis. But adding it as a DLT with UC, while specifying the same instance-profile in the DLT pipeline setting - causes a "MissingAuthenticatio...

Data Engineering

Delta Live Tables

Unity Catalog

350 Views
3 replies
1 kudos

01-15-2024 12:53:49 AM

View Replies

Latest Reply

Mathias_Peters
New Contributor II

a week ago

1 kudos

Hi, were you able to solve this problem? If so, what was the solution?

1 kudos

a week ago

2 More Replies

by brianbraunstein • Visitor

9 hours ago

54 Views
1 replies
0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

Data Engineering

54 Views
1 replies
0 kudos

9 hours ago

View Replies

Latest Reply

brianbraunstein
Visitor

7 hours ago

0 kudos

Ok, it looks like Databricks might have broken this functionality shortly after it came out: https://community.databricks.com/t5/data-engineering/parameterized-spark-sql-not-working/m-p/57969/highlight/true#M30972

0 kudos

7 hours ago

by QuantumFries • New Contributor

Monday

113 Views
4 replies
3 kudos

Change {{job.start_time.[iso_date]}} Timezone

I am trying to schedule some jobs using workflows and leveraging dynamic variables. One caveat is that when I try to use {{job.start_time.[iso_date]}} it seems to be defaulted to UTC, is there a way to change it?

Data Engineering

113 Views
4 replies
3 kudos

Monday

View Replies

Latest Reply

artsheiko
Valued Contributor III

Tuesday

3 kudos

Hi, all the dynamic values are in UTC (documentation). Maybe you can use the code like the one presented below + pass the variables between tasks (see Share information between tasks in a Databricks job) ? %python from datetime import datetime, timed...

3 kudos

Tuesday

3 More Replies

by stevenayers-bge • New Contributor II

14 hours ago

65 Views
1 replies
0 kudos

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

I am having an issue where when I do a shallow clone using :create or replace table `catalog_a_test`.`schema_a`.`table_a` shallow clone `catalog_a`.`schema_a`.`table_a` I get:[TABLE_OR_VIEW_NOT_FOUND] The table or view catalog_a_test.schema_a.table_a...

Data Engineering

65 Views
1 replies
0 kudos

14 hours ago

View Replies

Latest Reply

Omar_hamdan
Community Manager

9 hours ago

0 kudos

Hi StevenThis is really a strange issue. First let's exclude some possible causes for this. We need to check the following:- The permission to table A and Catalog B. take a look at the following link to check what permission is needed: https://docs.d...

0 kudos

9 hours ago

by Abbe • New Contributor II

12-20-2022 7:04:50 AM

1076 Views
2 replies
0 kudos

Update data type of a column within a table that has a GENERATED ALWAYS AS IDENTITY-column

I want to cast the data type of a column "X" in a table "A" where column "ID" is defined as GENERATED ALWAYS AS IDENTITY. Databricks refer to overwrite to achieve this: https://docs.databricks.com/delta/update-schema.htmlThe following operation:(spar...

Data Engineering

1076 Views
2 replies
0 kudos

12-20-2022 7:04:50 AM

View Replies

Latest Reply

RajuBolla
Visitor

10 hours ago

0 kudos

Update is not working but delete is when i changed to DEFAULT property AnalysisException: UPDATE on IDENTITY column "XXXX_ID" is not supported.

0 kudos

10 hours ago

1 More Replies

by mamiya • Visitor

11 hours ago

37 Views
0 replies
0 kudos

ODBC PowerBI 2 commands in one query

Hello everyone,I'm trying to use the ODBC DirectQuery option in PowerBI, but I keep getting an error about another command. The SQL query works while using the SQL Editor. Do I need to change the setup of my ODBC connector?DECLARE dateFrom DATE = DA...

Data Engineering

37 Views
0 replies
0 kudos

11 hours ago

User

Count

1602

736

344

284

247

Databricks

Forum Posts

How to increase executor memory in Databricks jobs

Autoingest not working with Unity Catalog in DLT pipeline

How to find that given Parquet file got imported into Bronze Layer ?

Delete on streaming table and starting startingVersion

BUG: Unity Catalog kills UDF

Query table based on table_name from information_schema

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

AutoLoader File notification mode Configuration with AWS

Resolved! DLT with UC Access Denied sqs

DLT can't authenticate with kinesis using instance profile

spark.sql not supporting kwargs as documented

Change {{job.start_time.[iso_date]}} Timezone

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

Update data type of a column within a table that has a GENERATED ALWAYS AS IDENTITY-column

ODBC PowerBI 2 commands in one query

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...