Data Engineering

Forum Posts

Sorted by:

by Venugopal • New Contributor II

yesterday

32 Views
1 replies
0 kudos

databricks asset bundles: Unable to fetch variables from variable-overrides.json

Hi,I am using Databricks CLI 0.227.1 for creating a bundle project to deploy job.As per this , https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/variables I wanted to have variable-overrides.json to have my variables.I created a js...

Data Engineering

32 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

ashraf1395
Valued Contributor II

yesterday

0 kudos

Hi there @Venu,You have to specify the names of those variables in databricks.yml as wellvariables: task_key: metadata_schema:then you can reference them later in your jobs definition : ${var.task_key} or ${var.job_cluster_key} the way you did.and ...

0 kudos

yesterday

by 797646 • New Contributor II

yesterday

58 Views
2 replies
0 kudos

Calculated measures not working in Dashboards for queries with big result

Queries with big result are executed on cluster. If we specify calculated measure as something like cal1 ascount(*) / count(distinct field1) it will wrap it in backticks as `count(*) / count(distinct field1) ` as `cal1`functions are not identified in...

Data Engineering

58 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

797646
New Contributor II

yesterday

0 kudos

Hi @Brahmareddy This didn't workcount(*) * 1.0 / count(distinct field1) AS cal1)gave me same error. But as per this feature release https://docs.databricks.com/aws/en/dashboards/datasets/calculated-measuresthis should work out of box, otherwise it's...

0 kudos

yesterday

1 More Replies

by BobCat62 • New Contributor

Wednesday

54 Views
2 replies
0 kudos

Missing Delta-live-Table in hive-metastore catalog

Hi experts,I defined my delta table in an external location as following:%sqlCREATE OR REFRESH STREAMING TABLE pumpdata (Body string,EnqueuedTimeUtc string,SystemProperties string,_rescued_data string,Properties string)USING DELTALOCATION 'abfss://md...

Data Engineering

Delta Live Tables

54 Views
2 replies
0 kudos

Wednesday

View Replies

Latest Reply

ashraf1395
Valued Contributor II

yesterday

0 kudos

Hey @BobCat62 , This might helpdlt will be in direct publishingmode by default. If you select hive_metstore you must specify the default schema in the dlt pipeline setting. If not done there. At the time of defining the dlt table pass the schema_name...

0 kudos

yesterday

1 More Replies

by dkxxx-rc • New Contributor III

yesterday

26 Views
2 replies
0 kudos

CREATE TEMP TABLE

The Databricks assistant tells me (sometimes) that `CREATE TEMP TABLE` is a valid SQL operation. And other sources (e.g., https://www.freecodecamp.org/news/sql-temp-table-how-to-create-a-temporary-sql-table/) say the same.But in actual practice, thi...

Data Engineering

26 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

ashraf1395
Valued Contributor II

yesterday

0 kudos

You can create temp tables in dlt pipelines as wellsimply@Dlt.table(name ="temp_table", temporary = True)def temp_table():return <any_query>

0 kudos

yesterday

1 More Replies

by Anish_2 • New Contributor II

yesterday

60 Views
2 replies
0 kudos

removal of Delta live tables

Hello Team,I have removed definition of table from delta live table pipeline but table is still present in unity catalog. In event log, it is giving below messageMaterialized View '`catalog1`.`schema1`.`table1`' is no longer defined in the pipeline a...

Data Engineering

Delta Live Table

60 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Brahmareddy
Honored Contributor

yesterday

0 kudos

Hi @Anish_2 How are you doing today? I agree with @KaranamS's answer.Databricks marks the table as inactive instead of removing it to prevent accidental data loss, allowing you to restore it if needed. Once inactive, the table remains in Unity Catalo...

0 kudos

yesterday

1 More Replies

by MrFi • New Contributor

Thursday

118 Views
1 replies
0 kudos

500 Error on /ajax-api/2.0/fs/list When Accessing Unity Catalog Volume in Databricks

We are encountering an issue with volumes created inside Unity Catalog. We are using AWS and Terraform to host Databricks, and our Unity Catalog structure is as follows:• Catalog: catalog_name• Schemas: raw, bronze, silver, gold (all with external l...

Data Engineering

118 Views
1 replies
0 kudos

Thursday

View Replies

Latest Reply

Brahmareddy
Honored Contributor

yesterday

0 kudos

Hi @MrFi How are you doing today?As per my understanding, It looks like the Unity Catalog UI might have trouble handling external volumes, even though dbutils works fine. Try running SHOW VOLUMES IN catalog_name.raw; to check if the volume is properl...

0 kudos

yesterday

by ceceliac • New Contributor III

12-09-2024 8:09:21 AM

538 Views
8 replies
0 kudos

inconsistent behavior with serverless sql: user is not an owner of table error with views

We get the following error with some basic views and not others when using serverless compute (from a notebook or from SQL Editor or from the Catalog Explorer). Views are simple select * from table x and underlying schemas/tables are using managed m...

Data Engineering

538 Views
8 replies
0 kudos

12-09-2024 8:09:21 AM

View Replies

Latest Reply

VZLA
Databricks Employee

12-23-2024 8:09:53 AM

0 kudos

@ceceliac just a quick check, if you rerun the same query after it has initially failed, will it go through or still fail? if it runs fine, wait another 10-15mins and rerun it and share the outcome. So: 1.- Run it once, it will fail. 2.- Rerun it inm...

0 kudos

12-23-2024 8:09:53 AM

7 More Replies

by Kassandra_ • Visitor

yesterday

104 Views
1 replies
0 kudos

RESTORE deletes part of the delta table's history

Having a delta table with the history of 15 versions (see screenshot). After running the command:RESTORE TABLE hive_metastore.my_schema.my_table TO VERSION AS OF 6;And then running DESCRIBE HISTORY (see screenshot) it seems that a new version (RESTOR...

Data Engineering

104 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

MariuszK
Contributor III

yesterday

0 kudos

it's not. I haven't observed this behavior. According to the delta lake documentation "Using the restore command resets the table’s content to an earlier version, but doesn’t remove any data. It simply updates the transaction log to indicate that cer...

0 kudos

yesterday

by creditorwatch • New Contributor II

02-27-2024 4:44:17 PM

2011 Views
2 replies
1 kudos

Load data from Aurora to Databricks directly

Hi,Does anyone know how to link Aurora to Databricks directly and load data into Databricks automatically on a schedule without any third-party tools in the middle?

Data Engineering

2011 Views
2 replies
1 kudos

02-27-2024 4:44:17 PM

View Replies

Latest Reply

MariuszK
Contributor III

yesterday

1 kudos

AWS Aurora supports PostgreSQL or MySQL, did you try to connect using JDBC?url = f"jdbc:postgresql://{database_host}:{database_port}/{database_name}"remote_table = (spark.read.format("jdbc").option("driver", driver).option("url", url).option("dbtable...

1 kudos

yesterday

1 More Replies

by vaibhavaher2025 • Visitor

yesterday

53 Views
2 replies
1 kudos

Serverless compute vs Job cluster

Hi Guys,For running the job with varying workload what should I use ? Serverless cluster or Job compute ?What are positives and negatives?(I'll be running my notebook from Azure data factory)

Data Engineering

53 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

KaranamS
New Contributor III

yesterday

1 kudos

It depends on cost, performance and startup time needed for your use-case.Serverless compute is usually preferred choice because of its fast startup time and dynamic scaling. However, if your workload is long-running and predictable, job compute with...

1 kudos

yesterday

1 More Replies

by verargulla • New Contributor III

10-30-2022 3:25:54 PM

12003 Views
4 replies
4 kudos

Azure Databricks: Error Creating Cluster

We have provisioned a new workspace in Azure using our own VNet. Upon creating the first cluster, I encounter this error:Control Plane Request Failure: Failed to get instance bootstrap steps from the Databricks Control Plane. Please check that instan...

Data Engineering

12003 Views
4 replies
4 kudos

10-30-2022 3:25:54 PM

View Replies

Latest Reply

abhikakade
Visitor

yesterday

4 kudos

I'm also seeing this issue. Was there a solution?

4 kudos

yesterday

3 More Replies

by Phani1 • Valued Contributor II

yesterday

44 Views
1 replies
0 kudos

Databricks Vs Fabric use case

Hi Team,We've noticed that for some use cases, customers are proposing a architecture with A) Fabric in the Gold layer and reporting in Azure Power BI, while using Databricks for the Bronze and Silver layers. However, we can also have the B) Gold lay...

Data Engineering

44 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

MariuszK
Contributor III

yesterday

0 kudos

Gold layer in Databricks and connect to Power BI - this is a good option.However, If you need to use some of Fabric capabilities, because your team has preferences to use T-SQL, Direct Lake, Python notebooks, low-code tools like Data Factory. MS Fabr...

0 kudos

yesterday

by dzsuzs • New Contributor II

06-03-2024 8:56:45 AM

1380 Views
3 replies
2 kudos

OOM Issue in Streaming with foreachBatch()

I have a stateless streaming application that uses foreachBatch. This function executes between 10-400 times each hour based on custom logic. The logic within foreachBatch includes: collect() on very small DataFrames (a few megabytes) --> driver mem...

Data Engineering

1380 Views
3 replies
2 kudos

06-03-2024 8:56:45 AM

View Replies

Latest Reply

gardnmi1983
Visitor

yesterday

2 kudos

Did you ever figure out what is causing the memory leak? We are experiencing a nearly identical issue where the memory gradually increases over time and OOM after a few days. I did track down this open bug ticket that states there is a memory leak ...

2 kudos

yesterday

2 More Replies

by robertomatus • New Contributor II

Thursday

75 Views
3 replies
1 kudos

Autoloader infering struct as a string when reading json data

Hi Everyone,Trying to read JSON files with autoloader is failing to infer the schema correctly, every nested or struct column is being inferred as a string. spark.readStream.format("cloudFiles") .option("cloudFiles.format", "json") .option("cloud...

Data Engineering

75 Views
3 replies
1 kudos

Thursday

View Replies

Latest Reply

Brahmareddy
Honored Contributor

yesterday

1 kudos

Hi @robertomatus ,You're right—it would be much better if we didn’t have to rely on workarounds. The reason AutoLoader infers schema differently from spark.read.json() is that it's optimized for streaming large-scale data efficiently. Unlike spark.re...

1 kudos

yesterday

2 More Replies

by N38 • New Contributor II

3 weeks ago

481 Views
10 replies
4 kudos

DLT Pipeline event_log error - invalid pipeline name / The Spark SQL phase analysis failed

I am trying the below queries using both SQL warehouse and a shared cluster on Databricks runtime (15.4/16.1) with Unity Catalog: SELECT * FROM event_log(table(my_catalog.myschema.bronze_employees))SELECT * FROM event_log("6b317553-5c5a-40d5-9541-1a5...

Data Engineering

481 Views
10 replies
4 kudos

3 weeks ago

View Replies

Latest Reply

Mbunko
New Contributor II

yesterday

4 kudos

Hi, is there an ETA on this fix?

4 kudos

yesterday

9 More Replies

User

Count

1610

763

345

286

251

Databricks Community

Forum Posts

databricks asset bundles: Unable to fetch variables from variable-overrides.json

Calculated measures not working in Dashboards for queries with big result

Missing Delta-live-Table in hive-metastore catalog

CREATE TEMP TABLE

removal of Delta live tables

500 Error on /ajax-api/2.0/fs/list When Accessing Unity Catalog Volume in Databricks

inconsistent behavior with serverless sql: user is not an owner of table error with views

RESTORE deletes part of the delta table's history

Load data from Aurora to Databricks directly

Serverless compute vs Job cluster

Azure Databricks: Error Creating Cluster

Databricks Vs Fabric use case

OOM Issue in Streaming with foreachBatch()

Autoloader infering struct as a string when reading json data

DLT Pipeline event_log error - invalid pipeline name / The Spark SQL phase analysis failed

Connect with Databricks Users in Your Area

How to use GLOW in Databricks Premium on AWS?

Writing a small pyspark dataframe to a table is ta...

How to create Service Principal and access APIs li...

databricks workspace import_dir not working withou...

Writing back from notebook to blob storage as sing...