cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shekharshukla
by New Contributor II
  • 918 Views
  • 1 replies
  • 0 kudos

Not able to access Table_tags in Databricks Apps:

When I try to fetch system.information_schema.schema_tags, it shows up but when I'm trying to fetch system.information_schema.table_tags it's not showing up and returns an empty df. Is there anything I am missing?assert os.getenv('DATABRICKS_WAREHOUS...

  • 918 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @shekharshukla How are you doing today? As per understanding, It looks like the system.information_schema.table_tags query is returning an empty DataFrame, which could be due to a couple of reasons. First, make sure that there are actually tags as...

  • 0 kudos
Faizan_khan8171
by New Contributor
  • 1228 Views
  • 1 replies
  • 0 kudos

UCX Assessment Dashboard Error: "The warehouse was not found"

Hello everyone,We recently installed UCX and were able to access the UCX Assessment Dashboard successfully. However, we’re now seeing an error stating: "The warehouse was not found." I suspect that someone may have accidentally deleted the warehouse ...

  • 1228 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @Faizan_khan8171 How are you doing today? As per my understanding, It looks like the warehouse linked to your UCX Assessment Dashboard was deleted, which is likely causing the error. You can try checking under SQL Warehouses to see if it's still t...

  • 0 kudos
BobCat62
by Databricks Partner
  • 2352 Views
  • 2 replies
  • 0 kudos

Resolved! Missing Delta-live-Table in hive-metastore catalog

Hi experts,I defined my delta table in an external location as following:%sqlCREATE OR REFRESH STREAMING TABLE pumpdata (Body string,EnqueuedTimeUtc string,SystemProperties string,_rescued_data string,Properties string)USING DELTALOCATION 'abfss://md...

Bild1.png Bild2.png Bild3.png Bild4.png
Data Engineering
Delta Live Tables
  • 2352 Views
  • 2 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hey @BobCat62 , This might helpdlt will be in direct publishingmode by default. If you select hive_metstore you must specify the default schema in the dlt pipeline setting. If not done there. At the time of defining the dlt table pass the schema_name...

  • 0 kudos
1 More Replies
MrFi
by New Contributor
  • 1686 Views
  • 1 replies
  • 0 kudos

500 Error on /ajax-api/2.0/fs/list When Accessing Unity Catalog Volume in Databricks

 We are encountering an issue with volumes created inside Unity Catalog. We are using AWS and Terraform to host Databricks, and our Unity Catalog structure is as follows:• Catalog: catalog_name• Schemas: raw, bronze, silver, gold (all with external l...

  • 1686 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @MrFi How are you doing today?As per my understanding, It looks like the Unity Catalog UI might have trouble handling external volumes, even though dbutils works fine. Try running SHOW VOLUMES IN catalog_name.raw; to check if the volume is properl...

  • 0 kudos
ceceliac
by New Contributor III
  • 3165 Views
  • 8 replies
  • 0 kudos

inconsistent behavior with serverless sql: user is not an owner of table error with views

We get the following error with some basic views and not others when using serverless compute (from a notebook or from SQL Editor or from the Catalog Explorer).  Views are simple select * from table x and underlying schemas/tables are using managed m...

  • 3165 Views
  • 8 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@ceceliac just a quick check, if you rerun the same query after it has initially failed, will it go through or still fail? if it runs fine, wait another 10-15mins and rerun it and share the outcome. So: 1.- Run it once, it will fail. 2.- Rerun it inm...

  • 0 kudos
7 More Replies
Kassandra_
by New Contributor
  • 1587 Views
  • 1 replies
  • 0 kudos

RESTORE deletes part of the delta table's history

Having a delta table with the history of 15 versions (see screenshot). After running the command:RESTORE TABLE hive_metastore.my_schema.my_table TO VERSION AS OF 6;And then running DESCRIBE HISTORY (see screenshot) it seems that a new version (RESTOR...

  • 1587 Views
  • 1 replies
  • 0 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 0 kudos

it's not. I haven't observed this behavior. According to the delta lake documentation "Using the restore command resets the table’s content to an earlier version, but doesn’t remove any data. It simply updates the transaction log to indicate that cer...

  • 0 kudos
creditorwatch
by New Contributor II
  • 4432 Views
  • 2 replies
  • 1 kudos

Load data from Aurora to Databricks directly

Hi,Does anyone know how to link Aurora to Databricks directly and load data into Databricks automatically on a schedule without any third-party tools in the middle?

  • 4432 Views
  • 2 replies
  • 1 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 1 kudos

AWS Aurora supports PostgreSQL or MySQL, did you try to connect using JDBC?url = f"jdbc:postgresql://{database_host}:{database_port}/{database_name}"remote_table = (spark.read.format("jdbc").option("driver", driver).option("url", url).option("dbtable...

  • 1 kudos
1 More Replies
philHarasz
by New Contributor III
  • 6828 Views
  • 4 replies
  • 0 kudos

Resolved! Writing a small pyspark dataframe to a table is taking a very long time

My experience with Databricks pyspark up to this point has always been to execute a SQL query against existing Databricks tables, then write the resulting pyspark dataframe into a new table. For the first time, I am now getting data via an API which ...

  • 6828 Views
  • 4 replies
  • 0 kudos
Latest Reply
philHarasz
New Contributor III
  • 0 kudos

After reading the suggested documentation, I tried using the "Parse nested XML (from_xml and schema_of_xml)". I used this code from the doc: df = spark.createDataFrame([(8, xml_data)], ["number", "payload"]) schema = schema_of_xml(df.select("payload"...

  • 0 kudos
3 More Replies
vaibhavaher2025
by New Contributor
  • 5826 Views
  • 2 replies
  • 2 kudos

Serverless compute vs Job cluster

Hi Guys,For running the job with varying workload what should I use ? Serverless cluster or Job compute ?What are positives and negatives?(I'll be running my notebook from Azure data factory)

  • 5826 Views
  • 2 replies
  • 2 kudos
Latest Reply
KaranamS
Contributor III
  • 2 kudos

It depends on cost, performance and startup time needed for your use-case.Serverless compute is usually preferred choice because of its fast startup time and dynamic scaling. However, if your workload is long-running and predictable, job compute with...

  • 2 kudos
1 More Replies
Phani1
by Databricks MVP
  • 1972 Views
  • 1 replies
  • 1 kudos

Databricks Vs Fabric use case

Hi Team,We've noticed that for some use cases, customers are proposing a architecture with A) Fabric in the Gold layer and reporting in Azure Power BI, while using Databricks for the Bronze and Silver layers. However, we can also have the B) Gold lay...

  • 1972 Views
  • 1 replies
  • 1 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 1 kudos

Gold layer in Databricks and connect to Power BI - this is a good option.However, If you need to use some of Fabric capabilities, because your team has preferences to use T-SQL, Direct Lake, Python notebooks, low-code tools like Data Factory. MS Fabr...

  • 1 kudos
dzsuzs
by New Contributor II
  • 3383 Views
  • 3 replies
  • 2 kudos

OOM Issue in Streaming with foreachBatch()

I have a stateless streaming application that uses foreachBatch. This function executes between 10-400 times each hour based on custom logic.  The logic within foreachBatch includes: collect() on very small DataFrames (a few megabytes) --> driver mem...

  • 3383 Views
  • 3 replies
  • 2 kudos
Latest Reply
gardnmi1983
New Contributor II
  • 2 kudos

Did you ever figure out what is causing the memory leak?  We are experiencing a nearly identical issue where the memory gradually increases over time and OOM after a few days.  I did track down this open bug ticket that states there is a memory leak ...

  • 2 kudos
2 More Replies
robertomatus
by New Contributor II
  • 2263 Views
  • 3 replies
  • 1 kudos

Autoloader infering struct as a string when reading json data

Hi Everyone,Trying to read JSON files with autoloader is failing to infer the schema correctly, every nested or struct column is being inferred as a string.   spark.readStream.format("cloudFiles") .option("cloudFiles.format", "json") .option("cloud...

  • 2263 Views
  • 3 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi @robertomatus ,You're right—it would be much better if we didn’t have to rely on workarounds. The reason AutoLoader infers schema differently from spark.read.json() is that it's optimized for streaming large-scale data efficiently. Unlike spark.re...

  • 1 kudos
2 More Replies
Phani1
by Databricks MVP
  • 4119 Views
  • 1 replies
  • 0 kudos

Databricks vs snowflake use case comparision

Hi Databricks Team,We see Databricks and Snowflake as very close in terms of features. When trying to convince customers about Databricks' products, we would like to know the key comparisons between Databricks and Snowflake by use case.Regards,Phani

  • 4119 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Phani1, You can read these resources: https://www.databricks.com/databricks-vs-snowflake https://www.databricks.com/blog/2018/08/27/by-customer-demand-databricks-and-snowflake-integration.html

  • 0 kudos
Katalin555
by New Contributor II
  • 1618 Views
  • 2 replies
  • 0 kudos

df.isEmpty() and df.fillna(0).isEmpty() throws error

In our code we usually use Single user cluster with 13.3 LTS with Spark 3.4.1 when loading data from delta table to Azure SQL Hyperscale, and we did not experience any issues, but starting last week our pipeline has been failing with the following er...

  • 1618 Views
  • 2 replies
  • 0 kudos
Latest Reply
Katalin555
New Contributor II
  • 0 kudos

Hi @Alberto_Umana ,Yes I checked and did not see any other information. We are using Driver: Standard_DS5_v2 · Workers: Standard_E16a_v4 · 1-6 workers, at the stage when the pipeline fails the shuffle information was :Shuffle Read Size / Records: 257...

  • 0 kudos
1 More Replies
AnudeepKolluri
by New Contributor II
  • 1237 Views
  • 4 replies
  • 0 kudos
  • 1237 Views
  • 4 replies
  • 0 kudos
Latest Reply
Jim_Anderson
Databricks Employee
  • 0 kudos

It looks like your completion email was distributed on Feb 04, but I will DM you the certification discount code again for your reference    

  • 0 kudos
3 More Replies
Labels