cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

L1000
by New Contributor III
  • 1877 Views
  • 4 replies
  • 2 kudos

DLT Serverless incremental refresh of materialized view

I have a materialized view that always does a "COMPLETE_RECOMPUTE", but I can't figure out why.I found how I can get the logs:  SELECT * FROM event_log(pipeline_id) WHERE event_type = 'planning_information' ORDER BY timestamp desc;   And for my table...

  • 1877 Views
  • 4 replies
  • 2 kudos
Latest Reply
L1000
New Contributor III
  • 2 kudos

I split up materialized view in 3 separate ones:step1:@Dlt.table(name="step1", table_properties={"delta.enableRowTracking": "true"}) def step1(): isolate_names = dlt.read("soruce").select("Name").groupBy("Name").count() return isolate_namesst...

  • 2 kudos
3 More Replies
RobDineen
by Contributor
  • 2246 Views
  • 2 replies
  • 2 kudos

Resolved! %SQL delete from temp table driving me mad

Hello there, i have a temp table where i want to remove a null / empty values ( see below )if there are no rows to delete, then shouldn't it just say Zero rows affected? 

RobDineen_0-1729682455777.png
  • 2246 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 2 kudos

@RobDineen This should answer your question: https://community.databricks.com/t5/get-started-discussions/how-to-create-temporary-table-in-databricks/m-p/67774/highlight/true#M2956Long story short, don't use it.

  • 2 kudos
1 More Replies
sandy311
by New Contributor III
  • 12930 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks asset bundle does not create new job if I change configuration of existing Databricks yam

When deploying multiple jobs using the `Databricks.yml` file via the asset bundle, the process either overwrites the same job or renames it, instead of creating separate, distinct jobs.

  • 12930 Views
  • 7 replies
  • 4 kudos
Latest Reply
Ncolin1999
New Contributor II
  • 4 kudos

@filipniziol my requirements is to just deploy  notebooks in databricks workspace. I don’t not wana create any job. Can I still uses databricks asset bundle 

  • 4 kudos
6 More Replies
Stephanos
by New Contributor
  • 1936 Views
  • 1 replies
  • 0 kudos

Sequencing Job Deployments with Databricks Asset Bundles

Hello Databricks Community!I'm working on a project where I need to deploy jobs in a specific sequence using Databricks Asset Bundles. Some of my jobs (let's call them coordination jobs) depend on other jobs (base jobs) and need to look up their job ...

  • 1936 Views
  • 1 replies
  • 0 kudos
Latest Reply
MohcineRouessi
New Contributor II
  • 0 kudos

Hey Steph, Have you found anything here please ? I'm currently stuck here, trying to achieve the same thing

  • 0 kudos
amelia1
by New Contributor II
  • 2398 Views
  • 2 replies
  • 0 kudos

pyspark read data using jdbc url returns column names only

Hello,I have a remote azure sql warehouse serverless instance that I can access using databricks-sql-connector. I can read/write/update tables no problem.But, I'm also trying to read/write/update tables using local pyspark + jdbc drivers. But when I ...

  • 2398 Views
  • 2 replies
  • 0 kudos
Latest Reply
infodeliberatel
New Contributor II
  • 0 kudos

I added `UseNativeQuery=0` in url. It works for me.

  • 0 kudos
1 More Replies
RobertWalsh
by New Contributor II
  • 11253 Views
  • 7 replies
  • 2 kudos

Resolved! Hive Table Creation - Parquet does not support Timestamp Datatype?

Good afternoon, Attempting to run this statement: %sql CREATE EXTERNAL TABLE IF NOT EXISTS dev_user_login ( event_name STRING, datetime TIMESTAMP, ip_address STRING, acting_user_id STRING ) PARTITIONED BY (date DATE) STORED AS PARQUET ...

  • 11253 Views
  • 7 replies
  • 2 kudos
Latest Reply
source2sea
Contributor
  • 2 kudos

1. change to spark native catalog approach (not hive metadata store) works. Syntax is essentially: CREATE TABLE IF NOT EXISTS dbName.tableName (columns names and types ) USING parquet PARTITIONED BY ( runAt STRING ) LOCA...

  • 2 kudos
6 More Replies
ChiragM
by New Contributor II
  • 1101 Views
  • 1 replies
  • 2 kudos

uninstalled libraries gets installed back on cluster after restart

I am trying to update libraries on Data bricks cluster by uninstalling existing libraries and installing new ones. When I make api call to uninstall libraries and restart cluster, libraries first show as they are being uninstalled but after cluster c...

  • 1101 Views
  • 1 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 2 kudos

maybe you can add some time between uninstalling of libraries and restartuninstall_libraries()time.sleep(60)  # Wait for the libraries to be uninstalledrestart_cluster()

  • 2 kudos
AngadSingh
by New Contributor III
  • 1470 Views
  • 2 replies
  • 0 kudos

Delete delta live table without deleting DLT pipeline

Hi,I am wondering how do I delete the managed DLT table without deleting the DLT pipeline? I tried commenting the code for the table definition but the DLT pipeline complains that source code has no table to create or update. Thanks in advance.#datae...

  • 1470 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hello @AngadSingh ,The error message you've shared identifies the issue you're encountering. The error message: Source code has no table to create or update. occurs because your Delta Live Tables (DLT) pipeline does not have any streaming tables defi...

  • 0 kudos
1 More Replies
Shankar
by New Contributor III
  • 11854 Views
  • 2 replies
  • 1 kudos

How does deletedFileRetentionDuration and logRetentionDuration associated with Vacuum?

I am trying to learn more about Vacuum operation and came across the two properties: delta.deletedFileRetentionDurationdelta.logRetentionDurationSo, let's say I have a delta table where few records/files have been deleted. The delta.deletedFileRetent...

Data Engineering
delta
deltatables
vacuum
  • 11854 Views
  • 2 replies
  • 1 kudos
Latest Reply
SubashDev
New Contributor II
  • 1 kudos

Will the deleted file be completely cleaned-up from storage only after 207 days (retention being 7 and vacuum interval 200 days)? As the default retention period is 7 days, there will not be any files older than 7 days, unless the retention period is...

  • 1 kudos
1 More Replies
TWib
by New Contributor III
  • 7849 Views
  • 7 replies
  • 4 kudos

Error in Spark Streaming with foreachBatch and Databricks Connect

The following code throws an error locally in my IDE with Databricks-connect.  from databricks.connect import DatabricksSession spark = DatabricksSession.builder.getOrCreate() spark.sql("CREATE DATABASE IF NOT EXISTS sample") spark.sql("DROP TABLE I...

  • 7849 Views
  • 7 replies
  • 4 kudos
Latest Reply
olivier-soucy
Contributor
  • 4 kudos

I'm also trying to use the foreachBatch method of a Spark Streaming DataFrame with databricks-connect. Given that spark connect supported was added to  `foreachBatch` in 3.5.0, I was expecting this to work.Configuration:- DBR 15.4 (Spark 3.5.0)- data...

  • 4 kudos
6 More Replies
Przemk00
by New Contributor II
  • 2021 Views
  • 5 replies
  • 0 kudos

DAB deploy missing packages

Hello everyone, not sure where to start but so will do it from the beginning. I encountered an issue where my .whl files in the dist/ and packages/ directories were not being deployed to Databricks using dab-deploy. After investigating, I discovered ...

Data Engineering
DAB
deployment
packages
  • 2021 Views
  • 5 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Production native approach is to maintain your artifacts in central antifactory and use this location in your .yml for libraries. This way you ensure your antifactory are properly versioned , governed and maintaining some sort of standards. I hope wi...

  • 0 kudos
4 More Replies
Fatimah-Tariq
by New Contributor III
  • 948 Views
  • 2 replies
  • 0 kudos

Need help with DLT

I have a DLT pipeline on databricks that has been running since months and just now I found out that there has been an issue with a logic in silver layer and as a result, the tables in my silver schema has faulty records now. Silver layer tables are ...

  • 948 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Fatimah-Tariq ,What about defining your DLT pipeline as below.This way you will create a stream table that reads from your silver, apply all the needed changes and then write back to your silver.%sql -- Read from the streaming table in the silver...

  • 0 kudos
1 More Replies
RobDineen
by Contributor
  • 5646 Views
  • 6 replies
  • 1 kudos

Resolved! Delta table in catalogue are showing but DO NOT exist

I am just working with Databricks, and have come across an issue where delta tables have been created in the catalogue but do not actually exist.  see screenshot for script ive been running and error messagesis this a bug, or am i missing something o...

RobDineen_0-1728648443009.png
  • 5646 Views
  • 6 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@RobDineenCan you try to refresh your table (REFRESH TABLE your_catalog.your_schema.your_table) and followed by spark.catalog.clearCache().Then try the drop operation: table_path = "dbfs:/user/hive/warehouse/season" dbutils.fs.rm(table_path, recurse=...

  • 1 kudos
5 More Replies
johnb1
by Contributor
  • 2038 Views
  • 1 replies
  • 2 kudos

Programatically remove external table completely (Azure)

Hi!I have created external tables with data stored in an Azure storage account.Is there a way to not only drop the tables but also remove the underlying folder in the storage account which contains the table's data? I want to do this from Databricks ...

  • 2038 Views
  • 1 replies
  • 2 kudos
Latest Reply
Panda
Valued Contributor
  • 2 kudos

@johnb1 You can achieve this with the code below. Please review.table_name = "table_name" location = "abfss://container@storage-account.dfs.core.windows.net/path/to/table/data/" spark.sql(f"DROP TABLE IF EXISTS {table_name}") dbutils.fs.rm(location...

  • 2 kudos
DB3
by New Contributor II
  • 2226 Views
  • 2 replies
  • 0 kudos

read_files to create streaming table using Databricks SQL

I follow the same syntax from documentation for create streaming table and it was last week and not working nowEx query:CREATE OR REFRESH STREAMING TABLE`ax`.`db`.`healthex`AS SELECT * FROM STREAMread_files("/Volumes/ax/db/dlt-test/", -- The file pat...

  • 2226 Views
  • 2 replies
  • 0 kudos
Latest Reply
DB3
New Contributor II
  • 0 kudos

I followed the syntax in this documentation link https://docs.databricks.com/en/tables/streaming.htmlI get this error if the STREAM keyword is excluded Please add the STREAM keyword to your FROM clause to turn this relation into a streaming query. SQ...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels