cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dwalsh
by New Contributor III
  • 1905 Views
  • 2 replies
  • 0 kudos

Resolved! Cannot run ./Includes/Classroom-Setup-01.1 in Advanced Data Engineering with Databricks with 12

I am started the Advanced Data Engineering with Databricks course and have tried to run the includes code at the start. We recently had issues with 12.2 and moved to a newer version as there appeared to be some issues around setuptools.  If I run "%r...

  • 1905 Views
  • 2 replies
  • 0 kudos
Latest Reply
jainendrabrown
New Contributor II
  • 0 kudos

I am also having an issue. Running the first command itself. I am not sure how to download the Classroom-Setup data

  • 0 kudos
1 More Replies
MadelynM
by Databricks Employee
  • 14310 Views
  • 2 replies
  • 0 kudos

Delta Live Tables + S3 | 5 tips for cloud storage with DLT

You’ve gotten familiar with Delta Live Tables (DLT) via the quickstart and getting started guide. Now it’s time to tackle creating a DLT data pipeline for your cloud storage–with one line of code. Here’s how it’ll look when you're starting:CREATE OR ...

Workflows-Left Nav Workflows
  • 14310 Views
  • 2 replies
  • 0 kudos
Latest Reply
waynelxb
New Contributor II
  • 0 kudos

Hi MadelynM,How should we handle Source File Archival and Data Retention with DLT? Source File Archival: Once the data from source file is loaded with DLT Auto Loader, we want to move the source file from source folder to archival folder. How can we ...

  • 0 kudos
1 More Replies
rkshanmugaraja
by New Contributor
  • 1818 Views
  • 1 replies
  • 0 kudos

Copy files and folder into Users area , but Files are not showing in UI

Hi AllI'm trying to copy the whole training directory (which contains multiple sub folders and files) from my catalog volume area to each users area. From : "dbfs:/Volumes/CatalogName/schema/Training"To : "dbfs:/Workspace/Users/username@domain.com/Tr...

  • 1818 Views
  • 1 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

 Hi  ,Do You want to use dbfs location on purpose, or You want to upload the training notebooks to Workspace/users location? The reason I'm asking is those are two different locations, although both are related to file management in Databricks. (see:...

  • 0 kudos
slakshmanan
by New Contributor III
  • 898 Views
  • 1 replies
  • 0 kudos

how to view row produced from rest API in databricks for long running queries in Running state

   print(f"Query ID: {query['query_id']} ,Duration: {query['duration']} ms,user :{query['user_display_name']},Query_execute :{query['query_text']},Query_status : {query['status']},rw:{query['rows_produced']}"") U am able to get rows_produced only for...

  • 898 Views
  • 1 replies
  • 0 kudos
Latest Reply
slakshmanan
New Contributor III
  • 0 kudos

https://{databricks_instance}.cloud.databricks.com/api/2.0/sql/history/queries?include_metrics=true

  • 0 kudos
surajitDE
by New Contributor III
  • 768 Views
  • 1 replies
  • 0 kudos

How to stop subsequent iterations in Databricks loop feature?

How to stop subsequent iterations in Databricks loop feature? sys.exit(0) or dbutils.notebook.exit() only marks the current task and series of tasks in sequence as failed, but continues with subsequent iterations.

  • 768 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @surajitDECurrently, there is no out of the box feature to achieve that. What you can do, is to try to implement notebook logic that in case of error will cancel for each task run using REST API or python sdk:- use /api/2.1/jobs/runs/cancel endpoi...

  • 0 kudos
SamAdams
by Contributor
  • 1356 Views
  • 1 replies
  • 0 kudos

Resolved! Redacted check constraint condition in Delta Table

Hello! I have a delta table with a check constraint - it's one of many that a config-driven ETL pipeline of mine generates. When someone edits the config file and deploys the change, I'd like for the check constraint to be updated as well if it's dif...

  • 1356 Views
  • 1 replies
  • 0 kudos
Latest Reply
SamAdams
Contributor
  • 0 kudos

Figured this out with the help of @SamDataWalk 's post https://community.databricks.com/t5/data-engineering/databricks-bug-with-show-tblproperties-redacted-azure-databricks/m-p/93546It happens because Databricks thinks certain keywords in the constra...

  • 0 kudos
SamDataWalk
by New Contributor III
  • 4803 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks bug with show tblproperties - redacted - Azure databricks

I am struggling to report what is a fairly fundamental bug. Can anyone help? Ideally someone from Databricks themselves. Or others who can confirm they can replicate it.There is a bug where databricks seems to be hiding “any” properties which have th...

  • 4803 Views
  • 5 replies
  • 2 kudos
Latest Reply
SamAdams
Contributor
  • 2 kudos

Like your example that redaction behavior seemed to pick up on the column name: a condition that included a column named "URL" was redacted, but one that included a "modifiedDateTime" was not

  • 2 kudos
4 More Replies
DBX123
by New Contributor
  • 1176 Views
  • 1 replies
  • 1 kudos

Is it possible to have an alert when a row is added to a table?

I currently have a table that periodically adds rows in (sometimes daily, sometimes over a month). I was hoping to have an alert for when a row is added into this. The table has date fields of when rows are loaded in.I have an alert working, but it j...

  • 1176 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

You could set your alert with query as: -- Custom alert to monitor for new rows added to a table SELECT COUNT(*) AS new_rows FROM your_table WHERE event_time > current_timestamp() - interval '1' hour In this example, the query checks for rows added t...

  • 1 kudos
EDDatabricks
by Contributor
  • 3040 Views
  • 2 replies
  • 2 kudos

How to enforce a cleanup policy on job cluster logs

We have a number of jobs on our databricks workspaces. All job clusters are configured with a dbfs location to save the respective logs (configured from Job cluster -> "Advanced options" -> "Logging").However, the logs are retained in the dbfs indefi...

  • 3040 Views
  • 2 replies
  • 2 kudos
Latest Reply
Atanu
Databricks Employee
  • 2 kudos

@EDDatabricks Thanks for reaching out to us. I think you should explore our Purge option is enabled or not. https://docs.databricks.com/en/administration-guide/workspace/settings/storage.html

  • 2 kudos
1 More Replies
dyusuf
by New Contributor II
  • 2145 Views
  • 2 replies
  • 2 kudos

Unity Catalog

Can we set up unity catalog on databricks community edition?If yes, please share the process.Thanks

  • 2145 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Another option you can try it to setup open source version of unity catalog with apache spark if you don't have possibility to create Azure trial account/

  • 2 kudos
1 More Replies
kunex
by New Contributor III
  • 3773 Views
  • 3 replies
  • 2 kudos

Resolved! Asset Bundles Job If Condition

HiI am trying to use Asset Bundles to deploy a job with email notifications active only on production environment.I tried to use such if statement but it does not seem to do anything.resources: jobs: Master_Load: name: Master Load e...

  • 3773 Views
  • 3 replies
  • 2 kudos
Latest Reply
kunex
New Contributor III
  • 2 kudos

Okay i made it work with a variable, it was pretty easy. The variable accepted multiline text easily.Thanks!databricks.ymltargets: dev: mode: development ... variables: notifications: default: - user1@smth.com ...

  • 2 kudos
2 More Replies
antr
by New Contributor II
  • 3795 Views
  • 3 replies
  • 0 kudos

What does DLT INITIALIZING phase do?

In Delta Live Tables, the INITIALIZING phase takes sometimes a minute, sometimes 5 minutes. I'd like to learn what is it doing in the background, and can it be optimized in any way.

  • 3795 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @antr, In Delta Live Tables (DLT), a feature of Databricks, the "Initializing" phase refers to the first step in the lifecycle of a DLT pipeline run. During this phase, the platform sets up the necessary resources, configurations, and dependencies...

  • 0 kudos
2 More Replies
noorbasha534
by Valued Contributor II
  • 2517 Views
  • 1 replies
  • 0 kudos

Databricks as a "pure" data streaming software like Confluent

DearsI was wondering if anyone has leveraged Databricks as a "pure" data streaming software in place of Confluent, Flink, Kafka etc.I see the reference architectures placing Databricks on the data processing side mostly once data is made available by...

  • 2517 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @noorbasha534 ,It depends on what you're asking for. Kafka is primarily a messaging system, optimized for handling high-throughput, distributed message logs. Databricks can read from Kafka as a data source but doesn't replace Kafka's role in messa...

  • 0 kudos
mayur_05
by New Contributor II
  • 1119 Views
  • 1 replies
  • 0 kudos

how to get node and executor id and log

Hi Team,We have df has 70 M row count and we are calling an API for 6000 rows per set using df.repartition(rep_count).foreachpartition(func_name) so in func_name we are calling API post req for that partition but when we are trying to print/log somet...

  • 1119 Views
  • 1 replies
  • 0 kudos
Latest Reply
mayur_05
New Contributor II
  • 0 kudos

any update on this???

  • 0 kudos
tliuzillow
by New Contributor
  • 706 Views
  • 1 replies
  • 1 kudos

Streaming Live Table - What is actually computed?

Can anyone please share in a DLT or structured streaming task, what group of rows are computed?Specific scenarios:1. when a streaming table A joining a delta table B. Is each of the minibatches in A joining the whole delta table? Does Spark compute t...

  • 706 Views
  • 1 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @tliuzillow ,1. Stream-static Join: Each minibatch from the streaming table (A) is joined with the entire Delta table (B). 2. Stream-stream Join: Each minibatch from the streaming table(A) is joined with minibatch from the streaming table(B). Howe...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels