cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KendraVant
by New Contributor II
  • 17259 Views
  • 7 replies
  • 2 kudos

Resolved! How do I clear all output results in a notebook?

I'm building notebooks for tutorial sessions and I want to clear all the output results from the notebook before distributing it to the participants. This functionality exists in Juypter but I can't find it in Databricks. Any pointers?

  • 17259 Views
  • 7 replies
  • 2 kudos
Latest Reply
holly
Databricks Employee
  • 2 kudos

Yes! Run > â€ƒClear >  Clear all cell outputs Fun fact, this feature was made ~10 years ago when we realised all our customer demos looked very messy and had lots of spoilers in them!

  • 2 kudos
6 More Replies
kmaley
by New Contributor
  • 747 Views
  • 1 replies
  • 1 kudos

Concurrent append exception - Two streaming sources writing to same record on the delta table

Hi All, I have a scenario where there are 2 streaming sources Stream1( id, col1, col2) and Stream 2( id, col3, col4) and my delta table has columns (id, col1, col2, col3, col4). My requirement is to insert the record into the delta table if the corre...

  • 747 Views
  • 1 replies
  • 1 kudos
Latest Reply
Witold
Honored Contributor
  • 1 kudos

I would keep both write operations separate, i.e. they should write in own tables/partitions. In later stages (e.g. silver), you can easily merge them.

  • 1 kudos
kiranpeesa
by New Contributor
  • 795 Views
  • 1 replies
  • 0 kudos

error in notebook execution

Error in callback <bound method UserNamespaceCommandHook.post_run_cell of <dbruntime.DatasetInfo.UserNamespaceCommandHook object at 0x7f5790c07070>> (for post_run_cell)

  • 795 Views
  • 1 replies
  • 0 kudos
Latest Reply
Witold
Honored Contributor
  • 0 kudos

Can you show us your code?

  • 0 kudos
jesonora
by New Contributor II
  • 1205 Views
  • 1 replies
  • 0 kudos

Enable serverless in Delta Live Tables in Azure Databricks?

I'm trying to create a serverless DLT, as far I understood is in Public Review as listed here, Azure Databricks regions - Azure Databricks | Microsoft Learn. I've created a workspace in North Europe, but I cannot see the feature in preview.Could you ...

  • 1205 Views
  • 1 replies
  • 0 kudos
Latest Reply
jesonora
New Contributor II
  • 0 kudos

Hi @Retired_mod,Thanks for the quick response. I have checked my region is on Public Preview for DLT, North Europe. But I am not able to see the "Serverless" checkbox.Am I missing some detail?Thanks!

  • 0 kudos
RajaPalukuri
by New Contributor II
  • 2573 Views
  • 3 replies
  • 0 kudos

Databricks -Terraform- (condition_task)

Hi Team ,I am planning to create IF/ELSE condition task in databricks using terraform code . My requirement is Task A ( Extract records from DB and Count recs) --> Task B ( validate the counts using Condition_task) --> Task c ( load data if Task B va...

  • 2573 Views
  • 3 replies
  • 0 kudos
Latest Reply
hendrykarlar
New Contributor II
  • 0 kudos

 Implementing conditional logic in Databricks using Terraform involves setting up tasks and condition checks between them. Here's how you can structure your Terraform code to achieve the desired workflow:Step 1: Define Databricks Notebooks as TasksAs...

  • 0 kudos
2 More Replies
NCat
by New Contributor III
  • 14767 Views
  • 7 replies
  • 3 kudos

How can I start SparkSession out of Notebook?

Hi community,How can I start SparkSession out of Notebook?I want to split my Notebook into small Python modules, and I want to let some of them to call Spark functionality.

  • 14767 Views
  • 7 replies
  • 3 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 3 kudos

Just overtake Databricks sparksession.from pyspark.sql import SparkSession spark = SparkSession.getActiveSession()

  • 3 kudos
6 More Replies
amartinez
by New Contributor III
  • 5041 Views
  • 6 replies
  • 5 kudos

Workaround for GraphFrames not working on Delta Live Table?

According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: or...

  • 5041 Views
  • 6 replies
  • 5 kudos
Latest Reply
lprevost
Contributor II
  • 5 kudos

I'm also trying to use GraphFrames inside a DLT pipeline.   I get an error that graphframes not installed in the cluster.   i"m using it successfully in test notebooks using the ML version of the cluster.  Is there a way to use this inside a DLT job?

  • 5 kudos
5 More Replies
anni
by New Contributor II
  • 1464 Views
  • 2 replies
  • 0 kudos

Classroom setup Error

 Encountering error in running classroom setup command. Help me to resolve this issue. Thank you.

Screenshot_20240628-033819_Chrome.jpg
  • 1464 Views
  • 2 replies
  • 0 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 0 kudos

The error happens in the classroom-setup notebook you're running. It is not possible to debug with the information given. 

  • 0 kudos
1 More Replies
ndatabricksuser
by New Contributor
  • 2373 Views
  • 2 replies
  • 1 kudos

Vacuum and Streaming Issue

Hi User Community,Requesting some advice on the below issue please:I have 4 Databricks notebooks, 1 That ingests data from a Kafka topic (metric data from many servers) and dumps the data in parquet format into a specified location. My 2nd data brick...

Data Engineering
Delta Lake
optimize
spark
structured streaming
vacuum
  • 2373 Views
  • 2 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

Vacuuming is also a lot faster with inventory tables!

  • 1 kudos
1 More Replies
nolanlavender00
by New Contributor
  • 5595 Views
  • 2 replies
  • 1 kudos

Resolved! How to stop a Streaming Job based on time of the week

I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.

  • 5595 Views
  • 2 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)

  • 1 kudos
1 More Replies
aranjan99
by New Contributor III
  • 2758 Views
  • 3 replies
  • 2 kudos

system.billing.usage table missing data for jobs running in my databricks account

I have some jobs running on databricks. I can obtain their jobId from the Jobs UI or List Job Runs API.However when trying to get DBU usage for the corresponding jobs from system.billing.usage, I do not see the same job_id in that table. Its been mor...

  • 2758 Views
  • 3 replies
  • 2 kudos
dm7
by New Contributor II
  • 3256 Views
  • 1 replies
  • 0 kudos

Unit Testing DLT Pipelines

Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.We want to know how we can unit test the PySpark logic/transformations independently without having to s...

  • 3256 Views
  • 1 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels