cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Srini41
by New Contributor
  • 545 Views
  • 1 replies
  • 0 kudos

org.rocksdb.RocksDBException: No space left on device

Structured stream is failing intermittently with following message. org.rocksdb.RocksDBException: No space left on device .Do we have a settings on databricks to assigned limited diskpace to the checkpoint tracking?Appreciate any help with resolving ...

  • 545 Views
  • 1 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Can you share details like the DBR version, and are you using anyForEachBatch?

  • 0 kudos
my_community2
by New Contributor III
  • 16733 Views
  • 9 replies
  • 6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL  table. An exception is thrown if the table does not exist.In case of an external table...

image.png
  • 16733 Views
  • 9 replies
  • 6 kudos
Latest Reply
MajdSAAD_7953
New Contributor II
  • 6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days 

  • 6 kudos
8 More Replies
PearceR
by New Contributor III
  • 13862 Views
  • 4 replies
  • 1 kudos

Resolved! custom upsert for delta live tables apply_changes()

Hello community :).I am currently implementing some pipelines using DLT. They are working great for my medalion architecture for landed json in bronze -> silver (using apply_changes) then materialized gold views ontop.However, I am attempting to crea...

  • 13862 Views
  • 4 replies
  • 1 kudos
Latest Reply
Harsh141220
New Contributor II
  • 1 kudos

Is it possible to have custom upserts for streaming tables in delta live tables?Im getting the error:pyspark.errors.exceptions.captured.AnalysisException: `blusmart_poc.information_schema.sessions` is not a Delta table.

  • 1 kudos
3 More Replies
Ambika
by New Contributor
  • 5299 Views
  • 2 replies
  • 1 kudos

Error while Resetting my Community Edition Password

I recently tried to create my account with Databricks Community Edition. I have singed up for it and received verification email. After that I have to reset my password. But while doing so I am always getting the following error. Can someone help me ...

image
  • 5299 Views
  • 2 replies
  • 1 kudos
Latest Reply
swredb
New Contributor II
  • 1 kudos

Receiving the same error when creating a new account - "An error has occurred. Please try again later."

  • 1 kudos
1 More Replies
pinaki1
by New Contributor III
  • 1251 Views
  • 1 replies
  • 0 kudos

PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER] It appears that you are attempting to reference

Getting The above error for this lineresult_df.rdd.foreachPartition(self.process_partition)

  • 1251 Views
  • 1 replies
  • 0 kudos
Latest Reply
Pradeep54
Databricks Employee
  • 0 kudos

The error message "CONTEXT_ONLY_VALID_ON_DRIVER" indicates that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that runs on workers. This is ...

  • 0 kudos
Saf4Databricks
by New Contributor III
  • 2949 Views
  • 4 replies
  • 0 kudos

Reading JSON from Databricks Workspace

I am using second example from Databricks` official document here: Work with workspace files. But I'm getting following error:Question: What could be a cause of the error, and how can we fix it?ERROR: Since Spark 2.3, the queries from raw JSON/CSV fi...

  • 2949 Views
  • 4 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Saf4Databricks ,As you said, you probably need to add multiline options to make it work. You can use this option when creating temporary view or using pyspark api. Below is example of creating temporary view: CREATE TEMPORARY VIEW multilineJson U...

  • 0 kudos
3 More Replies
jaredrohe
by New Contributor III
  • 4970 Views
  • 5 replies
  • 2 kudos

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

Hello,I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here...

Data Engineering
Access Mode
Delta Live Tables
Instance Profiles
No Isolation Shared
  • 4970 Views
  • 5 replies
  • 2 kudos
Latest Reply
AcrobaticMonkey
New Contributor II
  • 2 kudos

Same issue here, the instance profile works fine for both No isolation and single access mode but not for shared

  • 2 kudos
4 More Replies
IsmaelHenzel1
by New Contributor II
  • 1187 Views
  • 1 replies
  • 1 kudos

Resolved! Delta Live Tables - ForeachBatch

I am wondering how to create complex streaming queries using Delta Live Tables (DLT). I can't find a way to use foreachBatch with it, and this is causing me some difficulty. I need to create a window using a lag without a time range, which is not pos...

  • 1187 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi @IsmaelHenzel1,How are you doing today?As per understanding, Consider using Delta Live Tables (DLT) materialized views to handle complex streaming logic as DLT doesn’t currently support foreachBatch. For windowing with lag, DLT materialized views ...

  • 1 kudos
indianaDE
by New Contributor
  • 634 Views
  • 1 replies
  • 0 kudos

%run and Repos path error

We have one notebook(N1) which uses the %run command to call a second notebook(N2) which also calls a third notebook(N3) using %run. When running the %run cell within N2, N3 is successfully called and run. When running the %run cell within N1 we  get...

  • 634 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi @indianaDE,How are you doing today?As per my understanding,  Consider checking the relative paths you’re using in the %run commands, as the recent update might have changed how Databricks resolves paths for notebooks under the new Workspace/Repos ...

  • 0 kudos
oakhill
by New Contributor III
  • 1909 Views
  • 3 replies
  • 0 kudos

Cannot develop Delta Live Tables using Runtime 14 or 15.

When trying to develop a Delta Live Table-pipeline with my very generic clusters (runtime 14.3 or 15.4 LTS), I get th following error:  The Delta Live Tables (DLT) module is not supported on this cluster. You should either create a new pipeline or us...

  • 1909 Views
  • 3 replies
  • 0 kudos
Latest Reply
zoe-durand
Databricks Employee
  • 0 kudos

Hi @oakhill , as stated above, in order for DLT notebooks to work well you need to create a pipeline (which it sounds like you did!). You are correct - running a notebook cell will trigger a "Validate" action on the entire pipeline code. Alternativel...

  • 0 kudos
2 More Replies
Sampath_Kumar
by New Contributor II
  • 12533 Views
  • 2 replies
  • 0 kudos

Volume Limitations

I have a use case to create a table using JSON files. There are 36 million files in the upstream(S3 bucket). I just created a volume on top of it. So the volume has 36M files.  I'm trying to form a data frame by reading this volume using the below sp...

  • 12533 Views
  • 2 replies
  • 0 kudos
yagmur
by New Contributor II
  • 826 Views
  • 1 replies
  • 0 kudos

Authentication error on Git status fetch

when i try to change the branch i cannot, it says i need to create a repo. then i try to create repo but it says your git credentials need to be corrected. i try both access token and also azure active directory but still not working. do i need anoth...

  • 826 Views
  • 1 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Hi Yagmur, You should not need admin access in the workspace to create Git folders, but you need access to the remote repository you are trying to clone. Can you check your token by cloning the remote repo locally? If you continue to run into issues,...

  • 0 kudos
JeremyFord
by New Contributor III
  • 1336 Views
  • 2 replies
  • 0 kudos

Resolved! Asset Bundles - Workspace or GIT?

We are just starting down the path of migrating from DBX to DAB. I have been able to successfully use DAB as per all the available documentation.  We are very keen to use DAB for development deployments by the data engineering team and the benefits i...

  • 1336 Views
  • 2 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Hi Jeremy,  When using a DAB, the job reads from the workspace source, not the Git source. We will update the doc page to include DAB as an option and specifically call out this point to avoid future confusion. Check out this example in our talk wher...

  • 0 kudos
1 More Replies
drag7ter
by Contributor
  • 1480 Views
  • 1 replies
  • 0 kudos

Configure Service Principle access to GiLab

I'm facing an issue while trying to run my job in db and my notebooks located in Git Lab. When I run job under my personal user_Id it works fine, because I added Git Lab token to my user_Id profile and job able to pull branch from repository. But whe...

  • 1480 Views
  • 1 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Hello from the Databricks Git PM: We have a section in the documentation for setting up Git credentials for a SP. The important step is to use the OBO token for the SP when you call the git credential API. https://docs.databricks.com/en/repos/ci-cd-t...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels