cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

oakhill
by New Contributor III
  • 2491 Views
  • 3 replies
  • 0 kudos

Cannot develop Delta Live Tables using Runtime 14 or 15.

When trying to develop a Delta Live Table-pipeline with my very generic clusters (runtime 14.3 or 15.4 LTS), I get th following error:  The Delta Live Tables (DLT) module is not supported on this cluster. You should either create a new pipeline or us...

  • 2491 Views
  • 3 replies
  • 0 kudos
Latest Reply
zoe-durand
Databricks Employee
  • 0 kudos

Hi @oakhill , as stated above, in order for DLT notebooks to work well you need to create a pipeline (which it sounds like you did!). You are correct - running a notebook cell will trigger a "Validate" action on the entire pipeline code. Alternativel...

  • 0 kudos
2 More Replies
Sampath_Kumar
by New Contributor II
  • 13303 Views
  • 2 replies
  • 0 kudos

Volume Limitations

I have a use case to create a table using JSON files. There are 36 million files in the upstream(S3 bucket). I just created a volume on top of it. So the volume has 36M files.  I'm trying to form a data frame by reading this volume using the below sp...

  • 13303 Views
  • 2 replies
  • 0 kudos
yagmur
by New Contributor II
  • 1027 Views
  • 1 replies
  • 0 kudos

Authentication error on Git status fetch

when i try to change the branch i cannot, it says i need to create a repo. then i try to create repo but it says your git credentials need to be corrected. i try both access token and also azure active directory but still not working. do i need anoth...

  • 1027 Views
  • 1 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Hi Yagmur, You should not need admin access in the workspace to create Git folders, but you need access to the remote repository you are trying to clone. Can you check your token by cloning the remote repo locally? If you continue to run into issues,...

  • 0 kudos
JeremyFord
by New Contributor III
  • 1733 Views
  • 2 replies
  • 0 kudos

Resolved! Asset Bundles - Workspace or GIT?

We are just starting down the path of migrating from DBX to DAB. I have been able to successfully use DAB as per all the available documentation.  We are very keen to use DAB for development deployments by the data engineering team and the benefits i...

  • 1733 Views
  • 2 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Hi Jeremy,  When using a DAB, the job reads from the workspace source, not the Git source. We will update the doc page to include DAB as an option and specifically call out this point to avoid future confusion. Check out this example in our talk wher...

  • 0 kudos
1 More Replies
drag7ter
by Contributor
  • 1767 Views
  • 1 replies
  • 0 kudos

Configure Service Principle access to GiLab

I'm facing an issue while trying to run my job in db and my notebooks located in Git Lab. When I run job under my personal user_Id it works fine, because I added Git Lab token to my user_Id profile and job able to pull branch from repository. But whe...

  • 1767 Views
  • 1 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Hello from the Databricks Git PM: We have a section in the documentation for setting up Git credentials for a SP. The important step is to use the OBO token for the SP when you call the git credential API. https://docs.databricks.com/en/repos/ci-cd-t...

  • 0 kudos
Prasad_Koneru
by New Contributor III
  • 1654 Views
  • 1 replies
  • 0 kudos

How to export metadata of catalog objects

Hi All,I want to export metadata of catalog objects (schemas, tables, volumes, functions models) and import the metadata to another catalog.So do we have any readymade process/notebook/method/api is available to do this?Please help on this.Thanks in ...

  • 1654 Views
  • 1 replies
  • 0 kudos
Latest Reply
timsale
New Contributor II
  • 0 kudos

What about exporting data from Unity Catalog so we can sync with internal systems?

  • 0 kudos
ChrisLawford_n1
by New Contributor III
  • 1259 Views
  • 1 replies
  • 0 kudos

Delta Live Tables: How to determine what batch you have processed up to?

Hello,I am trying to use delta live tables in a production setting but I am having an issue in ensuring that I will be able to confirm the status of the data that the various tables have processed in the pipeline.In the most basic case let me imagine...

  • 1259 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @ChrisLawford_n1 ,Maybe instead of passing table name to cloud_file_state function, pass checkpoint location of DLT pipeline. In your function you can add some condition to check if that location exists.To find checkpoint location for DLT pipeline...

  • 0 kudos
tonyd
by New Contributor II
  • 889 Views
  • 1 replies
  • 1 kudos
  • 889 Views
  • 1 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @tonyd ,If you want to use serverless, first of all you need to enable it:When enabled, when sending the request to the REST API simply do not specify the cluster: import requests import json # Your Databricks domain DATABRICKS_DOMAIN = '<>' # Pe...

  • 1 kudos
Paddy_chu
by New Contributor III
  • 31793 Views
  • 3 replies
  • 3 kudos

How to restart the kernel on my notebook in databricks?

while installing a python package on my databricks notebook, I kept getting a message saying that: "Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages."I've tried restarting my cluster, also detach ...

error message
  • 31793 Views
  • 3 replies
  • 3 kudos
Latest Reply
johnb1
Contributor
  • 3 kudos

@Evan_MCK Follow-up question:When other notebooks run Python code on the same cluster, will those runs be aborted when dbutils.library.restartPython() is called?

  • 3 kudos
2 More Replies
Raja_Databricks
by New Contributor III
  • 11711 Views
  • 6 replies
  • 6 kudos

Resolved! Liquid Clustering With Merge

Hi there,I'm working with a large Delta table (2TB) and I'm looking for the best way to efficiently update it with new data (10GB). I'm particularly interested in using Liquid Clustering for faster queries, but I'm unsure if it supports updates effic...

  • 11711 Views
  • 6 replies
  • 6 kudos
Latest Reply
RV-Gokul
New Contributor II
  • 6 kudos

@youssefmrini @erigaud I have a similar issue, and I've pretty much tried the solution mentioned above. However, I'm not noticing any changes when I use a temporary table or persist the table.My main table contains 3.1 terabytes of data with 42 billi...

  • 6 kudos
5 More Replies
mishravishakha
by New Contributor
  • 2324 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to login Databric partner academy account.

I have registered on databricks partner academy but could not confirm the registration through the mail now that the link is expired and I am unable to log in databricks partner academy account.Please help me with this issue.

  • 2324 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vijay6
New Contributor II
  • 2 kudos

for some reason Im not able to access the partner academy

  • 2 kudos
2 More Replies
erigaud
by Honored Contributor
  • 1371 Views
  • 2 replies
  • 2 kudos

Resolved! DLT - Unity catalog and volume - Dynamically access volume path

Hello, We're using a dlt pipeline using an autoloader that reads from a volume inside Unity catalogThe path of the volume is /Volumes/<my-catalog>/...How can I dynamically access the catalog value of the dlt pipeline to use it in the code ? I don't w...

  • 1371 Views
  • 2 replies
  • 2 kudos
Latest Reply
erigaud
Honored Contributor
  • 2 kudos

Works perfectly, thank you ! It's a shame the documentation does not detail that use case 

  • 2 kudos
1 More Replies
rockybhai
by New Contributor II
  • 873 Views
  • 1 replies
  • 3 kudos

need urgent help

i am bringing 13000gb of data from redhsift to databricks by reading through spark and then wrting it has delta table so what is the best cluster configuration can you suggest and also wokrer nodes ....if i need to this to be done in 1hr

Data Engineering
clusteconfiguration
Databricks
dataengineering
redhsift
spark
  • 873 Views
  • 1 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi  @rockybhai  ,Transferring 13 TB of data from Amazon Redshift to Databricks and writing it as a Delta table within 1 hour is a significant task.Key ConsiderationsNetwork Bandwidth:Data Transfer Rate: To move 13 TB in 1 hour, you need a sustained d...

  • 3 kudos
Adam_Runarsson
by New Contributor II
  • 1265 Views
  • 3 replies
  • 0 kudos

Autoloader: Backfill on millions of files

Hi all!So I've been using Autoloader with File Notification mode against Azure to great success. Once past all the setup, it's rather seamless to use. I did have some issues in the beginning which is related to my questionThe storage account I'm work...

  • 1265 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

The docs are pretty sparse on the backfill process, but I think backfill won't just do a scan of the directory but will instead read the checkpoint file.  That seems logical to me anyways.

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels