cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cmilligan
by Contributor II
  • 5227 Views
  • 3 replies
  • 2 kudos

Resolved! Orchestrate run of a folder

I'm needing to run the contents of a folder, which can change over time. Is there a way to set up a notebook that can orchestrate running all notebooks in a folder? My though was if I could retrieve a list of the notebooks I could create a loop to ru...

  • 5227 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

List all notebooks by making API call and then run them by using dbutils.notebook.run:import requests ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() host_name = ctx.tags().get("browserHostName").get() host_token = ctx.apiToke...

  • 2 kudos
2 More Replies
al_joe
by Contributor
  • 8049 Views
  • 5 replies
  • 5 kudos

Resolved! How do I clone a repo in Community Edition?

The e-learning videos on DBacademy say we should click on "Repos" and "Add Repo"I cannot find this in my Community Edition UII am a little frustrated that there are so many different versions of the UI and many videos show UI options that we cannot ...

  • 8049 Views
  • 5 replies
  • 5 kudos
Latest Reply
Psybelo
New Contributor II
  • 5 kudos

Hello, just import the .dbc file direct into your user workspace, as explained by Databricks here:https://www.databricks.training/step-by-step/importing-courseware-from-github/The simplest way

  • 5 kudos
4 More Replies
Gim
by Contributor
  • 75725 Views
  • 3 replies
  • 9 kudos

Best practice for logging in Databricks notebooks?

What is the best practice for logging in Databricks notebooks? I have a bunch of notebooks that run in parallel through a workflow. I would like to keep track of everything that happens such as errors coming from a stream. I would like these logs to ...

  • 75725 Views
  • 3 replies
  • 9 kudos
Latest Reply
karthik_p
Databricks Partner
  • 9 kudos

@Gimwell Young​ AS @Debayan Mukherjee​ mentioned if you configure verbose logging in workspace level, logs will be moved to your storage bucket that you have provided during configuration. from there you can pull logs into any of your licensed log mo...

  • 9 kudos
2 More Replies
Gopi0403
by Databricks Partner
  • 5981 Views
  • 7 replies
  • 0 kudos

Issue on Cluster creating new workspace: I Cannot able to create a new workspace in Databricks using Quickstart. When I am creating the workspace I ge...

Issue on Cluster creating new workspace: I Cannot able to create a new workspace in Databricks using Quickstart. When I am creating the workspace I get the Rollback failed error from AWS eventhoughI have given all the required informations. Kindly he...

  • 5981 Views
  • 7 replies
  • 0 kudos
Latest Reply
Prabakar
Databricks Employee
  • 0 kudos

hi @Gopichandran N​ could you please add more information on the issue that you are facing. could you please add the screenshot of the error?

  • 0 kudos
6 More Replies
-werners-
by Esteemed Contributor III
  • 3604 Views
  • 2 replies
  • 17 kudos

Autoloader: how to avoid overlap in files

I'm thinking of using autoloader to process files being put on our data lake.Let's say f.e. every 15 minutes, a parquet files is written. These files however contain overlapping data.Now, every 2 hours I want to process the new data (autoloader) and...

  • 3604 Views
  • 2 replies
  • 17 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 17 kudos

What about forEachBatch and then MERGE?Alternatively, run another process that will clean updates using the window function, as you said.

  • 17 kudos
1 More Replies
Data_Engineer3
by Contributor III
  • 4855 Views
  • 1 replies
  • 7 kudos

Move folder from dbfs location to user workspace directory in azure databricks

I need to move group of files(python or scala file from)or folder from dbfs location to user workspace directory in azure databricks to do testing on file.Its verify difficult to upload each file one by one into the user workspace directory, so is it...

  • 4855 Views
  • 1 replies
  • 7 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 7 kudos

dbutils.fs.mv or dbutils.fs.cp can help you.

  • 7 kudos
weldermartins
by Honored Contributor
  • 4504 Views
  • 3 replies
  • 13 kudos

Resolved! SCD type 2

Hey guys. I don't know if I'm tired, I ask for your help, but I don't understand where is the difference in the number of fields.Thanks! I'm replicating SCD type 2 based on this documentation:https://docs.delta.io/latest/delta-update.html#slowly-chan...

SCD 2
  • 4504 Views
  • 3 replies
  • 13 kudos
Latest Reply
weldermartins
Honored Contributor
  • 13 kudos

@Werner Stinckens​ ?

  • 13 kudos
2 More Replies
Chris_Konsur
by New Contributor III
  • 3665 Views
  • 2 replies
  • 3 kudos

Resolved! to configure Autoloader in File notification mode to access the Premium BlobStorage

First, I tried to configure Autoloader in File notification mode to access the Premium BlobStorage 'databrickspoc1' (PREMIUM , ADLS Gen2). I get this Error: I get this errorcom.microsoft.azure.storage.StorageException: I checked my storage account->N...

  • 3665 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

When you created a premium account, have you chosen "Premium account type" as "File shares"? It should be "Block blobs".

  • 3 kudos
1 More Replies
Priya_Mani
by New Contributor II
  • 2865 Views
  • 3 replies
  • 4 kudos

Databricks Notebook dataframe loading duplicate data in SQL table

Hi, I am trying to load data from datalake into SQL table using "SourceDataFrame.write" operation in a Notebook using apache spark.This seems to be loading duplicates at random times. The logs don't give much information and I am not sure what else t...

  • 2865 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

can you elaborate a bit more on this notebook?And also what databricks runtime version?

  • 4 kudos
2 More Replies
User16844588229
by Databricks Employee
  • 14739 Views
  • 9 replies
  • 4 kudos

docs.databricks.com

Navigate and discover content more efficiently with Search in DatabricksHi all- Justin Kim here, I'm the Databricks product manager responsible for content organization and navigation in our product, which includes Search. Great to see you on the Com...

Search bar Search modal
  • 14739 Views
  • 9 replies
  • 4 kudos
Latest Reply
karthik_p
Databricks Partner
  • 4 kudos

@Justin Kim​ Thank you for quick reply, usually Last Modified is Recent changes right (that can be last 24hrs or cap limit that we add), whereas anytime they should show all Notebooks or Tables from start. that is where i got confused

  • 4 kudos
8 More Replies
Sandy21
by New Contributor III
  • 1884 Views
  • 1 replies
  • 3 kudos

Queries with running REST API command in databricks to create a Job

What happens when jobs/create REST API command is run multiple times(say 3 times) with the same JSON configuration? Will 3 jobs are created with the same name or only 1 job will be created?

  • 1884 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi @Santhosh Raj​ , logically only one job should be created.

  • 3 kudos
Dicer
by Valued Contributor
  • 8471 Views
  • 2 replies
  • 1 kudos

Resolved! PARSE_SYNTAX_ERROR: Syntax error at or near 'VACUUM'

I tried to VACUUM a delta table, but there is a Syntax error.Here is the code:%sql set spark.databricks.delta.retentionDurationCheck.enabled = False   VACUUM test_deltatable

  • 8471 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ravi
Databricks Employee
  • 1 kudos

@Cheuk Hin Christophe Poon​ Missing semi-colon at end of line 2?%sql set spark.databricks.delta.retentionDurationCheck.enabled = False; VACUUM test_deltatable

  • 1 kudos
1 More Replies
a2_ish
by New Contributor II
  • 3395 Views
  • 2 replies
  • 2 kudos

How to write the delta files for managed table? how can I define the sink

I have tried below code to write data in a delta table and save the delta files in a sink. I tried using azure storage as sink but I get error as not enough access, I can confirm that I have enough access to azure storage, however I can run the below...

  • 3395 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ankit Kumar​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 2 kudos
1 More Replies
pret
by New Contributor II
  • 5157 Views
  • 4 replies
  • 0 kudos

How can I run a scala command line in databricks?

I wish to run a scala command, which I believe would normally be run from a scala command line rather than from within a notebook. It happens to be:scala [-cp scalatest-<version>.jar:...] org.scalatest.tools.Runner [arguments](scalatest_2.12__3.0.8.j...

  • 5157 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @David Vardy​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 0 kudos
3 More Replies
Labels