cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992185
by Databricks Employee
  • 11615 Views
  • 2 replies
  • 3 kudos

Databricks Auto-Loader vs. Delta Live Tables

What is the difference between Databricks Auto-Loader and Delta Live Tables? Both seem to manage ETL for you but I'm confused on where to use one vs. the other.

  • 11615 Views
  • 2 replies
  • 3 kudos
Latest Reply
Steve_Lyle_BPCS
New Contributor II
  • 3 kudos

You say "...__would__ be a piece..." and "...DLT __would__ pick up...".Is DLT built upon AL?

  • 3 kudos
1 More Replies
Maxi1693
by New Contributor II
  • 3160 Views
  • 1 replies
  • 0 kudos

Resolved! Error java.lang.NullPointerException using Autoloader

Hi!I am pulling data from a Blob storage to Databrick using Autoloader. This process is working well for almost 10 resources, but for a specific one I am getting this error  java.lang.NullPointerException.Looks like this issue in when I connect to th...

  • 3160 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@Maxi1693  - The value for the schemaEvolutionMode should be a string. could you please try changing the below from .option("cloudFiles.schemaEvolutionMode", None)    to  .option("cloudFiles.schemaEvolutionMode", "none") and let us know. Refe...

  • 0 kudos
FurqanAmin
by New Contributor II
  • 2894 Views
  • 5 replies
  • 1 kudos

Logs not coming up in the UI - while being written to DBFS

I have a few spark-submit jobs that are being run via Databricks workflows. I have configured logging in DBFS and specified a location in my GCS bucket.The logs are present in that GCS bucket for the latest run but whenever I try to view them from th...

FurqanAmin_1-1705921514830.png FurqanAmin_0-1705921735529.png FurqanAmin_0-1705922202627.png
Data Engineering
logging
LOGS
ui
  • 2894 Views
  • 5 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Yes, I meant to set it to None. Is the issue specific to any particular cluster? Or do you see the issue with all the clusters in your workspace?

  • 1 kudos
4 More Replies
Noman_Q
by New Contributor II
  • 4579 Views
  • 2 replies
  • 1 kudos

Error Running Delta Live Pipeline.

Hi Guys, I am new to the Delta pipeline. I have created a pipeline and now when i try to run the pipeline i get the error message "PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator" even though I can crea...

  • 4579 Views
  • 2 replies
  • 1 kudos
Latest Reply
Noman_Q
New Contributor II
  • 1 kudos

Thank you for responding @Palash01 . thanks for giving me the direction so to get around it i had to get permission to "unrestricted cluster creation". 

  • 1 kudos
1 More Replies
joeyslaptop
by New Contributor II
  • 7697 Views
  • 5 replies
  • 3 kudos

How to add a column to a new table containing the original source filenames in DataBricks.

If this isn't the right spot to post this, please move it or refer me to the right area.I recently learned about the "_metadata.file_name".  It's not quite what I need.I'm creating a new table in DataBricks and want to add a USR_File_Name column cont...

Data Engineering
Databricks
filename
import
SharePoint
Upload
  • 7697 Views
  • 5 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi, Could you please elaborate more on the expectation here? 

  • 3 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 1063 Views
  • 1 replies
  • 0 kudos

Cluster types pricing

Hy guys,How can I get the pricing of cluster types (standard_D*, standard_E*, standart_F*, etc.) ?Im doing a study to decrease the price of my actual cluster.Have any idea ?Thank you, thank you

  • 1063 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Hey, you can use the pricing calculator here: https://www.databricks.com/product/pricing/product-pricing/instance-types

  • 0 kudos
AChang
by New Contributor III
  • 8244 Views
  • 1 replies
  • 0 kudos

Resolved! Move a folder from Workspace to DBFS

So, I didn't quite set up my model training output directory correctly, and it saved all my model files to the workspace in the git repo I was working in. I am trying to move these files to DBFS, but when I try using dbutils.fs.mv, I get this error: ...

  • 8244 Views
  • 1 replies
  • 0 kudos
Latest Reply
AChang
New Contributor III
  • 0 kudos

Figured it out, just had to use the !cp command, here is what I did, worked perfectly.!cp -r /Workspace/Repos/$RESTOFPATH /dbfs/folder and it put the entire folder i was trying to move, into that dbfs folder.

  • 0 kudos
JJ_LVS1
by New Contributor III
  • 4198 Views
  • 4 replies
  • 1 kudos

FiscalYear Start Period Is not Correct

Hi, I'm trying to create a calendar dimension including a fiscal year with a fiscal start of April 1. I'm using the fiscalyear library and am setting the start to month 4 but it insists on setting April to month 7.runtime 12.1My code snipet is:start_...

  • 4198 Views
  • 4 replies
  • 1 kudos
Latest Reply
DataEnginner
New Contributor II
  • 1 kudos

 import fiscalyear import datetime def get_fiscal_date(year,month,day): fiscalyear.setup_fiscal_calendar(start_month=4) v_fiscal_month=fiscalyear.FiscalDateTime(year, month, day).fiscal_month #To get the Fiscal Month v_fiscal_quarter=fiscalyea...

  • 1 kudos
3 More Replies
442027
by New Contributor II
  • 2022 Views
  • 1 replies
  • 1 kudos

Default delta log retention interval is different than in documentation?

It notes in the documentation here that the default delta log retention interval is 30 days - however when I create checkpoints in the delta log to trigger the cleanup - historical records from 30 days aren't removed; i.e. current day checkpoint is a...

  • 2022 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

you need to set SET TBLPROPERTIES ('delta.checkpointRetentionDuration' = '30 days',)

  • 1 kudos
Mrk
by New Contributor II
  • 11353 Views
  • 4 replies
  • 4 kudos

Resolved! Insert or merge into a table with GENERATED IDENTITY

Hi,When I create an identity column using the GENERATED ALWAYS AS IDENTITY statement and I try to INSERT or MERGE data into that table I keep getting the following error message:Cannot write to 'table', not enough data columns; target table has x col...

  • 11353 Views
  • 4 replies
  • 4 kudos
Latest Reply
Aboladebaba
New Contributor III
  • 4 kudos

You can run the INSERT by passing the subset of columns you want to provide values for... for example your insert statement would be something like:INSERT INTO target_table_with_identity_col(<list-of-cols-names-without-the-identity-column>SELECT(<lis...

  • 4 kudos
3 More Replies
ilarsen
by Contributor
  • 3032 Views
  • 2 replies
  • 0 kudos

Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

Hi. I am using structured streaming and auto loader to read json files, and it is automated by Workflow.  I am having difficulties with the job failing as schema changes are detected, but not retrying.  Hopefully someone can point me in the right dir...

  • 3032 Views
  • 2 replies
  • 0 kudos
Latest Reply
ilarsen
Contributor
  • 0 kudos

Another point I have realised, is that the task and the parent notebook (which then calls the child notebook that runs the auto loader part) does not fail if the schema-changed failure occurs during the auto loader process.  It's the child notebook a...

  • 0 kudos
1 More Replies
Aidonis
by New Contributor III
  • 37652 Views
  • 3 replies
  • 3 kudos

Copilot Databricks integration

Given Copilot has now been released as a paid for product. Do we have a timeline when it will be integrated into Databricks?Our team are using VScode alot for Copilot and we think it would be super awesome to have it on our Databricks environment. Ou...

  • 37652 Views
  • 3 replies
  • 3 kudos
Latest Reply
prasad_vaze
New Contributor III
  • 3 kudos

@Vartika no josephk didn't answer Aidan's question.  It's about comparing copilot with databricks assistant  and can copilot be used in databricks workspace?

  • 3 kudos
2 More Replies
xneg
by Contributor
  • 17147 Views
  • 12 replies
  • 9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

  • 17147 Views
  • 12 replies
  • 9 kudos
Latest Reply
Vartika
Databricks Employee
  • 9 kudos

Hey @Eugene Bikkinin​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

  • 9 kudos
11 More Replies
Michael_Galli
by Contributor III
  • 2342 Views
  • 1 replies
  • 0 kudos

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

I have a UC volume wil XLSX files, and would like to run a workflow when a new file arrives in the Volume.I was thinking of a workflow file arrival trigger.But that does not work when I add the physical ADLS location of the root folder:External locat...

Michael_Galli_0-1706024182211.png
  • 2342 Views
  • 1 replies
  • 0 kudos
Latest Reply
Michael_Galli
Contributor III
  • 0 kudos

Worked it out with Microsoft.-> only works with external volumes, not managed.https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers 

  • 0 kudos
ShlomoSQM
by New Contributor
  • 2294 Views
  • 2 replies
  • 0 kudos

Autoloader, toTable

"In autoloader there is the option ".toTable(catalog.volume.table_name)", I have an autoloder script that reads all the files from a source volume in unity catalog, inside the source I have two different files with two different schemas.I want to sen...

  • 2294 Views
  • 2 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @ShlomoSQM, looks like @shan_chandra suggested a feasible solution, just to add a little more context this is how you can achieve the same if you have a column that can help you identify what is type1 and type 2file_type1_stream = readStream.opti...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels