cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JJ_LVS1
by New Contributor III
  • 3205 Views
  • 4 replies
  • 1 kudos

FiscalYear Start Period Is not Correct

Hi, I'm trying to create a calendar dimension including a fiscal year with a fiscal start of April 1. I'm using the fiscalyear library and am setting the start to month 4 but it insists on setting April to month 7.runtime 12.1My code snipet is:start_...

  • 3205 Views
  • 4 replies
  • 1 kudos
Latest Reply
DataEnginner
New Contributor II
  • 1 kudos

 import fiscalyear import datetime def get_fiscal_date(year,month,day): fiscalyear.setup_fiscal_calendar(start_month=4) v_fiscal_month=fiscalyear.FiscalDateTime(year, month, day).fiscal_month #To get the Fiscal Month v_fiscal_quarter=fiscalyea...

  • 1 kudos
3 More Replies
442027
by New Contributor II
  • 1461 Views
  • 1 replies
  • 1 kudos

Default delta log retention interval is different than in documentation?

It notes in the documentation here that the default delta log retention interval is 30 days - however when I create checkpoints in the delta log to trigger the cleanup - historical records from 30 days aren't removed; i.e. current day checkpoint is a...

  • 1461 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

you need to set SET TBLPROPERTIES ('delta.checkpointRetentionDuration' = '30 days',)

  • 1 kudos
Mrk
by New Contributor II
  • 8887 Views
  • 4 replies
  • 4 kudos

Resolved! Insert or merge into a table with GENERATED IDENTITY

Hi,When I create an identity column using the GENERATED ALWAYS AS IDENTITY statement and I try to INSERT or MERGE data into that table I keep getting the following error message:Cannot write to 'table', not enough data columns; target table has x col...

  • 8887 Views
  • 4 replies
  • 4 kudos
Latest Reply
Aboladebaba
New Contributor III
  • 4 kudos

You can run the INSERT by passing the subset of columns you want to provide values for... for example your insert statement would be something like:INSERT INTO target_table_with_identity_col(<list-of-cols-names-without-the-identity-column>SELECT(<lis...

  • 4 kudos
3 More Replies
ilarsen
by Contributor
  • 2391 Views
  • 2 replies
  • 0 kudos

Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

Hi. I am using structured streaming and auto loader to read json files, and it is automated by Workflow.  I am having difficulties with the job failing as schema changes are detected, but not retrying.  Hopefully someone can point me in the right dir...

  • 2391 Views
  • 2 replies
  • 0 kudos
Latest Reply
ilarsen
Contributor
  • 0 kudos

Another point I have realised, is that the task and the parent notebook (which then calls the child notebook that runs the auto loader part) does not fail if the schema-changed failure occurs during the auto loader process.  It's the child notebook a...

  • 0 kudos
1 More Replies
Aidonis
by New Contributor III
  • 27154 Views
  • 3 replies
  • 3 kudos

Copilot Databricks integration

Given Copilot has now been released as a paid for product. Do we have a timeline when it will be integrated into Databricks?Our team are using VScode alot for Copilot and we think it would be super awesome to have it on our Databricks environment. Ou...

  • 27154 Views
  • 3 replies
  • 3 kudos
Latest Reply
prasad_vaze
New Contributor III
  • 3 kudos

@Vartika no josephk didn't answer Aidan's question.  It's about comparing copilot with databricks assistant  and can copilot be used in databricks workspace?

  • 3 kudos
2 More Replies
xneg
by Contributor
  • 14160 Views
  • 12 replies
  • 9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

  • 14160 Views
  • 12 replies
  • 9 kudos
Latest Reply
Vartika
Databricks Employee
  • 9 kudos

Hey @Eugene Bikkinin​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

  • 9 kudos
11 More Replies
Michael_Galli
by Contributor III
  • 1924 Views
  • 1 replies
  • 0 kudos

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

I have a UC volume wil XLSX files, and would like to run a workflow when a new file arrives in the Volume.I was thinking of a workflow file arrival trigger.But that does not work when I add the physical ADLS location of the root folder:External locat...

Michael_Galli_0-1706024182211.png
  • 1924 Views
  • 1 replies
  • 0 kudos
Latest Reply
Michael_Galli
Contributor III
  • 0 kudos

Worked it out with Microsoft.-> only works with external volumes, not managed.https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers 

  • 0 kudos
Bram
by New Contributor II
  • 5514 Views
  • 7 replies
  • 0 kudos

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Dear, In the current setup, we are using dbt as a modeling tool for our data lakehouse.For a specific use case, we want to use the insert_overwrite strategy, where dbt will replace all data for a specific partition:Databricks configurations | dbt Dev...

  • 5514 Views
  • 7 replies
  • 0 kudos
Latest Reply
nad__
New Contributor II
  • 0 kudos

Hi!I have same issue with insert_overwrite on Databricks with SQL Warehouse. Do you have any solution or updates? Or is it still not supported by Databricks? 

  • 0 kudos
6 More Replies
ShlomoSQM
by New Contributor
  • 1690 Views
  • 2 replies
  • 0 kudos

Autoloader, toTable

"In autoloader there is the option ".toTable(catalog.volume.table_name)", I have an autoloder script that reads all the files from a source volume in unity catalog, inside the source I have two different files with two different schemas.I want to sen...

  • 1690 Views
  • 2 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @ShlomoSQM, looks like @shan_chandra suggested a feasible solution, just to add a little more context this is how you can achieve the same if you have a column that can help you identify what is type1 and type 2file_type1_stream = readStream.opti...

  • 0 kudos
1 More Replies
Data_Engineeri7
by New Contributor
  • 2627 Views
  • 3 replies
  • 0 kudos

Global or environment parameters.

Hi All,Need a help on creating utility file that can be use in pyspark notebook.Utility file contain variables like database and schema names. So I need to pass this variables in other notebook wherever I am using database and schema.Thanks   

  • 2627 Views
  • 3 replies
  • 0 kudos
Latest Reply
KSI
New Contributor II
  • 0 kudos

You can use:${param_catalog}.schema.tablename.Pass actual value in the notebook through a job param "param_catalog" or widget utils through text called "param_catalog"

  • 0 kudos
2 More Replies
MarthinusBosma1
by New Contributor II
  • 1537 Views
  • 3 replies
  • 0 kudos

Unable to DROP TABLE: "Lock wait timeout exceeded"

We have a table where the underlying data has been dropped, and seemingly something else must have gone wrong as well, and we want to just get rid of the whole table and schema, but running "DROP TABLE schema.table" is throwing the following error:or...

  • 1537 Views
  • 3 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

The table needs to be dropped from the backend. If you can raise a ticket, support team can do it for you. 

  • 0 kudos
2 More Replies
Data_Engineer3
by Contributor III
  • 5001 Views
  • 5 replies
  • 0 kudos

Resolved! Need to define the struct and array of struct field colum in the delta live table(dlt) in databrick.

I want to create the columns with datatype struct and array of struct datatype in the DLT live tables, will it be possible, if possible could you share the sample for the same.Thanks.

  • 5001 Views
  • 5 replies
  • 0 kudos
Latest Reply
Data_Engineer3
Contributor III
  • 0 kudos

I have created DLT live tables pipeline, In Job UI, i can able to see only steps and if any failure happened it show only error at that stage.But if i use any log using print, it doesn't show the logs in the console or any where. how can i see the lo...

  • 0 kudos
4 More Replies
kiko_roy
by Contributor
  • 2889 Views
  • 3 replies
  • 1 kudos

Resolved! IsBlindAppend config changes

Hello Allcan someone please suggest me how can I change the config IsBlindAppend true from false. I need to do this not for a data table but a custom log table .Also is there any concern If I toggle the value as standard practices. pls suggest

  • 2889 Views
  • 3 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi, IsBlindAppend is not a config but an operation metrics that is used in Delta Lake History. The value of this changes based on the type of operation performed on Delta table. https://docs.databricks.com/en/delta/history.html

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels