cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dp15
by Contributor
  • 1969 Views
  • 1 replies
  • 1 kudos

Using UDF in an insert command

Hi,I am trying to use a UDF to get the last day of the month and use the boolean result of the function in an insert command. Please find herewith the function and the my query.function:import calendarfrom datetime import datetime, date, timedeltadef...

  • 1969 Views
  • 1 replies
  • 1 kudos
Latest Reply
Dp15
Contributor
  • 1 kudos

Thank you @Retired_mod for your detailed explanation

  • 1 kudos
Kroy
by Contributor
  • 17294 Views
  • 7 replies
  • 1 kudos

Resolved! What is difference between streaming and streaming live table

Can anyone explain in layman what is difference between Streaming and streaming live table ?

  • 17294 Views
  • 7 replies
  • 1 kudos
Latest Reply
CharlesReily
New Contributor III
  • 1 kudos

Streaming, in a broad sense, refers to the continuous flow of data over a network. It allows you to watch or listen to content in real-time without having to download the entire file first.  A "Streaming Live Table" might refer to a specific type of ...

  • 1 kudos
6 More Replies
kiko_roy
by Contributor
  • 10872 Views
  • 3 replies
  • 1 kudos

Permission error loading dataframe from azure unity catalog to GCS bucket

I am creating a data frame by reading a table's data residing in Azure backed unity catalog. I need to write the df or file to GCS bucket. I have configured the spark cluster config using the GCP service account json values.on running : df1.write.for...

Data Engineering
GCS bucket
permission error
  • 10872 Views
  • 3 replies
  • 1 kudos
Latest Reply
ruloweb
New Contributor II
  • 1 kudos

Hi, is there any terraform resource to apply this GRANT or this have to be done always manually?

  • 1 kudos
2 More Replies
leireroman
by New Contributor III
  • 1364 Views
  • 1 replies
  • 0 kudos

Bootstrap Timeout during job cluster start

My job was not able to start because I got this problem in the job cluster.This job is running on a Azure Databricks workspace that has been deployed for almost a year and I have not had this error before. It is deployed in North Europe.After getting...

leireroman_0-1713160992292.png
  • 1364 Views
  • 1 replies
  • 0 kudos
Latest Reply
lukasjh
New Contributor II
  • 0 kudos

We have the same problem randomly occurring since yesterday in two workspaces.The cluster started fine today in the morning at 08:00, but failed again from around 09:00 on. 

  • 0 kudos
Anske
by New Contributor III
  • 5448 Views
  • 1 replies
  • 1 kudos

One-time backfill for DLT streaming table before apply_changes

Hi,absolute Databricks noob here, but I'm trying to set up a DLT pipeline that processes cdc records from an external sql server instance to create a mirrored table in my databricks delta lakehouse. For this, I need to do some initial one-time backfi...

Data Engineering
Delta Live Tables
  • 5448 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anske
New Contributor III
  • 1 kudos

So since nobody responded, I decided to try my own suggestion and hack the snapshot data into the table that gathers the change data capture. After some straying I ended up with the notebook as attached.The notebook first creates 2 dlt tables (lookup...

  • 1 kudos
cubanDataDude
by Databricks Partner
  • 1445 Views
  • 1 replies
  • 1 kudos

Job Claiming NotebooKNotFound Incorrectly (seemingly)

I have the code captured below in the screenshot. When I run this individually it works just fine, when I JOB runs this it fails out with 'ResourceNotFound' - not sure what the issue is... - Checked 'main' branch, which is where this job is pulling f...

  • 1445 Views
  • 1 replies
  • 1 kudos
Latest Reply
cubanDataDude
Databricks Partner
  • 1 kudos

Figured it out:ecw_staging_nb_List = ['nb_UPSERT_stg_ecw_insurance','nb_UPSERT_stg_ecw_facilitygroups']Works just fine.

  • 1 kudos
jp_allard
by New Contributor
  • 2226 Views
  • 0 replies
  • 0 kudos

Selective Overwrite to a Unity Catalog Table

I have been able to perform a selective overwrite using replace Where to a hive_metastore table, but when I use the same code for the same table in a unity catalog, no data is written.Has anyone else had this issue or is there common mistakes that ar...

  • 2226 Views
  • 0 replies
  • 0 kudos
dannythermadom
by New Contributor III
  • 6961 Views
  • 6 replies
  • 7 kudos

Dbutils.notebook.run command not working with /Repos/

I have two github repo configured in Databricks Repos folder. repo_1 is run using a job and repo_2 is run/called from repo_1 using Dbutils.notebook.run command. dbutils.notebook.run("/Repos/repo_2/notebooks/notebook", 0, args)i am getting the follo...

  • 6961 Views
  • 6 replies
  • 7 kudos
Latest Reply
cubanDataDude
Databricks Partner
  • 7 kudos

I am having a similar issue...  ecw_staging_nb_List = ['/Workspace/Repos/PRIMARY/UVVC_DATABRICKS_EDW/silver/nb_UPSERT_stg_ecw_insurance',                 '/Repos/PRIMARY/UVVC_DATABRICKS_EDW/silver/nb_UPSERT_stg_ecw_facilitygroups'] Adding workspace d...

  • 7 kudos
5 More Replies
Jennifer
by New Contributor III
  • 1239 Views
  • 0 replies
  • 0 kudos

Optimization failed for timestampNtz

We have a table using timestampNtz type for timestamp, which is also a cluster key for this table using liquid clustering. I ran OPTIMIZE <table-name>, it failed with errorUnsupported datatype 'TimestampNTZType' But the failed optmization also broke ...

  • 1239 Views
  • 0 replies
  • 0 kudos
pragarwal
by Databricks Partner
  • 4055 Views
  • 2 replies
  • 0 kudos

Export Users and Groups from Unity Catalog

Hi,I am trying to export the list of users and groups from Unity catalog through databricks workspace but i am seeing only the users/groups created inside the workspace instead of the groups and users coming through scim in unity catalog.How can i ge...

  • 4055 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello when you refer to the users and groups in Unity Catalog, do you refer to the ones created at the Account Level?If this is the case you need to run the API call at the account level and not workspace level, you can see the API doc for account le...

  • 0 kudos
1 More Replies
Jorge3
by New Contributor III
  • 3066 Views
  • 1 replies
  • 0 kudos

Trigger a job on file update

I'm using AutoLoader to process any new file or update that arrives to my landing area. And then I schedule the job using DB workflows to trigger on file arrival. The issue is that the trigger only executes when new files arrive, not when an exiting ...

  • 3066 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ivan_Donev
New Contributor III
  • 0 kudos

I don't think you can effectively achieve your goal. While it's theoretically somewhat possible, Databricks documentation says there is no guarantee for correctness - Auto Loader FAQ | Databricks on AWS

  • 0 kudos
Anonymous
by Not applicable
  • 9784 Views
  • 2 replies
  • 1 kudos

When reading a csv file with Spark.read, the data is not loading in the appropriate column while pas

I am trying to read a csv file from storage location using spark.read function. Also, i am explicitly passing the schema to the function. However, the data is not loading in proper column of the dataframe. Following are the code details:from pyspark....

  • 9784 Views
  • 2 replies
  • 1 kudos
Latest Reply
sai_sathya
New Contributor III
  • 1 kudos

Hi , i would suggest to approach as suggested by Thomaz Rossito,but maybe you can give it as an try like swapping the struct field order like this followingschema = StructType([StructField('DA_RATE', DateType(), True),StructField('CURNCY_F', StringTy...

  • 1 kudos
1 More Replies
dvmentalmadess
by Valued Contributor
  • 8258 Views
  • 3 replies
  • 0 kudos

Resolved! OPTIMIZE: Exception thrown in awaitResult: / by zero

We run `OPTIMIZE` on our tables every 24 hours as follows:spark.sql(f'OPTIMIZE {catalog_name}.{schema_name}.`{table_name}`;') This morning one of our hourly jobs started failing on the call to `OPTIMIZE` with the error:org.apache.spark.SparkException...

  • 8258 Views
  • 3 replies
  • 0 kudos
Latest Reply
sh
New Contributor II
  • 0 kudos

I am getting same error. Any resolution

  • 0 kudos
2 More Replies
Labels