cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kiko_roy
by Contributor
  • 10200 Views
  • 3 replies
  • 1 kudos

Permission error loading dataframe from azure unity catalog to GCS bucket

I am creating a data frame by reading a table's data residing in Azure backed unity catalog. I need to write the df or file to GCS bucket. I have configured the spark cluster config using the GCP service account json values.on running : df1.write.for...

Data Engineering
GCS bucket
permission error
  • 10200 Views
  • 3 replies
  • 1 kudos
Latest Reply
ruloweb
New Contributor II
  • 1 kudos

Hi, is there any terraform resource to apply this GRANT or this have to be done always manually?

  • 1 kudos
2 More Replies
leireroman
by New Contributor III
  • 995 Views
  • 1 replies
  • 0 kudos

Bootstrap Timeout during job cluster start

My job was not able to start because I got this problem in the job cluster.This job is running on a Azure Databricks workspace that has been deployed for almost a year and I have not had this error before. It is deployed in North Europe.After getting...

leireroman_0-1713160992292.png
  • 995 Views
  • 1 replies
  • 0 kudos
Latest Reply
lukasjh
New Contributor II
  • 0 kudos

We have the same problem randomly occurring since yesterday in two workspaces.The cluster started fine today in the morning at 08:00, but failed again from around 09:00 on. 

  • 0 kudos
Anske
by New Contributor III
  • 4287 Views
  • 1 replies
  • 1 kudos

One-time backfill for DLT streaming table before apply_changes

Hi,absolute Databricks noob here, but I'm trying to set up a DLT pipeline that processes cdc records from an external sql server instance to create a mirrored table in my databricks delta lakehouse. For this, I need to do some initial one-time backfi...

Data Engineering
Delta Live Tables
  • 4287 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anske
New Contributor III
  • 1 kudos

So since nobody responded, I decided to try my own suggestion and hack the snapshot data into the table that gathers the change data capture. After some straying I ended up with the notebook as attached.The notebook first creates 2 dlt tables (lookup...

  • 1 kudos
cubanDataDude
by New Contributor II
  • 1041 Views
  • 1 replies
  • 1 kudos

Job Claiming NotebooKNotFound Incorrectly (seemingly)

I have the code captured below in the screenshot. When I run this individually it works just fine, when I JOB runs this it fails out with 'ResourceNotFound' - not sure what the issue is... - Checked 'main' branch, which is where this job is pulling f...

  • 1041 Views
  • 1 replies
  • 1 kudos
Latest Reply
cubanDataDude
New Contributor II
  • 1 kudos

Figured it out:ecw_staging_nb_List = ['nb_UPSERT_stg_ecw_insurance','nb_UPSERT_stg_ecw_facilitygroups']Works just fine.

  • 1 kudos
jp_allard
by New Contributor
  • 1916 Views
  • 0 replies
  • 0 kudos

Selective Overwrite to a Unity Catalog Table

I have been able to perform a selective overwrite using replace Where to a hive_metastore table, but when I use the same code for the same table in a unity catalog, no data is written.Has anyone else had this issue or is there common mistakes that ar...

  • 1916 Views
  • 0 replies
  • 0 kudos
dannythermadom
by New Contributor III
  • 5705 Views
  • 6 replies
  • 7 kudos

Dbutils.notebook.run command not working with /Repos/

I have two github repo configured in Databricks Repos folder. repo_1 is run using a job and repo_2 is run/called from repo_1 using Dbutils.notebook.run command. dbutils.notebook.run("/Repos/repo_2/notebooks/notebook", 0, args)i am getting the follo...

  • 5705 Views
  • 6 replies
  • 7 kudos
Latest Reply
cubanDataDude
New Contributor II
  • 7 kudos

I am having a similar issue...  ecw_staging_nb_List = ['/Workspace/Repos/PRIMARY/UVVC_DATABRICKS_EDW/silver/nb_UPSERT_stg_ecw_insurance',                 '/Repos/PRIMARY/UVVC_DATABRICKS_EDW/silver/nb_UPSERT_stg_ecw_facilitygroups'] Adding workspace d...

  • 7 kudos
5 More Replies
Jennifer
by New Contributor III
  • 779 Views
  • 0 replies
  • 0 kudos

Optimization failed for timestampNtz

We have a table using timestampNtz type for timestamp, which is also a cluster key for this table using liquid clustering. I ran OPTIMIZE <table-name>, it failed with errorUnsupported datatype 'TimestampNTZType' But the failed optmization also broke ...

  • 779 Views
  • 0 replies
  • 0 kudos
pragarwal
by New Contributor II
  • 3114 Views
  • 2 replies
  • 0 kudos

Export Users and Groups from Unity Catalog

Hi,I am trying to export the list of users and groups from Unity catalog through databricks workspace but i am seeing only the users/groups created inside the workspace instead of the groups and users coming through scim in unity catalog.How can i ge...

  • 3114 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello when you refer to the users and groups in Unity Catalog, do you refer to the ones created at the Account Level?If this is the case you need to run the API call at the account level and not workspace level, you can see the API doc for account le...

  • 0 kudos
1 More Replies
vpacik
by New Contributor
  • 2027 Views
  • 0 replies
  • 0 kudos

Databricks-connect OpenSSL Handshake failed on WSL2

When trying to setup databricks-connect on WSL2 using 13.3 cluster, I receive the following error regarding OpenSSL CERTIFICATE_ERIFY_FAILED.The authentication is done via SPARK_REMOTE env. variable. E0415 11:24:26.646129568 142172 ssl_transport_sec...

  • 2027 Views
  • 0 replies
  • 0 kudos
Jorge3
by New Contributor III
  • 2210 Views
  • 1 replies
  • 0 kudos

Trigger a job on file update

I'm using AutoLoader to process any new file or update that arrives to my landing area. And then I schedule the job using DB workflows to trigger on file arrival. The issue is that the trigger only executes when new files arrive, not when an exiting ...

  • 2210 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ivan_Donev
New Contributor III
  • 0 kudos

I don't think you can effectively achieve your goal. While it's theoretically somewhat possible, Databricks documentation says there is no guarantee for correctness - Auto Loader FAQ | Databricks on AWS

  • 0 kudos
Anonymous
by Not applicable
  • 7879 Views
  • 2 replies
  • 1 kudos

When reading a csv file with Spark.read, the data is not loading in the appropriate column while pas

I am trying to read a csv file from storage location using spark.read function. Also, i am explicitly passing the schema to the function. However, the data is not loading in proper column of the dataframe. Following are the code details:from pyspark....

  • 7879 Views
  • 2 replies
  • 1 kudos
Latest Reply
sai_sathya
New Contributor III
  • 1 kudos

Hi , i would suggest to approach as suggested by Thomaz Rossito,but maybe you can give it as an try like swapping the struct field order like this followingschema = StructType([StructField('DA_RATE', DateType(), True),StructField('CURNCY_F', StringTy...

  • 1 kudos
1 More Replies
dvmentalmadess
by Valued Contributor
  • 6578 Views
  • 3 replies
  • 0 kudos

Resolved! OPTIMIZE: Exception thrown in awaitResult: / by zero

We run `OPTIMIZE` on our tables every 24 hours as follows:spark.sql(f'OPTIMIZE {catalog_name}.{schema_name}.`{table_name}`;') This morning one of our hourly jobs started failing on the call to `OPTIMIZE` with the error:org.apache.spark.SparkException...

  • 6578 Views
  • 3 replies
  • 0 kudos
Latest Reply
sh
New Contributor II
  • 0 kudos

I am getting same error. Any resolution

  • 0 kudos
2 More Replies
ksenija
by Contributor
  • 6034 Views
  • 1 replies
  • 1 kudos

Resolved! Cluster pools

Could you help me understand pools? How to know the difference in pricing between running clusters and running clusters with a pool? Since we're saving time to start/stop the cluster when we have a pool. And should we keep Min Idle above 0 or equal t...

  • 6034 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Databricks pools are a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the pool’s idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the ins...

  • 1 kudos
drag7ter
by Contributor
  • 2643 Views
  • 2 replies
  • 0 kudos

Resolved! How to enable CDF when saveAsTable from pyspark code?

I'm running this code in databricks notebook and I want the table from dataframe in catalog were created with CDF enables. When I run the code table hasn't exited yet.This code doesn't create a table with enables CDF. It doesn't add:delta.enableChang...

  • 2643 Views
  • 2 replies
  • 0 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 0 kudos

Hello @drag7ter ,I don't see anything wrong with your approach, check my repro:    

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels