cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Siddartha01
by New Contributor II
  • 338 Views
  • 1 replies
  • 0 kudos

I got suspended from the Databrick certified Associate developer for Apache Spark.

I need immediate assistance to reschedule my exam. By mistakenly I have taken note book to do rough work. Due to using notebook i got suspended from the exam i think so. Please help me out with this issue.mail id- malothsiddunaik133@gmail.comThankyou...

  • 338 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Siddartha01! It looks like this post duplicates the one you recently posted. A response has already been provided to the Original thread. I recommend continuing the discussion in that thread to keep the conversation focused and organised.

  • 0 kudos
drag7ter
by Contributor
  • 577 Views
  • 1 replies
  • 1 kudos

Resolved! Delta sharing recipient auth status

I'm creating recipients, and send them activation link via email. All recipients are external (they don't have databricks account). Lets say I've created 300 recipients and I want to know who downloaded creds fail successfully, and got authenticated....

  • 577 Views
  • 1 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor II
  • 1 kudos

Hey @drag7ter You’re absolutely right, currently, there is no official API field that explicitly returns the recipient activation status (I just have tested it), even though the Databricks API documentation references a field called activated  boolea...

  • 1 kudos
Klusener
by Contributor
  • 4206 Views
  • 6 replies
  • 1 kudos

Relevance of off heap memory and usage

I was referring to the doc - https://kb.databricks.com/clusters/spark-executor-memory.In general total off heap memory is  =  spark.executor.memoryOverhead + spark.offHeap.size.  The off-heap mode is controlled by the properties spark.memory.offHeap....

  • 4206 Views
  • 6 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

Hello, Thanks for the follow up! The configuration for spark.executor.memory and spark.executor.memoryOverhead serves distinct purposes within Spark's memory management: spark.executor.memory: This controls the allocated memory for each executor's JV...

  • 1 kudos
5 More Replies
ggsmith
by Contributor
  • 4018 Views
  • 7 replies
  • 6 kudos

dlt Streaming Checkpoint Not Found

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...

  • 4018 Views
  • 7 replies
  • 6 kudos
Latest Reply
a_user12
New Contributor III
  • 6 kudos

same here

  • 6 kudos
6 More Replies
jar
by Contributor
  • 920 Views
  • 4 replies
  • 1 kudos

Resolved! Define time interval for when a cluster can be active

Hullo good Databricks people.I have a small dedicated cluster being used for Direct Query (PBI) which has a long termination period. I'd like for it to only be active during business hours though, and to set a restriction so that it's not possible to...

  • 920 Views
  • 4 replies
  • 1 kudos
Latest Reply
jar
Contributor
  • 1 kudos

So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.

  • 1 kudos
3 More Replies
soumiknow
by Contributor II
  • 1808 Views
  • 0 replies
  • 0 kudos

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

We have the following code which we used to load data to BigQuery table after reading the parquet files from Azure Data Lake Storage:df.write.format("bigquery").option( "parentProject", gcp_project_id ).option("table", f"{bq_table_name}").option( "te...

  • 1808 Views
  • 0 replies
  • 0 kudos
ChingizK
by New Contributor III
  • 2871 Views
  • 2 replies
  • 1 kudos

Hyperopt Error: There are no evaluation tasks, cannot return argmin of task losses.

The trials succeed when the cell in the notebook is executed manually:However, the same process fails when executed as a Workflow: The error simply says that there's an issue with the objective function. However how can that be the case if I'm able t...

01.png 02.png
Data Engineering
hyperopt
Workflows
  • 2871 Views
  • 2 replies
  • 1 kudos
Latest Reply
LibertyEnergy
New Contributor II
  • 1 kudos

I have this exact same issue! Can anyone offer guidance?

  • 1 kudos
1 More Replies
ramravi
by Contributor II
  • 24480 Views
  • 3 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 24480 Views
  • 3 replies
  • 0 kudos
Latest Reply
zerospeed
New Contributor II
  • 0 kudos

Hi I had similar issues with parquet files when trying to query athena, fix was i had to inspect the parquet file since it contained columns such as "Name", "name" which the aws crawler / athena would interpret as a duplicate column since it would se...

  • 0 kudos
2 More Replies
Nagarathna
by New Contributor II
  • 731 Views
  • 3 replies
  • 0 kudos

How to write trillions of rows to unity catalog table.

Hi team,I have a dataframe with 1269408570800 rows . I need to write this data to unity catalog table.How can I upload huge quantity of data ?I'm using databricks i runtime 15.4 LTS with 4 workers and each worker type is i3.4xlarge and driver of type...

Data Engineering
data upload
Unity Catalog
  • 731 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @Nagarathna @Lucas_TBrabo  I’d like to share my opinion and some tips that might help:1. You should try to avoud filtering by spark_partition_id because  you can create skewed partitions, you should use with repartition() and spark can optimize t...

  • 0 kudos
2 More Replies
chsoni12
by New Contributor II
  • 782 Views
  • 2 replies
  • 1 kudos

Impact of VACUUM Operations on Shallow Clones in Databricks

I performed a POC where i have to check that can we create a new delta table which contains only particular version of data of normal delta table without copying the data and if we make changes or perform any operation(insert/delete/truncate/records)...

  • 782 Views
  • 2 replies
  • 1 kudos
Latest Reply
chsoni12
New Contributor II
  • 1 kudos

Thanks. It really helps me a lot But there is also an issue in shallow clone. We can only clone the full table data, particular delta version data using timestamp/version from the normal table using shallow clone but we can not clone the table data b...

  • 1 kudos
1 More Replies
ep208
by New Contributor
  • 683 Views
  • 1 replies
  • 0 kudos

How to resolve Location Overlap

Hi,I am trying to ingest abfss://datalake@datalakename.dfs.core.windows.net/Delta/Project1/sales_table but when writting the table schema on the yamls, I uncorrectly wrote this table in other unit catalog table:---kind: SinkDeltaTablemetadata:  name:...

  • 683 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @ep208 ,From the error message you’re seeing (LOCATION_OVERLAP), it seems that Unity Catalog is still tracking a table or volume that points to the same path you’re now trying to reuse:abfss://datalake@datalakename.dfs.core.windows.net/Delta/Proj...

  • 0 kudos
KG_777
by New Contributor
  • 966 Views
  • 1 replies
  • 1 kudos

Resolved! Capturing deletes for SCD2 using apply changes or apply as delete decorator

We're looking to implement scd2 for tables in our lakehouse and we need to keep track of records that are being deleted in the source. Does anyone have a similar use case and can they outline some of the challenges they faced and workarounds they imp...

  • 966 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @KG_777 Tracking deleted records in an SCD Type 2 implementation for a lakehouse architecture is indeed a challenging but common requirement.Here's an overview of approaches, challenges, and workarounds based on industry experience:Common Approach...

  • 1 kudos
thiagoawstest
by Contributor
  • 10683 Views
  • 3 replies
  • 1 kudos

Save file to /tmp

Hello, I have python code that collects data in json, and sends it to an S3 bucket, everything works fine. But when there is a lot of data, it causes memory overflow.So I want to save locally, for example in /tmp or dbfs:/tmp and after sending it to ...

  • 10683 Views
  • 3 replies
  • 1 kudos
Latest Reply
JimBiard
New Contributor II
  • 1 kudos

I am experiencing the same problem. I create a file in /tmp and can verify that it exists. But when an attempt is made to open the file using pyspark, the file is not found. I noticed that the path I used to create the file is /tmp/foobar.parquet and...

  • 1 kudos
2 More Replies
shubham_007
by Contributor III
  • 3041 Views
  • 7 replies
  • 0 kudos

Assistance needed on DQX framework as we are referring GitHub resource but not enough details

Hi Community Experts,I hope this message finds you well. Our team is currently working on enhancing data quality within our Databricks environment and we are utilizing the Databricks DQX framework for this purpose. We are seeking detailed guidance an...

  • 3041 Views
  • 7 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi shubham,How are you doing today?, It’s great to see your team focusing on data quality using the DQX framework—it’s a solid tool for keeping your data clean and reliable. To get started, I’d suggest beginning with simple checks like NOT NULL, IN R...

  • 0 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels