cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

carlos_tasayco
by Contributor
  • 1249 Views
  • 4 replies
  • 0 kudos

path-based access to a table with row filters or column masks is not supported

I have a delta table which I am applying masking to some columns, however, every time I want to refresh the table (overwrite) I cannot I receive this error:If I do what Assistant recommend me (If you remove the .option("path", DeltaZones))It worked b...

carlos_tasayco_0-1745443128070.png carlos_tasayco_1-1745443214501.png
  • 1249 Views
  • 4 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Are you using Unity Catalog?

  • 0 kudos
3 More Replies
Siddartha01
by New Contributor II
  • 472 Views
  • 1 replies
  • 0 kudos

I got suspended from the Databrick certified Associate developer for Apache Spark.

I need immediate assistance to reschedule my exam. By mistakenly I have taken note book to do rough work. Due to using notebook i got suspended from the exam i think so. Please help me out with this issue.mail id- malothsiddunaik133@gmail.comThankyou...

  • 472 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Siddartha01! It looks like this post duplicates the one you recently posted. A response has already been provided to the Original thread. I recommend continuing the discussion in that thread to keep the conversation focused and organised.

  • 0 kudos
drag7ter
by Contributor
  • 800 Views
  • 1 replies
  • 1 kudos

Resolved! Delta sharing recipient auth status

I'm creating recipients, and send them activation link via email. All recipients are external (they don't have databricks account). Lets say I've created 300 recipients and I want to know who downloaded creds fail successfully, and got authenticated....

  • 800 Views
  • 1 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor III
  • 1 kudos

Hey @drag7ter You’re absolutely right, currently, there is no official API field that explicitly returns the recipient activation status (I just have tested it), even though the Databricks API documentation references a field called activated  boolea...

  • 1 kudos
Klusener
by Contributor
  • 6163 Views
  • 6 replies
  • 1 kudos

Relevance of off heap memory and usage

I was referring to the doc - https://kb.databricks.com/clusters/spark-executor-memory.In general total off heap memory is  =  spark.executor.memoryOverhead + spark.offHeap.size.  The off-heap mode is controlled by the properties spark.memory.offHeap....

  • 6163 Views
  • 6 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

Hello, Thanks for the follow up! The configuration for spark.executor.memory and spark.executor.memoryOverhead serves distinct purposes within Spark's memory management: spark.executor.memory: This controls the allocated memory for each executor's JV...

  • 1 kudos
5 More Replies
ggsmith
by Contributor
  • 4886 Views
  • 7 replies
  • 6 kudos

dlt Streaming Checkpoint Not Found

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...

  • 4886 Views
  • 7 replies
  • 6 kudos
Latest Reply
a_user12
New Contributor III
  • 6 kudos

same here

  • 6 kudos
6 More Replies
jar
by Contributor
  • 1380 Views
  • 4 replies
  • 1 kudos

Resolved! Define time interval for when a cluster can be active

Hullo good Databricks people.I have a small dedicated cluster being used for Direct Query (PBI) which has a long termination period. I'd like for it to only be active during business hours though, and to set a restriction so that it's not possible to...

  • 1380 Views
  • 4 replies
  • 1 kudos
Latest Reply
jar
Contributor
  • 1 kudos

So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.

  • 1 kudos
3 More Replies
ChingizK
by New Contributor III
  • 3161 Views
  • 2 replies
  • 1 kudos

Hyperopt Error: There are no evaluation tasks, cannot return argmin of task losses.

The trials succeed when the cell in the notebook is executed manually:However, the same process fails when executed as a Workflow: The error simply says that there's an issue with the objective function. However how can that be the case if I'm able t...

01.png 02.png
Data Engineering
hyperopt
Workflows
  • 3161 Views
  • 2 replies
  • 1 kudos
Latest Reply
LibertyEnergy
New Contributor II
  • 1 kudos

I have this exact same issue! Can anyone offer guidance?

  • 1 kudos
1 More Replies
ramravi
by Contributor II
  • 26980 Views
  • 3 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 26980 Views
  • 3 replies
  • 0 kudos
Latest Reply
zerospeed
New Contributor II
  • 0 kudos

Hi I had similar issues with parquet files when trying to query athena, fix was i had to inspect the parquet file since it contained columns such as "Name", "name" which the aws crawler / athena would interpret as a duplicate column since it would se...

  • 0 kudos
2 More Replies
Nagarathna
by New Contributor II
  • 1087 Views
  • 3 replies
  • 0 kudos

How to write trillions of rows to unity catalog table.

Hi team,I have a dataframe with 1269408570800 rows . I need to write this data to unity catalog table.How can I upload huge quantity of data ?I'm using databricks i runtime 15.4 LTS with 4 workers and each worker type is i3.4xlarge and driver of type...

Data Engineering
data upload
Unity Catalog
  • 1087 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @Nagarathna @Lucas_TBrabo  I’d like to share my opinion and some tips that might help:1. You should try to avoud filtering by spark_partition_id because  you can create skewed partitions, you should use with repartition() and spark can optimize t...

  • 0 kudos
2 More Replies
chsoni12
by New Contributor II
  • 1142 Views
  • 2 replies
  • 1 kudos

Impact of VACUUM Operations on Shallow Clones in Databricks

I performed a POC where i have to check that can we create a new delta table which contains only particular version of data of normal delta table without copying the data and if we make changes or perform any operation(insert/delete/truncate/records)...

  • 1142 Views
  • 2 replies
  • 1 kudos
Latest Reply
chsoni12
New Contributor II
  • 1 kudos

Thanks. It really helps me a lot But there is also an issue in shallow clone. We can only clone the full table data, particular delta version data using timestamp/version from the normal table using shallow clone but we can not clone the table data b...

  • 1 kudos
1 More Replies
ep208
by New Contributor
  • 1228 Views
  • 1 replies
  • 0 kudos

How to resolve Location Overlap

Hi,I am trying to ingest abfss://datalake@datalakename.dfs.core.windows.net/Delta/Project1/sales_table but when writting the table schema on the yamls, I uncorrectly wrote this table in other unit catalog table:---kind: SinkDeltaTablemetadata:  name:...

  • 1228 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @ep208 ,From the error message you’re seeing (LOCATION_OVERLAP), it seems that Unity Catalog is still tracking a table or volume that points to the same path you’re now trying to reuse:abfss://datalake@datalakename.dfs.core.windows.net/Delta/Proj...

  • 0 kudos
KG_777
by New Contributor
  • 1523 Views
  • 1 replies
  • 1 kudos

Resolved! Capturing deletes for SCD2 using apply changes or apply as delete decorator

We're looking to implement scd2 for tables in our lakehouse and we need to keep track of records that are being deleted in the source. Does anyone have a similar use case and can they outline some of the challenges they faced and workarounds they imp...

  • 1523 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @KG_777 Tracking deleted records in an SCD Type 2 implementation for a lakehouse architecture is indeed a challenging but common requirement.Here's an overview of approaches, challenges, and workarounds based on industry experience:Common Approach...

  • 1 kudos
thiagoawstest
by Contributor
  • 11752 Views
  • 3 replies
  • 2 kudos

Save file to /tmp

Hello, I have python code that collects data in json, and sends it to an S3 bucket, everything works fine. But when there is a lot of data, it causes memory overflow.So I want to save locally, for example in /tmp or dbfs:/tmp and after sending it to ...

  • 11752 Views
  • 3 replies
  • 2 kudos
Latest Reply
JimBiard
New Contributor III
  • 2 kudos

I am experiencing the same problem. I create a file in /tmp and can verify that it exists. But when an attempt is made to open the file using pyspark, the file is not found. I noticed that the path I used to create the file is /tmp/foobar.parquet and...

  • 2 kudos
2 More Replies
shubham_007
by Contributor III
  • 3944 Views
  • 7 replies
  • 0 kudos

Assistance needed on DQX framework as we are referring GitHub resource but not enough details

Hi Community Experts,I hope this message finds you well. Our team is currently working on enhancing data quality within our Databricks environment and we are utilizing the Databricks DQX framework for this purpose. We are seeking detailed guidance an...

  • 3944 Views
  • 7 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi shubham,How are you doing today?, It’s great to see your team focusing on data quality using the DQX framework—it’s a solid tool for keeping your data clean and reliable. To get started, I’d suggest beginning with simple checks like NOT NULL, IN R...

  • 0 kudos
6 More Replies
CJOkpala
by New Contributor II
  • 787 Views
  • 2 replies
  • 0 kudos

Error message while running queries

While running queries, both in SQL or notebooks, we get this error message below:INTERNAL_ERROR: Unexpected error when trying to access the statement result. Missing credentials to access the DBFS root storage container in Azure.The access connector ...

  • 787 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @CJOkpala This error suggests an issue with the credentials needed to access your Azure storage container from Databricks. Let's troubleshoot this methodically since there seems to be a disconnect between your configured access connector and the a...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels