cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChandraR
by New Contributor
  • 520 Views
  • 1 replies
  • 0 kudos

Data Engineering Associate -13+ Years of SAP SD/OTC Experience -Data Engineering Associate

Hi DataBricks  ,This is Chandra ,I am adapting the world of data with the help of Data bricks .I need your help and advises to successfully get adapt the Databricks Engineer profile approach .I have enrolled myself in the Learning platform ,I need yo...

  • 520 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @ChandraR! Happy to help you get started on your Databricks journey! To begin, it's important to get familiar with the Databricks ecosystem, including key components like the Lakehouse architecture, Delta Lake, Apache Spark, and Unity Catalog. ...

  • 0 kudos
AmanSehgal
by Honored Contributor III
  • 1093 Views
  • 1 replies
  • 0 kudos

Column Name Case sensitivity in DLT pipeline

I've a DLT pipeline that processes messages from event grid. The schema of the message has two columns in different cases - "employee_id" and  "employee_ID",I tried setting spark.sql.caseSensitive to true in my DLT notebook as well in DLT configurati...

  • 1093 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi @AmanSehgal, DLT treat column names as case-insensitive, even if spark.sql.caseSensitive is set to true. That’s why employee_id and employee_ID are seen as duplicates and cause the error. To fix this, you’ll need to rename one of the columns so yo...

  • 0 kudos
sunday-okey
by New Contributor
  • 562 Views
  • 1 replies
  • 0 kudos

Resolved! Introduction to Spark Lab

Hello, I got an error while accessing the Introduction to Spark Lab. Please see the error message below and resolve.", line 155, in do response = retryable(self._perform)(method, File "/voc/scripts/python/venv/lib/python3.10/site-packages/databricks/...

  • 562 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @sunday-okey! Apologies for the inconvenience.The issue has been resolved. Please try restarting the lab, it should be working as expected now.

  • 0 kudos
carlos_tasayco
by Contributor
  • 1285 Views
  • 4 replies
  • 0 kudos

path-based access to a table with row filters or column masks is not supported

I have a delta table which I am applying masking to some columns, however, every time I want to refresh the table (overwrite) I cannot I receive this error:If I do what Assistant recommend me (If you remove the .option("path", DeltaZones))It worked b...

carlos_tasayco_0-1745443128070.png carlos_tasayco_1-1745443214501.png
  • 1285 Views
  • 4 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Are you using Unity Catalog?

  • 0 kudos
3 More Replies
Siddartha01
by New Contributor II
  • 483 Views
  • 1 replies
  • 0 kudos

I got suspended from the Databrick certified Associate developer for Apache Spark.

I need immediate assistance to reschedule my exam. By mistakenly I have taken note book to do rough work. Due to using notebook i got suspended from the exam i think so. Please help me out with this issue.mail id- malothsiddunaik133@gmail.comThankyou...

  • 483 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Siddartha01! It looks like this post duplicates the one you recently posted. A response has already been provided to the Original thread. I recommend continuing the discussion in that thread to keep the conversation focused and organised.

  • 0 kudos
drag7ter
by Contributor
  • 815 Views
  • 1 replies
  • 1 kudos

Resolved! Delta sharing recipient auth status

I'm creating recipients, and send them activation link via email. All recipients are external (they don't have databricks account). Lets say I've created 300 recipients and I want to know who downloaded creds fail successfully, and got authenticated....

  • 815 Views
  • 1 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor III
  • 1 kudos

Hey @drag7ter You’re absolutely right, currently, there is no official API field that explicitly returns the recipient activation status (I just have tested it), even though the Databricks API documentation references a field called activated  boolea...

  • 1 kudos
Klusener
by Contributor
  • 6481 Views
  • 6 replies
  • 1 kudos

Relevance of off heap memory and usage

I was referring to the doc - https://kb.databricks.com/clusters/spark-executor-memory.In general total off heap memory is  =  spark.executor.memoryOverhead + spark.offHeap.size.  The off-heap mode is controlled by the properties spark.memory.offHeap....

  • 6481 Views
  • 6 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

Hello, Thanks for the follow up! The configuration for spark.executor.memory and spark.executor.memoryOverhead serves distinct purposes within Spark's memory management: spark.executor.memory: This controls the allocated memory for each executor's JV...

  • 1 kudos
5 More Replies
ggsmith
by Contributor
  • 4965 Views
  • 7 replies
  • 6 kudos

dlt Streaming Checkpoint Not Found

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...

  • 4965 Views
  • 7 replies
  • 6 kudos
Latest Reply
a_user12
New Contributor III
  • 6 kudos

same here

  • 6 kudos
6 More Replies
jar
by Contributor
  • 1443 Views
  • 4 replies
  • 1 kudos

Resolved! Define time interval for when a cluster can be active

Hullo good Databricks people.I have a small dedicated cluster being used for Direct Query (PBI) which has a long termination period. I'd like for it to only be active during business hours though, and to set a restriction so that it's not possible to...

  • 1443 Views
  • 4 replies
  • 1 kudos
Latest Reply
jar
Contributor
  • 1 kudos

So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.

  • 1 kudos
3 More Replies
ChingizK
by New Contributor III
  • 3186 Views
  • 2 replies
  • 1 kudos

Hyperopt Error: There are no evaluation tasks, cannot return argmin of task losses.

The trials succeed when the cell in the notebook is executed manually:However, the same process fails when executed as a Workflow: The error simply says that there's an issue with the objective function. However how can that be the case if I'm able t...

01.png 02.png
Data Engineering
hyperopt
Workflows
  • 3186 Views
  • 2 replies
  • 1 kudos
Latest Reply
LibertyEnergy
New Contributor II
  • 1 kudos

I have this exact same issue! Can anyone offer guidance?

  • 1 kudos
1 More Replies
ramravi
by Contributor II
  • 27204 Views
  • 3 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 27204 Views
  • 3 replies
  • 0 kudos
Latest Reply
zerospeed
New Contributor II
  • 0 kudos

Hi I had similar issues with parquet files when trying to query athena, fix was i had to inspect the parquet file since it contained columns such as "Name", "name" which the aws crawler / athena would interpret as a duplicate column since it would se...

  • 0 kudos
2 More Replies
Nagarathna
by New Contributor II
  • 1133 Views
  • 3 replies
  • 0 kudos

How to write trillions of rows to unity catalog table.

Hi team,I have a dataframe with 1269408570800 rows . I need to write this data to unity catalog table.How can I upload huge quantity of data ?I'm using databricks i runtime 15.4 LTS with 4 workers and each worker type is i3.4xlarge and driver of type...

Data Engineering
data upload
Unity Catalog
  • 1133 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @Nagarathna @Lucas_TBrabo  I’d like to share my opinion and some tips that might help:1. You should try to avoud filtering by spark_partition_id because  you can create skewed partitions, you should use with repartition() and spark can optimize t...

  • 0 kudos
2 More Replies
chsoni12
by New Contributor II
  • 1182 Views
  • 2 replies
  • 1 kudos

Impact of VACUUM Operations on Shallow Clones in Databricks

I performed a POC where i have to check that can we create a new delta table which contains only particular version of data of normal delta table without copying the data and if we make changes or perform any operation(insert/delete/truncate/records)...

  • 1182 Views
  • 2 replies
  • 1 kudos
Latest Reply
chsoni12
New Contributor II
  • 1 kudos

Thanks. It really helps me a lot But there is also an issue in shallow clone. We can only clone the full table data, particular delta version data using timestamp/version from the normal table using shallow clone but we can not clone the table data b...

  • 1 kudos
1 More Replies
ep208
by New Contributor
  • 1297 Views
  • 1 replies
  • 0 kudos

How to resolve Location Overlap

Hi,I am trying to ingest abfss://datalake@datalakename.dfs.core.windows.net/Delta/Project1/sales_table but when writting the table schema on the yamls, I uncorrectly wrote this table in other unit catalog table:---kind: SinkDeltaTablemetadata:  name:...

  • 1297 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @ep208 ,From the error message you’re seeing (LOCATION_OVERLAP), it seems that Unity Catalog is still tracking a table or volume that points to the same path you’re now trying to reuse:abfss://datalake@datalakename.dfs.core.windows.net/Delta/Proj...

  • 0 kudos
KG_777
by New Contributor II
  • 1585 Views
  • 1 replies
  • 2 kudos

Resolved! Capturing deletes for SCD2 using apply changes or apply as delete decorator

We're looking to implement scd2 for tables in our lakehouse and we need to keep track of records that are being deleted in the source. Does anyone have a similar use case and can they outline some of the challenges they faced and workarounds they imp...

  • 1585 Views
  • 1 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Hi @KG_777 Tracking deleted records in an SCD Type 2 implementation for a lakehouse architecture is indeed a challenging but common requirement.Here's an overview of approaches, challenges, and workarounds based on industry experience:Common Approach...

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels