cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

User Memory problem on driver with DLT

Lulka
New Contributor II

Hi!

I have a problem with user memory on driver (I have almost several mb of storage memory, 0 Execution memory and more than 7GB of JVM Memory on Heap in use).

How it can be? I don't have any broadcast variables, joins or aggregations.

All the pipeline:

1. I create a list of 20 string values

2. I iterate throw this list and do the following:

a. Create dlt.view from streaming source

b. Create streaming_live_table

c. Use apply_changes for scd-1

What this 7GB of Heap Memory on driver can be? As I understand, it's an user memory, but why so huge amount?

How I can reduce it? Is it a metadata for dlt or smth like that?

Is it a good approach to iterate throw the python list to create dlt?

Hope someone can give answers to some questions. Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Yuliya Valava​ : Giving you many possible threads to think about and implement.

  1. It's possible that the 7GB of Heap Memory on the driver is being used to store metadata related to the data being processed
  2. Iterating through the Python list to create DLTs could be causing this memory issue if the DLTs are being stored in memory. Can you try using spark Spark to process your data. This would allow you to distribute the processing across multiple nodes, which can help reduce memory usage on individual nodes
  3. To reduce the memory usage, you could also try creating a single DLT for all the data rather than creating a new DLT for each iteration of the loop

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

@Yuliya Valava​ : Giving you many possible threads to think about and implement.

  1. It's possible that the 7GB of Heap Memory on the driver is being used to store metadata related to the data being processed
  2. Iterating through the Python list to create DLTs could be causing this memory issue if the DLTs are being stored in memory. Can you try using spark Spark to process your data. This would allow you to distribute the processing across multiple nodes, which can help reduce memory usage on individual nodes
  3. To reduce the memory usage, you could also try creating a single DLT for all the data rather than creating a new DLT for each iteration of the loop

Anonymous
Not applicable

Hi @Yuliya Valava​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.