Databricks Community

Lulka · ‎02-27-2023

Hi!

I have a problem with user memory on driver (I have almost several mb of storage memory, 0 Execution memory and more than 7GB of JVM Memory on Heap in use).

How it can be? I don't have any broadcast variables, joins or aggregations.

All the pipeline:

1. I create a list of 20 string values

2. I iterate throw this list and do the following:

a. Create dlt.view from streaming source

b. Create streaming_live_table

c. Use apply_changes for scd-1

What this 7GB of Heap Memory on driver can be? As I understand, it's an user memory, but why so huge amount?

How I can reduce it? Is it a metadata for dlt or smth like that?

Is it a good approach to iterate throw the python list to create dlt?

Hope someone can give answers to some questions. Thank you in advance

Anonymous · ‎03-08-2023

@Yuliya Valava : Giving you many possible threads to think about and implement.

It's possible that the 7GB of Heap Memory on the driver is being used to store metadata related to the data being processed
Iterating through the Python list to create DLTs could be causing this memory issue if the DLTs are being stored in memory. Can you try using spark Spark to process your data. This would allow you to distribute the processing across multiple nodes, which can help reduce memory usage on individual nodes
To reduce the memory usage, you could also try creating a single DLT for all the data rather than creating a new DLT for each iteration of the loop

View solution in original post

Anonymous · ‎03-08-2023

@Yuliya Valava : Giving you many possible threads to think about and implement.

It's possible that the 7GB of Heap Memory on the driver is being used to store metadata related to the data being processed
Iterating through the Python list to create DLTs could be causing this memory issue if the DLTs are being stored in memory. Can you try using spark Spark to process your data. This would allow you to distribute the processing across multiple nodes, which can help reduce memory usage on individual nodes
To reduce the memory usage, you could also try creating a single DLT for all the data rather than creating a new DLT for each iteration of the loop

Anonymous · ‎03-11-2023

Hi @Yuliya Valava

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!