- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2023 09:05 AM
Hi!
I have a problem with user memory on driver (I have almost several mb of storage memory, 0 Execution memory and more than 7GB of JVM Memory on Heap in use).
How it can be? I don't have any broadcast variables, joins or aggregations.
All the pipeline:
1. I create a list of 20 string values
2. I iterate throw this list and do the following:
a. Create dlt.view from streaming source
b. Create streaming_live_table
c. Use apply_changes for scd-1
What this 7GB of Heap Memory on driver can be? As I understand, it's an user memory, but why so huge amount?
How I can reduce it? Is it a metadata for dlt or smth like that?
Is it a good approach to iterate throw the python list to create dlt?
Hope someone can give answers to some questions. Thank you in advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2023 05:40 PM
@Yuliya Valava : Giving you many possible threads to think about and implement.
- It's possible that the 7GB of Heap Memory on the driver is being used to store metadata related to the data being processed
- Iterating through the Python list to create DLTs could be causing this memory issue if the DLTs are being stored in memory. Can you try using spark Spark to process your data. This would allow you to distribute the processing across multiple nodes, which can help reduce memory usage on individual nodes
- To reduce the memory usage, you could also try creating a single DLT for all the data rather than creating a new DLT for each iteration of the loop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2023 05:40 PM
@Yuliya Valava : Giving you many possible threads to think about and implement.
- It's possible that the 7GB of Heap Memory on the driver is being used to store metadata related to the data being processed
- Iterating through the Python list to create DLTs could be causing this memory issue if the DLTs are being stored in memory. Can you try using spark Spark to process your data. This would allow you to distribute the processing across multiple nodes, which can help reduce memory usage on individual nodes
- To reduce the memory usage, you could also try creating a single DLT for all the data rather than creating a new DLT for each iteration of the loop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2023 07:41 PM
Hi @Yuliya Valava
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!

