10-13-2023 01:24 AM - edited 10-13-2023 01:31 AM
    
    public void bigDataTest() throws Exception {
        int rowsCount = 100_000;
        int colSize = 1024;
        int colCount = 12;
        
        String colValue = "'"+"x".repeat(colSize)+"'";
        String query = "select explode(sequence(1, "+rowsCount+"))," +
                String.join(",", Collections.nCopies(colCount, colValue));
        try (                
                Connection conn = dataSource.getConnection()
        ) {
            PreparedStatement ps = conn.prepareStatement(query, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
            ps.setFetchSize(1);
            ResultSet rs = ps.executeQuery();
            int count = 0;
            while(rs.next()) {
                if(count++ % 100 == 0) {
                    LOG.info("Count = {}", count);
                }
            }
        }
    }
With -Xmx200m I can read about 50_000 rows and after that I receive "Exception in thread "pool-12-thread-50" Exception in thread "pool-12-thread-1" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space"
The memory picture is classic for OOM:
What can I see in the heapdump:
10-13-2023 01:27 AM - edited 10-13-2023 01:30 AM
DATABRICKS_JDBC_URL = "jdbc:databricks://xxx.cloud.databricks.com:443/default;" +
"transportMode=http;" +
"ssl=1;" +
"httpPath=sql/protocolv1/o/xxxxx;AuthMech=3;MaxConsecutiveResultFileDownloadRetries=50;fetchsize=1"
Without custom MaxConsecutiveResultFileDownloadRetries I received 500638 JDBC error and can read only about 20_000 rows
databricksDriver = "com.databricks:databricks-jdbc:2.6.33"
					
				
			
			
				
			
			
			
			
			
			
			
		10-13-2023 02:28 AM
I'd first ingest the raw data onto a data lake (using some ingest tool, databricks is not the best for this imo), then process the data using databricks.
10-13-2023 03:13 AM
Perhaps for Some use cases this will be the solution.
But it does not cancel the fact that there is a memory leak bug in the driver.
10-13-2023 05:10 AM
not necessarily a memory leak. possibly the raw data is fetched and the query is processed in memory. don't know if that is the case though.
10-16-2023 12:28 AM
Ok, let's call it a temporary minor memory starvation issue causing the virtual machine to crash.
10-16-2023 12:31 AM
And here's another extremely minor issue leading to uncontrolled reproduction of threads. https://community.databricks.com/t5/data-engineering/thread-leakage-when-connection-cannot-be-establ...
For some reason nobody responds to it either....
01-14-2025 12:08 PM
Hi, I am from databricks eng and we have had the driver developer look into this and could not repro. A couple of things to note:
1. 2.6.33 is a pretty old driver that does not have Cloud Fetch support.
2. nowadays, later versions have Cloud Fetch enabled by default. The client/server interactions' shape has changed significantly in the later versions. Can you try the new versions?
3. if you do not mind, would you share your client-side fix, so I can pass it on to the driver developers to take a look and see whether they are still relevant to include in the improvement to the later versions?
Thanks for your patience and support!
eng-partner-eco-help@databricks.com
10-16-2023 02:19 AM
That is, at least I think, because the jdbc driver is not part of the databricks platform itself (and closed source afaik).
Chances are small that someone of the community knows the ins an outs of the driver-code.
Now, if you are convinced that there is an actual bug in the databricks driver, I suggest you open a ticket at databricks so someone can look into it.
Because maybe you stumbled upon something here.
10-17-2023 06:05 AM
I solved this issue, but it requires to change several classes. The final result:
10-17-2023 10:36 PM
Nice!
You might wanna share your improvements with the driver devs.
10-17-2023 11:35 PM
Yes, I really want to, but I have absolutely no idea how to send these edits to them.
They do not have a public repository or public ticket system.
10-18-2023 12:12 AM
@Retired_modany idea?
 
					
				
				
			
		
 
					
				
				
			
		
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now