Data Engineering

Forum Posts

Sorted by:

by Rahul2025 • New Contributor III

02-02-2023 10:37:28 PM

7723 Views
11 replies
1 kudos

Limitation on size of init script

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

Data Engineering

7723 Views
11 replies
1 kudos

02-02-2023 10:37:28 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 1:39:37 AM

1 kudos

Hi @Rahul K Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

1 kudos

04-10-2023 1:39:37 AM

10 More Replies

by SebastianM • New Contributor

12-16-2022 6:48:52 AM

2658 Views
1 replies
0 kudos

JDBC to delta lake: Is setting fetch size expected to be effective?

I am using the databricks jdbc driver to access a delta lake. The database URL specifies transportMode=http. I have experimented with setting different values of fetchSize on the java.sqlPreparedStatement object and have monitored memory use within m...

Data Engineering

2658 Views
1 replies
0 kudos

12-16-2022 6:48:52 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-20-2022 6:08:35 AM

0 kudos

I think there is one spark configuration but I forgot right now Pelase try to utilized this doc maybe you get something- https://spark.apache.org/docs/latest/configuration.html

0 kudos

12-20-2022 6:08:35 AM

by ramankr48 • Contributor II

10-19-2022 4:01:39 AM

47368 Views
6 replies
11 kudos

Resolved! how to find the size of a table in python or sql?

let's suppose there is a database db, inside that so many tables are there and , i want to get the size of tables . how to get in either sql, python, pyspark.even if i have to get one by one it's fine.

Data Engineering

47368 Views
6 replies
11 kudos

10-19-2022 4:01:39 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

10-19-2022 10:54:01 AM

11 kudos

@Raman Gupta - could you please try the below %python spark.sql("describe detail delta-table-name").select("sizeInBytes").collect()

11 kudos

10-19-2022 10:54:01 AM

5 More Replies

by JananiMohan • New Contributor

01-03-2022 8:41:50 AM

8483 Views
4 replies
0 kudos

Resolved! ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

After the new release of numpy 1.22.0 on Dec 31st, Databricks failed with this error for my existing Databricks Notebook Version 10.1 and numpy 1.20.0Qn: Why did the earlier releases after 1.20.0 uptil 1.22.0 did not raise the same exception. ?

Data Engineering

8483 Views
4 replies
0 kudos

01-03-2022 8:41:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-19-2022 8:21:36 AM

0 kudos

Hi @Janani Mohan Hope you are doing well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

0 kudos

05-19-2022 8:21:36 AM

3 More Replies

by DamienSicard • New Contributor III

12-16-2021 12:59:14 AM

10748 Views
2 replies
1 kudos

Resolved! Notebooks font size

Hi,Is there a way to increase the cells' font size and set it as a default setting ?Thanks.Best Damien

Data Engineering

10748 Views
2 replies
1 kudos

12-16-2021 12:59:14 AM

View Replies

Latest Reply

DamienSicard
New Contributor III

12-16-2021 1:42:16 AM

1 kudos

@Werner Stinckens Alright, thanks for your answer.Best Damien

1 kudos

12-16-2021 1:42:16 AM

1 More Replies

by Anonymous • Not applicable

06-08-2021 7:26:50 PM

16608 Views
1 replies
0 kudos

Resolved! Ideal number and size of partitions

Spark by default uses 200 partitions when doing transformations. The 200 partitions might be too large if a user is working with small data, hence it can slow down the query. Conversely, the 200 partitions might be too small if the data is big. So ho...

Data Engineering

16608 Views
1 replies
0 kudos

06-08-2021 7:26:50 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-09-2021 3:35:00 AM

0 kudos

You could tweak the default value 200 by changing spark.sql.shuffle.partitions configuration to match your data volume. Here is a sample python code for calculating the valueHowever if you have multiple workloads with different data volumes, instead ...

0 kudos

06-09-2021 3:35:00 AM

Databricks Community

Limitation on size of init script

JDBC to delta lake: Is setting fetch size expected to be effective?

Resolved! how to find the size of a table in python or sql?

Resolved! ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Resolved! Notebooks font size

Resolved! Ideal number and size of partitions