cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Rahul2025
by New Contributor III
  • 2769 Views
  • 11 replies
  • 1 kudos

Limitation on size of init script

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 2769 Views
  • 11 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Rahul K​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

  • 1 kudos
10 More Replies
SebastianM
by New Contributor
  • 1037 Views
  • 1 replies
  • 0 kudos

JDBC to delta lake: Is setting fetch size expected to be effective?

I am using the databricks jdbc driver to access a delta lake. The database URL specifies transportMode=http. I have experimented with setting different values of fetchSize on the java.sqlPreparedStatement object and have monitored memory use within m...

  • 1037 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

I think there is one spark configuration but I forgot right now Pelase try to utilized this doc maybe you get something- https://spark.apache.org/docs/latest/configuration.html

  • 0 kudos
ramankr48
by Contributor II
  • 22668 Views
  • 6 replies
  • 8 kudos

Resolved! how to find the size of a table in python or sql?

let's suppose there is a database db, inside that so many tables are there and , i want to get the size of tables . how to get in either sql, python, pyspark.even if i have to get one by one it's fine.

  • 22668 Views
  • 6 replies
  • 8 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 8 kudos

@Raman Gupta​ - could you please try the below %python spark.sql("describe detail delta-table-name").select("sizeInBytes").collect()

  • 8 kudos
5 More Replies
JananiMohan
by New Contributor
  • 4923 Views
  • 4 replies
  • 0 kudos

Resolved! ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

After the new release of numpy 1.22.0 on Dec 31st, Databricks failed with this error for my existing Databricks Notebook Version 10.1 and numpy 1.20.0Qn: Why did the earlier releases after 1.20.0 uptil 1.22.0 did not raise the same exception. ?

  • 4923 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Janani Mohan​ Hope you are doing well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 0 kudos
3 More Replies
Kaniz
by Community Manager
  • 16689 Views
  • 4 replies
  • 3 kudos
  • 16689 Views
  • 4 replies
  • 3 kudos
Latest Reply
Feras
New Contributor II
  • 3 kudos

To obtain the shape of a data frame in PySpark, you can obtain the number of rows through "DF.count()" and the number of columns through "len(DF.columns)". The code below provides the shape of PySpark data frame "DF".print((DF.count(), len(DF.columns...

  • 3 kudos
3 More Replies
DamienSicard
by New Contributor III
  • 6036 Views
  • 4 replies
  • 2 kudos

Resolved! Notebooks font size

Hi,Is there a way to increase the cells' font size and set it as a default setting ?Thanks.Best Damien

  • 6036 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Damien Sicard​ , As @werners has stated, you can zoom your browser.

  • 2 kudos
3 More Replies
Anonymous
by Not applicable
  • 6565 Views
  • 1 replies
  • 0 kudos

Resolved! Ideal number and size of partitions

Spark by default uses 200 partitions when doing transformations. The 200 partitions might be too large if a user is working with small data, hence it can slow down the query. Conversely, the 200 partitions might be too small if the data is big. So ho...

  • 6565 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could tweak the default value 200 by changing spark.sql.shuffle.partitions configuration to match your data volume. Here is a sample python code for calculating the valueHowever if you have multiple workloads with different data volumes, instead ...

  • 0 kudos
Labels