cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jose_gonzalez
by Moderator
  • 5602 Views
  • 1 replies
  • 0 kudos

Resolved! How to get the size of my Delta table

I would like to know how to get the total size of my Delta table

  • 5602 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

The following Kb will show a step by step example on how to get the size of a Delta table https://kb.databricks.com/sql/find-size-of-table.html

  • 0 kudos
jose_gonzalez
by Moderator
  • 15890 Views
  • 1 replies
  • 0 kudos

Resolved! error message rpc response (of 20978566 bytes) exceeds limit of 20971520 bytes

Im getting the following error message when trying to use display()Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster.com.databricks.rpc.RPCResponseTooLarge: rpc response (of 20978566 bytes) exceeds limi...

  • 15890 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

It seems like the error is coming from the 20MB output limit. For more information please check this https://docs.databricks.com/jobs.html#output-size-limits

  • 0 kudos
Anonymous
by Not applicable
  • 1631 Views
  • 1 replies
  • 0 kudos
  • 1631 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

There are two types of auto scaling in Databricks: Standard and Optimized. In both scenarios when tasks are submitted the cluster will begin scaling to execute as many of them in parallel immediately.Scaling down is different. In optimized autoscalin...

  • 0 kudos
Anonymous
by Not applicable
  • 4039 Views
  • 1 replies
  • 0 kudos
  • 4039 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16019159252
New Contributor III
  • 0 kudos

Yes, you can alerts - Email alerts sent in case of job failure, success, or timeout. You can set alerts up for job start, job success, and job failure (including skipped jobs), providing multiple comma-separated email addresses for each alert type. Y...

  • 0 kudos
Anonymous
by Not applicable
  • 688 Views
  • 0 replies
  • 0 kudos

Using multiple clouds

Are there recommendations and/or examples of leveraging AWS and Azure with Databricks? If so, is there any best practices to follow? Want to ensure we avoid expensive data transfer across clouds

  • 688 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 1233 Views
  • 0 replies
  • 0 kudos

Automatically create folder structure

I imported one workspace into another and noticed there were several instances of RESOURCE_DOES_NOT_EXIST errors because of the folder structure of the workspace (despite importing the workspace as well), see example below:Get: https://dbc-9d482d3a-f...

  • 1233 Views
  • 0 replies
  • 0 kudos
User16752241457
by New Contributor II
  • 1769 Views
  • 1 replies
  • 0 kudos

Saving display() plots

Is there an easy way I can save the plots generated by the display() cmd?

  • 1769 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16788317454
New Contributor III
  • 0 kudos

Plots generated via the display() command are automatically saved under /FileStore/plots. See the documentation for more info: https://docs.databricks.com/data/filestore.html#filestore.However, perhaps an easier approach to save/revisit plots is to u...

  • 0 kudos
User16788317454
by New Contributor III
  • 1110 Views
  • 1 replies
  • 0 kudos
  • 1110 Views
  • 1 replies
  • 0 kudos
Latest Reply
j_weaver
New Contributor III
  • 0 kudos

If you are talking about distributed training of a single XGBoost model, there is no built-in capability in SparkML. SparkML supports gradient boosted trees, but not XGBoost specifically. However, there are 3rd party packages, such as XGBoost4J that ...

  • 0 kudos
j_weaver
by New Contributor III
  • 1310 Views
  • 1 replies
  • 0 kudos
  • 1310 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16788317454
New Contributor III
  • 0 kudos

With Spark, there are a few ways you can scale your model: TrainingHyperparameter tuningInferenceIf you're looking to train one model across multiple workers, you can leverage Horovod. It's an open source project designed to simplify distributed neur...

  • 0 kudos
jose_gonzalez
by Moderator
  • 1083 Views
  • 2 replies
  • 0 kudos

Cluster goes unresponsive after installing a library

Right after I install a library in my cluster, my cluster goes unresponsive and nothing runs. How to solve this issue?

  • 1083 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

it is a standard cluster. It is happening for all libraries. is there a way to debug or show the errors messages if any?

  • 0 kudos
1 More Replies
j_weaver
by New Contributor III
  • 1116 Views
  • 1 replies
  • 0 kudos
  • 1116 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752246141
New Contributor III
  • 0 kudos

Pandas works for single machine computations, so any pandas code you write on Databricks will run on the driver of the cluster. Pyspark and Koalas are both distributed frameworks for when you have large datasets. You can use Pyspark and Koalas inte...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels