- 12850 Views
- 1 replies
- 0 kudos
How can I change the log level of the Spark Driver and executor process?
- 12850 Views
- 1 replies
- 0 kudos
Latest Reply
Change the log level of Driver:%scala
spark.sparkContext.setLogLevel("DEBUG")
spark.sparkContext.setLogLevel("INFO")Change the log level of a particular package in Driver logs:%scala
org.apache.log4j.Logger.getLogger("shaded.databricks.v201809...
- 1013 Views
- 1 replies
- 0 kudos
The cluster is Idle and there are no Spark jobs running on the Spark UI. Still I see my cluster is active and not getting terminated.
- 1013 Views
- 1 replies
- 0 kudos
Latest Reply
Databricks cluster is treated as active if there are any spark or non-Spark operations running on the cluster. Even though there are no Spark jobs running on the cluster, it's possible to have some driver-specific application code running marking th...
- 1369 Views
- 1 replies
- 0 kudos
I have a jar job running migrated from EMR to Databricks. The job runs as expected and completes all the operations in the application. However the job run is marked as failed on the Databricks Jobs UI.
- 1369 Views
- 1 replies
- 0 kudos
Latest Reply
Usage of spark.stop(), sc.stop() , System.exit() in your application can cause this behavior. Databricks manages the context shutdown on its own. Forcefully closing it can cause this abrupt behavior.
- 553 Views
- 1 replies
- 2 kudos
Few things you should not do in Databricks!
- 553 Views
- 1 replies
- 2 kudos
Latest Reply
Compared to OSS Spark, these are few things the users don't have to worry about when running the same job on Databricks. Memory management: Databricks use an internal formula to allocate the Driver and executor heap based on the size of the instance....
- 688 Views
- 2 replies
- 0 kudos
What is the best way to convert a very large parquet table to delta ? possibly without downtime!
- 688 Views
- 2 replies
- 0 kudos
Latest Reply
I vouch for Sajith's answer. The main advantage with "CONVERT TO DELTA" is that operations are metadata centric which means we are not reading the full data for the conversion. For any other file format conversion, it's necessary to read the data com...
1 More Replies
- 675 Views
- 2 replies
- 0 kudos
I have a streaming workload using the S3-SQS Connector. The streaming job is running fine within the SLA. Should I migrate my job to use the auto-loader? If Yes, what are the benefits? who should migrate and who should not?
- 675 Views
- 2 replies
- 0 kudos
Latest Reply
That makes sense @Anand Ladda​ ! One major improvement that will have a direct impact on the performance is the architectural difference. S3-SQS uses an internal implementation of the Delta table to store the checkpoint details about the source files...
1 More Replies
- 1126 Views
- 3 replies
- 0 kudos
What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?
- 1126 Views
- 3 replies
- 0 kudos
Latest Reply
The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-...
2 More Replies
by
aladda
• Honored Contributor II
- 647 Views
- 0 replies
- 0 kudos
It is best to avoid collecting stats on long strings. You typically want to collect stats on column that are used in filter, where clauses, joins and on which you tend to performance aggregations - typically numerical valuesYou can avoid collecting s...
- 647 Views
- 0 replies
- 0 kudos
- 730 Views
- 2 replies
- 0 kudos
What is the best way to deal with concurrent exceptions in Delta when you have multiple writers on the same delta table ?
- 730 Views
- 2 replies
- 0 kudos
Latest Reply
While you can try-catch-retry , it would be expensive to retry as the underlying table snapshot would have changed. So the best approach is to avoid conflicts using partitioning and disjoint command conditions as much as possible.
1 More Replies