cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826990884
by Databricks Employee
  • 19798 Views
  • 1 replies
  • 1 kudos

Resolved! Views vs Materialized Delta Tables

Is there general guidance around using views vs creating Delta tables? For example, I need to do some filtering and make small tweaks to a few columns for use in another application. Is there a downside of using a view here?

  • 19798 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826990884
Databricks Employee
  • 1 kudos

Views won't duplicate the data so if you are just filtering columns or rows or making small tweaks then views might be a good option. Unless, of course, the filtering is really expensive or you are doing a lot of calculations, then materialize the vi...

  • 1 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2561 Views
  • 1 replies
  • 1 kudos
  • 2561 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 1 kudos

All three options are secure ways to store secrets. Databricks secrets has the additional functionality of redaction , so it is convenient sometimes. Also in azure, you have the ability to use azure KV as the backend for secrets.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 4579 Views
  • 1 replies
  • 1 kudos

Resolved! Classpath issues when running spark-submit

How to identify the jars used to load a particular class. I am sure I packed the classes correctly in my application jar. However, looks like the class is loaded from a different jar. I want to understand the details so that I can ensure to use the r...

  • 4579 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

Adding the below configurations at the cluster level can help to print more logs to identify the jars from which the class is loaded. spark.executor.extraJavaOptions=-verbose:class spark.driver.extraJavaOptions=-verbose:class

  • 1 kudos
brickster_2018
by Databricks Employee
  • 2125 Views
  • 1 replies
  • 0 kudos

Resolved! Cannot upload libraries on UI

When trying to upload libraries on UI it fails.

  • 2125 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

One corner case scenario where we can hit this issue is if there is /<shard name>/0/Filestore/jars file in the root bucket of the workspace. Once you remove the file, the upload should work fine.

  • 0 kudos
User16783853906
by Databricks Employee
  • 4591 Views
  • 3 replies
  • 0 kudos

Resolved! Frequent spot loss of driver nodes resulting in failed jobs when using spot fleet pools

When using spot fleet pools to schedule jobs, driver and worker nodes are provisioned from the spot pools and we are noticing jobs failing with the below exception when there is a driver spot loss. Share best practices around using fleet pools with 1...

  • 4591 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16783853906
Databricks Employee
  • 0 kudos

In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability i...

  • 0 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 2808 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Vs Yarn - Resource Utilization

I have a spark-submit application that worked fine with 8GB executor memory in yarn. I am testing the same job against the Databricks cluster with the same executor memory. However, the jobs are running slower in Databricks. 

  • 2808 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

This is not an Apple to Apple comparison. When you set 8GB as the executor memory in Yarn, then the container that is launched to run the executor JVM is getting 8GB of memory. Accordingly, the Xmx value of the heap is calculated. In Databricks, when...

  • 1 kudos
Srikanth_Gupta_
by Databricks Employee
  • 3076 Views
  • 2 replies
  • 0 kudos
  • 3076 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Databricks Employee
  • 0 kudos

When a cluster is attached to a pool, cluster nodes are created using the the pool’s idle instances which help to reduce cluster start and auto-scaling times .If you are using pools and looking to reduce start time for all scenarios, then you should ...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2128 Views
  • 2 replies
  • 0 kudos
  • 2128 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

If you are looking for incrementally loading data from Azure SQL, checkout one of our technology partners that support change-data-capture or setup debezium for sql-server. These solutions could land data in a streaming fashion to kafka/kinesis/even...

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2711 Views
  • 1 replies
  • 0 kudos

Resolved! Can I use OSS Spark History Server to view the EventLogs

Is it possible to run the OSS SPark history server and view the spark event logs.

  • 2711 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Yes, it's possible. The OSS Spark history server can read the Spark event logs generated on a Databricks cluster. Using Cluster log delivery, the SPark logs can be written to any arbitrary location. Event logs can be copied from there to the storage ...

  • 0 kudos
User16826990884
by Databricks Employee
  • 1179 Views
  • 0 replies
  • 0 kudos

Encrypt root S3 bucket

This is a 2-part question:How do I go about encrypting an existing root S3 bucket?Will this impact my Databricks environment? (Resources not being accessible, performance issues etc.)

  • 1179 Views
  • 0 replies
  • 0 kudos
brickster_2018
by Databricks Employee
  • 3873 Views
  • 1 replies
  • 0 kudos

Resolved! Jobs running forever in Spark UI

On the Spark UI, Jobs are running forever. But my notebook already completed the operations. Why the resources are wasted

  • 3873 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This happens if the Spark driver is missing events. The jobs/task are not running. The Spark UI is reporting incorrect stats. This can be treated as a harmless UI issue. If you continue to see the issue consistently, then it might be good to review w...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1860 Views
  • 1 replies
  • 0 kudos
  • 1860 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

While using MERGE INTO statement, if the source data that will be merged into the target delta table is small enough to be fit into memory of the worker nodes, then it makes sense to broadcast the source data. By doing so, the execution can avoid the...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 4693 Views
  • 1 replies
  • 0 kudos

Resolved! Can Spark JDBC create duplicate records

Is it transaction safe?Does it ensure atomicity

  • 4693 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Atomicity is ensured at a task level and not at a stage level. For any reason, if the stage is getting retried, the tasks which already completed the write operation will re-run and cause duplicate records. This is expected by design. When Apache Spa...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels