cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16783853906
by Contributor III
  • 3647 Views
  • 2 replies
  • 0 kudos

Trigger.once mode recommendation

When is it recommended to use Trigger.once mode compared to fixed processing intervals with micro batches?

  • 3647 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

Also note, the configurations like maxFilesPerTrigger, maxBytesPerTrigger are ignored with Trigger.Once. Streaming queries with significantly less throughput can switch to Trigger.Once to avoid the continuous execution of the job checking the availab...

  • 0 kudos
1 More Replies
User16783853906
by Contributor III
  • 1533 Views
  • 2 replies
  • 0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time?  Will it impact the job result/performance?

  • 1533 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

  • 0 kudos
1 More Replies
User16783853906
by Contributor III
  • 1886 Views
  • 2 replies
  • 0 kudos

How does running VACUUM on Delta Lake tables effect read/write performance?

If I don't run VACUUM on a Delta Lake table, will that make my read performance slower?

  • 1886 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

VACUUM has no effect on read/write performance to that table. Never running VACUUM on a table will not make read/write performance to a Delta Lake table any slower.If you run VACUUM very infrequently, your VACUUM runtimes themselves may be pretty hig...

  • 0 kudos
1 More Replies
User16783855534
by New Contributor III
  • 684 Views
  • 1 replies
  • 1 kudos

Can I have a Databricks Cluster that is only 1 node?

Yes you can create a "Single Node" Cluster, https://docs.databricks.com/clusters/single-node.html . It is currently not recommended to use "Single Node" cluster for streaming workloads

  • 684 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 1 kudos

Single Node clusters should not be used for production workloads involving streaming queries, or complex computations. The intention here is to bring up the Spark cluster for all kinds of workloads

  • 1 kudos
User16826987838
by Contributor
  • 1492 Views
  • 1 replies
  • 0 kudos
  • 1492 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

You can't change the owner. But you can try to clone the cluster or you can also give "Can Manage" to another user but the cluster creator stays fixed.

  • 0 kudos
User16826987838
by Contributor
  • 649 Views
  • 1 replies
  • 1 kudos
  • 649 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16783855534
New Contributor III
  • 1 kudos

https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html#create-user

  • 1 kudos
User16869510359
by Esteemed Contributor
  • 1346 Views
  • 1 replies
  • 0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

  • 1346 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 3004 Views
  • 1 replies
  • 0 kudos

Resolved! How to list all Delta tables in a Database?

I wanted to get a list of all the Delta tables in a Database. What is the easiest way of getting it.

  • 3004 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

Below code, the snippet can be used to list down the tables in a databaseval db = "database_name"   spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x....

  • 0 kudos
User16826992666
by Valued Contributor
  • 10770 Views
  • 3 replies
  • 0 kudos
  • 10770 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

This is by design and working as expected. Spark writes the data distributedly. use of coalesce (1) can help to generate one file, however this solution is not scalable for large data set as it involves bringing the data to one single task.

  • 0 kudos
2 More Replies
Srikanth_Gupta_
by Valued Contributor
  • 474 Views
  • 1 replies
  • 1 kudos
  • 474 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

Photon is supported for batch workloads today and is the standard on Databricks SQL clusters and available as an option for Automated and Interactive clusters. And photon is in public preview today so available as an option for everyone. See this lin...

  • 1 kudos
User16869510359
by Esteemed Contributor
  • 538 Views
  • 2 replies
  • 0 kudos
  • 538 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Delta has significant value beyond the DML/ACID capabilities. Delta's data organization strategies that @Ryan Chynoweth​ mentions also offer an advantage even for read-only use cases for querying and joining the data. Delta also supports in-place con...

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Valued Contributor
  • 1554 Views
  • 1 replies
  • 0 kudos
  • 1554 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

This spark-salesforce connector looks like an option to query this data via SOQL/SAQL and brought into Databricks/Spark

  • 0 kudos
christys
by Community Manager
  • 406 Views
  • 1 replies
  • 0 kudos
  • 406 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
New Contributor III
  • 0 kudos

There's actually several options here!AWSIf you'd like a very quick setup but full featured environment for your org, use the AWS quickstart: https://aws.amazon.com/quickstart/architecture/databricks/If you're solo exploring, you can use Databricks c...

  • 0 kudos
Labels
Top Kudoed Authors