cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Valued Contributor
  • 1514 Views
  • 1 replies
  • 0 kudos

What happens if a spot instance worker is lost in the middle of a query?

Does the query have to be re-run from the start, or can it continue? Trying to evaluate what risk there is by using spot instances for production jobs

  • 1514 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

If a spot instance is reclaimed in the middle of a job, then spark will treat it as a lost worker. The spark engine will automatically retry the tasks from the lost worker on other available workers. So the query does not have to start over if indivi...

  • 0 kudos
User16826992666
by Valued Contributor
  • 442 Views
  • 1 replies
  • 0 kudos
  • 442 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

No, the HTML is a point-in-time snapshot of the notebook from when you perform the export. Visuals and data results do not update in the HTML when updates are made on the notebook still in the workspace.

  • 0 kudos
User16826992666
by Valued Contributor
  • 388 Views
  • 1 replies
  • 0 kudos

Which MLlib library am I supposed to use - pyspark.mllib or pyspark.ml?

Both of these libraries seem to be available and they are both for MLlib, how do I know which one to use?

  • 388 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

The pyspark.mllib library is built for RDD's, and the pyspark.ml library is built for Dataframes. The RDD-based mllib library is currently in maintenance mode, while the Dataframe library will continue to receive updates and active development. For t...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1347 Views
  • 1 replies
  • 0 kudos

Can I prevent users from downloading data from a notebook?

By default any user can download a copy of the data they query in a notebook. Is it possible to prevent this?

  • 1347 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

You can limit the ways that users can save copies of the data they have access to in a notebook, but not prevent it entirely. The download button which exists for cells in Databricks notebooks can be disabled in the "Workspace Settings" section of th...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1167 Views
  • 1 replies
  • 0 kudos

How to get the files with a prefix in Pyspark from s3 bucket?

I have different files in my s3. Now I want to get the files which starts with cop_

  • 1167 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

You are referencing a FileInfo object when calling .startswith()and not a string.The filename is a property of the FileInfo object, so this should work filename.name.startswith('cop_ ') should work.

  • 0 kudos
User16826994223
by Honored Contributor III
  • 569 Views
  • 1 replies
  • 2 kudos

Where do SQL endpoints run?

Where do Databricks SQL endpoints run?

  • 569 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 2 kudos

Like Databricks clusters, SQL endpoints are created and managed in your Cloud Account (like GCP,AZURE,cloud). SQL endpoints manage SQL-optimized clusters automatically in your account and scale to match end-user demand.

  • 2 kudos
User16826994223
by Honored Contributor III
  • 1143 Views
  • 1 replies
  • 0 kudos

What does it mean that Delta Lake supports multi-cluster writes

What does it mean that Delta Lake supports multi-cluster writes ,Please explain , Ca we write same delta table with Multiple cluster

  • 1143 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

It means that Delta Lake does locking to make sure that queries writing to a table from multiple clusters at the same time won’t corrupt the table. However, it does not mean that if there is a write conflict (for example, update and delete the same t...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 613 Views
  • 1 replies
  • 0 kudos

What DDL and DML features does Delta Lake not support?

What DDL and DML features does Delta Lake not support?

  • 613 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Unsupported DDL features:ANALYZE TABLE PARTITIONALTER TABLE [ADD|DROP] PARTITIONALTER TABLE RECOVER PARTITIONSALTER TABLE SET SERDEPROPERTIESCREATE TABLE LIKEINSERT OVERWRITE DIRECTORYLOAD DATAUnsupported DML features:INSERT INTO [OVERWRITE] table wi...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 782 Views
  • 1 replies
  • 0 kudos

Resolved! Business Continuity plan o DataBricks

Does Databricks have a business continuity plan for Databricks?

  • 782 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

AWS offers a business continuity program (AWS business continuity and disaster recovery), and Databricks is designed to run out of multiple regions and multiple availability zones, or data centers.This is applicable for All Business Provider

  • 0 kudos
User16826994223
by Honored Contributor III
  • 2661 Views
  • 1 replies
  • 0 kudos

How to export full result Databricks Azure

what is the best way to see all the data , I see display shows up to 100000 data only . any way in which I can see all the data or do I need to download or export it in different file

  • 2661 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Yes, databricks display only a limited dataframe. It allows you to download the data like a csv, . You can save the dataframe as a table in the databricks database with this:predictions.select("salry", "dept").write.saveAsTable("depsalry")Then you ca...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 472 Views
  • 1 replies
  • 0 kudos

what is Delta live table? How is it different from the Normal Delta table

what is Delta's live table? How is it different from the Normal Delta table

  • 472 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Delta Live tables builds on normal delta table by allowing you to easily define end-to-end data pipelines by specifying the data source, the transformation logic, and destination state of the data — instead of manually stitching together siloed data ...

  • 0 kudos
Srikanth_Gupta_
by Valued Contributor
  • 585 Views
  • 1 replies
  • 0 kudos

How does Spark SQL Catalyst optimizer work?

How does Catalyst optimizer improves the performances, what is its role?

  • 585 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Catalyst optimizer converts unresolved logical plan into executable physical plan, deep dive is available here

  • 0 kudos
Labels
Top Kudoed Authors