cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Databricks Employee
  • 13007 Views
  • 1 replies
  • 1 kudos

Resolved! Can you import a Jupyter notebook to a Databricks workspace?

Also curious if you can export a notebook created in Databricks as a Jupyter notebook

  • 13007 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826992666
Databricks Employee
  • 1 kudos

Yes, the .ipynb format is a supported file type which can be imported to a Databricks workspace. Note that some special configurations may need to be adjusted to work in the Databricks environment. Additional accepted file formats which can be import...

  • 1 kudos
User16826992666
by Databricks Employee
  • 2656 Views
  • 1 replies
  • 0 kudos

Resolved! What should I be looking for when evaluating the performance of a Spark job?

Where do I start when starting performance tuning of my queries? Are there particular things I should be looking out for?

  • 2656 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Databricks Employee
  • 0 kudos

Few things on top of my mind.1) Check Spark UI and check which stage is taking more time.2) Check for data skewing3) Data skew can severely downgrade performance of queries, Spark SQL accepts skew hints in queries, also make sure to use proper join h...

  • 0 kudos
User16826992666
by Databricks Employee
  • 1055 Views
  • 1 replies
  • 0 kudos

Does Databricks SQL support any kind of custom visuals?

Wondering if I can make any kind of custom visuals or are the ones that come built in the only options?

  • 1055 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

At this time the only available visuals are the ones that are included in the Databricks SQL environment. There is no way to import or create custom visuals.

  • 0 kudos
User16826992666
by Databricks Employee
  • 1084 Views
  • 1 replies
  • 0 kudos
  • 1084 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

No you do not. Although Delta is the default file format when writing data using Databricks, any file type supported by spark can be used when writing data.

  • 0 kudos
User16826992666
by Databricks Employee
  • 3182 Views
  • 1 replies
  • 0 kudos

What happens if a spot instance worker is lost in the middle of a query?

Does the query have to be re-run from the start, or can it continue? Trying to evaluate what risk there is by using spot instances for production jobs

  • 3182 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

If a spot instance is reclaimed in the middle of a job, then spark will treat it as a lost worker. The spark engine will automatically retry the tasks from the lost worker on other available workers. So the query does not have to start over if indivi...

  • 0 kudos
User16826992666
by Databricks Employee
  • 1354 Views
  • 1 replies
  • 0 kudos
  • 1354 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

No, the HTML is a point-in-time snapshot of the notebook from when you perform the export. Visuals and data results do not update in the HTML when updates are made on the notebook still in the workspace.

  • 0 kudos
User16826992666
by Databricks Employee
  • 1178 Views
  • 1 replies
  • 0 kudos

Which MLlib library am I supposed to use - pyspark.mllib or pyspark.ml?

Both of these libraries seem to be available and they are both for MLlib, how do I know which one to use?

  • 1178 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

The pyspark.mllib library is built for RDD's, and the pyspark.ml library is built for Dataframes. The RDD-based mllib library is currently in maintenance mode, while the Dataframe library will continue to receive updates and active development. For t...

  • 0 kudos
User16826992666
by Databricks Employee
  • 3218 Views
  • 1 replies
  • 0 kudos

Can I prevent users from downloading data from a notebook?

By default any user can download a copy of the data they query in a notebook. Is it possible to prevent this?

  • 3218 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

You can limit the ways that users can save copies of the data they have access to in a notebook, but not prevent it entirely. The download button which exists for cells in Databricks notebooks can be disabled in the "Workspace Settings" section of th...

  • 0 kudos
User16826994223
by Databricks Employee
  • 4024 Views
  • 1 replies
  • 0 kudos

How to get the files with a prefix in Pyspark from s3 bucket?

I have different files in my s3. Now I want to get the files which starts with cop_

  • 4024 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

You are referencing a FileInfo object when calling .startswith()and not a string.The filename is a property of the FileInfo object, so this should work filename.name.startswith('cop_ ') should work.

  • 0 kudos
User16826994223
by Databricks Employee
  • 1603 Views
  • 1 replies
  • 2 kudos

Where do SQL endpoints run?

Where do Databricks SQL endpoints run?

  • 1603 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826994223
Databricks Employee
  • 2 kudos

Like Databricks clusters, SQL endpoints are created and managed in your Cloud Account (like GCP,AZURE,cloud). SQL endpoints manage SQL-optimized clusters automatically in your account and scale to match end-user demand.

  • 2 kudos
User16826994223
by Databricks Employee
  • 6641 Views
  • 1 replies
  • 0 kudos

What does it mean that Delta Lake supports multi-cluster writes

What does it mean that Delta Lake supports multi-cluster writes ,Please explain , Ca we write same delta table with Multiple cluster

  • 6641 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

It means that Delta Lake does locking to make sure that queries writing to a table from multiple clusters at the same time won’t corrupt the table. However, it does not mean that if there is a write conflict (for example, update and delete the same t...

  • 0 kudos
User16826994223
by Databricks Employee
  • 1740 Views
  • 1 replies
  • 0 kudos

What DDL and DML features does Delta Lake not support?

What DDL and DML features does Delta Lake not support?

  • 1740 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Unsupported DDL features:ANALYZE TABLE PARTITIONALTER TABLE [ADD|DROP] PARTITIONALTER TABLE RECOVER PARTITIONSALTER TABLE SET SERDEPROPERTIESCREATE TABLE LIKEINSERT OVERWRITE DIRECTORYLOAD DATAUnsupported DML features:INSERT INTO [OVERWRITE] table wi...

  • 0 kudos
User16826994223
by Databricks Employee
  • 2071 Views
  • 1 replies
  • 0 kudos

Resolved! Business Continuity plan o DataBricks

Does Databricks have a business continuity plan for Databricks?

  • 2071 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

AWS offers a business continuity program (AWS business continuity and disaster recovery), and Databricks is designed to run out of multiple regions and multiple availability zones, or data centers.This is applicable for All Business Provider

  • 0 kudos
Labels