cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Databricks Employee
  • 1077 Views
  • 1 replies
  • 0 kudos

Which MLlib library am I supposed to use - pyspark.mllib or pyspark.ml?

Both of these libraries seem to be available and they are both for MLlib, how do I know which one to use?

  • 1077 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

The pyspark.mllib library is built for RDD's, and the pyspark.ml library is built for Dataframes. The RDD-based mllib library is currently in maintenance mode, while the Dataframe library will continue to receive updates and active development. For t...

  • 0 kudos
User16826992666
by Databricks Employee
  • 3014 Views
  • 1 replies
  • 0 kudos

Can I prevent users from downloading data from a notebook?

By default any user can download a copy of the data they query in a notebook. Is it possible to prevent this?

  • 3014 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Databricks Employee
  • 0 kudos

You can limit the ways that users can save copies of the data they have access to in a notebook, but not prevent it entirely. The download button which exists for cells in Databricks notebooks can be disabled in the "Workspace Settings" section of th...

  • 0 kudos
User16826994223
by Databricks Employee
  • 3901 Views
  • 1 replies
  • 0 kudos

How to get the files with a prefix in Pyspark from s3 bucket?

I have different files in my s3. Now I want to get the files which starts with cop_

  • 3901 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

You are referencing a FileInfo object when calling .startswith()and not a string.The filename is a property of the FileInfo object, so this should work filename.name.startswith('cop_ ') should work.

  • 0 kudos
User16826994223
by Databricks Employee
  • 1483 Views
  • 1 replies
  • 2 kudos

Where do SQL endpoints run?

Where do Databricks SQL endpoints run?

  • 1483 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826994223
Databricks Employee
  • 2 kudos

Like Databricks clusters, SQL endpoints are created and managed in your Cloud Account (like GCP,AZURE,cloud). SQL endpoints manage SQL-optimized clusters automatically in your account and scale to match end-user demand.

  • 2 kudos
User16826994223
by Databricks Employee
  • 6512 Views
  • 1 replies
  • 0 kudos

What does it mean that Delta Lake supports multi-cluster writes

What does it mean that Delta Lake supports multi-cluster writes ,Please explain , Ca we write same delta table with Multiple cluster

  • 6512 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

It means that Delta Lake does locking to make sure that queries writing to a table from multiple clusters at the same time won’t corrupt the table. However, it does not mean that if there is a write conflict (for example, update and delete the same t...

  • 0 kudos
User16826994223
by Databricks Employee
  • 1626 Views
  • 1 replies
  • 0 kudos

What DDL and DML features does Delta Lake not support?

What DDL and DML features does Delta Lake not support?

  • 1626 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Unsupported DDL features:ANALYZE TABLE PARTITIONALTER TABLE [ADD|DROP] PARTITIONALTER TABLE RECOVER PARTITIONSALTER TABLE SET SERDEPROPERTIESCREATE TABLE LIKEINSERT OVERWRITE DIRECTORYLOAD DATAUnsupported DML features:INSERT INTO [OVERWRITE] table wi...

  • 0 kudos
User16826994223
by Databricks Employee
  • 1924 Views
  • 1 replies
  • 0 kudos

Resolved! Business Continuity plan o DataBricks

Does Databricks have a business continuity plan for Databricks?

  • 1924 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

AWS offers a business continuity program (AWS business continuity and disaster recovery), and Databricks is designed to run out of multiple regions and multiple availability zones, or data centers.This is applicable for All Business Provider

  • 0 kudos
User16826994223
by Databricks Employee
  • 5312 Views
  • 1 replies
  • 0 kudos

How to export full result Databricks Azure

what is the best way to see all the data , I see display shows up to 100000 data only . any way in which I can see all the data or do I need to download or export it in different file

  • 5312 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Yes, databricks display only a limited dataframe. It allows you to download the data like a csv, . You can save the dataframe as a table in the databricks database with this:predictions.select("salry", "dept").write.saveAsTable("depsalry")Then you ca...

  • 0 kudos
User16826994223
by Databricks Employee
  • 1361 Views
  • 1 replies
  • 0 kudos

what is Delta live table? How is it different from the Normal Delta table

what is Delta's live table? How is it different from the Normal Delta table

  • 1361 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Delta Live tables builds on normal delta table by allowing you to easily define end-to-end data pipelines by specifying the data source, the transformation logic, and destination state of the data — instead of manually stitching together siloed data ...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 1794 Views
  • 1 replies
  • 0 kudos

How does Spark SQL Catalyst optimizer work?

How does Catalyst optimizer improves the performances, what is its role?

  • 1794 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Databricks Employee
  • 0 kudos

Catalyst optimizer converts unresolved logical plan into executable physical plan, deep dive is available here

  • 0 kudos
User16826994223
by Databricks Employee
  • 1628 Views
  • 1 replies
  • 0 kudos

Unable to start cluster Error :- Defunct Resource Detected

Hi AllI am getting this error for some jobs. Can you please let me know what could be the reasonRun result unavailable: job failed with an error message -Run result unavailable: job failed with error messageUnexpected failure while waiting for the cl...

  • 1628 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

This is an issue on the cloud level so try to put retries in the job as it happens not for all cluster start , it may fails once but will start after retry,Also, raise a databricks ticket , they will provide permanent solution

  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 1441 Views
  • 1 replies
  • 0 kudos

How to solve Hive connectivity issues?

I can see connectivity issues in my driver logs. How to solve this issue?

  • 1441 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Can you give us some more error please, I hope you will get more error in logs, whether it is a connection issue because of JDbc URL or host name or password,something like this

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels