cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826992666
by Valued Contributor
  • 627 Views
  • 1 replies
  • 0 kudos

Which MLlib library am I supposed to use - pyspark.mllib or pyspark.ml?

Both of these libraries seem to be available and they are both for MLlib, how do I know which one to use?

  • 627 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

The pyspark.mllib library is built for RDD's, and the pyspark.ml library is built for Dataframes. The RDD-based mllib library is currently in maintenance mode, while the Dataframe library will continue to receive updates and active development. For t...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1893 Views
  • 1 replies
  • 0 kudos

Can I prevent users from downloading data from a notebook?

By default any user can download a copy of the data they query in a notebook. Is it possible to prevent this?

  • 1893 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

You can limit the ways that users can save copies of the data they have access to in a notebook, but not prevent it entirely. The download button which exists for cells in Databricks notebooks can be disabled in the "Workspace Settings" section of th...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1662 Views
  • 1 replies
  • 0 kudos

How to get the files with a prefix in Pyspark from s3 bucket?

I have different files in my s3. Now I want to get the files which starts with cop_

  • 1662 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

You are referencing a FileInfo object when calling .startswith()and not a string.The filename is a property of the FileInfo object, so this should work filename.name.startswith('cop_ ') should work.

  • 0 kudos
User16826994223
by Honored Contributor III
  • 960 Views
  • 1 replies
  • 2 kudos

Where do SQL endpoints run?

Where do Databricks SQL endpoints run?

  • 960 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 2 kudos

Like Databricks clusters, SQL endpoints are created and managed in your Cloud Account (like GCP,AZURE,cloud). SQL endpoints manage SQL-optimized clusters automatically in your account and scale to match end-user demand.

  • 2 kudos
User16826994223
by Honored Contributor III
  • 5587 Views
  • 1 replies
  • 0 kudos

What does it mean that Delta Lake supports multi-cluster writes

What does it mean that Delta Lake supports multi-cluster writes ,Please explain , Ca we write same delta table with Multiple cluster

  • 5587 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

It means that Delta Lake does locking to make sure that queries writing to a table from multiple clusters at the same time won’t corrupt the table. However, it does not mean that if there is a write conflict (for example, update and delete the same t...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 931 Views
  • 1 replies
  • 0 kudos

What DDL and DML features does Delta Lake not support?

What DDL and DML features does Delta Lake not support?

  • 931 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Unsupported DDL features:ANALYZE TABLE PARTITIONALTER TABLE [ADD|DROP] PARTITIONALTER TABLE RECOVER PARTITIONSALTER TABLE SET SERDEPROPERTIESCREATE TABLE LIKEINSERT OVERWRITE DIRECTORYLOAD DATAUnsupported DML features:INSERT INTO [OVERWRITE] table wi...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1176 Views
  • 1 replies
  • 0 kudos

Resolved! Business Continuity plan o DataBricks

Does Databricks have a business continuity plan for Databricks?

  • 1176 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

AWS offers a business continuity program (AWS business continuity and disaster recovery), and Databricks is designed to run out of multiple regions and multiple availability zones, or data centers.This is applicable for All Business Provider

  • 0 kudos
User16826994223
by Honored Contributor III
  • 3586 Views
  • 1 replies
  • 0 kudos

How to export full result Databricks Azure

what is the best way to see all the data , I see display shows up to 100000 data only . any way in which I can see all the data or do I need to download or export it in different file

  • 3586 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Yes, databricks display only a limited dataframe. It allows you to download the data like a csv, . You can save the dataframe as a table in the databricks database with this:predictions.select("salry", "dept").write.saveAsTable("depsalry")Then you ca...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 755 Views
  • 1 replies
  • 0 kudos

what is Delta live table? How is it different from the Normal Delta table

what is Delta's live table? How is it different from the Normal Delta table

  • 755 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Delta Live tables builds on normal delta table by allowing you to easily define end-to-end data pipelines by specifying the data source, the transformation logic, and destination state of the data — instead of manually stitching together siloed data ...

  • 0 kudos
Srikanth_Gupta_
by Valued Contributor
  • 979 Views
  • 1 replies
  • 0 kudos

How does Spark SQL Catalyst optimizer work?

How does Catalyst optimizer improves the performances, what is its role?

  • 979 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Catalyst optimizer converts unresolved logical plan into executable physical plan, deep dive is available here

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1084 Views
  • 1 replies
  • 0 kudos

Unable to start cluster Error :- Defunct Resource Detected

Hi AllI am getting this error for some jobs. Can you please let me know what could be the reasonRun result unavailable: job failed with an error message -Run result unavailable: job failed with error messageUnexpected failure while waiting for the cl...

  • 1084 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

This is an issue on the cloud level so try to put retries in the job as it happens not for all cluster start , it may fails once but will start after retry,Also, raise a databricks ticket , they will provide permanent solution

  • 0 kudos
jose_gonzalez
by Moderator
  • 914 Views
  • 1 replies
  • 0 kudos

How to solve Hive connectivity issues?

I can see connectivity issues in my driver logs. How to solve this issue?

  • 914 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Can you give us some more error please, I hope you will get more error in logs, whether it is a connection issue because of JDbc URL or host name or password,something like this

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels