cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826987838
by Databricks Employee
  • 1688 Views
  • 1 replies
  • 1 kudos
  • 1688 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16783855534
Databricks Employee
  • 1 kudos

https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html#create-user

  • 1 kudos
brickster_2018
by Databricks Employee
  • 2857 Views
  • 1 replies
  • 0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

  • 2857 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 6559 Views
  • 1 replies
  • 0 kudos

Resolved! How to list all Delta tables in a Database?

I wanted to get a list of all the Delta tables in a Database. What is the easiest way of getting it.

  • 6559 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Below code, the snippet can be used to list down the tables in a databaseval db = "database_name"   spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x....

  • 0 kudos
User16826992666
by Databricks Employee
  • 36473 Views
  • 3 replies
  • 1 kudos
  • 36473 Views
  • 3 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

This is by design and working as expected. Spark writes the data distributedly. use of coalesce (1) can help to generate one file, however this solution is not scalable for large data set as it involves bringing the data to one single task.

  • 1 kudos
2 More Replies
Srikanth_Gupta_
by Databricks Employee
  • 1243 Views
  • 1 replies
  • 1 kudos
  • 1243 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

Photon is supported for batch workloads today and is the standard on Databricks SQL clusters and available as an option for Automated and Interactive clusters. And photon is in public preview today so available as an option for everyone. See this lin...

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1354 Views
  • 2 replies
  • 0 kudos
  • 1354 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Delta has significant value beyond the DML/ACID capabilities. Delta's data organization strategies that @Ryan Chynoweth​ mentions also offer an advantage even for read-only use cases for querying and joining the data. Delta also supports in-place con...

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Databricks Employee
  • 3413 Views
  • 1 replies
  • 0 kudos
  • 3413 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

This spark-salesforce connector looks like an option to query this data via SOQL/SAQL and brought into Databricks/Spark

  • 0 kudos
christys
by Databricks Employee
  • 1116 Views
  • 1 replies
  • 0 kudos
  • 1116 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

There's actually several options here!AWSIf you'd like a very quick setup but full featured environment for your org, use the AWS quickstart: https://aws.amazon.com/quickstart/architecture/databricks/If you're solo exploring, you can use Databricks c...

  • 0 kudos
Anonymous
by Not applicable
  • 1370 Views
  • 1 replies
  • 0 kudos

Setting cluster settings through SCIM

is there a way to set the following cluster settings through the SCIM? I am not seeing anything in the API docs that would suggest it is possible but I want to double check here.Enable credential passthroughSingle User AccessPermission settings

  • 1370 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

Credential passthroughThis actually needs some setting up in AWS IAM to get started. Once you've created the right instance profiles, you'll need to add them to your Databricks workspace. There's pretty exhaustive guides here that has each of the ste...

  • 0 kudos
aladda
by Databricks Employee
  • 1375 Views
  • 1 replies
  • 0 kudos
  • 1375 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Databricks jobs api can invoke code from cloud storage. But please note that it will not be stored as a Databricks notebook, rather it would be a source file or jar.

  • 0 kudos
User16789201666
by Databricks Employee
  • 2769 Views
  • 1 replies
  • 0 kudos

How do you capture change logs from RDMS source and ingest the changes in Databricks AWS?

A common option is to use AWS DMS, https://databricks.com/blog/2019/07/15/migrating-transactional-data-to-a-delta-lake-using-aws-dms.html

  • 2769 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

https://databricks.com/blog/2019/07/15/migrating-transactional-data-to-a-delta-lake-using-aws-dms.html

  • 0 kudos
alexott
by Databricks Employee
  • 6136 Views
  • 2 replies
  • 0 kudos

How I can test my Python code that I wrote using notebooks?

I've written the code in the notebooks using the Python, and I want to add tests to it to make sure that it won't break when I do more changes.What tools can I use for that tasks?

  • 6136 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

@Alex Ott​ has an awesome answer!Here is a great blog from our engineering team that may help as well. https://databricks.com/blog/2020/01/16/automate-deployment-and-testing-with-databricks-notebook-mlflow.html

  • 0 kudos
1 More Replies
Mooune_DBU
by Databricks Employee
  • 1649 Views
  • 0 replies
  • 0 kudos

Spark vs. Ray?

Ray has been getting a lot of traction lately for shining at distributed compute.What are the primary differences between Spark and Ray?In which areas/applications would each be best ? (i.e. Reinforcement Learning)In which cases would it make sense f...

  • 1649 Views
  • 0 replies
  • 0 kudos
User16826994223
by Databricks Employee
  • 2785 Views
  • 2 replies
  • 0 kudos

Requirement to Run Koalas

Hi I am planning to run Koalas on Databricks environment, What are the requirements for running Koalas there

  • 2785 Views
  • 2 replies
  • 0 kudos
Latest Reply
tj-cycyota
Databricks Employee
  • 0 kudos

Koalas is great! This really helps ease the transition from Pandas to Spark, because you can just use the same Pandas functions/classes through the Koalas API but everything runs in the background in Spark.

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels