Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16826987838 • Databricks Employee

06-23-2021 1:01:23 PM

950 Views
0 replies
0 kudos

How can I tell which runtime the model serving endpoints use?

Data Engineering

950 Views
0 replies
0 kudos

06-23-2021 1:01:23 PM

by User16826987838 • Databricks Employee

06-23-2021 12:30:38 PM

1675 Views
1 replies
1 kudos

Is there a way to add users to workspace programmatically (through API?) instead of going manually adding them through the Admin console?

Data Engineering

1675 Views
1 replies
1 kudos

06-23-2021 12:30:38 PM

View Replies

Latest Reply

User16783855534
Databricks Employee

06-23-2021 12:43:52 PM

1 kudos

https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html#create-user

1 kudos

06-23-2021 12:43:52 PM

by brickster_2018 • Databricks Employee

06-23-2021 12:25:51 PM

2849 Views
1 replies
0 kudos

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?

Data Engineering

2849 Views
1 replies
0 kudos

06-23-2021 12:25:51 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-23-2021 12:29:28 PM

0 kudos

spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...

0 kudos

06-23-2021 12:29:28 PM

by brickster_2018 • Databricks Employee

06-23-2021 12:19:32 PM

6537 Views
1 replies
0 kudos

Resolved! How to list all Delta tables in a Database?

I wanted to get a list of all the Delta tables in a Database. What is the easiest way of getting it.

Data Engineering

6537 Views
1 replies
0 kudos

06-23-2021 12:19:32 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-23-2021 12:22:17 PM

0 kudos

Below code, the snippet can be used to list down the tables in a databaseval db = "database_name" spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x....

0 kudos

06-23-2021 12:22:17 PM

by User16826992666 • Databricks Employee

06-22-2021 7:15:48 PM

36388 Views
3 replies
1 kudos

Resolved! When I save a Spark dataframe using df.write.format("csv"), I end up with mulitple csv files. Why is this happening?

Data Engineering

36388 Views
3 replies
1 kudos

06-22-2021 7:15:48 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-23-2021 12:12:11 PM

1 kudos

This is by design and working as expected. Spark writes the data distributedly. use of coalesce (1) can help to generate one file, however this solution is not scalable for large data set as it involves bringing the data to one single task.

1 kudos

06-23-2021 12:12:11 PM

2 More Replies

by Srikanth_Gupta_ • Databricks Employee

06-23-2021 8:23:24 AM

1239 Views
1 replies
1 kudos

Can we use Photon for batch and streaming process instead of Spark, when will be available for public?

Data Engineering

1239 Views
1 replies
1 kudos

06-23-2021 8:23:24 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-23-2021 12:03:30 PM

1 kudos

Photon is supported for batch workloads today and is the standard on Databricks SQL clusters and available as an option for Automated and Interactive clusters. And photon is in public preview today so available as an option for everyone. See this lin...

1 kudos

06-23-2021 12:03:30 PM

by brickster_2018 • Databricks Employee

06-23-2021 8:31:09 AM

1347 Views
2 replies
0 kudos

I don't have Upsert/Merge use cases. Should I use Delta or can I use Parquet?

Data Engineering

1347 Views
2 replies
0 kudos

06-23-2021 8:31:09 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-23-2021 11:58:34 AM

0 kudos

Delta has significant value beyond the DML/ACID capabilities. Delta's data organization strategies that @Ryan Chynoweth mentions also offer an advantage even for read-only use cases for querying and joining the data. Delta also supports in-place con...

0 kudos

06-23-2021 11:58:34 AM

1 More Replies

by Srikanth_Gupta_ • Databricks Employee

06-23-2021 9:40:35 AM

3397 Views
1 replies
0 kudos

Resolved! How do we ingest the data from Salesforce into DeltaLake for any CRM analytics use case

Data Engineering

3397 Views
1 replies
0 kudos

06-23-2021 9:40:35 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-23-2021 11:55:06 AM

0 kudos

This spark-salesforce connector looks like an option to query this data via SOQL/SAQL and brought into Databricks/Spark

0 kudos

06-23-2021 11:55:06 AM

by christys • Databricks Employee

05-28-2021 12:01:37 PM

1111 Views
1 replies
0 kudos

What's the easiest way to deploy a Databricks workspace on AWS?

Data Engineering

1111 Views
1 replies
0 kudos

05-28-2021 12:01:37 PM

View Replies

Latest Reply

Taha
Databricks Employee

06-23-2021 10:36:30 AM

0 kudos

There's actually several options here!AWSIf you'd like a very quick setup but full featured environment for your org, use the AWS quickstart: https://aws.amazon.com/quickstart/architecture/databricks/If you're solo exploring, you can use Databricks c...

0 kudos

06-23-2021 10:36:30 AM

by Anonymous • Not applicable

06-22-2021 6:14:56 PM

1365 Views
1 replies
0 kudos

Setting cluster settings through SCIM

is there a way to set the following cluster settings through the SCIM? I am not seeing anything in the API docs that would suggest it is possible but I want to double check here.Enable credential passthroughSingle User AccessPermission settings

Data Engineering

1365 Views
1 replies
0 kudos

06-22-2021 6:14:56 PM

View Replies

Latest Reply

Taha
Databricks Employee

06-23-2021 10:36:16 AM

0 kudos

Credential passthroughThis actually needs some setting up in AWS IAM to get started. Once you've created the right instance profiles, you'll need to add them to your Databricks workspace. There's pretty exhaustive guides here that has each of the ste...

0 kudos

06-23-2021 10:36:16 AM

by aladda • Databricks Employee

06-18-2021 11:49:39 AM

1370 Views
1 replies
0 kudos

Can databricks Jobs API invoke a notebook from object storage?

Data Engineering

1370 Views
1 replies
0 kudos

06-18-2021 11:49:39 AM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-23-2021 9:51:45 AM

0 kudos

Databricks jobs api can invoke code from cloud storage. But please note that it will not be stored as a Databricks notebook, rather it would be a source file or jar.

0 kudos

06-23-2021 9:51:45 AM

by User16789201666 • Databricks Employee

06-23-2021 8:15:30 AM

2762 Views
1 replies
0 kudos

How do you capture change logs from RDMS source and ingest the changes in Databricks AWS?

A common option is to use AWS DMS, https://databricks.com/blog/2019/07/15/migrating-transactional-data-to-a-delta-lake-using-aws-dms.html

Data Engineering

2762 Views
1 replies
0 kudos

06-23-2021 8:15:30 AM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-23-2021 9:25:54 AM

0 kudos

https://databricks.com/blog/2019/07/15/migrating-transactional-data-to-a-delta-lake-using-aws-dms.html

0 kudos

06-23-2021 9:25:54 AM

by alexott • Databricks Employee

06-18-2021 3:28:17 AM

6113 Views
2 replies
0 kudos

How I can test my Python code that I wrote using notebooks?

I've written the code in the notebooks using the Python, and I want to add tests to it to make sure that it won't break when I do more changes.What tools can I use for that tasks?

Data Engineering

6113 Views
2 replies
0 kudos

06-18-2021 3:28:17 AM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-23-2021 8:46:10 AM

0 kudos

@Alex Ott has an awesome answer!Here is a great blog from our engineering team that may help as well. https://databricks.com/blog/2020/01/16/automate-deployment-and-testing-with-databricks-notebook-mlflow.html

0 kudos

06-23-2021 8:46:10 AM

1 More Replies

by Mooune_DBU • Databricks Employee

06-23-2021 8:18:50 AM

1643 Views
0 replies
0 kudos

Spark vs. Ray?

Ray has been getting a lot of traction lately for shining at distributed compute.What are the primary differences between Spark and Ray?In which areas/applications would each be best ? (i.e. Reinforcement Learning)In which cases would it make sense f...

Data Engineering

1643 Views
0 replies
0 kudos

06-23-2021 8:18:50 AM

by User16789201666 • Databricks Employee

06-23-2021 8:18:10 AM

1014 Views
0 replies
0 kudos

How do you capture delta changes while building a delta pipeline and going from bronze to silver to gold?

You can use change data feed feature of delta tables as described here: https://docs.databricks.com/delta/delta-change-data-feed.html

Data Engineering

1014 Views
0 replies
0 kudos

06-23-2021 8:18:10 AM

Databricks Community

Forum Posts

How can I tell which runtime the model serving endpoints use?

Is there a way to add users to workspace programmatically (through API?) instead of going manually adding them through the Admin console?

Resolved! What is the difference between spark.sessionState.catalog.listTables vs spark.catalog.listTables

Resolved! How to list all Delta tables in a Database?

Resolved! When I save a Spark dataframe using df.write.format("csv"), I end up with mulitple csv files. Why is this happening?

Can we use Photon for batch and streaming process instead of Spark, when will be available for public?

I don't have Upsert/Merge use cases. Should I use Delta or can I use Parquet?

Resolved! How do we ingest the data from Salesforce into DeltaLake for any CRM analytics use case

What's the easiest way to deploy a Databricks workspace on AWS?

Setting cluster settings through SCIM

Can databricks Jobs API invoke a notebook from object storage?

How do you capture change logs from RDMS source and ingest the changes in Databricks AWS?

How I can test my Python code that I wrote using notebooks?

Spark vs. Ray?

How do you capture delta changes while building a delta pipeline and going from bronze to silver to gold?

Join Us as a Local Community Builder!

Delta live table not showing in workspace (Azure d...

Unable to install libraries from requirements.txt ...

Databricks Bundle Validation Error After CLI Upgra...

DABs with multi github sources

DLT Streaming With Watermark fails, suggesting I s...