Data Engineering

Forum Posts

Sorted by:

by User16826994223 • Databricks Employee

06-23-2021 12:37:25 AM

1030 Views
1 replies
0 kudos

Koalas or Pyspark

Should I use PySpark’s DataFrame API or Koalas, Which one is recommended , is there any performance impact if i use koalas or little slower than pyspark API

Data Engineering

1030 Views
1 replies
0 kudos

06-23-2021 12:37:25 AM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-23-2021 12:38:24 AM

0 kudos

If you are already familiar with pandas and want to leverage Spark for big data, we recommend using Koalas. If you are learning Spark from ground up, we recommend you start with PySpark’s API.

0 kudos

06-23-2021 12:38:24 AM

by Anonymous • Not applicable

06-02-2021 4:38:45 PM

1246 Views
1 replies
0 kudos

Photon usage

How do I know how much of a query/job used Photon?

Data Engineering

1246 Views
1 replies
0 kudos

06-02-2021 4:38:45 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-23-2021 12:28:16 AM

0 kudos

If you are using Photon on Databricks SQLClick the Query History icon on the sidebar.Click the line containing the query you’d like to analyze.On the Query Details pop-up, click Execution Details.Look at the Task Time in Photon metric at the bottom.

0 kudos

06-23-2021 12:28:16 AM

by Anonymous • Not applicable

06-02-2021 4:42:04 PM

3328 Views
1 replies
0 kudos

Malformed Request Error Message

I received the following error when launching a workspace: MALFORMED_REQUEST: Failed storage configuration validation checks: PUT, LIST, DELETE. How do I fix this?

Data Engineering

3328 Views
1 replies
0 kudos

06-02-2021 4:42:04 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-23-2021 12:23:37 AM

0 kudos

Check the S3 bucket policy and region. It looks like storage config validation is failing

0 kudos

06-23-2021 12:23:37 AM

by Anonymous • Not applicable

06-02-2021 4:45:22 PM

1620 Views
1 replies
0 kudos

Failed E2 Workspace Error Message

My workspace ended up in a FAILED workspace state with one of the following messages:INVALID_STATE: The maximum number of VPCs has been reached.INVALID_STATE: The maximum number of VPC endpoints has been reached.INVALID_STATE: The maximum number of a...

Data Engineering

1620 Views
1 replies
0 kudos

06-02-2021 4:45:22 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-23-2021 12:21:05 AM

0 kudos

The maximum number of xxx has been reached indicates that you have hit the soft limits for some of the AWS resources in that region. These are mostly soft limits and you could file a request to AWS support team to increase this

0 kudos

06-23-2021 12:21:05 AM

by Anonymous • Not applicable

06-02-2021 4:46:25 PM

1756 Views
1 replies
0 kudos

E2 Workspace DNS Unreachable

My E2 workspace is in a RUNNING state, but the DNS is unreachable.

Data Engineering

1756 Views
1 replies
0 kudos

06-02-2021 4:46:25 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-23-2021 12:18:40 AM

0 kudos

Try deleting the RUNNING workspace, wait for 5-10 minutes, and recreate the same workspace. If that doesn't solve the problem, file a support ticket

0 kudos

06-23-2021 12:18:40 AM

by Anonymous • Not applicable

06-02-2021 4:48:47 PM

1644 Views
1 replies
0 kudos

E2 workspace - Error Message Malformed Request : Invalid xxx in the HTTP request body

I received one of the following errors: MALFORMED_REQUEST: Invalid xxx in the HTTP request body or MALFORMED_REQUEST: Invalid xxx in body, where xxx is credentials, storage configurations, networks, etc.

Data Engineering

1644 Views
1 replies
0 kudos

06-02-2021 4:48:47 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-23-2021 12:14:46 AM

0 kudos

It denotes that the input payload is not what is expected the api-endpoint for the e2 accounts api. Possible causes include typo in variable values or json formatting issues ( not providing quotes etc )

0 kudos

06-23-2021 12:14:46 AM

by Anonymous • Not applicable

06-16-2021 2:02:18 PM

7269 Views
1 replies
1 kudos

Where can I find driver logs? I would like to see historical logs

Data Engineering

7269 Views
1 replies
1 kudos

06-16-2021 2:02:18 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-23-2021 12:06:53 AM

1 kudos

To access these driver log files from the UI, you could go to the Driver Logs tab on the cluster details page. You can also configure a log delivery location for the cluster. Both worker and cluster logs are delivered to the location you specify.

1 kudos

06-23-2021 12:06:53 AM

by Anonymous • Not applicable

06-16-2021 2:04:18 PM

1949 Views
1 replies
0 kudos

If a user in Databricks is deleted, what happens to their user folders?

Data Engineering

1949 Views
1 replies
0 kudos

06-16-2021 2:04:18 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 11:58:42 PM

0 kudos

When you remove a user from Databricks, a special backup folder is created in the workspace. More details at https://kb.databricks.com/notebooks/get-notebooks-deleted-user.html

0 kudos

06-22-2021 11:58:42 PM

by Anonymous • Not applicable

06-17-2021 11:57:39 AM

1405 Views
1 replies
0 kudos

How can I figure out if my workspace is ST/MT/E2?

Data Engineering

1405 Views
1 replies
0 kudos

06-17-2021 11:57:39 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 11:23:09 PM

0 kudos

If you have some of the features like "secure cluster connectivity", "multi-workspace accounts" , chances are that the account is E2. Would recommend to check with your Databricks accounts team.

0 kudos

06-22-2021 11:23:09 PM

by Anonymous • Not applicable

06-17-2021 11:58:34 AM

1736 Views
1 replies
0 kudos

Saving charts in S3

Can you save the charts that you create in Databricks notebooks as an image file to s3? For example I'm viewing the dataframe using display(df) and then using the plot button to actually create the graph

Data Engineering

1736 Views
1 replies
0 kudos

06-17-2021 11:58:34 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 11:18:03 PM

0 kudos

Would rendering visualizations via plotly and saving them to s3 work ?

0 kudos

06-22-2021 11:18:03 PM

by Anonymous • Not applicable

06-22-2021 6:16:16 PM

2841 Views
1 replies
1 kudos

SQL Formatting

Is there a way for individual users to adjust the SQL formatting defaults?

Data Engineering

2841 Views
1 replies
1 kudos

06-22-2021 6:16:16 PM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-22-2021 11:13:51 PM

1 kudos

Do you want a different format than Databrick provide the formatting in Sql cell?

1 kudos

06-22-2021 11:13:51 PM

by User16765131552 • Databricks Employee

06-18-2021 1:07:26 PM

2077 Views
2 replies
0 kudos

Resolved! Does Azure Databricks and Delta Layer make it a Lakehouse?

Even after going through many resources, I have failed to understand what constitutes a lakehouse, hence my question below.If we have Azure Gen 2 Storage, ADF, and Azure Databricks with the possibility of converting the incoming CSV files into Delta ...

Data Engineering

2077 Views
2 replies
0 kudos

06-18-2021 1:07:26 PM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-22-2021 11:09:07 PM

0 kudos

Lakehouse is a concept defined with the following Parameter-Data is stored in an open standard format.Data is stored in a way which support Data Science,ML and BI loads.Delta is just a way or engine on cloud storage that provides control on data and...

0 kudos

06-22-2021 11:09:07 PM

1 More Replies

by Anonymous • Not applicable

06-22-2021 7:39:16 PM

1950 Views
2 replies
1 kudos

What Databricks Runtime will I have to use if I want to leverage Python 2?

I have some code which is dependent on python 2. I am not able to use Python 2 with Databricks runtime 6.0.

Data Engineering

1950 Views
2 replies
1 kudos

06-22-2021 7:39:16 PM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-22-2021 10:49:34 PM

1 kudos

When you create a Databricks Runtime 5.5 LTS cluster by using the workspace UI, the default is Python 3. You have the option to specify Python 2. If you use the Databricks REST API to create a cluster using Databricks Runtime 5.5 LTS, the default is ...

1 kudos

06-22-2021 10:49:34 PM

1 More Replies

by User16826994223 • Databricks Employee

06-17-2021 12:54:41 AM

1651 Views
1 replies
0 kudos

How is the ETL process different than trigger once stream

I am little confused between what to use between structured stream(trigger once) and etl batch jobs, can I get help here on which basis i should make my decision.

Data Engineering

1651 Views
1 replies
0 kudos

06-17-2021 12:54:41 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 10:39:50 PM

0 kudos

In Structured Streaming, triggers are used to specify how often a streaming query should produce results. A RunOnce trigger will fire only once and then will stop the query - effectively running it like a batch job.Now, If your source data is a strea...

0 kudos

06-22-2021 10:39:50 PM

by User15787040559 • Databricks Employee

06-22-2021 3:31:30 PM

4672 Views
1 replies
0 kudos

What's the difference between Normalization and Standardization?

Normalization typically means rescales the values into a range of [0,1].Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).

Data Engineering

4672 Views
1 replies
0 kudos

06-22-2021 3:31:30 PM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-22-2021 10:37:08 PM

0 kudos

Normalization typically means rescales the values into a range of [0,1]. Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).A link which explains better is - https://towardsdatascience.com...

0 kudos

06-22-2021 10:37:08 PM

Databricks Community

Forum Posts

Koalas or Pyspark

Photon usage

Malformed Request Error Message

Failed E2 Workspace Error Message

E2 Workspace DNS Unreachable

E2 workspace - Error Message Malformed Request : Invalid xxx in the HTTP request body

Where can I find driver logs? I would like to see historical logs

If a user in Databricks is deleted, what happens to their user folders?

How can I figure out if my workspace is ST/MT/E2?

Saving charts in S3

SQL Formatting

Resolved! Does Azure Databricks and Delta Layer make it a Lakehouse?

What Databricks Runtime will I have to use if I want to leverage Python 2?

How is the ETL process different than trigger once stream

What's the difference between Normalization and Standardization?

Join Us as a Local Community Builder!

Resource Throttling; Large Merge Operation - Recen...

Databricks Asset Bundles - High Level Diagrams Flo...

Delta live table not showing in workspace (Azure d...

Unable to install libraries from requirements.txt ...

Databricks Bundle Validation Error After CLI Upgra...