Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16826987838 • Databricks Employee

06-18-2021 3:35:54 PM

1751 Views
1 replies
0 kudos

Is there a way to generate vanity urls for Azure Databricks workspaces?

Data Engineering

1751 Views
1 replies
0 kudos

06-18-2021 3:35:54 PM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-21-2021 5:29:45 AM

0 kudos

nope not possible you can set the name that shows up in azure portal though (but only when you’re first creating the WS i believe)

0 kudos

06-21-2021 5:29:45 AM

by User16826987838 • Databricks Employee

06-18-2021 3:46:54 PM

1531 Views
1 replies
0 kudos

Is it possible to prohibit a user from scheduling a notebook against an All Purpose cluster?

Data Engineering

1531 Views
1 replies
0 kudos

06-18-2021 3:46:54 PM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-21-2021 5:25:19 AM

0 kudos

Running Jobs configured to use an Existing All-Purpose Cluster is considered to be a Job client. If you set workload_type.clients.jobs = false on an all-purpose cluster, it can not be used to run jobs.

0 kudos

06-21-2021 5:25:19 AM

by aladda • Databricks Employee

06-19-2021 8:55:47 PM

11336 Views
1 replies
1 kudos

Resolved! What is the difference between a Temporary View and a Global Temporary View in Spark

Data Engineering

11336 Views
1 replies
1 kudos

06-19-2021 8:55:47 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:57:34 PM

1 kudos

In Databricks, each cluster creates an initial spark session. And each notebook creates a spark subsession within the same. A temporary View created in one notebook isn't accessible to others. If you need to share view across notebooks, you use Globa...

1 kudos

06-19-2021 8:57:34 PM

by aladda • Databricks Employee

06-19-2021 8:49:44 PM

4581 Views
1 replies
0 kudos

Resolved! What is the difference between coalesce and repartition when it comes to shuffle partitions in spark

Data Engineering

4581 Views
1 replies
0 kudos

06-19-2021 8:49:44 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:51:39 PM

0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order. Ex:- when you want to write-out a single CSV file output instea...

0 kudos

06-19-2021 8:51:39 PM

by aladda • Databricks Employee

06-19-2021 8:47:00 PM

2715 Views
1 replies
0 kudos

Resolved! Is there a way to disable magic commands such as %sh?

Data Engineering

2715 Views
1 replies
0 kudos

06-19-2021 8:47:00 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:47:20 PM

0 kudos

You can disable magic commands for a workspace but not on a per-user basis. Additional details about Workspace ACLs can be found here - https://docs.databricks.com/security/access-control/workspace-acl.html

0 kudos

06-19-2021 8:47:20 PM

by aladda • Databricks Employee

06-19-2021 8:42:15 PM

8837 Views
1 replies
0 kudos

Resolved! What is the difference between a Narrow Transformation and Wide Transformation

Data Engineering

8837 Views
1 replies
0 kudos

06-19-2021 8:42:15 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:44:13 PM

0 kudos

Narrow Transformation: In Narrow transformation, all the elements that are required to compute the records in single partition live in the single partition of parent RDD. Ex:- Select, Filter, Union, Wide Transformation: Wide transformation, all the e...

0 kudos

06-19-2021 8:44:13 PM

by aladda • Databricks Employee

06-19-2021 8:31:09 PM

4802 Views
1 replies
0 kudos

Resolved! What is the difference between a Transformation and Action in Spark?

Data Engineering

4802 Views
1 replies
0 kudos

06-19-2021 8:31:09 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:37:50 PM

0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

0 kudos

06-19-2021 8:37:50 PM

by aladda • Databricks Employee

06-08-2021 1:14:12 PM

22122 Views
2 replies
0 kudos

Resolved! What's the difference between %run vs dbutils.notebook.run

Data Engineering

22122 Views
2 replies
0 kudos

06-08-2021 1:14:12 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:29:47 PM

0 kudos

%run is copying code from another notebook and executing it within the one its called from. All variables defined in the notebook being called are therefore visible to the caller notebook dbutils.notebook.run() is more around executing different note...

0 kudos

06-19-2021 8:29:47 PM

1 More Replies

by aladda • Databricks Employee

05-28-2021 12:23:24 PM

77396 Views
2 replies
1 kudos

Resolved! What is Z-ordering in Delta and what are some best practices on using it?

Data Engineering

77396 Views
2 replies
1 kudos

05-28-2021 12:23:24 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:25:11 PM

1 kudos

Z-Ordering is a technique to colocate related information in the same set of files. This co-locality is automatically used by Delta Lake on Databricks data-skipping algorithms to dramatically reduce the amount of data that needs to be read. Syntax fo...

1 kudos

06-19-2021 8:25:11 PM

1 More Replies

by aladda • Databricks Employee

06-04-2021 1:35:09 PM

5126 Views
2 replies
0 kudos

Resolved! Can you provide steps for connecting to Tableau?

Data Engineering

5126 Views
2 replies
0 kudos

06-04-2021 1:35:09 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-19-2021 8:24:20 PM

0 kudos

Download & Install the Databricks ODBC DriverGet the hostname, port, HTTP Path as described here – there’s slightly different steps for cluster (DDE) or SQL endpoint (DSQL)Get a PAT tokenUse the curl command to validate the network settings using the...

0 kudos

06-19-2021 8:24:20 PM

1 More Replies

by User15787040559 • Databricks Employee

06-07-2021 9:30:41 AM

4539 Views
1 replies
0 kudos

How can I create from scratch a brand new Dataframe with Null values using spark.createDataFrame()?

from pyspark.sql.types import * schema = StructType([ StructField("c1", IntegerType(), True), StructField("c2", StringType(), True), StructField("c3", StringType(), True)]) df = spark.createDataFrame([(1, "2", None), (3, "4", None)], schema)

Data Engineering

4539 Views
1 replies
0 kudos

06-07-2021 9:30:41 AM

View Replies

Latest Reply

Mooune_DBU
Databricks Employee

06-18-2021 5:09:07 PM

0 kudos

df = spark.createDataFrame(sc.emptyRDD(), schema)Can you try this?

0 kudos

06-18-2021 5:09:07 PM

by User16826994223 • Databricks Employee

06-08-2021 4:54:32 AM

4622 Views
1 replies
1 kudos

Resolved! cluster start Issues

Some of the Jobs are failing in prod with below error message:Can you please check and let us know the reason for this? These are running under pool cluster.Run result unavailable: job failed with error messageUnexpected failure while waiting for the...

Data Engineering

4622 Views
1 replies
1 kudos

06-08-2021 4:54:32 AM

View Replies

Latest Reply

Mooune_DBU
Databricks Employee

06-18-2021 4:58:39 PM

1 kudos

@Kunal Gaurav , This status code only occurs in one of two conditions:We’re able to request the instances for the cluster but can’t bootstrap them in time We setup the containers on each instance, but can’t start the containers in timethis is an edg...

1 kudos

06-18-2021 4:58:39 PM

by RonanStokes_DB • Databricks Employee

06-08-2021 10:21:02 AM

2190 Views
1 replies
0 kudos

How do i create a custom encoder to enable use of custom data objects with pyspark datasets?

Data Engineering

2190 Views
1 replies
0 kudos

06-08-2021 10:21:02 AM

View Replies

Latest Reply

Mooune_DBU
Databricks Employee

06-18-2021 4:51:09 PM

0 kudos

Can you elaborate mode what you mean by "Encoder" (is it a serializing mechanism), what are the custom data objects? pyspark does support complex and binary formats as long as you can write your own serializer/deserializer.

0 kudos

06-18-2021 4:51:09 PM

by User16826987838 • Databricks Employee

06-18-2021 3:33:07 PM

2265 Views
2 replies
1 kudos

Prevent file downloads from /files/ URL

I would like to prevent file download via /files/ URL. For example: https://customer.databricks.com/files/some-file-in-the-filestore.txtIs there a way to do this?

Data Engineering

2265 Views
2 replies
1 kudos

06-18-2021 3:33:07 PM

View Replies

Latest Reply

Mooune_DBU
Databricks Employee

06-18-2021 4:46:13 PM

1 kudos

Unfortunately this is not possible from the platform.You can however use an external Web Application Firewall (e.g. Akmai) to filter all web traffic to your workspaces. This can block both Web access to download root bucket data.

1 kudos

06-18-2021 4:46:13 PM

1 More Replies

by jose_gonzalez • Databricks Employee

06-18-2021 4:27:00 PM

4011 Views
1 replies
1 kudos

Resolved! Are there any limitations on my broadcast joins?

I would like to know if there are any broadcast joins limitations.

Data Engineering

4011 Views
1 replies
1 kudos

06-18-2021 4:27:00 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-18-2021 4:28:12 PM

1 kudos

Yes, there are a couple limitation. Please find below the details:> It will not perform broadcast join if the table has 512 million or more rows > It will not perform broadcast join if the table is larger than 8GB

1 kudos

06-18-2021 4:28:12 PM

Databricks Community

Forum Posts

Is there a way to generate vanity urls for Azure Databricks workspaces?

Is it possible to prohibit a user from scheduling a notebook against an All Purpose cluster?

Resolved! What is the difference between a Temporary View and a Global Temporary View in Spark

Resolved! What is the difference between coalesce and repartition when it comes to shuffle partitions in spark

Resolved! Is there a way to disable magic commands such as %sh?

Resolved! What is the difference between a Narrow Transformation and Wide Transformation

Resolved! What is the difference between a Transformation and Action in Spark?

Resolved! What's the difference between %run vs dbutils.notebook.run

Resolved! What is Z-ordering in Delta and what are some best practices on using it?

Resolved! Can you provide steps for connecting to Tableau?

How can I create from scratch a brand new Dataframe with Null values using spark.createDataFrame()?

Resolved! cluster start Issues

How do i create a custom encoder to enable use of custom data objects with pyspark datasets?

Prevent file downloads from /files/ URL

Resolved! Are there any limitations on my broadcast joins?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template