cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826987838
by Databricks Employee
  • 1751 Views
  • 1 replies
  • 0 kudos
  • 1751 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

nope not possible you can set the name that shows up in azure portal though (but only when you’re first creating the WS i believe)

  • 0 kudos
User16826987838
by Databricks Employee
  • 1531 Views
  • 1 replies
  • 0 kudos
  • 1531 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Running Jobs configured to use an Existing All-Purpose Cluster is considered to be a Job client. If you set workload_type.clients.jobs = false on an all-purpose cluster, it can not be used to run jobs.

  • 0 kudos
aladda
by Databricks Employee
  • 11336 Views
  • 1 replies
  • 1 kudos
  • 11336 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

In Databricks, each cluster creates an initial spark session. And each notebook creates a spark subsession within the same. A temporary View created in one notebook isn't accessible to others. If you need to share view across notebooks, you use Globa...

  • 1 kudos
aladda
by Databricks Employee
  • 4581 Views
  • 1 replies
  • 0 kudos
  • 4581 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.  Ex:- when you want to write-out a single CSV file output instea...

  • 0 kudos
aladda
by Databricks Employee
  • 2715 Views
  • 1 replies
  • 0 kudos
  • 2715 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You can disable magic commands for a workspace but not on a per-user basis. Additional details about Workspace ACLs can be found here - https://docs.databricks.com/security/access-control/workspace-acl.html

  • 0 kudos
aladda
by Databricks Employee
  • 8837 Views
  • 1 replies
  • 0 kudos
  • 8837 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Narrow Transformation: In Narrow transformation, all the elements that are required to compute the records in single partition live in the single partition of parent RDD. Ex:- Select, Filter, Union, Wide Transformation: Wide transformation, all the e...

  • 0 kudos
aladda
by Databricks Employee
  • 4802 Views
  • 1 replies
  • 0 kudos
  • 4802 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

  • 0 kudos
aladda
by Databricks Employee
  • 22122 Views
  • 2 replies
  • 0 kudos
  • 22122 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

%run is copying code from another notebook and executing it within the one its called from. All variables defined in the notebook being called are therefore visible to the caller notebook dbutils.notebook.run() is more around executing different note...

  • 0 kudos
1 More Replies
aladda
by Databricks Employee
  • 77396 Views
  • 2 replies
  • 1 kudos
  • 77396 Views
  • 2 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

Z-Ordering is a technique to colocate related information in the same set of files. This co-locality is automatically used by Delta Lake on Databricks data-skipping algorithms to dramatically reduce the amount of data that needs to be read. Syntax fo...

  • 1 kudos
1 More Replies
aladda
by Databricks Employee
  • 5126 Views
  • 2 replies
  • 0 kudos
  • 5126 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Download & Install the Databricks ODBC DriverGet the hostname, port, HTTP Path as described here â€“ there’s slightly different steps for cluster (DDE) or SQL endpoint (DSQL)Get a PAT tokenUse the curl command to validate the network settings using the...

  • 0 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 4539 Views
  • 1 replies
  • 0 kudos

How can I create from scratch a brand new Dataframe with Null values using spark.createDataFrame()?

from pyspark.sql.types import * schema = StructType([ StructField("c1", IntegerType(), True), StructField("c2", StringType(), True), StructField("c3", StringType(), True)]) df = spark.createDataFrame([(1, "2", None), (3, "4", None)], schema)

  • 4539 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 0 kudos

df = spark.createDataFrame(sc.emptyRDD(), schema)Can you try this?

  • 0 kudos
User16826994223
by Databricks Employee
  • 4622 Views
  • 1 replies
  • 1 kudos

Resolved! cluster start Issues

Some of the Jobs are failing in prod with below error message:Can you please check and let us know the reason for this? These are running under pool cluster.Run result unavailable: job failed with error messageUnexpected failure while waiting for the...

  • 4622 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 1 kudos

@Kunal Gaurav​ , This status code only occurs in one of two conditions:We’re able to request the instances for the cluster but can’t bootstrap them in time We setup the containers on each instance, but can’t start the containers in timethis is an edg...

  • 1 kudos
RonanStokes_DB
by Databricks Employee
  • 2190 Views
  • 1 replies
  • 0 kudos
  • 2190 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 0 kudos

Can you elaborate mode what you mean by "Encoder" (is it a serializing mechanism), what are the custom data objects? pyspark does support complex and binary formats as long as you can write your own serializer/deserializer.

  • 0 kudos
User16826987838
by Databricks Employee
  • 2265 Views
  • 2 replies
  • 1 kudos

Prevent file downloads from /files/ URL

I would like to prevent file download via  /files/ URL. For example: https://customer.databricks.com/files/some-file-in-the-filestore.txtIs there a way to do this?

  • 2265 Views
  • 2 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 1 kudos

Unfortunately this is not possible from the platform.You can however use an external Web Application Firewall (e.g. Akmai) to filter all web traffic to your workspaces.  This can block both Web access to download root bucket data.

  • 1 kudos
1 More Replies
jose_gonzalez
by Databricks Employee
  • 4011 Views
  • 1 replies
  • 1 kudos

Resolved! Are there any limitations on my broadcast joins?

I would like to know if there are any broadcast joins limitations.

  • 4011 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Yes, there are a couple limitation. Please find below the details:> It will not perform broadcast join if the table has 512 million or more rows > It will not perform broadcast join if the table is larger than 8GB

  • 1 kudos
Labels