cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 1130 Views
  • 1 replies
  • 0 kudos
  • 1130 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

If the data in your table is huge, try to combine OPTIMIZE with WHERE so you only perform OPTIMIZE on a subset of the data rather than all data. see documentation here.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 4292 Views
  • 1 replies
  • 1 kudos

Z-order or Hilbert Curve, which is better

For Optimize on Delta table, there is support for 2 spatial curve algorithms. Which is better. Which one to choose for my workload.

  • 4292 Views
  • 1 replies
  • 1 kudos
Latest Reply
amr
Databricks Employee
  • 1 kudos

The OPTIMIZE ZORDER operation now uses Hilbert space-filling curves by default. This approach provides better clustering characteristics than Z-order in higher dimensions. For Delta tables using OPTIMIZE ZORDER with many columns, Hilbert curves can s...

  • 1 kudos
MoJaMa
by Databricks Employee
  • 3927 Views
  • 1 replies
  • 1 kudos
  • 3927 Views
  • 1 replies
  • 1 kudos
Latest Reply
amr
Databricks Employee
  • 1 kudos

Yes, Databricks support instance pools that will come from your reserved instance from Microsoft (provided you have an agreement), make sure your instance is on-demand to benefit from that, the other way to get cheaper VMs is to use Spot instances, t...

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1931 Views
  • 1 replies
  • 0 kudos
  • 1931 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

Koalas lets you run your scikit-learn code, which typically runs on one node, to a cluster of multiple nodes, and all you need to do is to change the python import from scikit-learn to Koalas, and you will have an ML code that runs on multiple nodes ...

  • 0 kudos
User16826992666
by Valued Contributor
  • 2548 Views
  • 3 replies
  • 0 kudos

Is it possible to enable encryption in between worker nodes?

I have a security requirement to encrypt all data when it is in transit. I am wondering if there is a setting I can use to enable encryption of the data during shuffles between the worker nodes.

  • 2548 Views
  • 3 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

Inter-node encryption is a requirement for HIPPA compliance, reach out to your account management team and ask them for HIPPA compliant shards.

  • 0 kudos
2 More Replies
User16826994223
by Honored Contributor III
  • 4178 Views
  • 1 replies
  • 2 kudos
  • 4178 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 2 kudos

which are a special type of column whose values are automatically generated based on a user-specified function over other columns in the Delta table. You can use most built-in SQL functions to generate the values of these generated columns. For examp...

  • 2 kudos
User16826994223
by Honored Contributor III
  • 1601 Views
  • 0 replies
  • 0 kudos

Some of the limitation I see In docs of photon until now july 2021 is  Works on Delta and Parquet tables only for both read and write.Does not suppor...

Some of the limitation I see In docs of photon until now july 2021 is Works on Delta and Parquet tables only for both read and write.Does not support the following data types:MapArrayDoes not support window and sort operatorsDoes not support Spark S...

  • 1601 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 2561 Views
  • 1 replies
  • 1 kudos
  • 2561 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

# providing a starting version spark.readStream.format("delta") \ .option("readChangeFeed", "true") \ .option("startingVersion", 0) \ .table("myDeltaTable")   # providing a starting timestamp spark.readStream.format("delta") \ .option("readCh...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 2240 Views
  • 1 replies
  • 0 kudos
  • 2240 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

ou must explicitly enable the change data feed option using one of the following methods:New table: Set the table property  delta.enableChangeDataFeed = true in the CREATE TABLE command.CREATE TABLE student (id INT, name STRING, age INT) TBLPROPERTIE...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1729 Views
  • 1 replies
  • 1 kudos

Resolved! prerequisite for SCIM provisioning

Hi Team Want to know what is the prerequisite for provisioning SCIM provisisoning in Azure

  • 1729 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Your Azure Databricks account must have the Azure Databricks Premium Plan.Your Azure Active Directory account must be a Premium edition account.You must be a global administrator for the Azure Active Directory account.

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1406 Views
  • 1 replies
  • 0 kudos
  • 1406 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Design to honor API and other limits of the platform.• Max API calls/ hr = 1500• Jobs per hour per workspace = 1000• Maximum concurrent Notebooks per cluster = 145

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels