Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16826994223 • Databricks Employee

06-17-2021 8:38:12 AM

1490 Views
1 replies
0 kudos

I understand Spark Streaming uses micro-batching. Does this increase latency?

Data Engineering

1490 Views
1 replies
0 kudos

06-17-2021 8:38:12 AM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-17-2021 8:38:31 AM

0 kudos

While Spark does use a micro-batch execution model, this does not have much impact on applications, because the batches can be as short as 0.5 seconds. In most applications of streaming big data, the analytics is done over a larger window (say 10 min...

0 kudos

06-17-2021 8:38:31 AM

by User16826994223 • Databricks Employee

06-17-2021 7:51:43 AM

792 Views
0 replies
0 kudos

Why Unity Catalouge ?Fine-grained permissions: Unity Catalog can enforce permissions for data at the row, column or view level instead of the file lev...

Why Unity Catalouge ?Fine-grained permissions: Unity Catalog can enforce permissions for data at the row, column or view level instead of the file level, so that you can always share just part of your data with a new user without copying it.An open, ...

Data Engineering

792 Views
0 replies
0 kudos

06-17-2021 7:51:43 AM

by User16826994223 • Databricks Employee

06-17-2021 12:28:38 AM

2482 Views
1 replies
0 kudos

Resolved! what are the join hints, available in spark 3.0, and how does it help compare to pervious spark version

what are the join hints, available in spark 3.0, and how does it help compare to pervious spark version

Data Engineering

2482 Views
1 replies
0 kudos

06-17-2021 12:28:38 AM

View Replies

Latest Reply

Srikanth_Gupta_
Databricks Employee

06-17-2021 7:38:57 AM

0 kudos

4 types of join hints in Spark 3.0BROADCASTMERGESHUFFLE_HASHSHUFFLE_REPLICATE_NLMay be good idea to enable Adaptive Query Execution which speeds up Spark SQL join during run timeIn Spark 3.0, Adaptive Query Execution comes with below featuresDynamica...

0 kudos

06-17-2021 7:38:57 AM

by User16826994223 • Databricks Employee

06-17-2021 6:17:45 AM

2187 Views
1 replies
0 kudos

How is the photon engine different to catalyst optimizer

Data Engineering

2187 Views
1 replies
0 kudos

06-17-2021 6:17:45 AM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-17-2021 6:19:23 AM

0 kudos

I got this question from some customers and I want ti clarify here tooI think we are conflating two things:Catalyst optimizer is about coming up "Steps to take to execute the query". For example, the optimizer will decide how and when to do the join...

0 kudos

06-17-2021 6:19:23 AM

by User16826994223 • Databricks Employee

06-17-2021 1:46:26 AM

721 Views
0 replies
0 kudos

databricks.com

Can I create a Delta Lake table on Databricks and query it with open-source Spark?Yes, in order to do this, you would install Open Source Spark and Delta Lake, both are open source. Delta Engine, which is only available on Databricks, will make delta...

Data Engineering

721 Views
0 replies
0 kudos

06-17-2021 1:46:26 AM

by User16826994223 • Databricks Employee

06-17-2021 12:48:37 AM

719 Views
0 replies
0 kudos

Data scientist Job Profile will be relevant in the future? By seeing current features in Databricks like AUTO ML, I am assuming that should the Data ...

Data scientist Job Profile will be relevant in the future?By seeing current features in Databricks like AUTO ML, I am assuming that should the Data scientist job will be mostly automated and sooner the data scientist in the company will start decl...

Data Engineering

719 Views
0 replies
0 kudos

06-17-2021 12:48:37 AM

by User16826994223 • Databricks Employee

06-17-2021 12:26:20 AM

853 Views
0 replies
0 kudos

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new int...

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new interface can also be used for the existing Grouped Aggregate Pandas UDFs. In addition, the old Pandas U...

Data Engineering

853 Views
0 replies
0 kudos

06-17-2021 12:26:20 AM

by User16826994223 • Databricks Employee

06-17-2021 12:23:41 AM

1032 Views
0 replies
0 kudos

Cluster Sizees on DB sql Cluster size Driver size Worker count 2X-Small i3.2xlarge 1 X-Small i3.2xlarge 2 Small i3.4xlarge 4 Med...

Cluster Sizees on DB sql Cluster size Driver size Worker count 2X-Small i3.2xlarge 1 X-Small i3.2xlarge 2 Small i3.4xlarge 4 Medium i3.8xlarge 8 Large i3.8xlarge 16 X-Large i3.16xlarge 32 2X-Large i3.16xlarge...

Data Engineering

1032 Views
0 replies
0 kudos

06-17-2021 12:23:41 AM

by User16826994223 • Databricks Employee

06-17-2021 12:17:51 AM

1341 Views
0 replies
0 kudos

Muti Cluster Load balancing Multi-cluster Load Balancing: the minimum and maximum number of clusters over which queries sent to the endpoint are distr...

Muti Cluster Load balancingMulti-cluster Load Balancing: the minimum and maximum number of clusters over which queries sent to the endpoint are distributed. The default is Off with a maximum of 1 cluster. When set to On, the default is minimum 1 clus...

Data Engineering

1341 Views
0 replies
0 kudos

06-17-2021 12:17:51 AM

by User16826992666 • Databricks Employee

06-16-2021 8:32:28 PM

939 Views
0 replies
0 kudos

When would I want to change the Isolation Level of my Delta table?

I read this article in the docs about isolation levels, but I am not sure if I should be specifying this for my Delta tables. What situation would I want to change the isolation level from the default?

Data Engineering

939 Views
0 replies
0 kudos

06-16-2021 8:32:28 PM

by User16826992666 • Databricks Employee

06-16-2021 1:24:00 PM

2757 Views
1 replies
0 kudos

Can you restrict the type of clusters users are allowed to create?

I would like to make it so users can only create job clusters and not interactive clusters. Is it possible to do this in a workspace?

Data Engineering

2757 Views
1 replies
0 kudos

06-16-2021 1:24:00 PM

View Replies

Latest Reply

User16826992666
Databricks Employee

06-16-2021 1:26:05 PM

0 kudos

This can be accomplished with cluster policies. You can use a policy similar to this example to restrict certain users or groups to only have permission to create job clusters.

0 kudos

06-16-2021 1:26:05 PM

by jose_gonzalez • Databricks Employee

06-16-2021 11:35:56 AM

32845 Views
1 replies
0 kudos

Resolved! What's the difference between mode("append") and mode("overwrite") on my Delta table

I would like to know the difference between .mode("append") and .mode("overwrite") when writing my Delta table

Data Engineering

32845 Views
1 replies
0 kudos

06-16-2021 11:35:56 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-16-2021 11:37:23 AM

0 kudos

Mode "append" atomically adds new data to an existing Delta table and "overwrite" atomically replaces all of the data in a table.

0 kudos

06-16-2021 11:37:23 AM

by jose_gonzalez • Databricks Employee

06-16-2021 11:32:17 AM

3487 Views
1 replies
0 kudos

Resolved! Where does the schema for a Delta table set reside?

I would like to know where can I find the current schema information from my Delta table.

Data Engineering

3487 Views
1 replies
0 kudos

06-16-2021 11:32:17 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-16-2021 11:33:15 AM

0 kudos

The table name, path, database info are stored in Hive metastore, the actual schema is stored in the "_delta_log" directory that should be in the root path location where you Delta table is stored.

0 kudos

06-16-2021 11:33:15 AM

by jose_gonzalez • Databricks Employee

06-16-2021 11:27:27 AM

7935 Views
1 replies
0 kudos

Resolved! How can I read a specific Delta table part file?

is there a way to read a specific part off a delta table? When I try to read the parquet file as parquet I get an error in the notebook that I’m using the incorrect format as it’s part of a delta table. I just want to read a single Parquet file, not ...

Data Engineering

7935 Views
1 replies
0 kudos

06-16-2021 11:27:27 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-16-2021 11:29:17 AM

0 kudos

Disable Delta format to read as Parquet you need to set to false the following Spark settings:>> SET spark.databricks.delta.formatCheck.enabled=false OR>> spark.conf.set("spark.databricks.delta.formatCheck.enabled", "false")its not recommended to re...

0 kudos

06-16-2021 11:29:17 AM

by jose_gonzalez • Databricks Employee

06-16-2021 11:17:46 AM

3752 Views
1 replies
0 kudos

Resolved! should I run ANALYZE TABLE on Delta tables?

I would like to know if it recommended to run Analyze table on Delta tables or not. If not, why?

Data Engineering

3752 Views
1 replies
0 kudos

06-16-2021 11:17:46 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-16-2021 11:19:44 AM

0 kudos

You can run ANALYZE TABLE on Delta tables only on Databricks Runtime 8.3 and above. For more details please refer to the docs: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-analyze-table.html

0 kudos

06-16-2021 11:19:44 AM

Databricks Community

Forum Posts

I understand Spark Streaming uses micro-batching. Does this increase latency?

Why Unity Catalouge ?Fine-grained permissions: Unity Catalog can enforce permissions for data at the row, column or view level instead of the file lev...

Resolved! what are the join hints, available in spark 3.0, and how does it help compare to pervious spark version

How is the photon engine different to catalyst optimizer

databricks.com

Data scientist Job Profile will be relevant in the future? By seeing current features in Databricks like AUTO ML, I am assuming that should the Data ...

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new int...

Cluster Sizees on DB sql Cluster size Driver size Worker count 2X-Small i3.2xlarge 1 X-Small i3.2xlarge 2 Small i3.4xlarge 4 Med...

Muti Cluster Load balancing Multi-cluster Load Balancing: the minimum and maximum number of clusters over which queries sent to the endpoint are distr...

When would I want to change the Isolation Level of my Delta table?

Can you restrict the type of clusters users are allowed to create?

Resolved! What's the difference between mode("append") and mode("overwrite") on my Delta table

Resolved! Where does the schema for a Delta table set reside?

Resolved! How can I read a specific Delta table part file?

Resolved! should I run ANALYZE TABLE on Delta tables?

Join Us as a Local Community Builder!

SQL Stored Procedures - Notebook to always run the...

Notebook dashboard export unavailable

Azure Data Factory and Photon

Quota Limit Exhausted Error when Creating Data Ing...

How do use Databricks Lakeflow Declarative Pipelin...