Community Platform Discussions

by zyang • Contributor

07-17-2023 6:13:01 AM

7163 Views
5 replies
3 kudos

Sync the production data in environment into test environment

Hello,I have a database called sales which contain several delta tables and views in both production and test workspace. But the data is not synced because some people develop the code in test workspace. As time passed, both the data and the tables i...

Community Platform Discussions

Reply

7163 Views
5 replies
3 kudos

07-17-2023 6:13:01 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

07-18-2023 4:14:31 AM

3 kudos

Hi @zyang, To sync data and tables/views between production and test workspaces in Azure, the recommended approach is to use the Databricks Sync (DBSync) project, which is an object synchronization tool that backs up, restores, and syncs Databricks ...

3 kudos

07-18-2023 4:14:31 AM

4 More Replies

by Chris_Shehu • Valued Contributor III

07-27-2023 12:38:26 PM

732 Views
0 replies
0 kudos

Feature Request: GUI: Additional Collapse options

When you're using a very large notebook sometimes it gets frustrating scrolling through all the code blocks. It would be nice to have a few additional options to make this easier. 1) Add a collapse all code cells button to the top.2) Add a collapse a...

Community Platform Discussions

Enhancement

Feature

GUI

Request

Reply

732 Views
0 replies
0 kudos

07-27-2023 12:38:26 PM

by Oliver_Angelil • Valued Contributor II

07-27-2023 4:12:46 AM

2390 Views
2 replies
2 kudos

Resolved! Confirmation that Ingestion Time Clustering is applied

The article on Ingestion Time Clustering mentions that "Ingestion Time Clustering is enabled by default on Databricks Runtime 11.2", however how can I confirm is it active for my table? For example, is there a:True/False "Ingestion Time Clustered" fl...

Community Platform Discussions

Reply

2390 Views
2 replies
2 kudos

07-27-2023 4:12:46 AM

View Replies

Latest Reply

Oliver_Angelil
Valued Contributor II

07-27-2023 7:52:45 AM

2 kudos

Thanks @NandiniN, that was very helpful. I have 3 follow-up questions:If I already have a table (350GB) that has been partitioned by 3 columns: Year, Month, Day, and stored in the hive-style with subdirectories: Year=X/Month=Y/Day=Z, can I read it in...

2 kudos

07-27-2023 7:52:45 AM

1 More Replies

by Dekova • New Contributor II

07-27-2023 5:05:30 AM

1651 Views
1 replies
1 kudos

Resolved! Photon and UDF efficiency

When using a JVM engine, Scala UDFs have an advantage over Python UDFs because data doesn't have to be shifted out to the Python environment for processing. If I understand the implications of using the Photon C++ engine, any processing that needs to...

Community Platform Discussions

Reply

1651 Views
1 replies
1 kudos

07-27-2023 5:05:30 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-27-2023 6:55:19 AM

1 kudos

Photon does not support UDFs:https://learn.microsoft.com/en-us/azure/databricks/runtime/photon#limitationsSo when creating a UDF, photon will not be used.

1 kudos

07-27-2023 6:55:19 AM

by Dekova • New Contributor II

07-27-2023 5:52:42 AM

459 Views
0 replies
0 kudos

Structured Streaming and Workplace Max Jobs

From the documentation: A workspace is limited to 1000 concurrent task runs. A 429 Too Many Requests response is returned when you request a run that cannot start immediately.The number of jobs a workspace can create in an hour is limited to 10000 (i...

Community Platform Discussions

Reply

459 Views
0 replies
0 kudos

07-27-2023 5:52:42 AM

by SSV_dataeng • New Contributor II

07-24-2023 3:25:13 AM

1005 Views
2 replies
0 kudos

Plot number of abandoned cart items by product

abandoned_carts_df = (email_carts_df.filter(col('converted') == False).filter(col('cart').isNotNull()))display(abandoned_carts_df) abandoned_items_df = (abandoned_carts_df.select(col("cart").alias("items")).groupBy("items").count())display(abandoned_...

Community Platform Discussions

Reply

1005 Views
2 replies
0 kudos

07-24-2023 3:25:13 AM

View Replies

Latest Reply

NandiniN
Honored Contributor

07-27-2023 5:16:42 AM

0 kudos

Hi @SSV_dataeng ,Try abandoned_items_df = (abandoned_carts_df.withColumn("items", explode("cart")).groupBy("items").count().sort("items"))

0 kudos

07-27-2023 5:16:42 AM

1 More Replies

by SSV_dataeng • New Contributor II

07-23-2023 8:36:24 AM

1386 Views
4 replies
0 kudos

write to Delta

spark.conf.set("spark.databricks.delta.properties.defaults.columnMapping.mode","name")products_output_path = DA.paths.working_dir + "/delta/products"products_df.write.format("delta").save(products_output_path) verify_files = dbutils.fs.ls(products_ou...

Community Platform Discussions

Reply

1386 Views
4 replies
0 kudos

07-23-2023 8:36:24 AM

View Replies

Latest Reply

NandiniN
Honored Contributor

07-27-2023 5:06:30 AM

0 kudos

Hi @SSV_dataeng ,Please check with this (you would have to indent it correctly for python)productsOutputPath = DA.workingDir + "/delta/products"(productsDF.write.format("delta").mode("overwrite").save(productsOutputPath))verify_files = dbutils.fs.ls(...

0 kudos

07-27-2023 5:06:30 AM

3 More Replies

by marchino • New Contributor II

07-26-2023 3:53:43 AM

3848 Views
4 replies
1 kudos

Can I change Service Principal's OAuth token's expiration date?

Hi,since I have to read from a Databricks table from an external API I created a Service Principal that would start a cluster and perform the operation, to authenticate the request on behalf of the Service Principal I generate the OAuth token followi...

Community Platform Discussions

Reply

3848 Views
4 replies
1 kudos

07-26-2023 3:53:43 AM

View Replies

Latest Reply

NandiniN
Honored Contributor

07-27-2023 4:22:32 AM

1 kudos

Hello @marchino Please check if this is of your interest https://kb.databricks.com/en_US/security/set-an-unlimited-lifetime-for-service-principal-access-token

1 kudos

07-27-2023 4:22:32 AM

3 More Replies

by Henrik • New Contributor III

07-26-2023 12:07:28 AM

2173 Views
3 replies
1 kudos

Data lineage on views

I do not know if this is intended behavior of data lineage but for me it is weird.When I create a view based on two tables the data lineage upstream looks correct. But when I replace the view to only use one of the tables, then data lineage upstream ...

Community Platform Discussions

Reply

2173 Views
3 replies
1 kudos

07-26-2023 12:07:28 AM

View Replies

Latest Reply

Henrik
New Contributor III

07-26-2023 11:21:24 PM

1 kudos

After some thoughts, i have come to this conclusion:Data lineage on views is working as one should expect. I strongly recommend that this feature is redesigned so it shows the result of the lastest view.

1 kudos

07-26-2023 11:21:24 PM

2 More Replies

by Chalki • New Contributor III

07-24-2023 11:22:24 PM

4376 Views
3 replies
0 kudos

Iterative read and writes cause java.lang.OutOfMemoryError: GC overhead limit exceeded

I have an iterative algorithm which read and writes a dataframe iteration trough a list with new partitions, like this: for p in partitions_list:df = spark.read.parquet("adls_storage/p")df.write.format("delta").mode("overwrite").option("partitionOver...

Community Platform Discussions

Reply

4376 Views
3 replies
0 kudos

07-24-2023 11:22:24 PM

View Replies

Latest Reply

Chalki
New Contributor III

07-26-2023 12:55:13 AM

0 kudos

@daniel_sahalI've attached the wrong snip/ Actually it is FULL GC Ergonomics, which was bothering me. Now I am attaching the correct snip. But as you said I scaled a bit. The thing I forgot to mention is that the table is wide - more than 300 column...

0 kudos

07-26-2023 12:55:13 AM

2 More Replies

by Dekova • New Contributor II

07-25-2023 5:28:31 PM

1963 Views
1 replies
3 kudos

Resolved! Using DeltaTable.merge() and generating surrogate keys on insert?

I'm using merge to upsert data into a table:DeltaTable.forName(DESTINATION_TABLE).as("target").merge(merge_df.as("source") ,"source.topic = target.topic and source.key = target.key").whenMatched().updateAll().whenNotMatched().insertAll().execute()Id ...

Community Platform Discussions

Reply

1963 Views
1 replies
3 kudos

07-25-2023 5:28:31 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

07-25-2023 10:36:32 PM

3 kudos

@Dekova 1) uuid() is non-deterministic meaning that it will give you different result each time you run this function2) Per the documentation "For Databricks Runtime 9.1 and above, MERGE operations support generated columns when you set spark.databri...

3 kudos

07-25-2023 10:36:32 PM

by Phani1 • Valued Contributor II

07-25-2023 2:17:08 AM

3591 Views
4 replies
1 kudos

Databricks Job Failure + Service now Integration

Hi Team,Could you please suggest how to raise the service now ticket, in case of Databricks job failure?Regards ,Phanindra

Community Platform Discussions

Reply

3591 Views
4 replies
1 kudos

07-25-2023 2:17:08 AM

View Replies

Latest Reply

Swastik_Mishra
New Contributor II

07-25-2023 6:53:33 AM

1 kudos

Hi @Phani1, You can use the webhook method to integrate Databricks job failure notifications with ServiceNow. This allows Databricks to send an HTTP POST request (webhook) to a designated endpoint in ServiceNow whenever a job fails. By doing so, you ...

1 kudos

07-25-2023 6:53:33 AM

3 More Replies

by kurtrm • New Contributor III

07-18-2023 9:22:37 AM

2691 Views
4 replies
0 kudos

Import dbfs file into workspace using Python SDK

Hello,I am looking to replicate the functionality provided by the databricks_cli Python package using the Python SDK. Previously, using the databricks_cli WorkspaceApi object, I could use the import_workspace or import_workspace_dir methods to move a...

Community Platform Discussions

Reply

2691 Views
4 replies
0 kudos

07-18-2023 9:22:37 AM

View Replies

Latest Reply

Kratik
New Contributor III

07-25-2023 6:37:34 AM

0 kudos

Even, I am looking for a way to bring files present in S3 to Workspace programmatically.

0 kudos

07-25-2023 6:37:34 AM

3 More Replies

by alesventus • Contributor

07-25-2023 6:06:16 AM

547 Views
0 replies
0 kudos

Big time differences in reading tables

When I read managed table in #databricks# i can see big differences in time spent. Small test table with just 2 records is once loaded in 3 seconds and another time in 30 seconds. Reading table_change for this tinny table took 15 minutes. Don't know ...

Community Platform Discussions

performance issue

Reply

547 Views
0 replies
0 kudos

07-25-2023 6:06:16 AM

by yzhang • New Contributor III

07-20-2023 4:30:57 PM

1758 Views
2 replies
4 kudos

Resolved! Is there a plan to support workflow jobs to be stored in a subfolder?

I have many workflow jobs created and they all in a flat list. Is there a way to create (kind of) sub folders that I can category my databricks workflow jobs into it (kind of organizer)...

Community Platform Discussions

Reply

1758 Views
2 replies
4 kudos

07-20-2023 4:30:57 PM

View Replies

Latest Reply

yzhang
New Contributor III

07-24-2023 8:16:11 AM

4 kudos

@Anonymous thanks for the suggestion. And thanks @Vinay_M_R a lot for answering the question. The solution mentioned is doable but less optimized way to do. Everyone in the team has to follow the same rules especially for shared jobs, and sometimes n...

4 kudos

07-24-2023 8:16:11 AM

1 More Replies

Databricks Community

Forum Posts

Sync the production data in environment into test environment

Feature Request: GUI: Additional Collapse options

Resolved! Confirmation that Ingestion Time Clustering is applied

Resolved! Photon and UDF efficiency

Structured Streaming and Workplace Max Jobs

Plot number of abandoned cart items by product

write to Delta

Can I change Service Principal's OAuth token's expiration date?

Data lineage on views

Iterative read and writes cause java.lang.OutOfMemoryError: GC overhead limit exceeded

Resolved! Using DeltaTable.merge() and generating surrogate keys on insert?

Databricks Job Failure + Service now Integration

Import dbfs file into workspace using Python SDK

Big time differences in reading tables

Resolved! Is there a plan to support workflow jobs to be stored in a subfolder?

Connect with Databricks Users in Your Area

Data loss after writing a transformed pyspark data...

Task runs missing from system database

Unity catalog implementation

capture return value from databricks job to local ...

How does coalesce works internally