cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Oliver_Angelil
by Valued Contributor II
  • 1840 Views
  • 2 replies
  • 2 kudos

Resolved! Confirmation that Ingestion Time Clustering is applied

The article on Ingestion Time Clustering mentions that "Ingestion Time Clustering is enabled by default on Databricks Runtime 11.2", however how can I confirm is it active for my table? For example, is there a:True/False "Ingestion Time Clustered" fl...

  • 1840 Views
  • 2 replies
  • 2 kudos
Latest Reply
Oliver_Angelil
Valued Contributor II
  • 2 kudos

Thanks @NandiniN, that was very helpful. I have 3 follow-up questions:If I already have a table (350GB) that has been partitioned by 3 columns: Year, Month, Day, and stored in the hive-style with subdirectories: Year=X/Month=Y/Day=Z, can I read it in...

  • 2 kudos
1 More Replies
Dekova
by New Contributor II
  • 1219 Views
  • 1 replies
  • 1 kudos

Resolved! Photon and UDF efficiency

When using a JVM engine, Scala UDFs have an advantage over Python UDFs because data doesn't have to be shifted out to the Python environment for processing. If I understand the implications of using the Photon C++ engine, any processing that needs to...

  • 1219 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Photon does not support UDFs:https://learn.microsoft.com/en-us/azure/databricks/runtime/photon#limitationsSo when creating a UDF, photon will not be used.

  • 1 kudos
Dekova
by New Contributor II
  • 333 Views
  • 0 replies
  • 0 kudos

Structured Streaming and Workplace Max Jobs

From the documentation: A workspace is limited to 1000 concurrent task runs. A 429 Too Many Requests response is returned when you request a run that cannot start immediately.The number of jobs a workspace can create in an hour is limited to 10000 (i...

  • 333 Views
  • 0 replies
  • 0 kudos
SSV_dataeng
by New Contributor II
  • 717 Views
  • 2 replies
  • 0 kudos

Plot number of abandoned cart items by product

abandoned_carts_df = (email_carts_df.filter(col('converted') == False).filter(col('cart').isNotNull()))display(abandoned_carts_df) abandoned_items_df = (abandoned_carts_df.select(col("cart").alias("items")).groupBy("items").count())display(abandoned_...

SSV_dataeng_0-1690194232666.png
  • 717 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Hi @SSV_dataeng ,Try abandoned_items_df = (abandoned_carts_df.withColumn("items", explode("cart")).groupBy("items").count().sort("items"))

  • 0 kudos
1 More Replies
SSV_dataeng
by New Contributor II
  • 964 Views
  • 4 replies
  • 0 kudos

write to Delta

spark.conf.set("spark.databricks.delta.properties.defaults.columnMapping.mode","name")products_output_path = DA.paths.working_dir + "/delta/products"products_df.write.format("delta").save(products_output_path) verify_files = dbutils.fs.ls(products_ou...

  • 964 Views
  • 4 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Hi @SSV_dataeng ,Please check with this (you would have to indent it correctly for python)productsOutputPath = DA.workingDir + "/delta/products"(productsDF.write.format("delta").mode("overwrite").save(productsOutputPath))verify_files = dbutils.fs.ls(...

  • 0 kudos
3 More Replies
marchino
by New Contributor II
  • 2751 Views
  • 4 replies
  • 1 kudos

Can I change Service Principal's OAuth token's expiration date?

Hi,since I have to read from a Databricks table from an external API I created a Service Principal that would start a cluster and perform the operation, to authenticate the request on behalf of the Service Principal I generate the OAuth token followi...

  • 2751 Views
  • 4 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

Hello @marchino Please check if this is of your interest https://kb.databricks.com/en_US/security/set-an-unlimited-lifetime-for-service-principal-access-token 

  • 1 kudos
3 More Replies
Henrik
by New Contributor III
  • 1439 Views
  • 3 replies
  • 1 kudos

Data lineage on views

I do not know if this is intended behavior of data lineage but for me it is weird.When I create a view based on two tables the data lineage upstream looks correct. But when I replace the view to only use one of the tables, then data lineage upstream ...

  • 1439 Views
  • 3 replies
  • 1 kudos
Latest Reply
Henrik
New Contributor III
  • 1 kudos

After some thoughts, i have come to this conclusion:Data lineage on views is working as one should expect. I strongly recommend that this feature is redesigned so it shows the result of the lastest view.

  • 1 kudos
2 More Replies
Chalki
by New Contributor III
  • 3419 Views
  • 3 replies
  • 0 kudos

Iterative read and writes cause java.lang.OutOfMemoryError: GC overhead limit exceeded

I have an iterative algorithm which read and writes a dataframe iteration trough a list with new partitions, like this: for p in partitions_list:df = spark.read.parquet("adls_storage/p")df.write.format("delta").mode("overwrite").option("partitionOver...

  • 3419 Views
  • 3 replies
  • 0 kudos
Latest Reply
Chalki
New Contributor III
  • 0 kudos

@daniel_sahalI've attached the wrong snip/ Actually it is FULL GC Ergonomics, which was bothering me. Now I am attaching the correct snip.  But as you said I scaled a bit. The thing I forgot to mention is that the table is wide - more than 300 column...

  • 0 kudos
2 More Replies
Dekova
by New Contributor II
  • 1406 Views
  • 1 replies
  • 3 kudos

Resolved! Using DeltaTable.merge() and generating surrogate keys on insert?

I'm using merge to upsert data into a table:DeltaTable.forName(DESTINATION_TABLE).as("target").merge(merge_df.as("source") ,"source.topic = target.topic and source.key = target.key").whenMatched().updateAll().whenNotMatched().insertAll().execute()Id ...

  • 1406 Views
  • 1 replies
  • 3 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 3 kudos

@Dekova 1) uuid() is non-deterministic meaning that it will give you different result each time you run this function2) Per the documentation "For Databricks Runtime 9.1 and above, MERGE operations support generated columns when you set spark.databri...

  • 3 kudos
Phani1
by Valued Contributor
  • 2684 Views
  • 4 replies
  • 1 kudos

Databricks Job Failure + Service now Integration

Hi Team,Could you please suggest how to raise the service now ticket, in case of Databricks job failure?Regards ,Phanindra

  • 2684 Views
  • 4 replies
  • 1 kudos
Latest Reply
Swastik_Mishra
New Contributor II
  • 1 kudos

Hi @Phani1, You can use the webhook method to integrate Databricks job failure notifications with ServiceNow. This allows Databricks to send an HTTP POST request (webhook) to a designated endpoint in ServiceNow whenever a job fails. By doing so, you ...

  • 1 kudos
3 More Replies
kurtrm
by New Contributor III
  • 2012 Views
  • 4 replies
  • 0 kudos

Import dbfs file into workspace using Python SDK

Hello,I am looking to replicate the functionality provided by the databricks_cli Python package using the Python SDK. Previously, using the databricks_cli WorkspaceApi object, I could use the import_workspace or import_workspace_dir methods to move a...

  • 2012 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kratik
New Contributor III
  • 0 kudos

Even, I am looking for a way to bring files present in S3 to Workspace programmatically. 

  • 0 kudos
3 More Replies
alesventus
by New Contributor III
  • 371 Views
  • 0 replies
  • 0 kudos

Big time differences in reading tables

When I read managed table in #databricks# i can see big differences in time spent. Small test table with just 2 records is once loaded in 3 seconds and another time in 30 seconds. Reading table_change for this tinny table took 15 minutes. Don't know ...

Community Discussions
performance issue
  • 371 Views
  • 0 replies
  • 0 kudos
yzhang
by New Contributor III
  • 1251 Views
  • 2 replies
  • 4 kudos

Resolved! Is there a plan to support workflow jobs to be stored in a subfolder?

I have many workflow jobs created and they all in a flat list. Is there a way to create (kind of) sub folders that I can category my databricks workflow jobs into it (kind of organizer)...

  • 1251 Views
  • 2 replies
  • 4 kudos
Latest Reply
yzhang
New Contributor III
  • 4 kudos

@Anonymous thanks for the suggestion. And thanks @Vinay_M_R a lot for answering the question. The solution mentioned is doable but less optimized way to do. Everyone in the team has to follow the same rules especially for shared jobs, and sometimes n...

  • 4 kudos
1 More Replies
GrahamBricks
by New Contributor
  • 1427 Views
  • 0 replies
  • 0 kudos

terraform jobs depends_on

I am attempting to automate Jobs creation using Databrick Terraform provider.    I have a number of task that will "depends_on" each other and am trying to use dynamic content to do this.  Each task name is stored in a string array so looping over th...

  • 1427 Views
  • 0 replies
  • 0 kudos
CraiMacl_23588
by New Contributor
  • 336 Views
  • 0 replies
  • 0 kudos

Init scripts in legacy workspace (pre-E2)

Hello,I've got a legacy workspace (not E2) and I am trying to move my cluster scoped init script to the workspace area (from DBFS). It doesn't seem to be possible to store a shell script in the workspace area (Accepted formats: .dbc, .scala, .py, .sq...

  • 336 Views
  • 0 replies
  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!