by
Callum
• New Contributor II
- 13010 Views
- 3 replies
- 2 kudos
So, I have this code for merging dataframes with pyspark pandas. And I want the index of the left dataframe to persist throughout the joins. So following suggestions from others wanting to keep the index after merging, I set the index to a column bef...
- 13010 Views
- 3 replies
- 2 kudos
Latest Reply
Hi!I tried debugging your code and I think that the error you get is simply because the column exists in two instances of your dataframe within your loop.I tried adding some extra debug lines in your merge_dataframes function:and after executing that...
2 More Replies
- 2040 Views
- 2 replies
- 0 kudos
Hi,I am using databricks with AWS.I need to capture events such as Start, Stop and Terminate of cluster and perform some other action based on the events that happened on the cluster.Is there a way I can achieve this in databricks?
- 2040 Views
- 2 replies
- 0 kudos
Latest Reply
Hi Daniel, thanks for the responseI would like to know if we can capture the event logs as shown in the image below when an event occurs on the cluster.
1 More Replies
by
KVNARK
• Honored Contributor II
- 15546 Views
- 2 replies
- 5 kudos
What and all we can implement maximum to attain the best optimization and which are all the best practices using PySpark end to end.
- 15546 Views
- 2 replies
- 5 kudos
Latest Reply
@KVNARK . This video is cool.https://www.youtube.com/watch?v=daXEp4HmS-E
1 More Replies
- 4199 Views
- 3 replies
- 2 kudos
I have installed "com.databricks:spark-xml_2.12:0.16.0" maven libraries to a cluster. The installation was successful. But when I restart the cluster, even this successful installation becomes failed. This happens with all Maven Libraries. Here is th...
- 4199 Views
- 3 replies
- 2 kudos
Latest Reply
it is intermittent issue, we also faced this issue earlier ,try to upgrade DBR version
2 More Replies
- 2062 Views
- 2 replies
- 0 kudos
We would like to know how to check what jobs cluster to have to expand disk.
- 2062 Views
- 2 replies
- 0 kudos
Latest Reply
You can check in the cluster's event logs. You can type in the search box, "disk" and you will see all the events there.
1 More Replies
by
SS2
• Valued Contributor
- 2060 Views
- 2 replies
- 1 kudos
Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.
- 2060 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @S S,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details
1 More Replies
by
rocky5
• New Contributor III
- 2542 Views
- 1 replies
- 2 kudos
I created a simple definition of delta live table smth like:CREATE OR REFRESH STREAMING LIVE TABLE customers_silverAS SELECT * FROM STREAM(LIVE.customers_bronze)But I am getting an error when running a pipeline:com.databricks.sql.transaction.tahoe.De...
- 2542 Views
- 1 replies
- 2 kudos
Latest Reply
You might need to execute the following on your tables to avoid this error message ALTER TABLE <table_name> SET TBLPROPERTIES ( 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name' )Docs https...
by
BL
• New Contributor III
- 5157 Views
- 4 replies
- 3 kudos
I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...
- 5157 Views
- 4 replies
- 3 kudos
Latest Reply
Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...
3 More Replies
by
jagac
• New Contributor
- 1338 Views
- 2 replies
- 0 kudos
Hi there, I recently made an account on the Community Edition and cannot seem to log in. Error says the following:Invalid email address or passwordNote: Emails/usernames are case-sensitiveSo I tried to reset my password and still could not log in. I ...
- 1338 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @jagac petrovic Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not res...
1 More Replies
- 3593 Views
- 3 replies
- 1 kudos
Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...
- 3593 Views
- 3 replies
- 1 kudos
Latest Reply
some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...
2 More Replies
- 4060 Views
- 4 replies
- 1 kudos
Hi team,Could you please confirm below clarifications1. How can we get the list of tables created by a user in particular workspace?2. How can we get the list of tables created by user from multiple workspaces? ( Same user has access to 10 workspace...
- 4060 Views
- 4 replies
- 1 kudos
Latest Reply
Hi Arun, hope your query is answered. Please select the best answer or let us know if any further questions.
3 More Replies
- 1699 Views
- 1 replies
- 0 kudos
So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:'array of objects' -> [ {} ,{} ,{} ] It's an 'array of arrays of objects' -> [ [ {}, {} ,{} ], [ {} ,{}...
- 1699 Views
- 1 replies
- 0 kudos
Latest Reply
You can use the explode function to flatten the array to rows, can you post a simple example of your data?
- 1383 Views
- 2 replies
- 0 kudos
I have a cdc enabled table where no data changes were made since July 28. Then updates started occurring from November 22 onwards. The first checkpoint occurred on Nov 28. Based on the corresponding timestamp of checkpoint and log files, it looks lik...
- 1383 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Laval Liahkim, could you please try running the VACUUM with 30 days retention?Please confirm when you last run the cmd with the 30-day retention period. Also, when you created this table and do you see old data files were deleted?Also, when disk...
1 More Replies
- 2682 Views
- 3 replies
- 0 kudos
So I'm the designated data engineer for a proof of concept we're running, I'm working with one infrastructure guy who's setting up everything in Terraform (company policy). He's got the setup down for Databricks so we can configure clusters and run n...
- 2682 Views
- 3 replies
- 0 kudos
Latest Reply
@Espen Solvang - Just thought of checking with you, could you please let us know if you require further assistance on this?
2 More Replies
- 3605 Views
- 2 replies
- 0 kudos
In Databricks on AWS, I am trying to run a streaming query (trigger=Once) with delta.enableChangeDataFeed=true in the table definition as instructed, but this always fails with :ERROR: Some streams terminated before this command could finish!
com.d...
- 3605 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @daniel e Can you try running the select command on table changes from 0th version and see if you get output?SELECT * FROM table_changes('tableName', 0)Also, Please share the streaming query that you are running.
1 More Replies