cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Therdpong
by New Contributor III
  • 2035 Views
  • 2 replies
  • 0 kudos

how to check what jobs cluster to have expanddisk.

We would like to know how to check what jobs cluster to have to expand disk.

  • 2035 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can check in the cluster's event logs. You can type in the search box, "disk" and you will see all the events there.

  • 0 kudos
1 More Replies
SS2
by Valued Contributor
  • 2047 Views
  • 2 replies
  • 1 kudos

Spark out of memory error.

Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.

  • 2047 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @S S​,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details

  • 1 kudos
1 More Replies
rocky5
by New Contributor III
  • 2518 Views
  • 1 replies
  • 2 kudos

Cannot create delta live table

I created a simple definition of delta live table smth like:CREATE OR REFRESH STREAMING LIVE TABLE customers_silverAS SELECT * FROM STREAM(LIVE.customers_bronze)But I am getting an error when running a pipeline:com.databricks.sql.transaction.tahoe.De...

  • 2518 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

You might need to execute the following on your tables to avoid this error message ALTER TABLE <table_name> SET TBLPROPERTIES ( 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name' )Docs https...

  • 2 kudos
BL
by New Contributor III
  • 5106 Views
  • 4 replies
  • 3 kudos

Error reading in Parquet file

I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...

  • 5106 Views
  • 4 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...

  • 3 kudos
3 More Replies
jagac
by New Contributor
  • 1316 Views
  • 2 replies
  • 0 kudos

Cannot log into Community Edition.

Hi there, I recently made an account on the Community Edition and cannot seem to log in. Error says the following:Invalid email address or passwordNote: Emails/usernames are case-sensitiveSo I tried to reset my password and still could not log in. I ...

  • 1316 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @jagac petrovic​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not res...

  • 0 kudos
1 More Replies
User16835756816
by Valued Contributor
  • 3515 Views
  • 3 replies
  • 1 kudos

How can I optimize my data pipeline?

Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...

  • 3515 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...

  • 1 kudos
2 More Replies
Arun_Kumar
by New Contributor II
  • 4008 Views
  • 4 replies
  • 1 kudos

List of Databricks tables created by a user

Hi team,​Could you please confirm below clarifications1. How can we get the list of tables created by a user in particular workspace?2. How can we get the list of tables created by user from multiple workspaces? ( Same user has access to 10 workspace...

  • 4008 Views
  • 4 replies
  • 1 kudos
Latest Reply
ashish1
New Contributor III
  • 1 kudos

Hi Arun, hope your query is answered. Please select the best answer or let us know if any further questions.

  • 1 kudos
3 More Replies
AndriusVitkausk
by New Contributor III
  • 1672 Views
  • 1 replies
  • 0 kudos

Reading multi-dimensional json files

So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:'array of objects' -> [ {} ,{} ,{} ] It's an 'array of arrays of objects' -> [ [ {}, {} ,{} ], [ {} ,{}...

  • 1672 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashish1
New Contributor III
  • 0 kudos

You can use the explode function to flatten the array to rows, can you post a simple example of your data?

  • 0 kudos
LavaLiah_85929
by New Contributor II
  • 1361 Views
  • 2 replies
  • 0 kudos

"desc history" shows versions older than the default logRetentionDuration of 30 days

I have a cdc enabled table where no data changes were made since July 28. Then updates started occurring from November 22 onwards. The first checkpoint occurred on Nov 28. Based on the corresponding timestamp of checkpoint and log files, it looks lik...

  • 1361 Views
  • 2 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Laval Liahkim​, could you please try running the VACUUM with 30 days retention?Please confirm when you last run the cmd with the 30-day retention period. Also, when you created this table and do you see old data files were deleted?Also, when disk...

  • 0 kudos
1 More Replies
espenol
by New Contributor III
  • 2651 Views
  • 3 replies
  • 0 kudos

How to debug Workflow Jobs timing out and DLT pipelines running forever?

So I'm the designated data engineer for a proof of concept we're running, I'm working with one infrastructure guy who's setting up everything in Terraform (company policy). He's got the setup down for Databricks so we can configure clusters and run n...

  • 2651 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@Espen Solvang​ - Just thought of checking with you, could you please let us know if you require further assistance on this?

  • 0 kudos
2 More Replies
nimble
by New Contributor
  • 3573 Views
  • 2 replies
  • 0 kudos

How can I run a streaming query on a new table with tbl property: change data feed enabled?

In Databricks on AWS, I am trying to run a streaming query (trigger=Once) with delta.enableChangeDataFeed=true in the table definition as instructed, but this always fails with :ERROR: Some streams terminated before this command could finish!   com.d...

  • 3573 Views
  • 2 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi @daniel e​ Can you try running the select command on table changes from 0th version and see if you get output?SELECT * FROM table_changes('tableName', 0)Also, Please share the streaming query that you are running.

  • 0 kudos
1 More Replies
Raghu_Bindingan
by New Contributor III
  • 3963 Views
  • 4 replies
  • 2 kudos

Truncate delta live table and try to repopulate it in the pipeline

Has anyone attempted to truncate a delta live gold level table that gets populated via a pipeline and then tried to repopulate it by starting the pipeline. I have this situation wherein i need to reprocess all data in my gold table, so i stopped the ...

  • 3963 Views
  • 4 replies
  • 2 kudos
Latest Reply
Rajeev45
Databricks Employee
  • 2 kudos

Please can you confirm if the job is still failing with the same error even after “FULL REFRESH ALL” option? If so please share the full stack trace and is it failing in any of the below steps?Creating updateWaiting for resourcesInitializingResetting...

  • 2 kudos
3 More Replies
DevOps88
by New Contributor II
  • 2194 Views
  • 2 replies
  • 3 kudos

Does exist the ability to run jobs with integration tests from the Databricks interface?

Currently, Nutter could be run inside a common CI/CD pipeline from GitLab, but need the possibility to run jobs with integration tests from the Databricks interface.How to use Nutter directly from Databricks?Does exist any integration test examples a...

  • 2194 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Dmitrii Kalashnikov​,You could find examples and more details here https://github.com/alexott/databricks-nutter-repos-demo

  • 3 kudos
1 More Replies
Trodenn
by New Contributor III
  • 4924 Views
  • 5 replies
  • 1 kudos

Resolved! ApprodxQuantile does not seem to be working with delta live tables (DLT)

HI,I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.Code is written as below:@dlt.table(name = "customer_order_silv...

Screenshot_20230130_053953
  • 4924 Views
  • 5 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.Of course, you need to set that customer_order_silver has a target location in the catalog, so read us...

  • 1 kudos
4 More Replies
guru1
by New Contributor II
  • 4368 Views
  • 2 replies
  • 0 kudos

Resolved! facing issue mentioned in body when connecting event hub with databricks , followed earlier discussion on this but no solution

ERROR: Query termination received for [id=37bada03-131b-4fbb-8992-a427263fef2c, runId=cf3d7c18-780e-43ae-aed0-9daf2939b823], with exception: java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit at java.util.Base64$Decoder...

  • 4368 Views
  • 2 replies
  • 0 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 0 kudos

The issue could be due to the mismatch in the eventHub jar and the dependencies added. Also, not all the required dependencies may be added.Suggestions:Using the azure_eventhubs_spark_2_12_.jar eventHub spark jar along with the following dependencies...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels
Latest Photos in Data Engineering