cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gazzyjuruj
by Contributor II
  • 2716 Views
  • 1 replies
  • 4 kudos

Resolved! databricks_error_message: time out placing nodes

Hi, today i'm receiving this error:-databricks_error_message :Timed out while placing nodes. what should be done to fix it?

  • 2716 Views
  • 1 replies
  • 4 kudos
Latest Reply
User16764241763
Databricks Employee
  • 4 kudos

Hello @Ghazanfar Uruj​  This can happen for a bunch of reasons. Could you please file a support case with details, if the issue still persists?

  • 4 kudos
AmanSehgal
by Honored Contributor III
  • 5256 Views
  • 2 replies
  • 10 kudos

Migrating data from delta lake to RDS MySQL and ElasticSearch

There are mechanisms (like DMS) to get data from RDS to delta lake and store the data in parquet format, but is it possible to reverse of this in AWS?I want to send data from data lake to MySQL RDS tables in batch mode.And the next step is to send th...

  • 5256 Views
  • 2 replies
  • 10 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 10 kudos

@Kaniz Fatma​  and @Hubert Dudek​  - writing to MySQL RDS is relatively simpler. I'm finding ways to export data into Elasticsearch

  • 10 kudos
1 More Replies
kjoth
by Contributor II
  • 1838 Views
  • 0 replies
  • 0 kudos

Unmanaged Table - Newly added data directories are not reflected in the table We have created an unmanaged table with partitions on the dbfs location, using SQL. After creating the tables, via SQL we are running

We have created an unmanaged table with partitions on the dbfs location, using SQL.example: %sql CREATE TABLE EnterpriseDailyTrafficSummarytest(EnterpriseID String,ServiceLocationID String, ReportDate String ) USING parquet PARTITIONED BY(ReportDate)...

  • 1838 Views
  • 0 replies
  • 0 kudos
Daba
by New Contributor III
  • 7429 Views
  • 3 replies
  • 5 kudos

Resolved! DLT+AutoLoader: where is the schema and checkpoint hide?

Hi, I'm exploring the DLT with AutoLoader feature and wondering where are the schema and checkpoint hide? I want to wipe these two to reset/reinitialize the flow but unlike the "regular" AutoLoader the checkpoint and schema folder are not there.Thank...

  • 7429 Views
  • 3 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 5 kudos

@Alexander Plepler​ , There is a storage option in pipeline settings - A path to a DBFS directory for storing checkpoints and tables created by the pipeline.Additionally, delta is registered in metastore, so the table schema is there.

  • 5 kudos
2 More Replies
Karthik1
by New Contributor II
  • 3901 Views
  • 2 replies
  • 0 kudos

Datab

Hi Databricks Team, I had given Databricks certified spark developer-Python exam on 15th April’22 and passed with 81.66% score but till now I didn’t receive my certificate or badge. I need to submit my badge to my employer. Kindly release my badge. T...

  • 3901 Views
  • 2 replies
  • 0 kudos
sannycse
by New Contributor II
  • 2918 Views
  • 2 replies
  • 3 kudos

Resolved! display password as shown in example using spark scala

Table has the following Columns:First_Name, Last_Name, Department_Id,Contact_No, Hire_DateDisplay the emplopyee First_name, Count of Characters in the firstname,password.Password should be first 4 letters of first name in lower case and the date and ...

  • 2918 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

@SANJEEV BANDRU​ , SELECT CONCAT(substring(First_Name, 0, 2) , substring(Hire_Date, 0, 2), substring(Hire_Date, 3, 2)) as password FROM table;If Hire_date is timestamp you may need to add date_format()

  • 3 kudos
1 More Replies
Syed1
by New Contributor III
  • 29469 Views
  • 7 replies
  • 13 kudos

Resolved! Python Graph not showing

Hi , I have run this code import matplotlib.pyplot as pltimport numpy as npplt.style.use('bmh')%matplotlib inlinex = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])p= plt.scatter(x, y)display command r...

  • 29469 Views
  • 7 replies
  • 13 kudos
Latest Reply
User16725394280
Databricks Employee
  • 13 kudos

@Syed Ubaid​  i tried with 7.3 LTS and its works fine.

  • 13 kudos
6 More Replies
Anonymous
by Not applicable
  • 13144 Views
  • 12 replies
  • 13 kudos

Resolved! Not able to run notebook even when cluster is running and databases/tables are not visible in "data" tab.

We are using Dataricks in AWS. i am not able to run a notebook even when cluster is running. When i run a cell, it returns "cancel". When i check the event log for the cluster, it shows "Metastore is down". Couldn't see any databases or tables that i...

Image Image Image
  • 13144 Views
  • 12 replies
  • 13 kudos
Latest Reply
User16753725182
Databricks Employee
  • 13 kudos

This means the network is fine, but something in the spark config is amiss.What are the DBR version and the hive version? Please check f you are using a compatible version.If you don't specify any version, it will take 1.3 and you wouldn't have to us...

  • 13 kudos
11 More Replies
p42af
by New Contributor
  • 8307 Views
  • 4 replies
  • 1 kudos

Resolved! rdd.foreachPartition() does nothing?

I expected the code below to print "hello" for each partition, and "world" for each record. But when I ran it the code ran but had no print outs of any kind. No errors either. What is happening here?%scala   val rdd = spark.sparkContext.parallelize(S...

  • 8307 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Is it lazy evaluated so you need to trigger action I guess

  • 1 kudos
3 More Replies
KC_1205
by Databricks Partner
  • 5308 Views
  • 2 replies
  • 3 kudos

Resolved! NumPy update 1.18-1.21

Hi all,I am planning to update the DB to 9.1 LTS from 7.3 LTS, corresponding NumPy version will be 1.19 and later would like to update 1.21 in the notebooks. At cluster I have Spark version related to the 9.1 LTS which will support 1.19 and notebook ...

  • 5308 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Kiran Chalasani​ ,According to the docs DBR 7.3 LTS comes with Numpy 1.18.1 https://docs.databricks.com/release-notes/runtime/7.3.html and DBR 9.1 LTS comes with Numpy 1.19.2 https://docs.databricks.com/release-notes/runtime/9.1.htmlIf you need t...

  • 3 kudos
1 More Replies
RKNutalapati
by Valued Contributor
  • 6905 Views
  • 4 replies
  • 3 kudos

Resolved! Copy CDF enabled delta table from one location to another by retaining history

I am currently doing some use case testing. I have to CLONE delta table with CDF enabled to a different S3 bucket. Deep clone doesn't meet the requirement. So I tried to copy the files using dbutils.fs.cp, it is copying all the versions but the tim...

  • 6905 Views
  • 4 replies
  • 3 kudos
ernijed
by New Contributor II
  • 9740 Views
  • 3 replies
  • 3 kudos

Resolved! Error in SQL statement: SparkFatalException. How to fix it?

When i try to execute sql query(2 joins) i get below message: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$a...

  • 9740 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

@Erni Jed​ , I tested, and your query is ok. So it has to be some other issue. Maybe you could try it on a smaller data set. Please analyze/debug also using SPARK UI.

  • 3 kudos
2 More Replies
Surendra
by New Contributor III
  • 12987 Views
  • 3 replies
  • 6 kudos

Resolved! Databricks notebook is taking 2 hours to write to /dbfs/mnt (blob storage). Same job is taking 8 minutes to write to /dbfs/FileStore. I would like to understand why write performance is different in both cases.

Problem statement:Source file format : .tar.gzAvg size: 10 mbnumber of tar.gz files: 1000Each tar.gz file contails around 20000 csv files.Requirement : Untar the tar.gz file and write CSV files to blob storage / intermediate storage layer for further...

databricks_write_to_dbfsMount databricks_write_to_dbfsMount
  • 12987 Views
  • 3 replies
  • 6 kudos
Latest Reply
Surendra
New Contributor III
  • 6 kudos

@Hubert Dudek​  Thanks for your suggestions.After creating storage account in same region as databricks I can see that performance is as expected.Now it is clear that issue is with /mnt/ location is being in different region than databricks. I would ...

  • 6 kudos
2 More Replies
Labels