cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

arkadiuszr
by New Contributor III
  • 1303 Views
  • 4 replies
  • 1 kudos

Resolved! Failure during cluster launch

Hi all,I am migrating to Databricks E2 from older one. I moved the cluster definitions from the old databricks instance as well as creating new ones. Databricks tries to start a cluster for an hour and then fails. This happens for modes: Single Node ...

  • 1303 Views
  • 4 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you @Hubert Dudek​  for your fantastic response.

  • 1 kudos
3 More Replies
Constantine
by Contributor III
  • 12653 Views
  • 5 replies
  • 5 kudos

Resolved! How to provide UPSERT condition in PySpark

I have a table `demo_table_one` in which I want to upsert the following valuesdata = [   (11111 , 'CA', '2020-01-26'), (11111 , 'CA', '2020-02-26'), (88888 , 'CA', '2020-06-10'), (88888 , 'CA', '2020-05-10'), (88888 , 'WA', '2020-07-10'), ...

  • 12653 Views
  • 5 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @John Constantine​  , Just a friendly follow-up. Do you still need help, or do @Hubert Dudek (Customer)​ and @werners responses help you find the solution? Please let us know.

  • 5 kudos
4 More Replies
Hemanth998
by New Contributor
  • 1286 Views
  • 3 replies
  • 3 kudos
  • 1286 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Hemanth​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​ 's response help you to find the solution? Please let us know.

  • 3 kudos
2 More Replies
ernijed
by New Contributor II
  • 5381 Views
  • 4 replies
  • 3 kudos

Resolved! Error in SQL statement: SparkFatalException. How to fix it?

When i try to execute sql query(2 joins) i get below message: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$a...

  • 5381 Views
  • 4 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

@Erni Jed​ , I tested, and your query is ok. So it has to be some other issue. Maybe you could try it on a smaller data set. Please analyze/debug also using SPARK UI.

  • 3 kudos
3 More Replies
Surendra
by New Contributor III
  • 5029 Views
  • 5 replies
  • 8 kudos

Resolved! Databricks notebook is taking 2 hours to write to /dbfs/mnt (blob storage). Same job is taking 8 minutes to write to /dbfs/FileStore. I would like to understand why write performance is different in both cases.

Problem statement:Source file format : .tar.gzAvg size: 10 mbnumber of tar.gz files: 1000Each tar.gz file contails around 20000 csv files.Requirement : Untar the tar.gz file and write CSV files to blob storage / intermediate storage layer for further...

databricks_write_to_dbfsMount databricks_write_to_dbfsMount
  • 5029 Views
  • 5 replies
  • 8 kudos
Latest Reply
Kaniz
Community Manager
  • 8 kudos

Hi @Hubert Dudek​ , I Just wanted to thank you. We’re so lucky to have customers like you!The way you are helping our community is incredible.

  • 8 kudos
4 More Replies
sannycse
by New Contributor II
  • 859 Views
  • 2 replies
  • 3 kudos

Resolved! display password as shown in example using spark scala

Table has the following Columns:First_Name, Last_Name, Department_Id,Contact_No, Hire_DateDisplay the emplopyee First_name, Count of Characters in the firstname,password.Password should be first 4 letters of first name in lower case and the date and ...

  • 859 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

@SANJEEV BANDRU​ , SELECT CONCAT(substring(First_Name, 0, 2) , substring(Hire_Date, 0, 2), substring(Hire_Date, 3, 2)) as password FROM table;If Hire_date is timestamp you may need to add date_format()

  • 3 kudos
1 More Replies
Syed1
by New Contributor III
  • 8002 Views
  • 9 replies
  • 13 kudos

Resolved! Python Graph not showing

Hi , I have run this code import matplotlib.pyplot as pltimport numpy as npplt.style.use('bmh')%matplotlib inlinex = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])p= plt.scatter(x, y)display command r...

  • 8002 Views
  • 9 replies
  • 13 kudos
Latest Reply
User16725394280
Contributor II
  • 13 kudos

@Syed Ubaid​  i tried with 7.3 LTS and its works fine.

  • 13 kudos
8 More Replies
Anonymous
by Not applicable
  • 3464 Views
  • 12 replies
  • 13 kudos

Resolved! Not able to run notebook even when cluster is running and databases/tables are not visible in "data" tab.

We are using Dataricks in AWS. i am not able to run a notebook even when cluster is running. When i run a cell, it returns "cancel". When i check the event log for the cluster, it shows "Metastore is down". Couldn't see any databases or tables that i...

Image Image Image
  • 3464 Views
  • 12 replies
  • 13 kudos
Latest Reply
User16753725182
Contributor III
  • 13 kudos

This means the network is fine, but something in the spark config is amiss.What are the DBR version and the hive version? Please check f you are using a compatible version.If you don't specify any version, it will take 1.3 and you wouldn't have to us...

  • 13 kudos
11 More Replies
p42af
by New Contributor
  • 2951 Views
  • 4 replies
  • 1 kudos

Resolved! rdd.foreachPartition() does nothing?

I expected the code below to print "hello" for each partition, and "world" for each record. But when I ran it the code ran but had no print outs of any kind. No errors either. What is happening here?%scala   val rdd = spark.sparkContext.parallelize(S...

  • 2951 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Is it lazy evaluated so you need to trigger action I guess

  • 1 kudos
3 More Replies
RKNutalapati
by Valued Contributor
  • 2371 Views
  • 7 replies
  • 3 kudos

Resolved! Copy CDF enabled delta table from one location to another by retaining history

I am currently doing some use case testing. I have to CLONE delta table with CDF enabled to a different S3 bucket. Deep clone doesn't meet the requirement. So I tried to copy the files using dbutils.fs.cp, it is copying all the versions but the tim...

  • 2371 Views
  • 7 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Rama Krishna N​ , Because you're on Azure, you can use Azure Data Factory's Data Copy Tool as it's described in the documentation - delta tables are just files in the container, and this tool can copy data, and potentially it would be cheaper tha...

  • 3 kudos
6 More Replies
rohit2
by New Contributor
  • 436 Views
  • 0 replies
  • 0 kudos

getting this issue how to resolve

Run result unavailable: job failed with error message Unexpected failure while waiting for the cluster (0425-153803-z370dv77) to be ready.Cause Unexpected state for cluster (job-1136322-run-1778866): Init scripts failed. instance_id: i-00d2e3661a2420...

  • 436 Views
  • 0 replies
  • 0 kudos
Maverick1
by Valued Contributor II
  • 1832 Views
  • 5 replies
  • 7 kudos

How to deploy mlflow models to sagemaker endpoints where sagemaker refers the private docker registry?

Is it possible to deploy the mlflow model to a sagemaker endpoint where the image URL is not referring to an image in ECR but the image is actually present in a private docker registry?

  • 1832 Views
  • 5 replies
  • 7 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 7 kudos

@Saurabh Verma​ , this to create the endpoint.also, check this out - https://github.com/mlflow/mlflow/blob/0fa849ad75e5733bf76cc14a4455657c5c32f107/mlflow/sagemaker/__init__.py#L361

  • 7 kudos
4 More Replies
ANOOP_V
by New Contributor II
  • 1198 Views
  • 4 replies
  • 3 kudos

Resolved! DataBricks Job Orchestration in PROD

Can I suggest customer about databricks Job orchestration (public preview) ? Can we use this feature in Production as well?

  • 1198 Views
  • 4 replies
  • 3 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 3 kudos

@ANOOP V​ , At present, we don’t have these features. I assume we do plan to include it by Q4 FY2022.What is next for Multitask Jobs?After GA we will be working on some highly-requested features during the private preview:- Job cluster reuse: make it...

  • 3 kudos
3 More Replies
Jeff1
by Contributor II
  • 2370 Views
  • 4 replies
  • 4 kudos

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

I continues to receive a parsing error when attempting to convert lat/long data to a geohash in data bricks . I've tried two coding methods in R and get the same error.library(geohashTools)Method #1my_tbl$geo_hash <- gh_encode(my_tbl$Latitude, my_tbl...

  • 2370 Views
  • 4 replies
  • 4 kudos
Latest Reply
Jeff1
Contributor II
  • 4 kudos

The problem was I was trying to run the gh_encode function on a Spark dataframe. I needed to collect the date into a R dataframe then run the function.

  • 4 kudos
3 More Replies
Labels
Top Kudoed Authors