cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 8324 Views
  • 12 replies
  • 13 kudos

Resolved! Not able to run notebook even when cluster is running and databases/tables are not visible in "data" tab.

We are using Dataricks in AWS. i am not able to run a notebook even when cluster is running. When i run a cell, it returns "cancel". When i check the event log for the cluster, it shows "Metastore is down". Couldn't see any databases or tables that i...

Image Image Image
  • 8324 Views
  • 12 replies
  • 13 kudos
Latest Reply
User16753725182
Databricks Employee
  • 13 kudos

This means the network is fine, but something in the spark config is amiss.What are the DBR version and the hive version? Please check f you are using a compatible version.If you don't specify any version, it will take 1.3 and you wouldn't have to us...

  • 13 kudos
11 More Replies
p42af
by New Contributor
  • 5201 Views
  • 4 replies
  • 1 kudos

Resolved! rdd.foreachPartition() does nothing?

I expected the code below to print "hello" for each partition, and "world" for each record. But when I ran it the code ran but had no print outs of any kind. No errors either. What is happening here?%scala   val rdd = spark.sparkContext.parallelize(S...

  • 5201 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Is it lazy evaluated so you need to trigger action I guess

  • 1 kudos
3 More Replies
KC_1205
by New Contributor III
  • 2949 Views
  • 2 replies
  • 3 kudos

Resolved! NumPy update 1.18-1.21

Hi all,I am planning to update the DB to 9.1 LTS from 7.3 LTS, corresponding NumPy version will be 1.19 and later would like to update 1.21 in the notebooks. At cluster I have Spark version related to the 9.1 LTS which will support 1.19 and notebook ...

  • 2949 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Kiran Chalasani​ ,According to the docs DBR 7.3 LTS comes with Numpy 1.18.1 https://docs.databricks.com/release-notes/runtime/7.3.html and DBR 9.1 LTS comes with Numpy 1.19.2 https://docs.databricks.com/release-notes/runtime/9.1.htmlIf you need t...

  • 3 kudos
1 More Replies
RKNutalapati
by Valued Contributor
  • 4944 Views
  • 4 replies
  • 3 kudos

Resolved! Copy CDF enabled delta table from one location to another by retaining history

I am currently doing some use case testing. I have to CLONE delta table with CDF enabled to a different S3 bucket. Deep clone doesn't meet the requirement. So I tried to copy the files using dbutils.fs.cp, it is copying all the versions but the tim...

  • 4944 Views
  • 4 replies
  • 3 kudos
ernijed
by New Contributor II
  • 7705 Views
  • 3 replies
  • 3 kudos

Resolved! Error in SQL statement: SparkFatalException. How to fix it?

When i try to execute sql query(2 joins) i get below message: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$a...

  • 7705 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

@Erni Jed​ , I tested, and your query is ok. So it has to be some other issue. Maybe you could try it on a smaller data set. Please analyze/debug also using SPARK UI.

  • 3 kudos
2 More Replies
Surendra
by New Contributor III
  • 8970 Views
  • 3 replies
  • 6 kudos

Resolved! Databricks notebook is taking 2 hours to write to /dbfs/mnt (blob storage). Same job is taking 8 minutes to write to /dbfs/FileStore. I would like to understand why write performance is different in both cases.

Problem statement:Source file format : .tar.gzAvg size: 10 mbnumber of tar.gz files: 1000Each tar.gz file contails around 20000 csv files.Requirement : Untar the tar.gz file and write CSV files to blob storage / intermediate storage layer for further...

databricks_write_to_dbfsMount databricks_write_to_dbfsMount
  • 8970 Views
  • 3 replies
  • 6 kudos
Latest Reply
Surendra
New Contributor III
  • 6 kudos

@Hubert Dudek​  Thanks for your suggestions.After creating storage account in same region as databricks I can see that performance is as expected.Now it is clear that issue is with /mnt/ location is being in different region than databricks. I would ...

  • 6 kudos
2 More Replies
arkadiuszr
by New Contributor III
  • 2818 Views
  • 3 replies
  • 1 kudos

Resolved! Failure during cluster launch

Hi all,I am migrating to Databricks E2 from older one. I moved the cluster definitions from the old databricks instance as well as creating new ones. Databricks tries to start a cluster for an hour and then fails. This happens for modes: Single Node ...

  • 2818 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please check:CPU quotas, please request to increase them anyway https://go.aws/3EvY1fX and use pools to have better control as old instances can be there for a moment after termination,Check the network configuration. Maybe it is downloading somethin...

  • 1 kudos
2 More Replies
rohit2
by New Contributor
  • 759 Views
  • 0 replies
  • 0 kudos

getting this issue how to resolve

Run result unavailable: job failed with error message Unexpected failure while waiting for the cluster (0425-153803-z370dv77) to be ready.Cause Unexpected state for cluster (job-1136322-run-1778866): Init scripts failed. instance_id: i-00d2e3661a2420...

  • 759 Views
  • 0 replies
  • 0 kudos
Maverick1
by Valued Contributor II
  • 3234 Views
  • 3 replies
  • 6 kudos

How to deploy mlflow models to sagemaker endpoints where sagemaker refers the private docker registry?

Is it possible to deploy the mlflow model to a sagemaker endpoint where the image URL is not referring to an image in ECR but the image is actually present in a private docker registry?

  • 3234 Views
  • 3 replies
  • 6 kudos
Latest Reply
Atanu
Databricks Employee
  • 6 kudos

@Saurabh Verma​ , this to create the endpoint.also, check this out - https://github.com/mlflow/mlflow/blob/0fa849ad75e5733bf76cc14a4455657c5c32f107/mlflow/sagemaker/__init__.py#L361

  • 6 kudos
2 More Replies
IgnacioCastinei
by New Contributor III
  • 52984 Views
  • 9 replies
  • 5 kudos

Resolved! Download a dbfs:/FileStore File to my Local Machine?

Hi all, I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all fil...

  • 52984 Views
  • 9 replies
  • 5 kudos
Latest Reply
CraigJ
New Contributor II
  • 5 kudos

works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>Note that this will prompt you for yo...

  • 5 kudos
8 More Replies
dewan
by New Contributor
  • 692 Views
  • 0 replies
  • 0 kudos

SIMEXBangladesh

SIMEX Bangladesh is one of the trusted construction company in Bangladesh, always striving to build a safe ecosystem in the construction industry.For more details: https://simex.com.bd/highway-construction-company-in-bangladesh/

  • 692 Views
  • 0 replies
  • 0 kudos
gideonvos
by New Contributor
  • 725 Views
  • 0 replies
  • 0 kudos

Databricks workspace API metadata

Hi, the API works great. However, when listing workspaces via API it would be great to also be able to get back extra metadata, for example, last modification date. Is this possible?

  • 725 Views
  • 0 replies
  • 0 kudos
User16826992666
by Valued Contributor
  • 1962 Views
  • 3 replies
  • 2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

  • 1962 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
sgannavaram
by New Contributor III
  • 9798 Views
  • 6 replies
  • 4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.  

  • 9798 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Srinivas Gannavaram​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

  • 4 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels