cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Tahseen0354
by Valued Contributor
  • 10503 Views
  • 13 replies
  • 35 kudos

How do I compare cost between databricks gcp and azure databricks ?

I have a databricks job running in azure databricks. A similar job is also running in databricks gcp. I would like to compare the cost. If I assign a custom tag to the job cluster running in azure databricks, I can see the cost incurred by that job i...

  • 10503 Views
  • 13 replies
  • 35 kudos
Latest Reply
Own
Contributor
  • 35 kudos

In Azure, you can use Cost Management to track your expenses incurred by Databricks instance.

  • 35 kudos
12 More Replies
ossinova
by Contributor II
  • 1370 Views
  • 1 replies
  • 0 kudos

Schedule reload of system.information_schema for external tables in platform

Probably not feasible, but is there a way to update (via STORED PROCEDURE, FUNCTION or SQL query) the information schema of all external tables within Databricks. Last updated that I can see was when I converted the tables to Unity. From my understa...

  • 1370 Views
  • 1 replies
  • 0 kudos
Latest Reply
Own
Contributor
  • 0 kudos

You can try optimize and cache with the internal tables such as schema tables to fetch updated information.

  • 0 kudos
rammy
by Contributor III
  • 3014 Views
  • 3 replies
  • 11 kudos

How would i retrieve data JSON data with namespaces using spark SQL?

File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve record...

image.png image
  • 3014 Views
  • 3 replies
  • 11 kudos
Latest Reply
SS2
Valued Contributor
  • 11 kudos

I case of struct you can use (.) For extracting the value

  • 11 kudos
2 More Replies
allan-silva
by New Contributor III
  • 3868 Views
  • 3 replies
  • 4 kudos

Resolved! Can't create database - UnsupportedFileSystemException No FileSystem for scheme "dbfs"

I'm following a class "DE 3.1 - Databases and Tables on Databricks", but it is not possible create databases due to "AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.Unsupp...

  • 3868 Views
  • 3 replies
  • 4 kudos
Latest Reply
allan-silva
New Contributor III
  • 4 kudos

A colleague from my work figured out the problem: the cluster being used wasn't configured to use DBFS when running notebooks.

  • 4 kudos
2 More Replies
Shiva_Dsouz
by New Contributor II
  • 1854 Views
  • 1 replies
  • 1 kudos

How to get spark streaming metrics like input rows, processed rows and batch duration to Prometheus for monitoring

I have been reading this article https://www.databricks.com/session_na20/native-support-of-prometheus-monitoring-in-apache-spark-3-0 and it has been mentioned that we can get the spark streaming metrics like input rows, processing rate and batch dura...

  • 1854 Views
  • 1 replies
  • 1 kudos
Latest Reply
SS2
Valued Contributor
  • 1 kudos

I think you can use spark UI to see deep level details ​

  • 1 kudos
andalo
by New Contributor II
  • 2587 Views
  • 3 replies
  • 2 kudos

Databricks cluster failure

do you help me with the next error?MessageCluster terminated. Reason: Azure Vm Extension FailureHelpInstance bootstrap failed.Failure message: Cloud Provider Failure. Azure VM Extension stuck on transitioning state. Please try again later.VM extensio...

  • 2587 Views
  • 3 replies
  • 2 kudos
Latest Reply
SS2
Valued Contributor
  • 2 kudos

You can restart the cluster and check once.​

  • 2 kudos
2 More Replies
mickniz
by Contributor
  • 3998 Views
  • 6 replies
  • 10 kudos

What is the best way to take care of Drop and Rename a column in Schema evaluation.

I would need some suggestion from DataBricks Folks. As per documentation in Schema Evaluation for Drop and Rename Data is overwritten. Does it means we loose data (because I read data is not deleted but kind of staged). Is it possible to query old da...

  • 3998 Views
  • 6 replies
  • 10 kudos
Latest Reply
SS2
Valued Contributor
  • 10 kudos

Overwritte ​option will overwritte your data. If you want to change column name then you can first alter the delta table as per your need then you can append new data as well. So both problems you can resolve

  • 10 kudos
5 More Replies
Shirley
by New Contributor III
  • 8480 Views
  • 12 replies
  • 8 kudos

Cluster terminated after 120 mins and cannot restart

Last night the cluster was working properly, but this morning the cluster was terminated automatically and cannot be restarted. Got an error message under sparkUI: Could not find data to load UI for driver 5526297689623955253 in cluster 1125-062259-i...

  • 8480 Views
  • 12 replies
  • 8 kudos
Latest Reply
SS2
Valued Contributor
  • 8 kudos

Then can use.​

  • 8 kudos
11 More Replies
kodvakare
by New Contributor III
  • 5324 Views
  • 5 replies
  • 9 kudos

Resolved! How to write same code in different locations in the DB notebook?

The old version of the notebook had this feature, where you could Ctrl+click on different positions in a notebook cell to bring the cursor there, and type to update the code in both the positions like in JupyterLab. The newer version is awesome but s...

Old DataBricks version, update in multiple positions like Jupyter IDE image
  • 5324 Views
  • 5 replies
  • 9 kudos
Latest Reply
SS2
Valued Contributor
  • 9 kudos

Alt+click is working fine ​

  • 9 kudos
4 More Replies
SindhujaRaghupa
by New Contributor II
  • 9181 Views
  • 2 replies
  • 1 kudos

Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.NullPointerException

I have uploaded a csv file which have well formatted data and I was trying to use display(questions) where questions=spark.read.option("header","true").csv("/FileStore/tables/Questions.csv")This is throwing an error as follows:SparkException: Job abo...

  • 9181 Views
  • 2 replies
  • 1 kudos
Latest Reply
SS2
Valued Contributor
  • 1 kudos

You can use inferschema​

  • 1 kudos
1 More Replies
pkgltn
by New Contributor III
  • 999 Views
  • 0 replies
  • 0 kudos

Mounting a Azure Storage Account path on Databricks

Hi,I have a Databricks instance and I mounted the Azure Storage Account. When I run the following command, the output is ExecutionError: An error occurred while calling o1168.ls.: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util...

  • 999 Views
  • 0 replies
  • 0 kudos
Muthumk255
by New Contributor
  • 2008 Views
  • 2 replies
  • 0 kudos

Cannot sign in at databricks partner-academy portal

Hi thereI have used my company email to register an account for databricks learning .databricks.com a while back.Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at part...

  • 2008 Views
  • 2 replies
  • 0 kudos
Latest Reply
Harshjot
Contributor III
  • 0 kudos

Hi @Muthukrishnan Balasubramanian​ I got the same issue a while back what worked for me is registering using personal account on partner academy then later I changed my email to my work email. Not sure if it's the best way to sort the issue.

  • 0 kudos
1 More Replies
db-avengers2rul
by Contributor II
  • 2200 Views
  • 1 replies
  • 0 kudos

Resolved! zip file not able to import in workspace

Dear Team,Using the community edition when i tried to import a zip file it is always throwing some error

  • 2200 Views
  • 1 replies
  • 0 kudos
Latest Reply
db-avengers2rul
Contributor II
  • 0 kudos

Please refer to the error in the attachment my question is this restriction is only for community edition ? or also for premium account ?

  • 0 kudos
yang
by New Contributor II
  • 1519 Views
  • 1 replies
  • 2 kudos

Resolved! Error in DE 4.1 - DLT UI Walkthrough (from Data Engineering with Databricks v3 course)

I am working on Data Engineering with Databricks v3 course. In notebook DE 4.1 - DLT UI Walkthrough, I countered an error in cmd 11: DA.validate_pipeline_config(pipeline_language)The error message is: AssertionError: Expected the parameter "suite" to...

  • 1519 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

The DA validate function is just to check that you named the pipeline correctly, set up the correct number of workers, 0, and other configurations. The name and directory aren't crucial to the learning process. The goal is to get familiar with the ...

  • 2 kudos
eimis_pacheco
by Contributor
  • 1259 Views
  • 1 replies
  • 1 kudos

How to remove more than 4 byte characters using pyspark in databricks?

Hi community,We have the need of removing more than 4 byte characters using pyspark in databricks since these are not supported by amazon Redshift. Does someone know how can I accomplish this?Thank you very much in advanceRegards

  • 1259 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

assuming you are having a string type column in pyspark dataframe, one possible way could beidentify total number of characters for each value in column (say identify no of bytes taken by each character (say b)use substring() function to select first...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels