cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BorislavBlagoev
by Valued Contributor III
  • 1554 Views
  • 2 replies
  • 2 kudos

Resolved! Converting dataframe to delta.

Is it possible to convert the dataframe to a delta table without saving the dataframe on the storage?

  • 1554 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

no, it will only be a delta table when writing it.

  • 2 kudos
1 More Replies
bluetail
by Contributor
  • 11443 Views
  • 6 replies
  • 5 kudos

Resolved! ModuleNotFoundError: No module named 'mlflow' when running a notebook

I am running a notebook on the Coursera platform.my configuration file, Classroom-Setup, looks like this:%python   spark.conf.set("com.databricks.training.module-name", "deep-learning") spark.conf.set("com.databricks.training.expected-dbr", "6.4")   ...

  • 11443 Views
  • 6 replies
  • 5 kudos
Latest Reply
User16753724663
Valued Contributor
  • 5 kudos

Hi @Maria Bruevich​ ,From the error description, it looks like the mlflow library is not present. You can use ML cluster as these type of cluster already have mlflow library. Please check the below document:https://docs.databricks.com/release-notes/r...

  • 5 kudos
5 More Replies
DanVartanian
by New Contributor II
  • 3160 Views
  • 4 replies
  • 1 kudos

Resolved! Help trying to calculate a percentage

The image below shows what my source data is (HAVE) and what I'm trying to get to (WANT).I want to be able to calculate the percentage of bad messages (where formattedMessage = false) by source and date.I'm not sure how to achieve this in DatabricksS...

havewant
  • 3160 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you could use a windows function over source and date with a sum of messagecount. This gives you the total per source/date repeated on every line.Then apply a filter on formattedmessage == false and divide messagecount by the sum above.

  • 1 kudos
3 More Replies
SettlerOfCatan
by New Contributor
  • 1510 Views
  • 0 replies
  • 0 kudos

Access data within the blob storage without downloading

Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service, like Databricks.We want to read and work with these files with a computing resource obtained by Azure directly without d...

blob-storage Azure-ML fileytypes blob
  • 1510 Views
  • 0 replies
  • 0 kudos
Azure_Data_Eng1
by New Contributor
  • 286 Views
  • 0 replies
  • 0 kudos

data=[['x', 20220118, 'FALSE', 3],['x', 20220118, 'TRUE', 97],['x', 20220119, 'FALSE', 1],['x'...

data=[['x', 20220118, 'FALSE', 3],['x', 20220118, 'TRUE', 97],['x', 20220119, 'FALSE', 1],['x', 20220119, 'TRUE', 49],['Y', 20220118, 'FALSE', 100],['Y', 20220118, 'TRUE', 900],['Y', 20220119, 'FALSE', 200],['Y', 20220119, 'TRUE', 800]]df=spark.creat...

  • 286 Views
  • 0 replies
  • 0 kudos
prasadvaze
by Valued Contributor
  • 3307 Views
  • 8 replies
  • 2 kudos

Resolved! SQL endpoint is unable to connect to external hive metastore ( Azure databricks)

Using Azure databricks, I have set up SQL Endpoint with the connection details that match with global init script. I am able to browse tables from regular cluster in Data Engineering module but i get below error when trying a query using SQL Endpoint...

  • 3307 Views
  • 8 replies
  • 2 kudos
Latest Reply
prasadvaze
Valued Contributor
  • 2 kudos

@Prabakar Ammeappin​  @Kaniz Fatma​  Also I found out that after delta table is created in external metastore (and the table data resides in ADLS) then in the sql end point settings I do not need to provide ADLS connection details. I only provided...

  • 2 kudos
7 More Replies
Soma
by Valued Contributor
  • 2110 Views
  • 3 replies
  • 1 kudos

Resolved! AutoLoader with Custom Queue

Hi Everyone can someone help with creating custom queue for auto loader as given here as default FlushwithClose event is not getting created when my data is uploaded to blob as given in azure DB docscloudFiles.queueNameThe name of the Azure queue. If...

  • 2110 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

you need to setup notification service for blob/adls like here https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-gen2.html#cloud-resource-managementsetUpNotificationServices will return queue name which later can be used in au...

  • 1 kudos
2 More Replies
prasadvaze
by Valued Contributor
  • 4710 Views
  • 2 replies
  • 4 kudos

Resolved! How to select from a very large column ( string ) of delta table ?

In one of my delta table , the string column "abc" has 1753484 characters long value (string) . I get an error while selecting or transforming this column value ( in the downstream application). How do I solve this? SELECT ID, abc, length(abc) as ...

  • 4710 Views
  • 2 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @prasad vaze​ , try using the CHAR_LENGTH function.

  • 4 kudos
1 More Replies
sonali1996
by New Contributor
  • 1914 Views
  • 2 replies
  • 0 kudos

Resolved! Multithreading in SCALA DATABRICKS

Hi Team, I was trying to call/run multiple notebooks in one notebook concurrent. But the caller notebooks are getting executing one by one whereas I need to run all the caller notebooks concurrently. I have also tried using Threading in Scala Databri...

  • 1914 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Sonali Bhatt​ , This documentation might help you :-https://databricks.com/blog/2016/08/30/notebook-workflows-the-easiest-way-to-implement-apache-spark-pipelines.html

  • 0 kudos
1 More Replies
Soma
by Valued Contributor
  • 1148 Views
  • 4 replies
  • 2 kudos

Resolved! Query RestAPI end point in Databricks Standard Workspace

Do we have option to query delta table using Standard Workspace as a endpoint instead of JDBC

  • 1148 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@somanath Sankaran​ - Would you be happy to mark @Hubert Dudek​'s answer as best if it resolved the problem? That helps other members who are searching for answers find the solution more quickly.

  • 2 kudos
3 More Replies
MattM
by New Contributor III
  • 1746 Views
  • 5 replies
  • 5 kudos

Resolved! Schema Parsing issue when datatype of source field is mapped incorrect

I have complex json file which has massive struct column. We regularly have issues when we try to parse this json file by forming our case class to extract the fields from schema. With this approach the issue we are facing is that if one data type of...

  • 1746 Views
  • 5 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hey there, @Matt M​ - If @Hubert Dudek​'s response solved the issue, would you be happy to mark his answer as best? It helps other members find the solution more quickly.

  • 5 kudos
4 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2408 Views
  • 9 replies
  • 3 kudos

Resolved! Tring to create incremental pipeline but fails when I try to use outputMode "update"

def upsertToDelta(microBatchOutputDF, batchId): microBatchOutputDF.createOrReplaceTempView("updates")   microBatchOutputDF._jdf.sparkSession().sql(""" MERGE INTO old o USING updates u ON u.id = o.id WHEN MATCHED THEN UPDATE SE...

  • 2408 Views
  • 9 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Delta table/file version is too old. Please try to upgrade it as described here https://docs.microsoft.com/en-us/azure/databricks/delta/versioning​

  • 3 kudos
8 More Replies
Alex_Persin
by New Contributor II
  • 3015 Views
  • 2 replies
  • 2 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 3015 Views
  • 2 replies
  • 2 kudos
Latest Reply
mstuder
New Contributor II
  • 2 kudos

Also interested in increasing shared memory for use with ray

  • 2 kudos
1 More Replies
hetadesai
by New Contributor II
  • 4385 Views
  • 3 replies
  • 4 kudos

Resolved! How to download zip file from SFTP location and put that file into Azure Data Lake and unzip there ?

I have zip file on SFTP location. I want to copy that file from SFTP location and put it into Azure Data lake and want to unzip there using spark notebook. Please help me to solve this.

  • 4385 Views
  • 3 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @heta desai​ , Did our suggestions help you?

  • 4 kudos
2 More Replies
Disney
by New Contributor II
  • 698 Views
  • 1 replies
  • 5 kudos

Resolved! We have hundreds of ETL process (Informatica) with a lot of logic pulling various data from applications into a relational db (Target DB). Can we use Delta Lake as the Target DB?

Hi DB Support,Can we use DB's Delta Lake as our Target DB? Here's our situation...We have hundreds of ETL jobs pulling from these Sources. (SAP, Siebel/Oracle, Cognos, Postgres) .Our ETL Process has all of the logic and our Target DB is an MPP syst...

  • 698 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi yes you can the best is to create sql endpoint in premium workspace and just write to delta lake as to sql. This is community forum not support. You can contact databricks via https://databricks.com/company/contact or via AWS, Azure if you have su...

  • 5 kudos
Labels
Top Kudoed Authors