cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ImAbhishekTomar
by New Contributor III
  • 1480 Views
  • 2 replies
  • 1 kudos

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Hi,Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network join p in i.pr...

  • 1480 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Hi @Abhishek Tomar​ , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-sparkIf you want to have JSON imported directly by databricks/spark, please go with the below solution:SELECT ...

  • 1 kudos
1 More Replies
IgnacioCastinei
by New Contributor III
  • 38762 Views
  • 11 replies
  • 8 kudos

Resolved! Download a dbfs:/FileStore File to my Local Machine?

Hi all, I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all fil...

  • 38762 Views
  • 11 replies
  • 8 kudos
Latest Reply
CraigJ
New Contributor II
  • 8 kudos

works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>Note that this will prompt you for yo...

  • 8 kudos
10 More Replies
dewan
by New Contributor
  • 283 Views
  • 0 replies
  • 0 kudos

SIMEXBangladesh

SIMEX Bangladesh is one of the trusted construction company in Bangladesh, always striving to build a safe ecosystem in the construction industry.For more details: https://simex.com.bd/highway-construction-company-in-bangladesh/

  • 283 Views
  • 0 replies
  • 0 kudos
gideonvos
by New Contributor
  • 360 Views
  • 0 replies
  • 0 kudos

Databricks workspace API metadata

Hi, the API works great. However, when listing workspaces via API it would be great to also be able to get back extra metadata, for example, last modification date. Is this possible?

  • 360 Views
  • 0 replies
  • 0 kudos
User16826992666
by Valued Contributor
  • 1031 Views
  • 3 replies
  • 2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

  • 1031 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
sgannavaram
by New Contributor III
  • 6023 Views
  • 6 replies
  • 4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.  

  • 6023 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Srinivas Gannavaram​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

  • 4 kudos
5 More Replies
athjain
by New Contributor III
  • 1548 Views
  • 5 replies
  • 9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

  • 1548 Views
  • 5 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hey @Athlestan Jain​ Just checking in. Do you think you were able to find a solution to your problem from the above answers?  If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

  • 9 kudos
4 More Replies
Michael_Galli
by Contributor II
  • 2208 Views
  • 1 replies
  • 1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

  • 2208 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

  • 1 kudos
mroy
by New Contributor III
  • 1375 Views
  • 7 replies
  • 0 kudos

Resolved! Bug Report: "Unsubscribed from" emails for deleted jobs have bad templating

I guess someone inverted the tokens in the template, because the emails look like this:Subject: "[user@company.com] Unsubscribed from 'Job'"Body: "This job has been deleted by dbc-12345678-1234."But it should look like this instead:Subject: "[dbc-123...

  • 1375 Views
  • 7 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

The bug reported has been fixed and merged. It will be deployed in the next release, which is planned for tomorrow in the PST time zone. !!!! Thanks to @Marco Roy​ 

  • 0 kudos
6 More Replies
MartinB
by Contributor III
  • 11672 Views
  • 26 replies
  • 6 kudos

Resolved! Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Imagine the following setup:I have log files stored as JSON files partitioned by year, month, day and hour in physical folders:""" /logs |-- year=2020 |-- year=2021 `-- year=2022 |-- month=01 `-- month=02 |-- day=01 |-- day=.....

  • 11672 Views
  • 26 replies
  • 6 kudos
Latest Reply
MartinB
Contributor III
  • 6 kudos

@Kaniz Fatma​  could you maybe involve a Databricks expert?

  • 6 kudos
25 More Replies
PJ
by New Contributor III
  • 1656 Views
  • 10 replies
  • 0 kudos

Please bring back "Right Click > Clone" functionality within Databricks Repos! After this was removed, the best way to replicate this fun...

Please bring back "Right Click > Clone" functionality within Databricks Repos!After this was removed, the best way to replicate this functionality was to:Export the file in .dbc format Import the .dbc file back in. New file has a suffix of " (1)"As o...

  • 1656 Views
  • 10 replies
  • 0 kudos
Latest Reply
PJ
New Contributor III
  • 0 kudos

Hello! Just to update the group on this question, the clone right-click functionality is working again in Repos for me I believe this fix came with a new databricks upgrade on 2022-04-20 / 2022-04-21

  • 0 kudos
9 More Replies
hiral_jasani
by New Contributor
  • 216 Views
  • 0 replies
  • 0 kudos

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack  Do you have a lot of data that is stuck in your source systems? Data engineers...

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers too bottlenecked to build another ingest pipeline? Join us for a live, hands-on workshop on building...

Image
  • 216 Views
  • 0 replies
  • 0 kudos
PJ
by New Contributor III
  • 991 Views
  • 3 replies
  • 3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

  • 991 Views
  • 3 replies
  • 3 kudos
Latest Reply
PJ
New Contributor III
  • 3 kudos

Thank you @Hubert Dudek​ !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

  • 3 kudos
2 More Replies
Michael_Galli
by Contributor II
  • 7058 Views
  • 6 replies
  • 3 kudos

Resolved! com.microsoft.sqlserver.jdbc.SQLServerException:The driver could not establish a secure connection to SQL Server by using SSL encr. Error: "Unexpected rethrowing"

Hi all,there is a random error when pushing data from Databricks to a Azure SQL Database.Anyone else also had this problem? Any ideas are appreciated.See stacktrace attached.Target: Azure SQL Database, Standard S6: 400 DTUsDatabricks Cluster config:"...

  • 7058 Views
  • 6 replies
  • 3 kudos
Latest Reply
Michael_Galli
Contributor II
  • 3 kudos

@Pearl Ubaru​ TLS 1.1 is already deprecated.Are there any concerns from your side to set TLS 1.2 in the connection string?

  • 3 kudos
5 More Replies
Labels
Top Kudoed Authors