cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826992666
by Valued Contributor
  • 1707 Views
  • 3 replies
  • 2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

  • 1707 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
sgannavaram
by New Contributor III
  • 8603 Views
  • 6 replies
  • 4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.  

  • 8603 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Srinivas Gannavaram​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

  • 4 kudos
5 More Replies
athjain
by New Contributor III
  • 2745 Views
  • 5 replies
  • 9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

  • 2745 Views
  • 5 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hey @Athlestan Jain​ Just checking in. Do you think you were able to find a solution to your problem from the above answers?  If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

  • 9 kudos
4 More Replies
Michael_Galli
by Contributor III
  • 3258 Views
  • 1 replies
  • 1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

  • 3258 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

  • 1 kudos
hiral_jasani
by New Contributor
  • 425 Views
  • 0 replies
  • 0 kudos

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack  Do you have a lot of data that is stuck in your source systems? Data engineers...

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers too bottlenecked to build another ingest pipeline? Join us for a live, hands-on workshop on building...

Image
  • 425 Views
  • 0 replies
  • 0 kudos
PJ
by New Contributor III
  • 1928 Views
  • 3 replies
  • 3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

  • 1928 Views
  • 3 replies
  • 3 kudos
Latest Reply
PJ
New Contributor III
  • 3 kudos

Thank you @Hubert Dudek​ !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

  • 3 kudos
2 More Replies
PJ
by New Contributor III
  • 3181 Views
  • 7 replies
  • 0 kudos

Please bring back "Right Click > Clone" functionality within Databricks Repos! After this was removed, the best way to replicate this fun...

Please bring back "Right Click > Clone" functionality within Databricks Repos!After this was removed, the best way to replicate this functionality was to:Export the file in .dbc format Import the .dbc file back in. New file has a suffix of " (1)"As o...

  • 3181 Views
  • 7 replies
  • 0 kudos
Latest Reply
PJ
New Contributor III
  • 0 kudos

Hello! Just to update the group on this question, the clone right-click functionality is working again in Repos for me I believe this fix came with a new databricks upgrade on 2022-04-20 / 2022-04-21

  • 0 kudos
6 More Replies
ImAbhishekTomar
by New Contributor III
  • 2665 Views
  • 1 replies
  • 1 kudos

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Hi,Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network join p in i.pr...

  • 2665 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Hi @Abhishek Tomar​ , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-sparkIf you want to have JSON imported directly by databricks/spark, please go with the below solution:SELECT ...

  • 1 kudos
MartinB
by Contributor III
  • 20147 Views
  • 16 replies
  • 3 kudos

Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Imagine the following setup:I have log files stored as JSON files partitioned by year, month, day and hour in physical folders:""" /logs |-- year=2020 |-- year=2021 `-- year=2022 |-- month=01 `-- month=02 |-- day=01 |-- day=.....

  • 20147 Views
  • 16 replies
  • 3 kudos
Latest Reply
MartinB
Contributor III
  • 3 kudos

@Kaniz Fatma​  could you maybe involve a Databricks expert?

  • 3 kudos
15 More Replies
Michael_Galli
by Contributor III
  • 9538 Views
  • 6 replies
  • 3 kudos

Resolved! com.microsoft.sqlserver.jdbc.SQLServerException:The driver could not establish a secure connection to SQL Server by using SSL encr. Error: "Unexpected rethrowing"

Hi all,there is a random error when pushing data from Databricks to a Azure SQL Database.Anyone else also had this problem? Any ideas are appreciated.See stacktrace attached.Target: Azure SQL Database, Standard S6: 400 DTUsDatabricks Cluster config:"...

  • 9538 Views
  • 6 replies
  • 3 kudos
Latest Reply
Michael_Galli
Contributor III
  • 3 kudos

@Pearl Ubaru​ TLS 1.1 is already deprecated.Are there any concerns from your side to set TLS 1.2 in the connection string?

  • 3 kudos
5 More Replies
JakeP
by New Contributor III
  • 1892 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to create a path under /Repos via API?

Trying to use Repos API to automate creation and updates to repos under paths not specific to a user, i.e. /Repos/Admin/<repo-name>. It seems that creating a repo via POST to /api/2.0/repos will fail if you don't include a path, and will also fail i...

  • 1892 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

https://docs.databricks.com/dev-tools/api/latest/workspace.html#mkdirs try through Workspace API:curl --netrc --request POST \ https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/workspace/mkdirs \ --header 'Accept: application/json' \ --dat...

  • 1 kudos
2 More Replies
mroy
by Contributor
  • 2629 Views
  • 3 replies
  • 0 kudos

Resolved! Bug Report: "Unsubscribed from" emails for deleted jobs have bad templating

I guess someone inverted the tokens in the template, because the emails look like this:Subject: "[user@company.com] Unsubscribed from 'Job'"Body: "This job has been deleted by dbc-12345678-1234."But it should look like this instead:Subject: "[dbc-123...

  • 2629 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

The bug reported has been fixed and merged. It will be deployed in the next release, which is planned for tomorrow in the PST time zone. !!!! Thanks to @Marco Roy​ 

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels