Data Engineering

Forum Posts

Sorted by:

by ImAbhishekTomar • New Contributor III

04-21-2022 8:20:00 AM

1480 Views
2 replies
1 kudos

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Hi,Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network join p in i.pr...

Data Engineering

1480 Views
2 replies
1 kudos

04-21-2022 8:20:00 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-21-2022 11:24:36 AM

1 kudos

Hi @Abhishek Tomar , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-sparkIf you want to have JSON imported directly by databricks/spark, please go with the below solution:SELECT ...

1 kudos

04-21-2022 11:24:36 AM

1 More Replies

by IgnacioCastinei • New Contributor III

02-27-2018 3:15:21 PM

38762 Views
11 replies
8 kudos

Resolved! Download a dbfs:/FileStore File to my Local Machine?

Hi all, I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all fil...

Data Engineering

38762 Views
11 replies
8 kudos

02-27-2018 3:15:21 PM

View Replies

Latest Reply

CraigJ
New Contributor II

04-25-2022 12:57:39 AM

8 kudos

works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>Note that this will prompt you for yo...

8 kudos

04-25-2022 12:57:39 AM

10 More Replies

by dewan • New Contributor

04-24-2022 9:38:43 PM

283 Views
0 replies
0 kudos

SIMEXBangladesh

SIMEX Bangladesh is one of the trusted construction company in Bangladesh, always striving to build a safe ecosystem in the construction industry.For more details: https://simex.com.bd/highway-construction-company-in-bangladesh/

Data Engineering

283 Views
0 replies
0 kudos

04-24-2022 9:38:43 PM

by gideonvos • New Contributor

04-23-2022 11:00:41 PM

360 Views
0 replies
0 kudos

Databricks workspace API metadata

Hi, the API works great. However, when listing workspaces via API it would be great to also be able to get back extra metadata, for example, last modification date. Is this possible?

Data Engineering

360 Views
0 replies
0 kudos

04-23-2022 11:00:41 PM

by User16826992666 • Valued Contributor

06-25-2021 10:38:31 AM

1031 Views
3 replies
2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

Data Engineering

1031 Views
3 replies
2 kudos

06-25-2021 10:38:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:11:52 AM

2 kudos

Hi @Trevor Bishop Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

2 kudos

04-22-2022 7:11:52 AM

2 More Replies

by sgannavaram • New Contributor III

03-06-2022 9:27:22 PM

6023 Views
6 replies
4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.

Data Engineering

6023 Views
6 replies
4 kudos

03-06-2022 9:27:22 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:08:18 AM

4 kudos

Hey there @Srinivas Gannavaram Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

4 kudos

04-22-2022 7:08:18 AM

5 More Replies

by athjain • New Contributor III

03-07-2022 12:24:37 AM

1548 Views
5 replies
9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

Data Engineering

1548 Views
5 replies
9 kudos

03-07-2022 12:24:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:04:16 AM

9 kudos

Hey @Athlestan Jain Just checking in. Do you think you were able to find a solution to your problem from the above answers? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

9 kudos

04-22-2022 7:04:16 AM

4 More Replies

by Michael_Galli • Contributor II

04-22-2022 3:00:10 AM

2208 Views
1 replies
1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

Data Engineering

2208 Views
1 replies
1 kudos

04-22-2022 3:00:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-22-2022 3:16:05 AM

1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

1 kudos

04-22-2022 3:16:05 AM

by mroy • New Contributor III

04-15-2022 8:36:54 AM

1375 Views
7 replies
0 kudos

Resolved! Bug Report: "Unsubscribed from" emails for deleted jobs have bad templating

I guess someone inverted the tokens in the template, because the emails look like this:Subject: "[user@company.com] Unsubscribed from 'Job'"Body: "This job has been deleted by dbc-12345678-1234."But it should look like this instead:Subject: "[dbc-123...

Data Engineering

1375 Views
7 replies
0 kudos

04-15-2022 8:36:54 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-20-2022 10:23:51 AM

0 kudos

The bug reported has been fixed and merged. It will be deployed in the next release, which is planned for tomorrow in the PST time zone. !!!! Thanks to @Marco Roy

0 kudos

04-20-2022 10:23:51 AM

6 More Replies

by MartinB • Contributor III

02-13-2022 7:59:14 AM

11672 Views
26 replies
6 kudos

Resolved! Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Data Engineering

11672 Views
26 replies
6 kudos

02-13-2022 7:59:14 AM

View Replies

Latest Reply

MartinB
Contributor III

03-04-2022 7:51:13 AM

6 kudos

@Kaniz Fatma could you maybe involve a Databricks expert?

6 kudos

03-04-2022 7:51:13 AM

25 More Replies

by PJ • New Contributor III

02-03-2022 10:30:31 AM

1656 Views
10 replies
0 kudos

Please bring back "Right Click > Clone" functionality within Databricks Repos! After this was removed, the best way to replicate this fun...

Please bring back "Right Click > Clone" functionality within Databricks Repos!After this was removed, the best way to replicate this functionality was to:Export the file in .dbc format Import the .dbc file back in. New file has a suffix of " (1)"As o...

Data Engineering

1656 Views
10 replies
0 kudos

02-03-2022 10:30:31 AM

View Replies

Latest Reply

PJ
New Contributor III

04-21-2022 12:47:02 PM

0 kudos

Hello! Just to update the group on this question, the clone right-click functionality is working again in Repos for me I believe this fix came with a new databricks upgrade on 2022-04-20 / 2022-04-21

0 kudos

04-21-2022 12:47:02 PM

9 More Replies

by FolksIT • New Contributor

04-21-2022 10:48:51 PM

197 Views
0 replies
0 kudos

The Changing Role of Teachers in Curriculum 2024

Apttus Training from beginners to advanced at a low cost. Get Apttus Certification with real-time projects, tutorials, interview questions and answers. Join now!!

Data Engineering

197 Views
0 replies
0 kudos

04-21-2022 10:48:51 PM

by hiral_jasani • New Contributor

04-21-2022 4:06:32 PM

216 Views
0 replies
0 kudos

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers...

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers too bottlenecked to build another ingest pipeline? Join us for a live, hands-on workshop on building...

Data Engineering

216 Views
0 replies
0 kudos

04-21-2022 4:06:32 PM

by PJ • New Contributor III

04-21-2022 12:45:17 PM

991 Views
3 replies
3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

Data Engineering

991 Views
3 replies
3 kudos

04-21-2022 12:45:17 PM

View Replies

Latest Reply

PJ
New Contributor III

04-21-2022 1:45:33 PM

3 kudos

Thank you @Hubert Dudek !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

3 kudos

04-21-2022 1:45:33 PM

2 More Replies

by Michael_Galli • Contributor II

04-05-2022 7:55:48 AM

7058 Views
6 replies
3 kudos

Resolved! com.microsoft.sqlserver.jdbc.SQLServerException:The driver could not establish a secure connection to SQL Server by using SSL encr. Error: "Unexpected rethrowing"

Hi all,there is a random error when pushing data from Databricks to a Azure SQL Database.Anyone else also had this problem? Any ideas are appreciated.See stacktrace attached.Target: Azure SQL Database, Standard S6: 400 DTUsDatabricks Cluster config:"...

Data Engineering

7058 Views
6 replies
3 kudos

04-05-2022 7:55:48 AM

View Replies

Latest Reply

Michael_Galli
Contributor II

04-19-2022 11:07:57 PM

3 kudos

@Pearl Ubaru TLS 1.1 is already deprecated.Are there any concerns from your side to set TLS 1.2 in the connection string?

3 kudos

04-19-2022 11:07:57 PM

5 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Resolved! Download a dbfs:/FileStore File to my Local Machine?

SIMEXBangladesh

Databricks workspace API metadata

Resolved! What is the best method for bringing an already trained model into MLflow?

Resolved! How to get the last time ( previous ) databricks job run time?

Resolved! Control visibility of delta tables at sql endpoint

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

Resolved! Bug Report: "Unsubscribed from" emails for deleted jobs have bad templating

Resolved! Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Please bring back "Right Click > Clone" functionality within Databricks Repos! After this was removed, the best way to replicate this fun...

The Changing Role of Teachers in Curriculum 2024

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers...

Resolved! How should you optimize <1GB delta tables?

Resolved! com.microsoft.sqlserver.jdbc.SQLServerException:The driver could not establish a secure connection to SQL Server by using SSL encr. Error: "Unexpected rethrowing"

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...