cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

athjain
by New Contributor III
  • 3858 Views
  • 5 replies
  • 9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

  • 3858 Views
  • 5 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hey @Athlestan Jain​ Just checking in. Do you think you were able to find a solution to your problem from the above answers?  If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

  • 9 kudos
4 More Replies
Michael_Galli
by Contributor III
  • 4494 Views
  • 1 replies
  • 1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

  • 4494 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

  • 1 kudos
hiral_jasani
by New Contributor
  • 750 Views
  • 0 replies
  • 0 kudos

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack  Do you have a lot of data that is stuck in your source systems? Data engineers...

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers too bottlenecked to build another ingest pipeline? Join us for a live, hands-on workshop on building...

Image
  • 750 Views
  • 0 replies
  • 0 kudos
PJ
by New Contributor III
  • 2999 Views
  • 3 replies
  • 3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

  • 2999 Views
  • 3 replies
  • 3 kudos
Latest Reply
PJ
New Contributor III
  • 3 kudos

Thank you @Hubert Dudek​ !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

  • 3 kudos
2 More Replies
PJ
by New Contributor III
  • 4048 Views
  • 7 replies
  • 0 kudos

Please bring back "Right Click > Clone" functionality within Databricks Repos! After this was removed, the best way to replicate this fun...

Please bring back "Right Click > Clone" functionality within Databricks Repos!After this was removed, the best way to replicate this functionality was to:Export the file in .dbc format Import the .dbc file back in. New file has a suffix of " (1)"As o...

  • 4048 Views
  • 7 replies
  • 0 kudos
Latest Reply
PJ
New Contributor III
  • 0 kudos

Hello! Just to update the group on this question, the clone right-click functionality is working again in Repos for me I believe this fix came with a new databricks upgrade on 2022-04-20 / 2022-04-21

  • 0 kudos
6 More Replies
ImAbhishekTomar
by New Contributor III
  • 3638 Views
  • 1 replies
  • 1 kudos

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Hi,Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network join p in i.pr...

  • 3638 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Hi @Abhishek Tomar​ , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-sparkIf you want to have JSON imported directly by databricks/spark, please go with the below solution:SELECT ...

  • 1 kudos
MartinB
by Contributor III
  • 29292 Views
  • 16 replies
  • 3 kudos

Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Imagine the following setup:I have log files stored as JSON files partitioned by year, month, day and hour in physical folders:""" /logs |-- year=2020 |-- year=2021 `-- year=2022 |-- month=01 `-- month=02 |-- day=01 |-- day=.....

  • 29292 Views
  • 16 replies
  • 3 kudos
Latest Reply
MartinB
Contributor III
  • 3 kudos

@Kaniz Fatma​  could you maybe involve a Databricks expert?

  • 3 kudos
15 More Replies
Michael_Galli
by Contributor III
  • 12388 Views
  • 6 replies
  • 3 kudos

Resolved! com.microsoft.sqlserver.jdbc.SQLServerException:The driver could not establish a secure connection to SQL Server by using SSL encr. Error: "Unexpected rethrowing"

Hi all,there is a random error when pushing data from Databricks to a Azure SQL Database.Anyone else also had this problem? Any ideas are appreciated.See stacktrace attached.Target: Azure SQL Database, Standard S6: 400 DTUsDatabricks Cluster config:"...

  • 12388 Views
  • 6 replies
  • 3 kudos
Latest Reply
Michael_Galli
Contributor III
  • 3 kudos

@Pearl Ubaru​ TLS 1.1 is already deprecated.Are there any concerns from your side to set TLS 1.2 in the connection string?

  • 3 kudos
5 More Replies
JakeP
by New Contributor III
  • 2703 Views
  • 3 replies
  • 1 kudos

Resolved! Is there a way to create a path under /Repos via API?

Trying to use Repos API to automate creation and updates to repos under paths not specific to a user, i.e. /Repos/Admin/<repo-name>. It seems that creating a repo via POST to /api/2.0/repos will fail if you don't include a path, and will also fail i...

  • 2703 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

https://docs.databricks.com/dev-tools/api/latest/workspace.html#mkdirs try through Workspace API:curl --netrc --request POST \ https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/workspace/mkdirs \ --header 'Accept: application/json' \ --dat...

  • 1 kudos
2 More Replies
mroy
by Contributor
  • 3408 Views
  • 3 replies
  • 0 kudos

Resolved! Bug Report: "Unsubscribed from" emails for deleted jobs have bad templating

I guess someone inverted the tokens in the template, because the emails look like this:Subject: "[user@company.com] Unsubscribed from 'Job'"Body: "This job has been deleted by dbc-12345678-1234."But it should look like this instead:Subject: "[dbc-123...

  • 3408 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

The bug reported has been fixed and merged. It will be deployed in the next release, which is planned for tomorrow in the PST time zone. !!!! Thanks to @Marco Roy​ 

  • 0 kudos
2 More Replies
Dunken
by New Contributor III
  • 4555 Views
  • 3 replies
  • 0 kudos

Resolved! SSO with Auth0?

Do you support SSO with any IdP which supports SAML 2.0 (e.g. Auth0) or is it limited to https://docs.databricks.com/administration-guide/users-groups/single-sign-on/index.html#supported-identity-providers?

  • 4555 Views
  • 3 replies
  • 0 kudos
Latest Reply
525374
New Contributor II
  • 0 kudos

I am currently having few applications (say App1, App2) along with databricks all integrated with auth0. Now what I wanted to achieve is that when we login to say databricks and then access other apps url in another tab it should not ask for login in...

  • 0 kudos
2 More Replies
_r_vind1199
by New Contributor II
  • 4643 Views
  • 3 replies
  • 3 kudos

Resolved! Pyspark installation issue

When I try to start pyspark session in pycharm. It throws me this error "RuntimeError("Java gateway process exited before sending its port number"). Could anyone help me to solve this?

  • 4643 Views
  • 3 replies
  • 3 kudos
Latest Reply
_r_vind1199
New Contributor II
  • 3 kudos

@Aashita Ramteke​ , Pyspark version 3.2.1

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels