- 790 Views
- 1 replies
- 0 kudos
If you have a lot of transactions in a table it seems like the Delta log keeping track of all those transactions would get pretty large. Does the size of the metadata become a problem over time?
- 790 Views
- 1 replies
- 0 kudos
Latest Reply
Yes, the size of the metadata can become a problem over time but not because of performance but because of storage costs. Delta performance will not degrade due to the size of the metadata, but your cloud storage bill can increase. By default Delta h...
- 571 Views
- 1 replies
- 0 kudos
If we don't have any datasets to be shared with external companies, does that mean Delta Sharing is not valid for our org? Is there any use case to use it internally?
- 571 Views
- 1 replies
- 0 kudos
Latest Reply
Delta sharing can be done externally and internally. One use case for sharing internally would be if two separate business units would like to share data with each other without exposing their Lakehouse with the other unit.
- 620 Views
- 1 replies
- 0 kudos
Can I read a Delta table directly using Koalas or do I need to read using Spark and then convert the Spark dataframe to a Koalas dataframe?
- 620 Views
- 1 replies
- 0 kudos
Latest Reply
Yes, you can use the "read_delta" function. Documentation.
- 780 Views
- 1 replies
- 2 kudos
I'm working on setting up tooling to allow team members to easily register and load models from a central mlflow model registry via dbconnect. However after following the instructions at the public docs , hitting this error raise _NoDbutilsError
mlfl...
- 780 Views
- 1 replies
- 2 kudos
Latest Reply
You could monkey patch MLFlow's _get_dbutils() with something similar to this to get this working while connecting from dbconnectspark = SparkSession.builder.getOrCreate()
# monkey-patch MLFlow's _get_dbutils()
def _get_dbutils():
return DBUtils(...
by
aladda
• Honored Contributor II
- 602 Views
- 1 replies
- 0 kudos
I see the revision_timestamp paramater on NotebookTask https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobsnotebooktask. An example of how to invoke it would be helpful
- 602 Views
- 1 replies
- 0 kudos
Latest Reply
You can use the databricks built in version control feature, coupled with the NotebookTask Jobs API to specify a specific version of the notebook based on the timestamp of the save defined in unix timestamp formatcurl -n -X POST -H 'Content-Type: app...
- 758 Views
- 1 replies
- 0 kudos
I have read and heard that having too many small files can cause performance problems when reading large data sets. But how do I know if that is an issue I am facing?
- 758 Views
- 1 replies
- 0 kudos
Latest Reply
Databricks SQL endpoint has a query history section which provides additional information to debug / tune queries. One such metric under execution details is the number of files read. For ETL/Data science workloads, you could use the Spark UI of the ...
- 1471 Views
- 1 replies
- 1 kudos
In databricks is there a way to display the spark job process in a dashboard? I have a simple dashboard that displays a table, but the main spark job behind it takes 15 minutes to run. Is there a way to show the spark job progress bar in a dashboard?
- 1471 Views
- 1 replies
- 1 kudos
Latest Reply
The best way to do so would be to collect data about the job run using the REST API (runs get endpoint). This endpoint provides as much metadata as possible. You may need to use other endpoints to get the job or run ids in order to get the correct in...
- 666 Views
- 0 replies
- 0 kudos
I have a dataframe that looks like the following,+-------+--------+
|Charges| Status|
+-------+--------+
| 495.6| Denied|
|1806.28| Denied|
| 261.3|Accepted|
| 8076.5|Accepted|
|1041.24| Denied|
| 507.88| Denied|
| 208.0|Accepted|
| 152.49| ...
- 666 Views
- 0 replies
- 0 kudos
- 1135 Views
- 1 replies
- 0 kudos
I know I have the option to delete rows from a Delta table when running a merge. But I'm confused about how that would actually affect the files that contain the deleted records. Are those files deleted, or are they rewritten, or what?
- 1135 Views
- 1 replies
- 0 kudos
Latest Reply
Delta implements MERGE by physically rewriting existing files. It is implemented in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in t...
- 869 Views
- 1 replies
- 0 kudos
I know that when deletes are made from a Delta table the underlying files are not actually removed. For compliance reasons I need to able to truly delete the records. How can I know which files need to be removed, and is there a way to remove them ot...
- 869 Views
- 1 replies
- 0 kudos
Latest Reply
Here is a document explaining best practices for GDPR and CCPA compliance using Delta Lake. Specifically on cleaning up stale data - you can use the VACUUM function to remove files that are no longer referenced by a Delta table and are older than a s...
- 2001 Views
- 0 replies
- 0 kudos
Dataframe write to SQL Server table containing Always autogenerate column fails. I am using Apache Spark Connector for SQL Server and Azure SQL. When autogenerate field are not included in dataframe, I encountered - "No key found " error If auto-gene...
- 2001 Views
- 0 replies
- 0 kudos
- 1174 Views
- 1 replies
- 0 kudos
I would like to know if I can connect using to DBconnect to any DBR version or if only the supported version will work?
- 1174 Views
- 1 replies
- 0 kudos
Latest Reply
Only the following Databricks Runtime versions are supported:Databricks Runtime 8.1 ML, Databricks Runtime 8.1Databricks Runtime 7.3 LTS ML, Databricks Runtime 7.3 LTSDatabricks Runtime 6.4 ML, Databricks Runtime 6.4Databricks Runtime 5.5 LTS ML, Dat...
- 718 Views
- 1 replies
- 0 kudos
I would like to know if there is a way to connect to Databricks cluster using my IDE
- 718 Views
- 1 replies
- 0 kudos
Latest Reply
Databricks connect allows you to connect your favorite IDE to Databricks clusters. You can find more details on how to set it up and install all the libraries https://docs.databricks.com/dev-tools/databricks-connect.html