Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...
Hello,We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.We have a pipeline that gets XML data and stores it in an RD...
I will post here what worked resolving this error for us, in case someone else in the future encounters this.It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avo...
HI Everyone,Im trying to merge two delta tables who have data more than 200 million in each of them. These tables are properly optimized. But upon running the job, the job is taking a long time to execute and the memory spills are huger (1TB-3TB) rec...
I want to upload a simple csv file to a volume which was created in our unity catalog. We are using secure cluster connectivity and our storage account (metastore) is not publicly accessable. We injected the storage in our vnet. I am getting the fol...
@AdrianaIspas We are running into the same issue. It took a while to figure out that the error message is related to this limitation. Any updates on when we can expect the limitation to be taken away? We want to secure access to our storage accounts ...
I had a table creation script as follows for example: CREATE TABLE default.test2 ( id BIGINT GENERATED BY DEFAULT AS IDENTITY(), name String)using deltalocation "/mnt/datalake/xxxx" What are the possible ways to apply not n...
Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...
Hello @vlado101
The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...
Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.
Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...
Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution.
I was taking online exam for Databricks Certified Data Analyst Associate on 06-Oct-2023 1:45PM. In between, they paused it and wanted to survey my whole room which they did, told me to clear the table of water bottle and laptop charger and then asked...
With the introduction of the Unity Catalog in databricks, many of us have become familiar with creating catalogs. However, did you know that the Unity Catalog also allows you to create foreign catalogs? You can register databases from the following s...
Hi all,I'm trying to join 2 views in SQL editor for some analysis. I get the following error:[INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '22/12/...
Hi Kaniz, I found the equivalent SQL code for this but it didn't seem to store the operation past the execution. I.e I would run the code to configure settings, then run the troublesome code afterwards and still get the same result. The problem has b...
Like title, i have a delta table with data of 22/9 and today i wanna remove old data and add new data of 23/9, i used 'truncate' and 'copy into' query but after 'truncate', nothing is added to table, what's happened with my table, file of old data st...
Hello:Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. Appreciate your empathic solution.
Hello,We are migrating to Unity Catalog (UC), and for very few of our tables, we get the below error when trying to write or even display them. We are using UC enabled clusters, usually with runtime versions 12.2 LTS. The below error, when it happens...
Hello,
Thanks for contacting Databricks Support.
The error message indicates a problem with the configuration key fs.azure.account.key. This configuration key is used to provide the access key for the Azure Data Lake Storage account. Not sure if th...