cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

svrdragon
by New Contributor
  • 1913 Views
  • 0 replies
  • 0 kudos

optimizeWrite takes too long

Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...

  • 1913 Views
  • 0 replies
  • 0 kudos
pavlos_skev
by New Contributor III
  • 5640 Views
  • 2 replies
  • 0 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

Hello,We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.We have a pipeline that gets XML data and stores it in an RD...

  • 5640 Views
  • 2 replies
  • 0 kudos
Latest Reply
pavlos_skev
New Contributor III
  • 0 kudos

I will post here what worked resolving this error for us, in case someone else in the future encounters this.It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avo...

  • 0 kudos
1 More Replies
Mohammad_Younus
by New Contributor
  • 4681 Views
  • 0 replies
  • 0 kudos

Merge delta tables with data more than 200 million

HI Everyone,Im trying to merge two delta tables who have data more than 200 million in each of them. These tables are properly optimized. But upon running the job, the job is taking a long time to execute and the memory spills are huger (1TB-3TB) rec...

Mohammad_Younus_0-1698373999153.png
  • 4681 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 8105 Views
  • 1 replies
  • 1 kudos

The perfect table

Unlock the Power of #Databricks: The Perfect Table in 8 Simple Steps! 

perfec_table8.png perfec_table7.png perfec_table6.png perfec_table5.png
  • 8105 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Hubert-Dudek, Thank you for sharing this great post

  • 1 kudos
boriste
by New Contributor II
  • 10007 Views
  • 11 replies
  • 10 kudos

Resolved! Upload to Volume inside unity catalog not possible?

 I want to upload a simple csv file to a volume which was created in our unity catalog. We are using secure cluster connectivity and our storage account (metastore) is not publicly accessable. We injected the storage in our vnet. I am getting the fol...

  • 10007 Views
  • 11 replies
  • 10 kudos
Latest Reply
jeroenvs
New Contributor III
  • 10 kudos

@AdrianaIspas We are running into the same issue. It took a while to figure out that the error message is related to this limitation. Any updates on when we can expect the limitation to be taken away? We want to secure access to our storage accounts ...

  • 10 kudos
10 More Replies
harish446
by New Contributor
  • 1453 Views
  • 1 replies
  • 0 kudos

Can a not null constraint be applied on a identity column

I had a table creation script as follows for example: CREATE TABLE default.test2          (  id BIGINT GENERATED BY DEFAULT AS IDENTITY(),                name  String)using deltalocation "/mnt/datalake/xxxx"  What are the possible ways to apply not n...

Data Engineering
data engineering
Databricks
Delta Lake
Delta tables
spark
  • 1453 Views
  • 1 replies
  • 0 kudos
Latest Reply
Krishnamatta
New Contributor III
  • 0 kudos

Hi Harish,Here is the documentation for this issuehttps://docs.databricks.com/en/tables/constraints.html

  • 0 kudos
vlado101
by New Contributor II
  • 4009 Views
  • 1 replies
  • 1 kudos

Resolved! ANALYZE TABLE is not updating columns stats

Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...

  • 4009 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hello @vlado101  The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1578 Views
  • 1 replies
  • 1 kudos

Structured Streaming Aggregation

Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.

structured2.png
  • 1578 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing @Hubert-Dudek !!!

  • 1 kudos
TimReddick
by Contributor
  • 8627 Views
  • 6 replies
  • 2 kudos

Using run_job_task in Databricks Asset Bundles

Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...

Data Engineering
Databrick Asset Bundles
run_job_task
  • 8627 Views
  • 6 replies
  • 2 kudos
Latest Reply
kyle_r
New Contributor II
  • 2 kudos

Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution. 

  • 2 kudos
5 More Replies
Sanjay96m
by New Contributor
  • 1331 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Certification exam Suspended. Need Assistance

I was taking online exam for Databricks Certified Data Analyst Associate on 06-Oct-2023 1:45PM. In between, they paused it and wanted to survey my whole room which they did, told me to clear the table of water bottle and laptop charger and then asked...

  • 1331 Views
  • 1 replies
  • 0 kudos
Latest Reply
Cert-Team
Databricks Employee
  • 0 kudos

@Sanjay96m Thank you for your patience, the support team is working through support tickets and will reach out to you shortlly.

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1297 Views
  • 1 replies
  • 1 kudos

Foreign catalogs

With the introduction of the Unity Catalog in databricks, many of us have become familiar with creating catalogs. However, did you know that the Unity Catalog also allows you to create foreign catalogs? You can register databases from the following s...

db.png
  • 1297 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing @Hubert-Dudek !!!

  • 1 kudos
117074
by New Contributor III
  • 11822 Views
  • 1 replies
  • 1 kudos

[INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER]

Hi all,I'm trying to join 2 views in SQL editor for some analysis. I get the following error:[INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '22/12/...

  • 11822 Views
  • 1 replies
  • 1 kudos
Latest Reply
117074
New Contributor III
  • 1 kudos

Hi Kaniz, I found the equivalent SQL code for this but it didn't seem to store the operation past the execution. I.e I would run the code to configure settings, then run the troublesome code afterwards and still get the same result. The problem has b...

  • 1 kudos
pavlos_skev
by New Contributor III
  • 1150 Views
  • 1 replies
  • 0 kudos

Potential Unity Catalog Bug: Invalid configuration value detected for fs.azure.account.keyInvalid

Hello,We are migrating to Unity Catalog (UC), and for very few of our tables, we get the below error when trying to write or even display them. We are using UC enabled clusters, usually with runtime versions 12.2 LTS. The below error, when it happens...

  • 1150 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Databricks Employee
  • 0 kudos

Hello,  Thanks for contacting Databricks Support.  The error message indicates a problem with the configuration key fs.azure.account.key. This configuration key is used to provide the access key for the Azure Data Lake Storage account. Not sure if th...

  • 0 kudos
Labels