cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mgrave
by New Contributor II
  • 3243 Views
  • 2 replies
  • 2 kudos

Temporary table names are highlighted as syntax errors in SQL notebooks

See attached screenshot. In my SQL notebook, declare a temporary view:CREATE OR REPLACE TEMP VIEW tmp_table ASSELECT ...;SELECT count(*) FROM tmp_table; The code editor considers tmp_table is not a valid name in that latter SELECT. The reason is:Coul...

  • 3243 Views
  • 2 replies
  • 2 kudos
Latest Reply
Craig_
New Contributor III
  • 2 kudos

My temp views always show red as well.  Maybe it is something with our specific environment?I've also noticed, when browsing the catalog from within the notebook, the temp tables are listed but an error is thrown when you try to view the columns of t...

  • 2 kudos
1 More Replies
aerofish
by New Contributor III
  • 1464 Views
  • 0 replies
  • 0 kudos

Structured streaming deduplication issue

Recently we are using structured streaming to ingest data. We want to use watermark to drop duplicated event. But We encountered some wired behavior and unexpected exception. Anyone can help me to explain what is the expected behavior and how should ...

Data Engineering
deduplication
streaming
watermark
  • 1464 Views
  • 0 replies
  • 0 kudos
StephanieAlba
by Databricks Employee
  • 5095 Views
  • 2 replies
  • 0 kudos

When would you not want to use autoloader?

I am genuinely curious why would you ever not use Autoloader? I see it in one-off downloads of course. When you pull data from another platform, say Salesforce, is it better to append to a table without Autoloader? There must be cases I am missing. T...

  • 5095 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Autoloader is pretty handy, but not open source.  That is one reason f.e.Another reason is f.e. if you cannot guarantee lexicographically generated files, or you do not want to use streaming, or you do not land your raw data into a data lake (read fr...

  • 0 kudos
1 More Replies
Ruby8376
by Valued Contributor
  • 3898 Views
  • 3 replies
  • 1 kudos

Resolved! DATABRICKS TO AZ SQL??

Hi All,, quick question:Is this correct data flow pattern: Databricks -> Az SQL -> Tableau??Or does it have to go through ADLS: Databricks -> ADLS -> Az SQL - > Tableau? Also, is it better to leverage databricks lakehouse sql warehouse capability as ...

  • 3898 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I would not call it 'better' per se.  A lakehouse is a more modern approach to a classic datawarehouse, using flexible distributed cloud compute, cheap storage and open file formats.If you have an existing environment, which works well, that is heavi...

  • 1 kudos
2 More Replies
eimis_pacheco
by Contributor
  • 7736 Views
  • 1 replies
  • 0 kudos

What are the Delta Live Tables limitations with relation to Unity Catalog?

Hi community!I was in a Databricks webinar and one of the participants said "Delta Live Tables seems to have some limitations when using with Unity Catalog. Is the idea to get parity with Hive?" and someone answered "DLT + Unity Catalog combination h...

  • 7736 Views
  • 1 replies
  • 0 kudos
Michael_Appiah
by Contributor II
  • 27771 Views
  • 1 replies
  • 1 kudos

Hashing Functions in PySpark

Hashes are commonly used in SCD2 merges to determine whether data has changed by comparing the hashes of the new rows in the source with the hashes of the existing rows in the target table. PySpark offers multiple different hashing functions like:MD5...

  • 27771 Views
  • 1 replies
  • 1 kudos
Latest Reply
Michael_Appiah
Contributor II
  • 1 kudos

Hi @Retired_mod ,thank you for your comprehensive answer. What is your opinion on the trade-off between using a hash like xxHASH64 which returns a LongType column and thus would offer good performance when there is a need to join on the hash column v...

  • 1 kudos
kaleighspitz
by New Contributor
  • 1723 Views
  • 0 replies
  • 0 kudos

Delta Live Tables saving as corrupt files

Hello,I am using Delta Live Tables to store data and then trying to save them to ADLS. I've specified the storage location of the Delta Live Tables in my Delta Live Tables pipeline. However, when I check the files that are saved in ADLS, they are cor...

Data Engineering
Delta Live Tables
  • 1723 Views
  • 0 replies
  • 0 kudos
jfarmer
by New Contributor II
  • 7842 Views
  • 3 replies
  • 1 kudos

PermissionError / Operation not Permitted with Files-in-Repos

I've been running a notebook using files-in-repo. Previously this has worked fine. I'm unsure what's changed (I was testing integration with DCS on older runtimes, but don't think I made any persistent changes)--but now it's throwing an error (always...

image image
  • 7842 Views
  • 3 replies
  • 1 kudos
Latest Reply
_carleto_
New Contributor II
  • 1 kudos

Hi @jfarmer , did you solved this issue? I'm having exactly the same challenge.Thanks!

  • 1 kudos
2 More Replies
Paval
by New Contributor
  • 1676 Views
  • 0 replies
  • 0 kudos

Failed to run the job on databricks version LTS 9.x and 10.x(AWS)

Hi Team,When we tried to change the databricks version from 7.3 to 9.x or 10.x we are getting below error. Caused by: java.lang.RuntimeException: MetaException(message:Unable to verify existence of default database: com.amazonaws.services.glue.model....

  • 1676 Views
  • 0 replies
  • 0 kudos
rp16
by New Contributor II
  • 2805 Views
  • 2 replies
  • 2 kudos

How can we create streaming tables as external delta tables ?

We would like to introduce DLT, Streaming tables to our medallion architecture but we are unable to create the streaming tables with concerned schemas. STREAMING Tables doesn't have an option to be stored with custom schemas. The requirement we have ...

  • 2805 Views
  • 2 replies
  • 2 kudos
Latest Reply
Faisal
Contributor
  • 2 kudos

If unity catalog is used, by default tables under that would be managed

  • 2 kudos
1 More Replies
nikhilkumawat
by New Contributor III
  • 5004 Views
  • 2 replies
  • 1 kudos

[INTERNAL_ERROR] Cannot generate code for expression: claimsconifer.default.decrypt_colA(

A column contains encrypted data at rest. I am trying to create a sql function which will decrypt the data if the user is a part of a particular group. Below is the function: %sql CREATE OR REPLACE FUNCTION test.default.decrypt_if_valid_user(col_a ST...

  • 5004 Views
  • 2 replies
  • 1 kudos
Latest Reply
nikhilkumawat
New Contributor III
  • 1 kudos

Hi @Retired_mod After removing "TABLE" keyword from create or replace statement this function got registered as builtin function. Just to verify that I displayed all the functions and I can see that function--> decrypt_if_valid_user:Now I am trying t...

  • 1 kudos
1 More Replies
Oliver_Angelil
by Valued Contributor II
  • 3779 Views
  • 3 replies
  • 1 kudos

Resolved! Are data health check expectations available only on Delta Live tables?

I love the idea of "expectations" being available for Delta Live tables: https://docs.databricks.com/delta-live-tables/expectations.htmlI'd like to know if they are also available for regular delta tables?Thank you in advance!

  • 3779 Views
  • 3 replies
  • 1 kudos
Latest Reply
erigaud
Honored Contributor
  • 1 kudos

Hello @Oliver_Angelil, so have you found a way to implement something resembling expectations for delta tables outside of a DLT pipeline ? 

  • 1 kudos
2 More Replies
invalidargument
by New Contributor III
  • 5204 Views
  • 1 replies
  • 1 kudos

How to display shap waterfall plot

Hi,I have managed to display force plot for a single observation using the advice from this thread:Solved: How to display SHAP plots? - Databricks - 28315But is there anyway to display the newer "waterfall"-plot shap.plots.waterfall — SHAP latest doc...

  • 5204 Views
  • 1 replies
  • 1 kudos
Latest Reply
invalidargument
New Contributor III
  • 1 kudos

Thank you for the swift response. I made a minimal example and it does work as you said. However when I try with my own model it does not work, the only output is<Figure size 576x468 with 3 Axes>I tried to save the figure as a file and then I do get ...

  • 1 kudos
rt-slowth
by Contributor
  • 1205 Views
  • 0 replies
  • 0 kudos

How to write test code in databricks

    from databricks.connect import DatabricksSession from data.dbx_conn_info import DbxConnInfo class SparkSessionManager: _instance = None _spark = None def __new__(cls): if cls._instance is None: cls._instance = s...

  • 1205 Views
  • 0 replies
  • 0 kudos
User16789201666
by Databricks Employee
  • 11541 Views
  • 3 replies
  • 4 kudos
  • 11541 Views
  • 3 replies
  • 4 kudos
Latest Reply
arun_pamulapati
Databricks Employee
  • 4 kudos

Use Lakehouse Monitoring:  https://docs.databricks.com/en/lakehouse-monitoring/index.html Specifically:  https://docs.databricks.com/en/lakehouse-monitoring/monitor-output.html#drift-metrics-table

  • 4 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels