cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

MC8D
by New Contributor II
  • 1556 Views
  • 1 replies
  • 1 kudos

Foreign Catalog with Case Sensitive PostgreSQL

I am trying to query my postgresql read replica as a foreign catalog.I can sucessfuly test the connection.I can see the database names.The table names are auto populated correctly.However when I try to view or query a table, I get the following error...

  • 1556 Views
  • 1 replies
  • 1 kudos
Latest Reply
MC8D
New Contributor II
  • 1 kudos

Hi @Retired_mod I am able to query the pg_catalog database which has all lower case table names, so the connection is working.I am unable to query the tables in my "public" schema, as they have capitalization in the table names.If I query with no bac...

  • 1 kudos
viniaperes
by New Contributor II
  • 2031 Views
  • 0 replies
  • 0 kudos

Pass Databricks's Spark session to a user defined module

Hello everyone,I have a .py file (not a notebook) where I have the following class with the following constructor:class DataQualityChecker: def __init__(self, spark_session: SparkSession, df: DataFrame, quality_config_filepath: str) -> None: ...

  • 2031 Views
  • 0 replies
  • 0 kudos
jgen17
by New Contributor II
  • 8882 Views
  • 2 replies
  • 0 kudos

Cluster library installation fails

Hello everyone,I get a weird error when installing additional libraries in my cluster.I have a predefined Databricks cluster (Standard_L8s_v2) as a Compute instance. I run pipelines on that cluster in Azure ADF. The pipeline consists several tasks. T...

  • 8882 Views
  • 2 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hello everyone,I get a weird error when installing additional libraries in my cluster.I have a predefined Databricks cluster (Standard_L8s_v2) as a Compute instance. I run pipelines on that cluster in Azure ADF. The pipeline consists several tasks. T...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
1 More Replies
successhawk
by New Contributor II
  • 1098 Views
  • 1 replies
  • 1 kudos

How can I provide read only access to the Admin console?

As a DevSecOps engineer, I want to provide Ops support personnel READ ONLY access to the admin console in my production workspaces, so that they can easily view non-secret configurations, such as user/group memberships/entitlements and workspace sett...

  • 1098 Views
  • 1 replies
  • 1 kudos
Latest Reply
418971
New Contributor II
  • 1 kudos

Have you found out a solution for this?

  • 1 kudos
mgrave
by New Contributor II
  • 2059 Views
  • 2 replies
  • 2 kudos

Temporary table names are highlighted as syntax errors in SQL notebooks

See attached screenshot. In my SQL notebook, declare a temporary view:CREATE OR REPLACE TEMP VIEW tmp_table ASSELECT ...;SELECT count(*) FROM tmp_table; The code editor considers tmp_table is not a valid name in that latter SELECT. The reason is:Coul...

  • 2059 Views
  • 2 replies
  • 2 kudos
Latest Reply
Craig_
New Contributor III
  • 2 kudos

My temp views always show red as well.  Maybe it is something with our specific environment?I've also noticed, when browsing the catalog from within the notebook, the temp tables are listed but an error is thrown when you try to view the columns of t...

  • 2 kudos
1 More Replies
aerofish
by New Contributor III
  • 1004 Views
  • 0 replies
  • 0 kudos

Structured streaming deduplication issue

Recently we are using structured streaming to ingest data. We want to use watermark to drop duplicated event. But We encountered some wired behavior and unexpected exception. Anyone can help me to explain what is the expected behavior and how should ...

Data Engineering
deduplication
streaming
watermark
  • 1004 Views
  • 0 replies
  • 0 kudos
krocodl
by Contributor
  • 8247 Views
  • 11 replies
  • 3 kudos

OOM while loading a lot of data through JDBC

   public void bigDataTest() throws Exception { int rowsCount = 100_000; int colSize = 1024; int colCount = 12; String colValue = "'"+"x".repeat(colSize)+"'"; String query = "select explode(s...

Screenshot 2023-10-13 at 08.10.08.png Screenshot 2023-10-13 at 08.12.52.png
Data Engineering
JDBC
Out-of-memory
resource leaking
  • 8247 Views
  • 11 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Retired_modany idea?

  • 3 kudos
10 More Replies
StephanieAlba
by Databricks Employee
  • 3720 Views
  • 2 replies
  • 0 kudos

When would you not want to use autoloader?

I am genuinely curious why would you ever not use Autoloader? I see it in one-off downloads of course. When you pull data from another platform, say Salesforce, is it better to append to a table without Autoloader? There must be cases I am missing. T...

  • 3720 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Autoloader is pretty handy, but not open source.  That is one reason f.e.Another reason is f.e. if you cannot guarantee lexicographically generated files, or you do not want to use streaming, or you do not land your raw data into a data lake (read fr...

  • 0 kudos
1 More Replies
Ruby8376
by Valued Contributor
  • 3333 Views
  • 3 replies
  • 1 kudos

Resolved! DATABRICKS TO AZ SQL??

Hi All,, quick question:Is this correct data flow pattern: Databricks -> Az SQL -> Tableau??Or does it have to go through ADLS: Databricks -> ADLS -> Az SQL - > Tableau? Also, is it better to leverage databricks lakehouse sql warehouse capability as ...

  • 3333 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I would not call it 'better' per se.  A lakehouse is a more modern approach to a classic datawarehouse, using flexible distributed cloud compute, cheap storage and open file formats.If you have an existing environment, which works well, that is heavi...

  • 1 kudos
2 More Replies
andr3s
by New Contributor II
  • 32738 Views
  • 3 replies
  • 2 kudos

SSL_connect: certificate verify failed with Power BI

Hi, I'm getting this error with Power BI:Any ideas?Thanks in advance,Andres

Screenshot 2023-05-19 154328
  • 32738 Views
  • 3 replies
  • 2 kudos
Latest Reply
andr3s
New Contributor II
  • 2 kudos

Hi Harrison, thanks for the reply. We have sorted the issue a few days later but forgot to reply to your answer. There had been some changes on the security set-up that affected our connectivity to Databricks from all clients.

  • 2 kudos
2 More Replies
eimis_pacheco
by Contributor
  • 6391 Views
  • 1 replies
  • 0 kudos

What are the Delta Live Tables limitations with relation to Unity Catalog?

Hi community!I was in a Databricks webinar and one of the participants said "Delta Live Tables seems to have some limitations when using with Unity Catalog. Is the idea to get parity with Hive?" and someone answered "DLT + Unity Catalog combination h...

  • 6391 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi community!I was in a Databricks webinar and one of the participants said "Delta Live Tables seems to have some limitations when using with Unity Catalog. Is the idea to get parity with Hive?" and someone answered "DLT + Unity Catalog combination h...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Michael_Appiah
by Contributor
  • 15470 Views
  • 1 replies
  • 0 kudos

Hashing Functions in PySpark

Hashes are commonly used in SCD2 merges to determine whether data has changed by comparing the hashes of the new rows in the source with the hashes of the existing rows in the target table. PySpark offers multiple different hashing functions like:MD5...

  • 15470 Views
  • 1 replies
  • 0 kudos
Latest Reply
Michael_Appiah
Contributor
  • 0 kudos

Hi @Retired_mod ,thank you for your comprehensive answer. What is your opinion on the trade-off between using a hash like xxHASH64 which returns a LongType column and thus would offer good performance when there is a need to join on the hash column v...

  • 0 kudos
kaleighspitz
by New Contributor
  • 1303 Views
  • 0 replies
  • 0 kudos

Delta Live Tables saving as corrupt files

Hello,I am using Delta Live Tables to store data and then trying to save them to ADLS. I've specified the storage location of the Delta Live Tables in my Delta Live Tables pipeline. However, when I check the files that are saved in ADLS, they are cor...

Data Engineering
Delta Live Tables
  • 1303 Views
  • 0 replies
  • 0 kudos
jfarmer
by New Contributor II
  • 5594 Views
  • 3 replies
  • 1 kudos

PermissionError / Operation not Permitted with Files-in-Repos

I've been running a notebook using files-in-repo. Previously this has worked fine. I'm unsure what's changed (I was testing integration with DCS on older runtimes, but don't think I made any persistent changes)--but now it's throwing an error (always...

image image
  • 5594 Views
  • 3 replies
  • 1 kudos
Latest Reply
_carleto_
New Contributor II
  • 1 kudos

Hi @jfarmer , did you solved this issue? I'm having exactly the same challenge.Thanks!

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels