cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16776430979
by New Contributor III
  • 53974 Views
  • 4 replies
  • 5 kudos

Best practices around bronze/silver/gold (medallion model) data lake classification?

What's the best way to organize our data lake and delta setup? We’re trying to use the bronze, silver and gold classification strategy. The main question is how do we know what classification the data is inside Databricks if there’s no actual physica...

  • 53974 Views
  • 4 replies
  • 5 kudos
Latest Reply
G_E
New Contributor II
  • 5 kudos

Has the reply from @Retired_mod been removed?

  • 5 kudos
3 More Replies
SQL
by New Contributor II
  • 2881 Views
  • 6 replies
  • 1 kudos

Presto hive table to delta table conversion

Hi Everyone, I am using the below sql query to generate the days in order in hive & it is working fine. The table got migrated to delta and my query is failing. It would be appreciated if someone helps me to figure out the issue.SQL Query :with  ex...

  • 2881 Views
  • 6 replies
  • 1 kudos
Latest Reply
thelogicplus
Contributor
  • 1 kudos

Hi @SQL @jose_gonzalez , Have you tried code conversion tool fromTravinto technologies  ? They have hive to delta table conversion 

  • 1 kudos
5 More Replies
subhankar
by New Contributor II
  • 1035 Views
  • 2 replies
  • 0 kudos

Need guidance on connecting to Azure Databricks using JDBC Protocol

Step 1: Download and Reference the JDBC Driver Download the Databricks JDBC Driver: Visit the Databricks JDBC Driver download page. Download the appropriate version for your operating system. Extract the DatabricksJDBC42.jar file from the downloaded ...

  • 1035 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @subhankar , Good Day!  Looking at the error you are getting here shows that it tries to find some kind of JVM file and probably refers to the JAVA_HOME variable to achieve it. It looks as if it is not set correctly in your Environment Variables. ...

  • 0 kudos
1 More Replies
aliacovella
by Contributor
  • 810 Views
  • 3 replies
  • 1 kudos

How can I get logging or print output from a Delta Live Table workflow.

I'm trying to debug a task that is a DLT workflow and I've tried putting in log statements and print statements but I can't seem to see the output in the event log after the run nor can I see the print statements anywhere. Can someone point me to whe...

  • 810 Views
  • 3 replies
  • 1 kudos
Latest Reply
Edthehead
Contributor III
  • 1 kudos

Refer to this answer https://community.databricks.com/t5/data-engineering/how-to-print-out-logs-during-dlt-pipeline-run/td-p/82303 

  • 1 kudos
2 More Replies
Sangeetha112
by New Contributor
  • 1847 Views
  • 1 replies
  • 0 kudos

Email Extraction

Hi , Hope you are doing well. I was trying to extract a specific email attachment from the outlook, and inject into the dbfs loaction, but something went wrong. Could you please help. I am hereby giving the code whcih I used.  import imaplibimport em...

  • 1847 Views
  • 1 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

If you face issues with IMAP, consider using Microsoft Graph API for email access. It provides robust support for Outlook without handling IMAP details and enhances security with OAuth2 tokens.Followed is a sample script, but I didn't tested it: pip ...

  • 0 kudos
DeepankarB
by New Contributor III
  • 2493 Views
  • 2 replies
  • 1 kudos

Resolved! Error API calling with Service Principal Secret

Hi,I am working on Databricks workspace setup on AWS and trying to use Service Principal to execute API calls (CI/CD) deployment through Bitbucket. So I created secret for the service principal and trying to test the token. The test failed with below...

  • 2493 Views
  • 2 replies
  • 1 kudos
Latest Reply
DeepankarB
New Contributor III
  • 1 kudos

I have been able to resolve this issue. Apparently you need to generate access token using service principal client id and client secret.  saurabh18cs solution is more relevant to Azure Databricks. Got below link from Databricks which provide generic...

  • 1 kudos
1 More Replies
gourishrivastav
by New Contributor
  • 1035 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Fundamentals Certificate

Dear Team,I have successfully completed the Databricks Fundamentals training and aced the certificate quiz with a perfect score of 200 out of 200. However, I have not yet received the certificate. Can you please let me know the expected timeline for ...

  • 1035 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

You should be able to receive it immediately. Can you share your use id with which you have taken the quiz?

  • 0 kudos
krocodl
by Contributor
  • 9456 Views
  • 12 replies
  • 3 kudos

OOM while loading a lot of data through JDBC

   public void bigDataTest() throws Exception { int rowsCount = 100_000; int colSize = 1024; int colCount = 12; String colValue = "'"+"x".repeat(colSize)+"'"; String query = "select explode(s...

Screenshot 2023-10-13 at 08.10.08.png Screenshot 2023-10-13 at 08.12.52.png
Data Engineering
JDBC
Out-of-memory
resource leaking
  • 9456 Views
  • 12 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Retired_modany idea?

  • 3 kudos
11 More Replies
KacperG
by New Contributor III
  • 1711 Views
  • 5 replies
  • 2 kudos

Resolved! Merge operation stuck on scanning files for matches

HiI'm executing simple merge, however it always stucks at "MERGE operation - scanning files for matches". Both delta tables are not big - source has about 100MiB in 1 file and target has 1,5GiB, 7 files, so it should be quite fast operation, however ...

KacperG_0-1729157109666.png
  • 1711 Views
  • 5 replies
  • 2 kudos
Latest Reply
KacperG
New Contributor III
  • 2 kudos

Well, in the end, it was caused by skewed data. Document_ID was -1 for returns in sales, so a big part of the table was filled with -1 values. Adding an extra column to the merger solved the problem.This article helped me a lot: https://www.databrick...

  • 2 kudos
4 More Replies
NagarajuBondala
by New Contributor II
  • 1081 Views
  • 1 replies
  • 1 kudos

Resolved! AI-Suggested Comments Not Appearing for Delta Live Tables Populated Tables

I'm working with Delta Live Tables (DLT) in Databricks and have noticed that AI-suggested comments for columns are not showing up for tables populated using DLT. Interestingly, this feature works fine for tables that are not populated using DLT. Is t...

Data Engineering
AI
Delta Live Tables
dlt
  • 1081 Views
  • 1 replies
  • 1 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 1 kudos

It's because materialized view in DLT (MV) and streaming table in DLT (ST) don't support ALTER (which is needed to persist those AI generated comments)

  • 1 kudos
ls
by New Contributor III
  • 970 Views
  • 3 replies
  • 1 kudos

Resolved! Change spark configs in Serverless compute clusters

Howdy!I wanted to know how I can change some spark configs in a Serverless compute. I have a base.yml file and tried placing: spark_conf:     - spark.driver.maxResultSize: "16g"but I still get his error:[CONFIG_NOT_AVAILABLE] Configuration spark.driv...

  • 970 Views
  • 3 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

To address the memory issue in your Serverless compute environment, you can consider the following strategies: Optimize the Query: Filter Early: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data b...

  • 1 kudos
2 More Replies
Uj337
by New Contributor III
  • 1957 Views
  • 8 replies
  • 0 kudos

Library installation failed for library due to user error for wheel file

Hi All,Recently we have implemented the change to make databricks workspace accessible only via a private network. After this change, we found lot of errors on connectivity like from Power BI to Databricks, Azure Data factory to Databricks etc.I was ...

  • 1957 Views
  • 8 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi @Uj337,How are you doing today?This issue seems to be tied to the private network setup affecting access to the .whl file on DBFS. i recommend you to start by ensuring the driver node has proper access to the dbfs:/Volumes/any.whl path and that al...

  • 0 kudos
7 More Replies
jordan_boles
by New Contributor II
  • 923 Views
  • 1 replies
  • 2 kudos

Future of iceberg-kafka-connect

Databricks acquired the iceberg kafka connect repo this past summer. There are open issues and PRs that devs would like to address and collaborate on to improve the connector. But Databricks has not yet engaged with this community in the ~6 months si...

  • 923 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 2 kudos

Thanks for sharing this @jordan_boles . Happy Data Engineering!

  • 2 kudos
infinitylearnin
by New Contributor III
  • 241 Views
  • 1 replies
  • 2 kudos

Resolved! Role of Data Practitioner in AI Era

As the AI revolution takes off in 2025, there is a renewed emphasis on adopting a Data-First approach. Organizations are increasingly recognizing the need to establish a robust data foundation while preparing a skilled fleet of Data Engineers to tack...

  • 241 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 2 kudos

Good work @infinitylearnin . Keep it up.

  • 2 kudos
AxelBrsn
by New Contributor III
  • 2942 Views
  • 1 replies
  • 0 kudos

Why materialized views are created in __databricks_internal ?

Hello, I have a question about why materialized views are created in "__databricks_internal" catalog?We specified catalog and schemas in the DLT Pipeline.

Data Engineering
catalog
Delta Live Table
materialized views
  • 2942 Views
  • 1 replies
  • 0 kudos
Latest Reply
MathieuDB
Databricks Employee
  • 0 kudos

Hello @AxelBrsn  Materialized views created by Delta Live Tables (DLT) pipelines are stored in the __databricks_internal catalog for several reasons: Isolation: The __databricks_internal catalog is used to store system-generated tables, such as mater...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels