Data Engineering

Forum Posts

Sorted by:

by Aditya2002 • New Contributor

08-30-2023 4:18:30 AM

2063 Views
0 replies
0 kudos

Regarding databricks community edition bleep issue

Hi team,I had created a databricks community edition account. I am trying to **bleep** into it and its showing error. I tried changing password but it still doesn't work. Please let me know where the problem is.Thanks & Regards

Data Engineering

2063 Views
0 replies
0 kudos

08-30-2023 4:18:30 AM

by dng • New Contributor III

11-30-2022 4:39:44 PM

10977 Views
6 replies
10 kudos

Databricks JDBC Driver v2.6.29 Cloud Fetch failing for Windows Operating System

Hi everyone, I've been stuck for the past two days on this issue with my Databricks JDBC driver and I'm hoping someone can give me more insight into how to troubleshoot. I am using the Databricks JDBC driver in RStudio and the connection was working ...

Data Engineering

10977 Views
6 replies
10 kudos

11-30-2022 4:39:44 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

01-30-2023 9:09:08 AM

10 kudos

@Debbie Ng From your message I see there was a windows update and this failure started. based on the conversation you tried latest version of the driver and still you face the problem. I believe this is something related to the Java version compatib...

10 kudos

01-30-2023 9:09:08 AM

5 More Replies

by rt-slowth • Contributor

08-29-2023 10:15:28 PM

3358 Views
0 replies
0 kudos

how to build data warehouses and data marts with Python

I don't know how to build data warehouses and data marts with Python. My current development environment is storing data in AWS Redshift, and I can run queries from Databricks against the stacked tables in Redshift.Can you show me some simple code?

Data Engineering

3358 Views
0 replies
0 kudos

08-29-2023 10:15:28 PM

by NathanSundarara • Valued Contributor

06-07-2023 11:28:00 AM

10944 Views
7 replies
2 kudos

Delta live table generate unique integer value (kind of surrogate key) for combination of columns

Hi,we are in process of moving our Datawarehouse from sql server to databricks. we are in process of testing our Dimension Product table which has identity column for referencing in fact table as surrogate key. In Databricks Apply changes SCD type 2 ...

Data Engineering

10944 Views
7 replies
2 kudos

06-07-2023 11:28:00 AM

View Replies

Latest Reply

ilarsen
Contributor

08-29-2023 8:35:23 PM

2 kudos

Hey. Yep, xxhash64 (or even just hash) generate numerical values for you. Combine with abs function to ensure the value is positive. In our team we used abs(hash()) ourselves... for maybe a day. Very quickly I observed a collision, and the data s...

2 kudos

08-29-2023 8:35:23 PM

6 More Replies

by srDataEngineer • New Contributor II

04-17-2023 9:29:30 AM

6623 Views
4 replies
0 kudos

Resolved! udf not admin user

java.lang.SecurityException: User does not have permission SELECT on anonymous function.

Data Engineering

6623 Views
4 replies
0 kudos

04-17-2023 9:29:30 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 1:09:23 AM

0 kudos

Hi @data engineer Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

0 kudos

04-18-2023 1:09:23 AM

3 More Replies

by Retko • Contributor

08-29-2023 6:31:23 AM

5883 Views
0 replies
0 kudos

Custom logging using Log4J to a file

Hello,I would like to ask for help setting up the log4j.I want to use log4j (log4j2) to generate custom log messages in my notebook when running.This message would be generated like this: logger.info("some info message") but using log4j not python lo...

Data Engineering

5883 Views
0 replies
0 kudos

08-29-2023 6:31:23 AM

by Frank • New Contributor III

09-26-2022 11:20:12 PM

1923 Views
1 replies
1 kudos

Design Question

we have an application that takes in raw metrics data like key-value pairs. then we split them into four different table like below`key1, min, max, average`Those four tables are later used for dashboard. What are the design recommendations to this? S...

Data Engineering

1923 Views
1 replies
1 kudos

09-26-2022 11:20:12 PM

View Replies

Latest Reply

stefnhuy
New Contributor III

08-29-2023 5:10:32 AM

1 kudos

Hey,I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing d...

1 kudos

08-29-2023 5:10:32 AM

by Matt_L • New Contributor III

10-12-2021 10:19:36 AM

8782 Views
3 replies
3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

Data Engineering

8782 Views
3 replies
3 kudos

10-12-2021 10:19:36 AM

View Replies

Latest Reply

Matt_L
New Contributor III

10-13-2021 9:22:28 AM

3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

3 kudos

10-13-2021 9:22:28 AM

2 More Replies

by UmaMahesh1 • Honored Contributor III

11-29-2022 10:38:31 AM

11991 Views
7 replies
17 kudos

Spark Structured Streaming : Data write is too slow into adls.

I'm a bit new to spark structured streaming stuff so do ask all the relevant questions if I missed any.I have a notebook which consumes the events from a kafka topic and writes those records into adls. The topic is json serialized so I'm just writing...

Data Engineering

11991 Views
7 replies
17 kudos

11-29-2022 10:38:31 AM

View Replies

Latest Reply

Miletto
New Contributor II

08-28-2023 2:09:35 PM

17 kudos

17 kudos

08-28-2023 2:09:35 PM

6 More Replies

by 564824 • New Contributor II

08-28-2023 5:44:05 AM

1618 Views
1 replies
1 kudos

Will enabling Unity Catalog affect existing user access and jobs in production?

Hi, at my company, we are using Databricks with AWS IAM identity center as single sign on, I was looking into Unity catalog which seems to offer centralized access but I wanted to know if there will be any downside like loss of existing user profile ...

Data Engineering

1618 Views
1 replies
1 kudos

08-28-2023 5:44:05 AM

View Replies

Latest Reply

Atanu
Databricks Employee

08-28-2023 7:48:45 AM

1 kudos

You can look into this doc https://docs.databricks.com/en/data-governance/unity-catalog/migrate.html which have some details about your question here.

1 kudos

08-28-2023 7:48:45 AM

by SaraCorralLou • New Contributor III

08-18-2023 4:25:58 AM

11310 Views
7 replies
2 kudos

Bad performance UDFs functions

Hello,I am contacting you because I am having a problem with the performance of my notebooks on databricks.My notebook is written in python (pypark) in it I read a delta table that I copy to a dataframe and do several transformations and create sever...

Data Engineering

11310 Views
7 replies
2 kudos

08-18-2023 4:25:58 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

08-28-2023 6:37:40 AM

2 kudos

looping over records is a performance killer. To be avoided at all costs.beware the for-loop (databricks.com)

2 kudos

08-28-2023 6:37:40 AM

6 More Replies

by Chris_Shehu • Valued Contributor III

08-25-2023 10:13:43 AM

4687 Views
2 replies
1 kudos

Resolved! Custom Library's(Unity Catalog Enabled Clusters)

I'm trying to use a custom library that I created from a .whl file in the workspace/shared location. The library attaches to the cluster without any issues and I can it when I list the modules using pip. When I try to call the module I get an error t...

Data Engineering

4687 Views
2 replies
1 kudos

08-25-2023 10:13:43 AM

View Replies

Latest Reply

Szpila
New Contributor III

08-28-2023 1:19:54 AM

1 kudos

Hello Guys,I am working on the project where we need to use spark-excel library (Maven) in order to ingest data from excel files. As those 3rd party library are not allowed on shared cluster, do you have any workaround other then using pandas for exa...

1 kudos

08-28-2023 1:19:54 AM

1 More Replies

by User15986662700 • Databricks Employee

06-16-2021 9:47:51 AM

6466 Views
4 replies
1 kudos

How to integrate Databricks and Spark to Secure HBase cluster with Kerberos?

Data Engineering

6466 Views
4 replies
1 kudos

06-16-2021 9:47:51 AM

View Replies

Latest Reply

User15986662700
Databricks Employee

06-16-2021 9:59:37 AM

1 kudos

Yes, it is possible to connect databricks to a kerberized hbase cluster. The attached article explains the steps. It consists of setting up a kerberos client using a keytab in the cluster nodes, installing the hbase-spark integration library, and set...

1 kudos

06-16-2021 9:59:37 AM

3 More Replies

by naga_databricks • Contributor

08-23-2023 2:59:30 AM

5619 Views
1 replies
0 kudos

Reading bigquery data using a query

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. df = spark.read.format("bigquery") \.option("query", query) \.option("materializationProject", materializationProject) \...

Data Engineering

5619 Views
1 replies
0 kudos

08-23-2023 2:59:30 AM

View Replies

by EDDatabricks • Databricks Partner

08-24-2023 7:12:24 AM

1880 Views
2 replies
2 kudos

Appropriate storage account type for reference data (Azure)

Hello,We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. ...

Data Engineering

Delta Live Table

Storage account

1880 Views
2 replies
2 kudos

08-24-2023 7:12:24 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

08-25-2023 4:49:37 AM

2 kudos

+1 for ADLS. Hierarchical storage, hot/cold/premium storage, things not possible in blob storage

2 kudos

08-25-2023 4:49:37 AM

1 More Replies

Databricks Community

Forum Posts

Regarding databricks community edition bleep issue

Databricks JDBC Driver v2.6.29 Cloud Fetch failing for Windows Operating System

how to build data warehouses and data marts with Python

Delta live table generate unique integer value (kind of surrogate key) for combination of columns

Resolved! udf not admin user

Custom logging using Log4J to a file

Design Question

Resolved! Slow performance loading checkpoint file?

Spark Structured Streaming : Data write is too slow into adls.

Will enabling Unity Catalog affect existing user access and jobs in production?

Bad performance UDFs functions

Resolved! Custom Library's(Unity Catalog Enabled Clusters)

How to integrate Databricks and Spark to Secure HBase cluster with Kerberos?

Reading bigquery data using a query

Appropriate storage account type for reference data (Azure)

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template