cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ywaihong6123
by New Contributor
  • 6085 Views
  • 1 replies
  • 0 kudos

Libraries Not Working on Shared Cluster 13.3 LTS

I am facing this error while installing the spark-excel library into the cluster. Does anyone know how to add library into artifact allowlist?Jars and Maven Libraries on Shared Clusters must be on the allowlist. Failed Libraries: com.crealytics:spark...

  • 6085 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Databricks Employee
  • 0 kudos

You can add the jar followed below steps:How to add items to the allowlistYou can add items to the allowlist with Data Explorer or the REST API.To open the dialog for adding items to the allowlist in Data Explorer, do the following:In your Databricks...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 11001 Views
  • 1 replies
  • 2 kudos

Handling GDPR requests in databricks

When dealing with GDPR requests in databricks, there are some essential things to keep in mind:- Use a low retention period to ensure you don't keep table delta version history for tables with personal information.- Use APPLY CHANGES to handle Slowly...

ezgif-3-020e69a4fd.gif
  • 11001 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Thank you for sharing this information @Hubert-Dudek!!!!

  • 2 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 22825 Views
  • 1 replies
  • 2 kudos

Checking that spark dataframe is empty

#databricks #spark 3.3 has introduced a simple yet powerful isEmpty() function for DataFrames. Gone are the days of using count() to check for empty DataFrames — now it'**bleep** as easy as calling df.isEmpty().

Screenshot 2023-08-29 143021.png
  • 22825 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Thank you for sharing this @Hubert-Dudek !!!

  • 2 kudos
Aditya2002
by New Contributor
  • 1695 Views
  • 0 replies
  • 0 kudos

Regarding databricks community edition **bleep** issue

Hi team,I had created a databricks community edition account. I am trying to **bleep** into it and its showing error. I tried changing password but it still doesn't work. Please let me know where the problem is.Thanks & Regards

  • 1695 Views
  • 0 replies
  • 0 kudos
dng
by New Contributor III
  • 7766 Views
  • 6 replies
  • 10 kudos

Databricks JDBC Driver v2.6.29 Cloud Fetch failing for Windows Operating System

Hi everyone, I've been stuck for the past two days on this issue with my Databricks JDBC driver and I'm hoping someone can give me more insight into how to troubleshoot. I am using the Databricks JDBC driver in RStudio and the connection was working ...

  • 7766 Views
  • 6 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

@Debbie Ng​ From your message I see there was a windows update and this failure started. based on the conversation you tried latest version of the driver and still you face the problem. I believe this is something related to the Java version compatib...

  • 10 kudos
5 More Replies
rt-slowth
by Contributor
  • 2442 Views
  • 0 replies
  • 0 kudos

how to build data warehouses and data marts with Python

I don't know how to build data warehouses and data marts with Python. My current development environment is storing data in AWS Redshift, and I can run queries from Databricks against the stacked tables in Redshift.Can you show me some simple code?

  • 2442 Views
  • 0 replies
  • 0 kudos
NathanSundarara
by Contributor
  • 7573 Views
  • 7 replies
  • 2 kudos

Delta live table generate unique integer value (kind of surrogate key) for combination of columns

Hi,we are in process of moving our Datawarehouse from sql server to databricks. we are in process of testing our Dimension Product table which has identity column for referencing in fact table as surrogate key. In Databricks Apply changes SCD type 2 ...

  • 7573 Views
  • 7 replies
  • 2 kudos
Latest Reply
ilarsen
Contributor
  • 2 kudos

Hey.  Yep, xxhash64 (or even just hash) generate numerical values for you.  Combine with abs function to ensure the value is positive.  In our team we used abs(hash()) ourselves... for maybe a day.  Very quickly I observed a collision, and the data s...

  • 2 kudos
6 More Replies
LukaszJ
by Contributor III
  • 20727 Views
  • 6 replies
  • 2 kudos

Resolved! Install ODBC driver by init script

Hello,I want to install ODBC driver (for pyodbc).I have tried to do it using terraform, however I think it is impossible.So I want to do it with Init Script in my cluster. I have the code from the internet and it works when it is on the beginning of ...

  • 20727 Views
  • 6 replies
  • 2 kudos
Latest Reply
MayaBakh_80151
New Contributor II
  • 2 kudos

Actually found this article and using this to migrate my shell script to workspace.Cluster-named and cluster-scoped init script migration notebook - Databricks 

  • 2 kudos
5 More Replies
srDataEngineer
by New Contributor II
  • 5410 Views
  • 4 replies
  • 0 kudos

Resolved! udf not admin user

java.lang.SecurityException: User does not have permission SELECT on anonymous function.

  • 5410 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @data engineer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

  • 0 kudos
3 More Replies
Retko
by Contributor
  • 4654 Views
  • 0 replies
  • 0 kudos

Custom logging using Log4J to a file

Hello,I would like to ask for help setting up the log4j.I want to use log4j (log4j2) to generate custom log messages in my notebook when running.This message would be generated like this: logger.info("some info message") but using log4j not python lo...

  • 4654 Views
  • 0 replies
  • 0 kudos
Frank
by New Contributor III
  • 1135 Views
  • 1 replies
  • 1 kudos

Design Question

we have an application that takes in raw metrics data like key-value pairs. then we split them into four different table like below`key1, min, max, average`Those four tables are later used for dashboard. What are the design recommendations to this? S...

  • 1135 Views
  • 1 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey,I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing d...

  • 1 kudos
Matt_L
by New Contributor III
  • 6973 Views
  • 3 replies
  • 3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

  • 6973 Views
  • 3 replies
  • 3 kudos
Latest Reply
Matt_L
New Contributor III
  • 3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

  • 3 kudos
2 More Replies
r0nald
by New Contributor II
  • 8624 Views
  • 3 replies
  • 1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

  • 8624 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the transform function in sql is not the same as the scala/pyspark counterpart.  It is in fact a map().Here is some interesting infoI agree that functions are essential for code modularity.  Hence I prefer not to use sql but scala/pyspark instead.

  • 1 kudos
2 More Replies
UmaMahesh1
by Honored Contributor III
  • 8077 Views
  • 7 replies
  • 17 kudos

Spark Structured Streaming : Data write is too slow into adls.

I'm a bit new to spark structured streaming stuff so do ask all the relevant questions if I missed any.I have a notebook which consumes the events from a kafka topic and writes those records into adls. The topic is json serialized so I'm just writing...

  • 8077 Views
  • 7 replies
  • 17 kudos
Latest Reply
Miletto
New Contributor II
  • 17 kudos

 

  • 17 kudos
6 More Replies
564824
by New Contributor II
  • 1207 Views
  • 1 replies
  • 1 kudos

Will enabling Unity Catalog affect existing user access and jobs in production?

Hi, at my company, we are using Databricks with AWS IAM identity center as single sign on, I was looking into Unity catalog which seems to offer centralized access but I wanted to know if there will be any downside like loss of existing user profile ...

  • 1207 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

You can look into this doc https://docs.databricks.com/en/data-governance/unity-catalog/migrate.html which have some details about your question here. 

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels