cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Aditya2002
by New Contributor
  • 796 Views
  • 0 replies
  • 0 kudos

Regarding databricks community edition **bleep** issue

Hi team,I had created a databricks community edition account. I am trying to **bleep** into it and its showing error. I tried changing password but it still doesn't work. Please let me know where the problem is.Thanks & Regards

  • 796 Views
  • 0 replies
  • 0 kudos
Eduard
by New Contributor II
  • 90014 Views
  • 1 replies
  • 1 kudos

Cluster xxxxxxx was terminated during the run.

Hello,I have a problem with the autoscaling of a cluster. Every time the autoscaling is activated I get this error. Does anyone have any idea why this could be?"Cluster xxxxxxx was terminated during the run (cluster state message: Lost communication ...

  • 90014 Views
  • 1 replies
  • 1 kudos
dng
by New Contributor III
  • 7234 Views
  • 6 replies
  • 10 kudos

Databricks JDBC Driver v2.6.29 Cloud Fetch failing for Windows Operating System

Hi everyone, I've been stuck for the past two days on this issue with my Databricks JDBC driver and I'm hoping someone can give me more insight into how to troubleshoot. I am using the Databricks JDBC driver in RStudio and the connection was working ...

  • 7234 Views
  • 6 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

@Debbie Ng​ From your message I see there was a windows update and this failure started. based on the conversation you tried latest version of the driver and still you face the problem. I believe this is something related to the Java version compatib...

  • 10 kudos
5 More Replies
rt-slowth
by Contributor
  • 2328 Views
  • 0 replies
  • 0 kudos

how to build data warehouses and data marts with Python

I don't know how to build data warehouses and data marts with Python. My current development environment is storing data in AWS Redshift, and I can run queries from Databricks against the stacked tables in Redshift.Can you show me some simple code?

  • 2328 Views
  • 0 replies
  • 0 kudos
NathanSundarara
by Contributor
  • 6989 Views
  • 7 replies
  • 2 kudos

Delta live table generate unique integer value (kind of surrogate key) for combination of columns

Hi,we are in process of moving our Datawarehouse from sql server to databricks. we are in process of testing our Dimension Product table which has identity column for referencing in fact table as surrogate key. In Databricks Apply changes SCD type 2 ...

  • 6989 Views
  • 7 replies
  • 2 kudos
Latest Reply
ilarsen
Contributor
  • 2 kudos

Hey.  Yep, xxhash64 (or even just hash) generate numerical values for you.  Combine with abs function to ensure the value is positive.  In our team we used abs(hash()) ourselves... for maybe a day.  Very quickly I observed a collision, and the data s...

  • 2 kudos
6 More Replies
LukaszJ
by Contributor III
  • 20084 Views
  • 6 replies
  • 2 kudos

Resolved! Install ODBC driver by init script

Hello,I want to install ODBC driver (for pyodbc).I have tried to do it using terraform, however I think it is impossible.So I want to do it with Init Script in my cluster. I have the code from the internet and it works when it is on the beginning of ...

  • 20084 Views
  • 6 replies
  • 2 kudos
Latest Reply
MayaBakh_80151
New Contributor II
  • 2 kudos

Actually found this article and using this to migrate my shell script to workspace.Cluster-named and cluster-scoped init script migration notebook - Databricks 

  • 2 kudos
5 More Replies
srDataEngineer
by New Contributor II
  • 5225 Views
  • 4 replies
  • 0 kudos

Resolved! udf not admin user

java.lang.SecurityException: User does not have permission SELECT on anonymous function.

  • 5225 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @data engineer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

  • 0 kudos
3 More Replies
Retko
by Contributor
  • 4216 Views
  • 0 replies
  • 0 kudos

Custom logging using Log4J to a file

Hello,I would like to ask for help setting up the log4j.I want to use log4j (log4j2) to generate custom log messages in my notebook when running.This message would be generated like this: logger.info("some info message") but using log4j not python lo...

  • 4216 Views
  • 0 replies
  • 0 kudos
Frank
by New Contributor III
  • 1057 Views
  • 1 replies
  • 1 kudos

Design Question

we have an application that takes in raw metrics data like key-value pairs. then we split them into four different table like below`key1, min, max, average`Those four tables are later used for dashboard. What are the design recommendations to this? S...

  • 1057 Views
  • 1 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey,I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing d...

  • 1 kudos
Matt_L
by New Contributor III
  • 6576 Views
  • 3 replies
  • 3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

  • 6576 Views
  • 3 replies
  • 3 kudos
Latest Reply
Matt_L
New Contributor III
  • 3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

  • 3 kudos
2 More Replies
r0nald
by New Contributor II
  • 7721 Views
  • 3 replies
  • 1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

  • 7721 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the transform function in sql is not the same as the scala/pyspark counterpart.  It is in fact a map().Here is some interesting infoI agree that functions are essential for code modularity.  Hence I prefer not to use sql but scala/pyspark instead.

  • 1 kudos
2 More Replies
UmaMahesh1
by Honored Contributor III
  • 7452 Views
  • 7 replies
  • 17 kudos

Spark Structured Streaming : Data write is too slow into adls.

I'm a bit new to spark structured streaming stuff so do ask all the relevant questions if I missed any.I have a notebook which consumes the events from a kafka topic and writes those records into adls. The topic is json serialized so I'm just writing...

  • 7452 Views
  • 7 replies
  • 17 kudos
Latest Reply
Miletto
New Contributor II
  • 17 kudos

 

  • 17 kudos
6 More Replies
564824
by New Contributor II
  • 1143 Views
  • 1 replies
  • 1 kudos

Will enabling Unity Catalog affect existing user access and jobs in production?

Hi, at my company, we are using Databricks with AWS IAM identity center as single sign on, I was looking into Unity catalog which seems to offer centralized access but I wanted to know if there will be any downside like loss of existing user profile ...

  • 1143 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

You can look into this doc https://docs.databricks.com/en/data-governance/unity-catalog/migrate.html which have some details about your question here. 

  • 1 kudos
SaraCorralLou
by New Contributor III
  • 6266 Views
  • 7 replies
  • 2 kudos

Bad performance UDFs functions

Hello,I am contacting you because I am having a problem with the performance of my notebooks on databricks.My notebook is written in python (pypark) in it I read a delta table that I copy to a dataframe and do several transformations and create sever...

SaraCorralLou_0-1692357805407.png
  • 6266 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

looping over records is a performance killer.  To be avoided at all costs.beware the for-loop (databricks.com)

  • 2 kudos
6 More Replies
Chris_Shehu
by Valued Contributor III
  • 2957 Views
  • 2 replies
  • 1 kudos

Resolved! Custom Library's(Unity Catalog Enabled Clusters)

I'm trying to use a custom library that I created from a .whl file in the workspace/shared location. The library attaches to the cluster without any issues and I can it when I list the modules using pip. When I try to call the module I get an error t...

  • 2957 Views
  • 2 replies
  • 1 kudos
Latest Reply
Szpila
New Contributor III
  • 1 kudos

Hello Guys,I am working on the project where we need to use spark-excel library (Maven) in order to ingest data from excel files. As those 3rd party library are not allowed on shared cluster, do you have any workaround other then using pandas for exa...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels