cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Adig
by New Contributor III
  • 5026 Views
  • 5 replies
  • 15 kudos

Generate Group Id for similar deduplicate values of a dataframe column.

Inupt DataFrame'''KeyName KeyCompare SourcePapasMrtemis PapasMrtemis S1PapasMrtemis Pappas, Mrtemis S1Pappas, Mrtemis PapasMrtemis S2Pappas, Mrtemis Pappas, Mrtemis S2Mich...

  • 5026 Views
  • 5 replies
  • 15 kudos
Latest Reply
VaibB
Contributor
  • 15 kudos

Create a UDF where you pass all the fields as Input that you need to take into consideration for a unique row. Create a list by splitting based on ' ' or ','. sort the list and concat all the elements of the list to derive "new field". Calculate dens...

  • 15 kudos
4 More Replies
stinodego
by New Contributor III
  • 4762 Views
  • 8 replies
  • 19 kudos

Python job run error messages are unreadable

This has been going on for some time now; all errors look like this (note the weird `[0;34m` marks everywhere). How can we fix this?We're not doing anything crazy, this is just the latest runtime with pretty much the simplest possible hello world pro...

image
  • 4762 Views
  • 8 replies
  • 19 kudos
Latest Reply
VaibB
Contributor
  • 19 kudos

Have you tried detaching and reattaching the notebook? Or Cluster restart? Did you check you are not importing any specific library someone else with the right access might have installed some library with install to all clusters as checked.

  • 19 kudos
7 More Replies
cmilligan
by Contributor II
  • 8739 Views
  • 2 replies
  • 6 kudos

Resolved! How to go up two folders using relative path in %run?

I'm wanting to store a notebook with functions two folders up from the current notebook. I know that I can start the path with ../ to go up one folder but when I've tried .../ it won't go up two folders. Is there a way to do this?

  • 8739 Views
  • 2 replies
  • 6 kudos
Latest Reply
VaibB
Contributor
  • 6 kudos

In order to access a notebook in the current folder use ../notebook_2to go 2 folders up and access (say notebook "secret") use ../../secret

  • 6 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 1131 Views
  • 1 replies
  • 6 kudos

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis​ @Nadia Elsayed​ @Vidula Khanna​ @Jose Gonzalez​ @Harshjot Singh​ you al...

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis​ @Nadia Elsayed​ @Vidula Khanna​ @Jose Gonzalez​ @Harshjot Singh​ you all are fabulous bunch of teams and very helpful.Thanks very much for your responses when asked. Happy...

  • 1131 Views
  • 1 replies
  • 6 kudos
Latest Reply
Harshjot
Contributor III
  • 6 kudos

@Smitha Nelapati​ so happy to see that the issue is resolved

  • 6 kudos
Erik
by Valued Contributor III
  • 15617 Views
  • 12 replies
  • 8 kudos

Grafana + databricks = True?

We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find any way to con...

  • 15617 Views
  • 12 replies
  • 8 kudos
Latest Reply
cold_river_22
New Contributor II
  • 8 kudos

There is now an Open-Source Grafana Databricks backend plugin available.https://github.com/mullerpeter/databricks-grafana

  • 8 kudos
11 More Replies
vr
by Contributor
  • 6225 Views
  • 5 replies
  • 6 kudos

Resolved! How to avoid trimming in EXPLAIN?

I am looking on EXPLAIN EXTENDED plan for a statement.In == Physical Plan == section, I go down to FileScan node and see a lot of ellipsis, like +- FileScan parquet schema.table[Time#8459,TagName#8460,Value#8461,Quality#8462,day#8...

  • 6225 Views
  • 5 replies
  • 6 kudos
Latest Reply
SS2
Valued Contributor
  • 6 kudos

I also faced the same ​

  • 6 kudos
4 More Replies
Retko
by Contributor
  • 20895 Views
  • 5 replies
  • 8 kudos

Databricks notebook sometime takes too long to run query (even on empty table)

Hi,sometime I notice that running a query takes too long - even simple queries - and next time when I run same query it runs much faster. I have cluster running (DBR 10.4 LTS • 5 workers) and it has constantly several workers.An Example of query is s...

  • 20895 Views
  • 5 replies
  • 8 kudos
Latest Reply
j_afanador
Contributor II
  • 8 kudos

Probably the cluster is always in use and the query always falls into the processing query, or the cluster auto stops every time that you use it.

  • 8 kudos
4 More Replies
augustin
by New Contributor II
  • 5527 Views
  • 5 replies
  • 5 kudos

Mount an uncrypted AWS EFS in AWS Databricks

Hi,I want to mount an uncrypted AWS EFS in AWS Databricks. When I do:mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-abcdef.efs.region.amazonaws.com:/ /mnt/efs-uncryptedI get this error:mount.nfs4: moun...

  • 5527 Views
  • 5 replies
  • 5 kudos
Latest Reply
Andrei_Radulesc
Contributor III
  • 5 kudos

"To support NFS under LXC, some of the apparmor protections need to be lifted." (see https://theorangeone.net/posts/mount-nfs-inside-lxc/)

  • 5 kudos
4 More Replies
sqlshep
by New Contributor III
  • 4192 Views
  • 3 replies
  • 1 kudos
  • 4192 Views
  • 3 replies
  • 1 kudos
Latest Reply
sqlshep
New Contributor III
  • 1 kudos

Its broken again, i am seeing this several times a week, and it is offline for hours at a time.

  • 1 kudos
2 More Replies
hitesh1
by New Contributor III
  • 8657 Views
  • 1 replies
  • 5 kudos

java.util.NoSuchElementException: key not found

Hello,We are using a Azure Databricks with Standard DS14_V2 Cluster with Runtime 9.1 LTS, Spark 3.1.2 and Scala 2.12 and facing the below issue frequently when running our ETL pipeline. As part of the operation that is failing there are several joins...

  • 8657 Views
  • 1 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

Hey man,Please use these configuration in your cluster and it will work,spark.sql.storeAssignmentPolicy LEGACYspark.sql.parquet.binaryAsString truespark.speculation falsespark.sql.legacy.timeParserPolicy LEGACYif it wont work let me know what problem...

  • 5 kudos
Jack
by New Contributor II
  • 8618 Views
  • 1 replies
  • 1 kudos

Python: Generate new dfs from a list of dataframes using for loop

I have a list of dataframes (for this example 2) and want to apply a for-loop to the list of frames to generate 2 new dataframes. To start, here is my starting dataframe called df_final:First, I create 2 dataframes: df2_b2c_fast, df2_b2b_fast:for x i...

df_long view
  • 8618 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

thanks

  • 1 kudos
isaac_gritz
by Databricks Employee
  • 1799 Views
  • 1 replies
  • 6 kudos

Databricks Security Review

Conducting a security review or vendor assessment of Databricks and looking to learn more about our security features, compliance information, and privacy policies?You can find the latest on Databricks security features, architecture, compliance and ...

  • 1799 Views
  • 1 replies
  • 6 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 6 kudos

thanks man

  • 6 kudos
SRK
by Contributor III
  • 3435 Views
  • 3 replies
  • 5 kudos

Resolved! I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.

I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.Following is the issue:java.lang.IllegalArgumentException:...

  • 3435 Views
  • 3 replies
  • 5 kudos
Latest Reply
SRK
Contributor III
  • 5 kudos

I got the issue resolved. The issues was by mistake we have duplicate columns in the schema files. Because of that it was showing that error. However, the error is totally mis-leading, that's why didn't able to rectify it.

  • 5 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 1775 Views
  • 2 replies
  • 12 kudos

Resolved! How to get list of users who created the tables in different workspaces and the operations they have done.

Hi,I have 10 workspaces linked to different departments. We have overall 4 users doing some activity on these 10 workspaces . I want to get the list of users who are all operating on which tables and what operation they have performed and all in all ...

  • 1775 Views
  • 2 replies
  • 12 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 12 kudos

Hi Ranjit,for tablets, I believe it's hard but if you want to combine all 10 workspaces you can use the databricks API for cluster lists https://docs.databricks.com/dev-tools/api/latest/index.htmland then you can check their IAM roles to understand w...

  • 12 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels