cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

db-avengers2rul
by Contributor II
  • 5983 Views
  • 2 replies
  • 0 kudos

Resolved! delete files from the directory

Is there a way to delete recursively files using a command in notebookssince in the below directory i have many combination of files like .txt,,png,.jpg but i only want to delete files with .csv example dbfs:/FileStore/.csv*

  • 5983 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Hi @Rakesh Reddy Gopidi​ You can use the os module to iterate over a directory.By using a loop over the directory, you can check what the file ends with using .endsWith(".csv).After fetching all the files, you can remove it. Hope this helps..Cheers.

  • 0 kudos
1 More Replies
UmaMahesh1
by Honored Contributor III
  • 3814 Views
  • 2 replies
  • 15 kudos

Resolved! Pyspark dataframe column comparison

I have a string column which is a concatenation of elements with a hyphen as follows. Let 3 values from that column looks like below, Row 1 - A-B-C-D-E-FRow 2 - A-B-G-C-D-E-FRow 3 - A-B-G-D-E-FI want to compare 2 consecutive rows and create a column ...

  • 3814 Views
  • 2 replies
  • 15 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 15 kudos

Hi,I think you can follow these steps:1. Use window function to create a new column by shifting, then your df will look like thisid value lag1 A-B-C-D-E-F null2 A-B-G-C-D-E-F A-B-C-D-E-F3 A-B-G-D-E-F ...

  • 15 kudos
1 More Replies
cozos
by New Contributor III
  • 3439 Views
  • 6 replies
  • 5 kudos

What does "ScalaDriverLocal: User Code Compile error" mean?

22/11/30 01:45:31 WARN ScalaDriverLocal: loadLibraries: Libraries failed to be installed: Set()   22/11/30 01:50:14 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE) 22/11/30 01:50:15 WARN ScalaDriverLocal: User Code Compile err...

  • 3439 Views
  • 6 replies
  • 5 kudos
Latest Reply
cozos
New Contributor III
  • 5 kudos

Hi @Werner Stinckens​ thanks for the help. Unfortunately I don't think its so simple - I do have a JAR that I submitted as a Databricks JAR task, and the JAR does have the org.apache.beam class: I guess what I'm trying to understand is what does Scal...

  • 5 kudos
5 More Replies
vr
by Contributor
  • 6004 Views
  • 12 replies
  • 9 kudos

Why is execution too fast?

I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...

stage stats DAG
  • 6004 Views
  • 12 replies
  • 9 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 9 kudos

Hi @Vladimir Ryabtsev​, We haven’t heard from you since the last response from ​​@Uma Maheswara Rao Desula​, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the community, as it c...

  • 9 kudos
11 More Replies
jd1
by New Contributor II
  • 567 Views
  • 1 replies
  • 3 kudos

Hello, When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path ...

Hello,When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path will add the full path to the cell in the notebook. This is annoying behaviour, since you end up with...

  • 567 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Someone heard you In the experimental Monaco editor, I found this particular issue not appearing.

  • 3 kudos
Adig
by New Contributor III
  • 3071 Views
  • 6 replies
  • 17 kudos

Generate Group Id for similar deduplicate values of a dataframe column.

Inupt DataFrame'''KeyName KeyCompare SourcePapasMrtemis PapasMrtemis S1PapasMrtemis Pappas, Mrtemis S1Pappas, Mrtemis PapasMrtemis S2Pappas, Mrtemis Pappas, Mrtemis S2Mich...

  • 3071 Views
  • 6 replies
  • 17 kudos
Latest Reply
VaibB
Contributor
  • 17 kudos

Create a UDF where you pass all the fields as Input that you need to take into consideration for a unique row. Create a list by splitting based on ' ' or ','. sort the list and concat all the elements of the list to derive "new field". Calculate dens...

  • 17 kudos
5 More Replies
stinodego
by New Contributor III
  • 2631 Views
  • 8 replies
  • 19 kudos

Python job run error messages are unreadable

This has been going on for some time now; all errors look like this (note the weird `[0;34m` marks everywhere). How can we fix this?We're not doing anything crazy, this is just the latest runtime with pretty much the simplest possible hello world pro...

image
  • 2631 Views
  • 8 replies
  • 19 kudos
Latest Reply
VaibB
Contributor
  • 19 kudos

Have you tried detaching and reattaching the notebook? Or Cluster restart? Did you check you are not importing any specific library someone else with the right access might have installed some library with install to all clusters as checked.

  • 19 kudos
7 More Replies
cmilligan
by Contributor II
  • 4984 Views
  • 2 replies
  • 6 kudos

Resolved! How to go up two folders using relative path in %run?

I'm wanting to store a notebook with functions two folders up from the current notebook. I know that I can start the path with ../ to go up one folder but when I've tried .../ it won't go up two folders. Is there a way to do this?

  • 4984 Views
  • 2 replies
  • 6 kudos
Latest Reply
VaibB
Contributor
  • 6 kudos

In order to access a notebook in the current folder use ../notebook_2to go 2 folders up and access (say notebook "secret") use ../../secret

  • 6 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 654 Views
  • 1 replies
  • 6 kudos

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis​ @Nadia Elsayed​ @Vidula Khanna​ @Jose Gonzalez​ @Harshjot Singh​ you al...

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis​ @Nadia Elsayed​ @Vidula Khanna​ @Jose Gonzalez​ @Harshjot Singh​ you all are fabulous bunch of teams and very helpful.Thanks very much for your responses when asked. Happy...

  • 654 Views
  • 1 replies
  • 6 kudos
Latest Reply
Harshjot
Contributor III
  • 6 kudos

@Smitha Nelapati​ so happy to see that the issue is resolved

  • 6 kudos
Erik
by Valued Contributor II
  • 10012 Views
  • 15 replies
  • 9 kudos

Grafana + databricks = True?

We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find any way to con...

  • 10012 Views
  • 15 replies
  • 9 kudos
Latest Reply
cold_river_22
New Contributor II
  • 9 kudos

There is now an Open-Source Grafana Databricks backend plugin available.https://github.com/mullerpeter/databricks-grafana

  • 9 kudos
14 More Replies
vr
by Contributor
  • 3608 Views
  • 5 replies
  • 6 kudos

Resolved! How to avoid trimming in EXPLAIN?

I am looking on EXPLAIN EXTENDED plan for a statement.In == Physical Plan == section, I go down to FileScan node and see a lot of ellipsis, like +- FileScan parquet schema.table[Time#8459,TagName#8460,Value#8461,Quality#8462,day#8...

  • 3608 Views
  • 5 replies
  • 6 kudos
Latest Reply
SS2
Valued Contributor
  • 6 kudos

I also faced the same ​

  • 6 kudos
4 More Replies
Retko
by Contributor
  • 12146 Views
  • 5 replies
  • 8 kudos

Databricks notebook sometime takes too long to run query (even on empty table)

Hi,sometime I notice that running a query takes too long - even simple queries - and next time when I run same query it runs much faster. I have cluster running (DBR 10.4 LTS • 5 workers) and it has constantly several workers.An Example of query is s...

  • 12146 Views
  • 5 replies
  • 8 kudos
Latest Reply
j_afanador
Contributor II
  • 8 kudos

Probably the cluster is always in use and the query always falls into the processing query, or the cluster auto stops every time that you use it.

  • 8 kudos
4 More Replies
augustin
by New Contributor II
  • 3684 Views
  • 5 replies
  • 5 kudos

Mount an uncrypted AWS EFS in AWS Databricks

Hi,I want to mount an uncrypted AWS EFS in AWS Databricks. When I do:mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-abcdef.efs.region.amazonaws.com:/ /mnt/efs-uncryptedI get this error:mount.nfs4: moun...

  • 3684 Views
  • 5 replies
  • 5 kudos
Latest Reply
Andrei_Radulesc
Contributor III
  • 5 kudos

"To support NFS under LXC, some of the apparmor protections need to be lifted." (see https://theorangeone.net/posts/mount-nfs-inside-lxc/)

  • 5 kudos
4 More Replies
sqlshep
by New Contributor III
  • 2675 Views
  • 5 replies
  • 2 kudos
  • 2675 Views
  • 5 replies
  • 2 kudos
Latest Reply
sqlshep
New Contributor III
  • 2 kudos

Its broken again, i am seeing this several times a week, and it is offline for hours at a time.

  • 2 kudos
4 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels