cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lmcglone
by New Contributor II
  • 6432 Views
  • 2 replies
  • 3 kudos

Comparing 2 dataframes and create columns from values within a dataframe

Hi,I have a dataframe that has name and companyfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()columns = ["company","name"]data = [("company1", "Jon"), ("company2", "Steve"), ("company1", "...

image
  • 6432 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You need to join and pivotdf .join(df2, on=[df.company == df2.job_company])) .groupBy("company", "name") .pivot("job_company") .count()

  • 3 kudos
1 More Replies
shamly
by New Contributor III
  • 3748 Views
  • 4 replies
  • 2 kudos

How to replace LF and replace with ' ' in csv UTF-16 encoded?

I have tried several code and nothing worked. An extra space or line LF is going to next row in my output. All rows are ending in CRLF, but some rows end in LF and while reading the csv, it is not giving correct output. My csv have double dagger as d...

  • 3748 Views
  • 4 replies
  • 2 kudos
Latest Reply
sher
Valued Contributor II
  • 2 kudos

val df = spark.read.format("csv") .option("header",true) .option("sep","||") .load("file load") display(df)   try this

  • 2 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 2756 Views
  • 3 replies
  • 9 kudos

one of the date datatype format issue in pysaprk

if anyone has encountered this date type format - 6/15/25 12:00 AM could you mention the right formatting to be used in Pyspark.Thanks in advance!

  • 2756 Views
  • 3 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

Without legacy, it will also work.SELECT to_timestamp('6/15/23 12:00 AM', 'M/dd/yy h:mm a')

  • 9 kudos
2 More Replies
andrew0117
by Contributor
  • 8733 Views
  • 1 replies
  • 0 kudos

Resolved! How to read a local file using Databricks( file stored in your own computer)

without uploading the file into dbfs? Thanks!

  • 8733 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

In my opinion, it doesn't make sense, but...you can Mount SMB Azure file share on a Windows Machine https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows and then mount the same folder on databricks using pip install ...

  • 0 kudos
Mahesh777k
by New Contributor
  • 2559 Views
  • 2 replies
  • 2 kudos

How to delete duplicate tables?

Hi Everyone,Accidently imported duplicate tables, guide me how to delete themusing data bricks community edition  

image
  • 2559 Views
  • 2 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

Hi @Mahesh Babu Uppala​ You can use the following method to delete only the duplicate tables%scala val tables = spark.sql("""SHOW TABLES""").createOrReplaceTempView("tables") val temp_tables = spark.sql("""select tableName from tables where tableName...

  • 2 kudos
1 More Replies
labtech
by Valued Contributor II
  • 4504 Views
  • 4 replies
  • 18 kudos

Resolved! Limit resource when create cluster in Databricks on AWS platform

Hi team,Could you please help check on my case? I always failed at this step Thanks

image
  • 4504 Views
  • 4 replies
  • 18 kudos
Latest Reply
labtech
Valued Contributor II
  • 18 kudos

Thanks all your answer. The problem come from AWS side. Don't know why the first ticket they said that the issue didn't come from AWS

  • 18 kudos
3 More Replies
jamesw
by New Contributor II
  • 2599 Views
  • 1 replies
  • 1 kudos

Ganglia not working with custom container services

Setup:custom docker container starting from the "databricksruntime/gpu-conda:cuda11" base image layer10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)multi-node, p3.8xlarge GPU computeWhen I try to view Ganglia metrics I am met with "502 Bad Gatewa...

image.png image
  • 2599 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 1 kudos

Hi @James W​ , Ganglia is not available for custom docker containers by default. This is a known limitation. However, you can try this experimental support for ganglia in custom DCS:https://github.com/databricks/containers/tree/master/experimental/ub...

  • 1 kudos
Dinu2
by New Contributor III
  • 2102 Views
  • 1 replies
  • 1 kudos

base64 encode is not matching with Oracle's base64 encode

Hi , base64 encode is not matching with Oracle's base64 encode. please see below result. Could anyone help me on this?In Azure Databricks: encoded= base64.b64encode(b'952B8D04E5CFB9BE')output is - b'OTUyQjhEMDRFNUNGQjlCRQ=='In Oracle: select utl_enco...

  • 2102 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

Oracle handles base64 encoding a little bit differently.Please check this link to understand what's the difference:https://dba.stackexchange.com/a/129134

  • 1 kudos
preetham333
by New Contributor II
  • 1618 Views
  • 3 replies
  • 4 kudos

Did not received badge

I have completed my data bricks lakehouse fundamentals but did not received badge. Please help in this issue.

  • 1618 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @kalle preetham​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 4 kudos
2 More Replies
prashant7sep
by New Contributor II
  • 3632 Views
  • 7 replies
  • 5 kudos

Lakehouse Fundamentals Accreditation badge not received

Lakehouse Fundamentals Accreditation badge not receivedI just passed the Lakehouse Fundamentals Accreditation at https://partner-academy.databricks.com/ and I haven't received my badge yet and cant find the credentials. Please advise.

  • 3632 Views
  • 7 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Prashant Singh​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 5 kudos
6 More Replies
Mado
by Valued Contributor II
  • 3715 Views
  • 0 replies
  • 1 kudos

How to get a snapshot of a streaming delta table as a static table?

Hi,Assume that I have a streaming delta table. Is there any way to get snapshot of the streaming table as a static table?Reason is that I need to join this streaming table with a static table by:output = output.join(country_information, ["Country"], ...

  • 3715 Views
  • 0 replies
  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 2222 Views
  • 1 replies
  • 22 kudos

How to process files from the internet in databricks? "spark.sparkContext.addFile" download file to HDFS directory. "SparkFiles.get&quo...

How to process files from the internet in databricks?"spark.sparkContext.addFile" download file to HDFS directory. "SparkFiles.get" return the path and the name. However, as Databricks use the DBFS file system, we need to add the "file:///" prefix to...

image.png
  • 2222 Views
  • 1 replies
  • 22 kudos
Latest Reply
Matt101122
Contributor
  • 22 kudos

@Hubert Dudek​ Do you know if addFile should work with abfss:// path? Trying to add a file from azure data lake with external location in unity catalog.

  • 22 kudos
jeremy1
by New Contributor II
  • 12118 Views
  • 9 replies
  • 7 kudos

DLT and Modularity (best practices?)

I have [very] recently started using DLT for the first time. One of the challenges I have run into is how to include other "modules" within my pipelines. I missed the documentation where magic commands (with the exception of %pip) are ignored and was...

  • 12118 Views
  • 9 replies
  • 7 kudos
Latest Reply
Greg_Galloway
New Contributor III
  • 7 kudos

I like the approach @Arvind Ravish​ shared since you can't currently use %run in DLT pipelines. However, it took a little testing to be clear on how exactly to make it work. First, ensure in the Admin Console that the repos feature is configured as f...

  • 7 kudos
8 More Replies
databicky
by Contributor II
  • 1372 Views
  • 1 replies
  • 1 kudos

how to add the title excelsheet with python

i want to write title with some combination of rows in pandas df, and ​write into excel sheet. i tried some method but i could see styler object is not subscriptable

  • 1372 Views
  • 1 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Mohammed sadamusean​ ,Can you please share the sample input and sample expected output, so that we can try on our end and can let you know.Happy Learning!!

  • 1 kudos
ACP
by New Contributor III
  • 7280 Views
  • 5 replies
  • 0 kudos

Screenshot 2023-01-09 094039

Hey guys,Databricks academy login is not working. I have been trying for the past 1 hour and still doesn't work. It seems to be with the Databricks https certificate being expired but not sure. I'm attaching an image with the error. Any help with thi...

  • 7280 Views
  • 5 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Andre Paiva​ ,Can you please try now I can able to load both customer and partner academy websites, I think the Academy team has fixed the issue.  Happy Learning!!

  • 0 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels