cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

weldermartins
by Honored Contributor
  • 5637 Views
  • 6 replies
  • 10 kudos

Resolved! Spark - API Jira

Hello guys. I use pyspark in my daily life. A demand has arisen to collect information in Jira. I was able to do this via Talend ESB, but I wouldn't want to use different tools to get the job done. Do you have any example of how to extract data from ...

  • 5637 Views
  • 6 replies
  • 10 kudos
Latest Reply
Marty73
New Contributor II
  • 10 kudos

Hi,There is also a new Databricks for Jira add-on on the Atlassian Marketplace. It is easy to setup and exports are directly created within Jira. They can be one-time, scheduled, or real-time. It can also export additional Jira data such as Assets, C...

  • 10 kudos
5 More Replies
greyfine
by New Contributor II
  • 9107 Views
  • 8 replies
  • 7 kudos

Resolved! Hi Everyone , I was wondering if it is possible to have alerts set up on query level for pyspark notebooks that are run on schedule in databricks so if we have some expected result from it we can receive a mail alert ?

In Above you can see we have 3 workspaces - we have the alert option available in the sql workspace but not in our data science and engineering space , anyway we can incorporate this in our DS and Engineering space ?

image.png
  • 9107 Views
  • 8 replies
  • 7 kudos
Latest Reply
JKR
Contributor
  • 7 kudos

How can I receive call on teams/number/slack if any jobs fails?

  • 7 kudos
7 More Replies
Mado
by Valued Contributor II
  • 11508 Views
  • 4 replies
  • 3 kudos

Resolved! Using "Select Expr" and "Stack" to Unpivot PySpark DataFrame doesn't produce expected results

I am trying to unpivot a PySpark DataFrame, but I don't get the correct results.Sample dataset:# Prepare Data data = [("Spain", 101, 201, 301), \ ("Taiwan", 102, 202, 302), \ ("Italy", 103, 203, 303), \ ("China", 104, 204, 304...

image image
  • 11508 Views
  • 4 replies
  • 3 kudos
Latest Reply
lukeoz
New Contributor II
  • 3 kudos

You can also use backticks around the column names that would otherwise be recognised as numbers.from pyspark.sql import functions as F   unpivotExpr = "stack(3, '2018', `2018`, '2019', `2019`, '2020', `2020`) as (Year, CPI)" unPivotDF = df.select("C...

  • 3 kudos
3 More Replies
Braxx
by Contributor II
  • 7489 Views
  • 6 replies
  • 2 kudos

Resolved! issue with group by

I am trying to group by a data frame by "PRODUCT", "MARKET" and aggregate the rest ones specified in col_list. There are much more column in the list but for simplification lets take the example below.Unfortunatelly I am getting the error:"TypeError:...

  • 7489 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ralphma
New Contributor II
  • 2 kudos

The error you're encountering, "TypeError: unhashable type: 'Column'," is likely due to the way you're defining exprs. In Python, sets use curly braces {}, but they require their items to be hashable. Since the result of sum(x).alias(x) is not hashab...

  • 2 kudos
5 More Replies
avnish26
by New Contributor III
  • 9405 Views
  • 4 replies
  • 8 kudos

Spark 3.3.0 connect kafka problem

I am trying to connect to my Kafka from spark but getting an error:Kafka Version: 2.4.1Spark Version: 3.3.0I am using jupyter notebook to execute the pyspark code below:```from pyspark.sql.functions import *from pyspark.sql.types import *#import libr...

  • 9405 Views
  • 4 replies
  • 8 kudos
Latest Reply
jose_gonzalez
Moderator
  • 8 kudos

Hi @avnish26, did you added the Jar files to the cluster? do you still have issues? please let us know

  • 8 kudos
3 More Replies
Nastasia
by New Contributor II
  • 3586 Views
  • 3 replies
  • 1 kudos

Why is Spark creating multiple jobs for one action?

I noticed that when launching this bunch of code with only one action, I have three jobs that are launched.from pyspark.sql import DataFrame from pyspark.sql.types import StructType, StructField, StringType from pyspark.sql.functions import avgdata:...

https___i.stack.imgur.com_xfYDe.png LTHBM DdfHN
  • 3586 Views
  • 3 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

The above code will create two jobs.JOB-1. dataframe: DataFrame = spark.createDataFrame(data=data,schema=schema)The createDataFrame function is responsible for inferring the schema from the provided data or using the specified schema.Depending on the...

  • 1 kudos
2 More Replies
yutaro_ono1_558
by New Contributor II
  • 9394 Views
  • 5 replies
  • 1 kudos

Resolved! How to read data from S3 Access Point by pyspark?

I want to read data from s3 access point.I successfully accessed using boto3 client to data through s3 access point.s3 = boto3.resource('s3')ap = s3.Bucket('arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 Access Point name]')for obj in ap.object...

  • 9394 Views
  • 5 replies
  • 1 kudos
Latest Reply
shrestha-rj
New Contributor II
  • 1 kudos

I'm reaching out to seek assistance as I navigate an issue. Currently, I'm trying to read JSON files from an S3 Multi-Region Access Point using a Databricks notebook. While reading directly from the S3 bucket presents no challenges, I encounter an "j...

  • 1 kudos
4 More Replies
Braxx
by Contributor II
  • 10315 Views
  • 3 replies
  • 1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

  • 10315 Views
  • 3 replies
  • 1 kudos
Latest Reply
Invasioned
New Contributor II
  • 1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

  • 1 kudos
2 More Replies
DJey
by New Contributor III
  • 8618 Views
  • 5 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 8618 Views
  • 5 replies
  • 2 kudos
Latest Reply
DJey
New Contributor III
  • 2 kudos

@Vidula Khanna​  Enabling the below property resolved my issue:spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled",True) Thanks v much!

  • 2 kudos
4 More Replies
RantoB
by Valued Contributor
  • 19811 Views
  • 8 replies
  • 4 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 19811 Views
  • 8 replies
  • 4 kudos
Latest Reply
MartinIsti
New Contributor III
  • 4 kudos

I know it's a 2 years old thread but I needed to find a solution to this very thing today. I had one notebook using SparkContextfrom pyspark import SparkFilesfrom pyspark.sql.functions import *sc.addFile(url) But according to the runtime 14 release n...

  • 4 kudos
7 More Replies
pvm26042000
by New Contributor III
  • 3665 Views
  • 4 replies
  • 2 kudos

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

  • 3665 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sai1098
New Contributor II
  • 2 kudos

Vectorized Pandas UDFs offer improved performance compared to standard PySpark UDFs by leveraging the power of Pandas and operating on entire columns of data at once, rather than row by row.They provide a more intuitive and familiar programming inter...

  • 2 kudos
3 More Replies
lnights
by New Contributor II
  • 4017 Views
  • 5 replies
  • 2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

transactions in azure storage
  • 4017 Views
  • 5 replies
  • 2 kudos
Latest Reply
PetePP
New Contributor II
  • 2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

  • 2 kudos
4 More Replies
AryaMa
by New Contributor III
  • 23664 Views
  • 13 replies
  • 8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

  • 23664 Views
  • 13 replies
  • 8 kudos
Latest Reply
padang
New Contributor II
  • 8 kudos

Sorry, bringing this back up...​from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

  • 8 kudos
12 More Replies
giriraj01234567
by New Contributor II
  • 8584 Views
  • 1 replies
  • 2 kudos

getting error while runction show function

I was using String indexer, while fitting, transforming I didn't get any erro. but While runnign show function I am getting error, I mention the error beloworg.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 45.0 failed...

  • 8584 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Bojja Giri​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 9740 Views
  • 7 replies
  • 11 kudos

Resolved! Unzip Files

Hi all, I am trying to unzip a file in databricks but facing an issue,Please help me if you have any doc or codes to share.

  • 9740 Views
  • 7 replies
  • 11 kudos
Latest Reply
vivek_rawat
New Contributor III
  • 11 kudos

Hey ajay,You can follow this module to unzip your zip file.To give your brief idea about this, it will unzip your file directly into your driver node storage.So If your compressed data is inside DBFS then you first have to move that to drive node and...

  • 11 kudos
6 More Replies
Labels