cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

pyspark - regexp_extract

weldermartins
Honored Contributor

hello everyone, I'm creating a regex expression to fetch only the value of a string, but some values โ€‹โ€‹are negative. I am not able to create the rule to compose the negative value. can you help me?

from pyspark.sql.functions  import regexp_extract
from pyspark.sql.types import StructType,StructField, StringType
 
data = [("01","[$R$-pt-BR] 150.00"),
        ("02", "-[$R$-pt-BR] 379.52" ),
        ("03", "[$R$-pt-BR] 185.16" ),
        ("04", "[$R$-pt-BR] 185.16" ),]
 
schema = StructType([ \
    StructField("id",StringType(),True), \
    StructField("description",StringType(),True), 
  ])
 
df = spark.createDataFrame(data=data,schema=schema)
df.display()
 
df1 = df\
.withColumn("value", regexp_extract('description', r"[\d]{1,4}.[\d]{1,4}", 0))
df1.display()

image

1 ACCEPTED SOLUTION

Accepted Solutions

NhatHoang
Valued Contributor II

Hi there,

  1. Create a column to catch the minus "-": pattern is: "^[\-]?"
  2. Create a column to catch the digits that you already done.
  3. Concat these two columns above.

Hope it fit your requirement. ๐Ÿ™‚

View solution in original post

7 REPLIES 7

weldermartins
Honored Contributor

@Werner Stinckensโ€‹ 

can you help me?

NhatHoang
Valued Contributor II

Hi there,

  1. Create a column to catch the minus "-": pattern is: "^[\-]?"
  2. Create a column to catch the digits that you already done.
  3. Concat these two columns above.

Hope it fit your requirement. ๐Ÿ™‚

df1 = df\
.withColumn("value", regexp_extract('description', "[\d]{1,4}.[\d]{1,4}", 0))\
.withColumn("operador", regexp_extract('description', "^[\-]?", 0))\
.withColumn("value2", concat("operador","value"))
df1.display()

image.png@Nhat Hoangโ€‹ , Thanks.

weldermartins
Honored Contributor

I found another solution, but I didn't want to give up on regex. If you find a way, be sure to post. Thanks.

image

-werners-
Esteemed Contributor III

\s*\[[^)]*\] removes the square brackets and everything inside it and the space too (well actually substitutes it with nothing).

https://regex101.com/r/tv9pbJ/1

Haven't checked if spark can do regex substitution.

mcwir
Contributor

its like you need to find this pattern : "^[\-]?"

ErinArmistead
New Contributor II

Have you found the answer? If you are a student in college or school searching for free essay examples online, you may want to visit the website https://writinguniverse.com/free-essay-examples/soccer/ here you will find a vast collection of free essay examples related to various topics, including soccer. These essay examples can be valuable resources to help you complete your essay assignments.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.