cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

pyspark - regexp_extract

weldermartins
Honored Contributor

hello everyone, I'm creating a regex expression to fetch only the value of a string, but some values ​​are negative. I am not able to create the rule to compose the negative value. can you help me?

from pyspark.sql.functions  import regexp_extract
from pyspark.sql.types import StructType,StructField, StringType
 
data = [("01","[$R$-pt-BR] 150.00"),
        ("02", "-[$R$-pt-BR] 379.52" ),
        ("03", "[$R$-pt-BR] 185.16" ),
        ("04", "[$R$-pt-BR] 185.16" ),]
 
schema = StructType([ \
    StructField("id",StringType(),True), \
    StructField("description",StringType(),True), 
  ])
 
df = spark.createDataFrame(data=data,schema=schema)
df.display()
 
df1 = df\
.withColumn("value", regexp_extract('description', r"[\d]{1,4}.[\d]{1,4}", 0))
df1.display()

image

1 ACCEPTED SOLUTION

Accepted Solutions

NhatHoang
Valued Contributor II

Hi there,

  1. Create a column to catch the minus "-": pattern is: "^[\-]?"
  2. Create a column to catch the digits that you already done.
  3. Concat these two columns above.

Hope it fit your requirement. 🙂

View solution in original post

7 REPLIES 7

weldermartins
Honored Contributor

@Werner Stinckens​ 

can you help me?

NhatHoang
Valued Contributor II

Hi there,

  1. Create a column to catch the minus "-": pattern is: "^[\-]?"
  2. Create a column to catch the digits that you already done.
  3. Concat these two columns above.

Hope it fit your requirement. 🙂

df1 = df\
.withColumn("value", regexp_extract('description', "[\d]{1,4}.[\d]{1,4}", 0))\
.withColumn("operador", regexp_extract('description', "^[\-]?", 0))\
.withColumn("value2", concat("operador","value"))
df1.display()

image.png@Nhat Hoang​ , Thanks.

weldermartins
Honored Contributor

I found another solution, but I didn't want to give up on regex. If you find a way, be sure to post. Thanks.

image

-werners-
Esteemed Contributor III

\s*\[[^)]*\] removes the square brackets and everything inside it and the space too (well actually substitutes it with nothing).

https://regex101.com/r/tv9pbJ/1

Haven't checked if spark can do regex substitution.

mcwir
Contributor

its like you need to find this pattern : "^[\-]?"

ErinArmistead
New Contributor II

Have you found the answer? If you are a student in college or school searching for free essay examples online, you may want to visit the website https://writinguniverse.com/free-essay-examples/soccer/ here you will find a vast collection of free essay examples related to various topics, including soccer. These essay examples can be valuable resources to help you complete your essay assignments.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group