cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

183530
by New Contributor III
  • 1626 Views
  • 2 replies
  • 2 kudos

How to search an array of words in a text field

Example:TABLE 1FIELD_TEXTI like salty food and Italian foodI have Italian foodbread, rice and beansmexican foodscoke, spritearray['italia', 'mex','coke']match TABLE1 X ARRAYResults:I like salty food and Italian foodI have Italian foodmexican foodsis ...

  • 1626 Views
  • 2 replies
  • 2 kudos
Latest Reply
Meredithharper
New Contributor II
  • 2 kudos

Yes, you can do it in SQL with LIKE or IN and in PySpark using array contains, ideal for filtering Words like halal catering Barcelona, catering, and many more

  • 2 kudos
1 More Replies
guostong
by New Contributor III
  • 5616 Views
  • 1 replies
  • 1 kudos

How to update the items in array of struct column with sql

create table test.json_test_01 ( id int, description string, struct_address STRUCT<street_number: STRING, street_name: STRING, city: STRING, province: STRING>, arrary_phone ARRAY<STRUCT<phone_number: STRING, phone_type: STRING>> );   insert into ...

  • 5616 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Richard Guo​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
pt-jake
by New Contributor II
  • 4327 Views
  • 1 replies
  • 1 kudos

Arrays of complex type always evaluate to ARRAY<STRING>?

Arrays of complex types seemingly always evaluate to ARRAY<STRING>. Therefore, casting or attempting to load JSON data with empty array values fails. For example, attempting to cast a JSON value of {"likes": []...} on load to the following table sche...

  • 4327 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jake Neyer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 1 kudos
Rsa
by New Contributor II
  • 5531 Views
  • 2 replies
  • 2 kudos

Resolved! Error while using Array_contains function in left join condition

'Item_id' is column in array format like ["ba1b-5fbe1547ddd5", "88f9-ac3b93334f69", "8bba-4075a47eb814"] in table1 and table2 has column Id with single value like ba1b-5fbe1547ddd5.While join two table select table1.*,table2.*from table1left join tab...

  • 5531 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Rishabh Shanker​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
1 More Replies
RamyaN
by New Contributor II
  • 3508 Views
  • 2 replies
  • 3 kudos

How to read enum[] (enum of array) datatype from postgres using spark

We are trying to read a column which is enum of array datatype from postgres as string datatype to target. We could able to achieve this by expilcitly using concat function while extracting like belowval jdbcDF3 = spark.read .format("jdbc") .option(...

  • 3508 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You can try custom schema for JDBC read.option("customSchema", "colname STRING")

  • 3 kudos
1 More Replies
Panna
by New Contributor II
  • 1945 Views
  • 1 replies
  • 3 kudos

Is there only one element type option for an array?

I'm creating an array which contains both string and double, just wondering if I can have multiple element type options for one array column? Thanks

  • 1945 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Elements of any type that share a least common type can be used, https://docs.databricks.com/sql/language-manual/functions/array.html#arguments.Please correct me if I misunderstood to understand the requirement.

  • 3 kudos
KNP
by New Contributor
  • 3330 Views
  • 2 replies
  • 0 kudos

passing array as a parameter to PandasUDF

Hi Team,My python dataframe is as below.The raw data is quite a long series of approx 5000 numbers. My requirement is to go through each row in RawData column and calculate 2 metrics. I have created a function in Python and it works absolutely fine. ...

image
  • 3330 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hello @Kausthub NP​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 0 kudos
1 More Replies
dtabass
by New Contributor III
  • 3279 Views
  • 3 replies
  • 0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

  • 3279 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Michael Carey​ Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

  • 0 kudos
2 More Replies
aschiff
by Contributor II
  • 36659 Views
  • 24 replies
  • 4 kudos

Resolved! Extracting data from a multi-layered JSON object

I have a table in databricks called owner_final_delta with a column called contacts that holds data with this structure:array<struct<address:struct<apartment:string,city:string,house:string,poBox:string,sources:array<string>,state:string,street:strin...

  • 36659 Views
  • 24 replies
  • 4 kudos
Latest Reply
Dooley
Valued Contributor II
  • 4 kudos

Have you tried to use the explode function for that column with the array?df.select(explode(df.emailId).alias("email")).show()----------Also, if you are a SQL lover, you can instead use the Databricks syntax for querying a JSON seen here.

  • 4 kudos
23 More Replies
Raymond_Garcia
by Contributor II
  • 6095 Views
  • 1 replies
  • 1 kudos

Resolved! query array in SQL

Hello I have a databricks question I was not able to answer myselfI have this queryselect count(*) from tablewhere object[0].value is not null and object[0].value.value1 = "s"and created_year = 2022 and created_month = 7 and created_day = 4you can se...

  • 6095 Views
  • 1 replies
  • 1 kudos
Latest Reply
Raymond_Garcia
Contributor II
  • 1 kudos

SELECT count(*)FROM ( SELECT explode(mmycolumn) FROM table WHERE created_year = 2022 and created_month = 7 and created_day = 5)WHERE col.field is not null and col.field.field! = "signal"

  • 1 kudos
SailajaB
by Valued Contributor III
  • 11247 Views
  • 5 replies
  • 12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

  • 11247 Views
  • 5 replies
  • 12 kudos
Latest Reply
SailajaB
Valued Contributor III
  • 12 kudos

@Hubert Dudek​ , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = []  for row in dfJSONString:    # ==== Unflatten the JSON string ==== #    js...

  • 12 kudos
4 More Replies
TS
by New Contributor III
  • 720 Views
  • 0 replies
  • 1 kudos

Is there a better way for this matching?

I have an array:var arg = condColumnsKeyswith the elementsarg: Array[String] = Array(LOT_PREFIX, PS_NAME_BOOK_TEMPLATE_NAME, PS_NAME_PAGE_NAME, PS_NAME_FIELD_NAME)Desired outcome is to get the string "LOT_PREFIX" and store it in var ccLotPrefixMy fir...

  • 720 Views
  • 0 replies
  • 1 kudos
Labels