Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Example:TABLE 1FIELD_TEXTI like salty food and Italian foodI have Italian foodbread, rice and beansmexican foodscoke, spritearray['italia', 'mex','coke']match TABLE1 X ARRAYResults:I like salty food and Italian foodI have Italian foodmexican foodsis ...
Yes, you can do it in SQL with LIKE or IN and in PySpark using array contains, ideal for filtering Words like halal catering Barcelona, catering, and many more
Arrays of complex types seemingly always evaluate to ARRAY<STRING>. Therefore, casting or attempting to load JSON data with empty array values fails. For example, attempting to cast a JSON value of {"likes": []...} on load to the following table sche...
Hi @Jake Neyer Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...
'Item_id' is column in array format like ["ba1b-5fbe1547ddd5", "88f9-ac3b93334f69", "8bba-4075a47eb814"] in table1 and table2 has column Id with single value like ba1b-5fbe1547ddd5.While join two table select table1.*,table2.*from table1left join tab...
Hi @Rishabh Shanker Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...
We are trying to read a column which is enum of array datatype from postgres as string datatype to target. We could able to achieve this by expilcitly using concat function while extracting like belowval jdbcDF3 = spark.read .format("jdbc") .option(...
Elements of any type that share a least common type can be used, https://docs.databricks.com/sql/language-manual/functions/array.html#arguments.Please correct me if I misunderstood to understand the requirement.
Hi Team,My python dataframe is as below.The raw data is quite a long series of approx 5000 numbers. My requirement is to go through each row in RawData column and calculate 2 metrics. I have created a function in Python and it works absolutely fine. ...
Hello @Kausthub NP Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...
The following doesn't work for me:%sql
SELECT user_id, array_size(education) AS edu_cnt
FROM users
ORDER BY edu_cnt DESC
LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...
Hey there @Michael Carey Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!
I have a table in databricks called owner_final_delta with a column called contacts that holds data with this structure:array<struct<address:struct<apartment:string,city:string,house:string,poBox:string,sources:array<string>,state:string,street:strin...
Have you tried to use the explode function for that column with the array?df.select(explode(df.emailId).alias("email")).show()----------Also, if you are a SQL lover, you can instead use the Databricks syntax for querying a JSON seen here.
Hello I have a databricks question I was not able to answer myselfI have this queryselect count(*) from tablewhere object[0].value is not null and object[0].value.value1 = "s"and created_year = 2022 and created_month = 7 and created_day = 4you can se...
SELECT count(*)FROM ( SELECT explode(mmycolumn) FROM table WHERE created_year = 2022 and created_month = 7 and created_day = 5)WHERE col.field is not null and col.field.field! = "signal"
Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...
@Hubert Dudek , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = [] for row in dfJSONString: # ==== Unflatten the JSON string ==== # js...
I have an array:var arg = condColumnsKeyswith the elementsarg: Array[String] = Array(LOT_PREFIX, PS_NAME_BOOK_TEMPLATE_NAME, PS_NAME_PAGE_NAME, PS_NAME_FIELD_NAME)Desired outcome is to get the string "LOT_PREFIX" and store it in var ccLotPrefixMy fir...