cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

ImAbhishekTomar
New Contributor III

Hi,

Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.

COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network  join p in i.provider

Source JSON

 {
     "ReportId":"F0001",
     "ReportName":"ALYX_HLT",
     "in_network":[
      {"provider":[1,2,3,4],"price":10},
      {"provider":[1004],"price":100.2},
      {"provider":[39,52],"price":3}
     ]
 }

Expected Output

 [
  { "ReportId":"F0001","ReportName":"ALYX_HLT","provider":100,"price":10},
  { "ReportId":"F0001","ReportName":"ALYX_HLT","provider":200,"price":10},
  { "ReportId":"F0001","ReportName":"ALYX_HLT","provider":300,"price":1.3},
  { "ReportId":"F0001","ReportName":"ALYX_HLT","provider":400,"price":23.1},
  { "ReportId":"F0001","ReportName":"ALYX_HLT","provider":500,"price":23.1}
 ]

https://docs.microsoft.com/en-us/answers/questions/821351/trying-to-flattren-my-json-using-cosmosdb-...

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Hi @Abhishek Tomar​ , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-spark

If you want to have JSON imported directly by databricks/spark, please go with the below solution:

SELECT
  ReportId,
  ReportName,
  in_network.price as price,
  Explode(in_network.provider) as provider
From
(SELECT
  ReportId,
  ReportName,
  Explode(in_network) as in_network
FROM
  my_json);

image.png 

View solution in original post

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III

Hi @Abhishek Tomar​ , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-spark

If you want to have JSON imported directly by databricks/spark, please go with the below solution:

SELECT
  ReportId,
  ReportName,
  in_network.price as price,
  Explode(in_network.provider) as provider
From
(SELECT
  ReportId,
  ReportName,
  Explode(in_network) as in_network
FROM
  my_json);

image.png 

Hi @Abhishek Tomar​ ,  Please let us know if @Hubert Dudek​ 's answer helps, or we'll find another explanation for you.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.