<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Ingest Cosmos Mongo DB data using Databricks by applying filters in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22802#M15685</link>
    <description>&lt;P&gt;Done&lt;/P&gt;</description>
    <pubDate>Thu, 01 Dec 2022 16:58:06 GMT</pubDate>
    <dc:creator>Swapnil1998</dc:creator>
    <dc:date>2022-12-01T16:58:06Z</dc:date>
    <item>
      <title>Ingest Cosmos Mongo DB data using Databricks by applying filters</title>
      <link>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22798#M15681</link>
      <description>&lt;P&gt;I would need to add a filter condition while ingesting data from a Cosmos Mongo DB using Databricks,&lt;/P&gt;&lt;P&gt;I am using the below query to ingest data of a Cosmos Collection:&lt;/P&gt;&lt;P&gt;&lt;B&gt;df = spark.read \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;.format('com.mongodb.spark.sql.DefaultSource') \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;.option('uri', sourceCosmosConnectionString) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;.option('database', sourceCosmosDocument) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;.option('collection', sourceCosmosCollection) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;.load()&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I add a filter here to pick only selected data? Eg: I only want to ingest data where &lt;B&gt;{"type" : "student"}&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would really appreciate if anyone can help in this&lt;/P&gt;&lt;P&gt;I gave a try with the below query but getting error as below:&lt;/P&gt;&lt;P&gt;&lt;B&gt;query = {"type" : "student"}&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;df = spark.read \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;    .format('com.mongodb.spark.sql.DefaultSource') \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;    .option('uri', sourceCosmosConnectionString) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;    .option('database', sourceCosmosDocument) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;    .option('collection', sourceCosmosCollection) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;    .option('pipeline', json.dumps(query)) \&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;    .load()&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Error:&lt;/B&gt;&lt;/P&gt;&lt;P&gt; org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 16.0 failed 4 times, most recent failure: Lost task 0.3 in stage 16.0 (TID 34) (10.139.64.5 executor 0): com.mongodb.MongoCommandException: Command failed with error 40324 (40324): 'Unrecognized pipeline stage name: type' on server xxxxxxx-xxxxx.mongo.cosmos.azure.com:10255. The full response is {"ok": 0.0, "errmsg": "Unrecognized pipeline stage name: type", "code": 40324, "codeName": "40324"}&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2022 10:00:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22798#M15681</guid>
      <dc:creator>Swapnil1998</dc:creator>
      <dc:date>2022-11-11T10:00:47Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest Cosmos Mongo DB data using Databricks by applying filters</title>
      <link>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22800#M15683</link>
      <description>&lt;P&gt;Hi &lt;B&gt;Kaniz Fatma,&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The above-mentioned query is working as expected.&lt;/P&gt;&lt;P&gt;Thanks a lot for the suggestion.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Swapnil&lt;/P&gt;</description>
      <pubDate>Mon, 14 Nov 2022 07:28:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22800#M15683</guid>
      <dc:creator>Swapnil1998</dc:creator>
      <dc:date>2022-11-14T07:28:08Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest Cosmos Mongo DB data using Databricks by applying filters</title>
      <link>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22802#M15685</link>
      <description>&lt;P&gt;Done&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 16:58:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingest-cosmos-mongo-db-data-using-databricks-by-applying-filters/m-p/22802#M15685</guid>
      <dc:creator>Swapnil1998</dc:creator>
      <dc:date>2022-12-01T16:58:06Z</dc:date>
    </item>
  </channel>
</rss>

