mongo pipeline

Problem statement

Yesterday, I had the need of querying a map data structure we store on Mongo without having to do the filtering in-memory post load. A sample data-structure looked as follows;

{
    "_id" : ObjectId("5f0c07a36aea9125527d80f1"),
    "firstName" : "Bruce",
    "lastName" : "Wayne",
    "fileInfoMap" : {
        "5a09fa70-1aa1-4c71-b938-64dde74eba79" : {
            "id" : "5a09fa70-1aa1-4c71-b938-64dde74eba79",
            "fileName" : "test",
            "fileType" : "image/jpeg"
        }
    }
}

What I wanted was the ability to go through the fileInfoMap and get all the records that have the fileType set to a type I pass in.


Stackoverflow and beyond

As many of us developers do, I resorted to Googling for a solution for this problem. Alas, I was not having much success. But it did take me into Mongo’s aggregate pipeline. Sometimes, the answer lies in taking the time to read through the documentation and I must say Mongo documentation is pretty darn good.

After an hour or so of going through how the aggregate pipeline worked and its idiosyncrasies, I finally got to my eureka moment! Loved it so much that I went ahead and created a sample project on GitHub to show how it worked. Go check it out here.


The Solution

Here is the solution if it were run on the mongo CLI. We will then break it down and go through what each step does. Finally, we will see how we will go about doing this with spring mongo data. Sounds good? Ok, let’s get into it!

db.getCollection('customer').aggregate([
 {'$project': { "map": { "$objectToArray": "$fileInfoMap" }, 'firstName': 1, 'lastName': 1}},
 {'$match':  { 'map': { '$elemMatch': { 'v.fileType': 'image/jpeg' } }  }},
{'$project': { "fileInfoMap": { "$arrayToObject": "$map" }, 'firstName': 1, 'lastName': 1 }}])

Let us dissect the first line of our pipeline;

{'$project': { "map": { "$objectToArray": "$fileInfoMap" }, 'firstName': 1, 'lastName': 1}}

This builds up a project of what data we need retrieved. For the field fileInfoMap in our problem statement, we use an in-built function called $objectToArray which will transform our map data-structure into an array. This is the output looks like after the first pipeline stage;

{
    "_id" : ObjectId("5f0c07a36aea9125527d80f1"),
    "firstName" : "Bruce",
    "lastName" : "Wayne",
    "map" : [ 
        {
            "k" : "5a09fa70-1aa1-4c71-b938-64dde74eba79",
            "v" : {
                "id" : "5a09fa70-1aa1-4c71-b938-64dde74eba79",
                "fileName" : "test",
                "fileType" : "image/jpeg"
            }
        }
    ]
}

As you can see, each key-value pair is broken down into one array element with the object within the value part of our map nested inside the “v” attribute.

Next up, we have our match statement added to our pipeline;

 {'$match':  { 'map': { '$elemMatch': { 'v.fileType': 'image/jpeg' } }  }}

This match operator will then go through all the array elements and find the fileType I am looking for.

Finally, it’s about bringing it back to a form that I want to present which is done through another projection operation;

{'$project': { "fileInfoMap": { "$arrayToObject": "$map" }, 'firstName': 1, 'lastName': 1 }}

Which results in the following output;

{
    "_id" : ObjectId("5f0d09ec905aa626276606a4"),
    "firstName" : "Bruce",
    "lastName" : "Wayne",
    "fileInfoMap" : {
        "e1c0f514-fb4a-47f4-8fcc-17ef6d96ab45" : {
            "id" : "e1c0f514-fb4a-47f4-8fcc-17ef6d96ab45",
            "fileName" : "test",
            "fileType" : "image/jpeg"
        }
    }
}

Now that we have covered how we do it via the mongo CLI, lets see how we can achieve the same using Spring data mongo.


Mongo Aggregate with Spring Data Mongo

The MongoTemplate has a method that accepts TypedAggregations. However, I could not find a way to execute the $objectToArray and $arrayToObject operations using that API. So I resorted to the aggregate method which accepts a Bson document which resulted in the following. The full code can be found here.

AggregateIterable<Document>customerAggregateIterable =  mongoTemplate.getDb().getCollection("customer")
                .aggregate(
                        Arrays.asList(
                                Aggregates.project(BsonDocument.parse("{ \"map\": { \"$objectToArray\": \"$fileInfoMap\" }, 'firstName': 1, 'lastName': 1}")),
                                Aggregates.match(BsonDocument.parse(String.format("{ 'map': { '$elemMatch': { 'v.fileType': '%s' } }  }",fileType))),
                                Aggregates.project(BsonDocument.parse("{ \"fileInfoMap\": { \"$arrayToObject\": \"$map\" }, 'firstName': 1, 'lastName': 1 }"))

                        ));

That brings us to the end of this post.

Thank you to everyone who made it to the end and if you see any improvements or suggestions, please leave a comment or a GitHub issue on the repository.

Leave a Reply

Your email address will not be published. Required fields are marked *