Lowercase all values on a key for all documents in MongoDB

Background

Imagine the following document:

{
  "first_name": "John",
  "last_name": "Doe",
  "username": "doetheman"
}

Our team a document like this and a problem; our system is case-sensitive for the value of username. We did lowercase the value on creation and update, but it was in a specific edge case possible to create a user with uppercase letters in the username. We did fix that edge case by adding validation to the dto. It was probably something we should have done from the start but... oh well.

The problem was now how to lowercase the username for all the usernames on the live database who had at least one letter in uppercase.

Solution

There are two ways to do this, one performant way and one slow way. The performant (fast) way is only available for MongoDB v4.2 and above while the slow way is available for all versions of MongoDB.

The slow way (MongoDB v4.0 and below)

Find all documents, loop over them with .forEach and save them one at a time to the database.

db.users.find().forEach(
  function(user) {
    user.UserName = user.username.toLowerCase();
    db.users.save(user);
  }
)

You can speed up the update below by passing a match to .find() so it only finds documents that need to be updated. E.g.: db.users.find({username: /.*[A-Z]+.*/}). This expression will find all users for which the value of the key username contains at least one uppercase character. It will give you an impressive speed increase (just see this benchmark)

The fast way (MongoDB v4.2 and above)

Let the database do the "TheUserName" -> "theusername" transformation.

MongoDB has some built-in transformations you can use. One of them is $toLower, but there is also $toUpper and a lot of other operations you can do inside of an update.

My first try was to do the following:

// Does not work!
db.users.updateMany(
  {},
  { $set: { username: { $toLower: "$username" } } }
);

The reason above does not work is quite simple (if one reads the documentation that is...); $toLower is a Aggregation Pipeline Operator. It is thankfully quite easy to create a pipeline update, just pass the update inside an array.

db.users.updateMany(
  {},
  [ { $set: { username: { $toLower: "$username" } } } ]
);

Tada! You now have a fast and quick way to lowercase all values for a specified key.

I do however recommend you to limit the update with a regex so that only documents that need to be updated are updated. Below is an example.

db.users.updateMany(
  { username: /.*[A-Z]+.*/ },
  [ { $set: { username: { $toLower: "$username" } } } ]
);

Some bench-marking

I did a simple benchmark comparison between using and not using a regex filter.

Test condition

  • 1'000'000 users/documents
  • 5% of the users had a username with at least one uppercase letter
  • Same data for all tests
  • Tested each method 10 times

Result

MatchUpdate TypeAverage timeMedian time
{}Pipeline6.244 s6.249 s
{ username: /.*[A-Z]+.*/}Pipeline2.145 s2.134 s
{}.forEach109.989 s109.989 s
{ username: /.*[A-Z]+.*/}.forEach12.725. s12.687

I did not bother with running the match {} with forEach more than once which is why the average time and the median time are the same.


A warning regarding email as username

Email validation is tricky. For example "hello there"@domain.tld is a valid email, even though a lot of email providers won't let you register such an email. The RFC 5321 specification also states:

The local-part of a mailbox MUST BE treated as case sensitive.

Most (if not almost all?) big email providers do however not differentiate between Name.Name@domain.tld and name.name@domain.tld. So the local-part is in practice case-insensitive, but it is not guaranteed.

Not differentiating between uppercase and lowercase when it comes to email addresses might therefore not be something you want to do, depending on your situation.

You might instead want to create a case-insensitive index (available from MongoDB v3.4 and up). Just remember to also specify the same collation when you search for the user. E.g.:

// Create case-insensitive index
db.users.createIndex(
  { email: 1 },
  { collation: { locale: 'en', strength: 2 }, unique: true }
)

// Case-insensitive search which uses the index
db.users.find( { email: "localpart@domain.tld" } )
  .collation( { locale: 'en', strength: 2 } )

// Case-sensitive search which does not use the index
// => scans all documents
db.users.find( { email: "localpart@domain.tld" } )