An informal recollection of migrating a monorepo from node_modules to Yarn Plug 'n' Play

2023-11-07

Preface

Prior to this project, I didn’t know anything about Yarn or frontend build systems in general. Frontend “stuff” has never been my jam, but improving build times and developer productivity definitely is. With that in mind, please pardon my ignorance if I use some incorrect terminology throughout this article, and please let me know if any edits should be made to improve clarity!

Prior to my involvement, a few people had made multiple partial strides to help get this migration closer to the finish line, but it was clearly going to take some focused effort to connect the dots and get the monorepo to a point where the migration would be truly feasible. Despite the large hurdles in the way, the performance gains and other quality-of-life improvements for developers promised at the end of the migration were too good to leave on the table.

tl;dr

Switching over to Yarn Plug ‘n’ Play from node_modules is an arduous process in a monorepo. All the changes required to reach compatibility with Plug ‘n’ Play need to also support node_modules, as you can’t incrementally switch packages over to Plug ‘n’ Play; the entire repo has to use Plug ‘n’ Play in an all-or-nothing config change. So, you have to constantly switch your local config over to use the pnp nodeLinker, make sure things build and test successfully, then make sure they still work as intended with node_modules. Then, you merge your changes and rinse and repeat as you constantly keep up with package changes coming in from developers. Then, once everything is finally working and building, switch over the repo’s nodeLinker to pnp and hope for the best!

Context and status quo

What is Yarn Plug’n’Play?

Yarn Plug’n’Play, hereby referenced as PnP, works by “[telling] Yarn to generate a single Node.js loader file in place of the typical node_modules folder. This loader file, named .pnp.cjs, contains all information about your project’s dependency tree, informing your tools as to the location of the packages on the disk and letting them know how to resolve require and import calls.”¹

So, instead of resolving imports directly from the node_modules folder where they’ve been fully decompressed and installed, .pnp.cjs contains what is essentially a map from a package reference to its location internally within the zip file of the package within Yarn’s package cache. Here’s a snippet from berry/.pnp.cjs which might better visualize what’s happening:

    ["callsites", [\
      ["npm:3.0.0", {\
        "packageLocation": "./.yarn/cache/callsites-npm-3.0.0-4966cb90f4-40e3cb2027.zip/node_modules/callsites/",\
        "packageDependencies": [\
          ["callsites", "npm:3.0.0"]\
        ],\
        "linkType": "HARD"\
      }]\
    ]],\

When callsites is referenced, Yarn knows how to resolve the import from right within the zip file! No need to decompress and install the cache to the node_modules directory. Cool, but why would this benefit us?

Why would we want to use PnP anyway?

Here’s a gist of what we were dealing with prior to the switch to PnP:

A multi-language monorepo with many frontend services, each with their own package.json, with a root package.json which points to all the workspaces within the repo.
Most services use the same version of common dependencies, but this isn’t always a guarantee. In some services, outdated versions of webpack were being used which didn’t support PnP out of the box. There were also a bunch of other packages which required major version upgrades to get to versions which supported PnP.
The local build environment is a Docker container running through Docker Desktop on macOS.
Build time for one of the larger services would take approximately 13 minutes when using a bind mount from host to container for the container’s working tree. Build time could be noticeably decreased by using a volume instead of a bind mount for the working tree, but this introduced an intermediate “sync” step between host and container for the majority of developer operations (not just builds) in order to get the up-to-date working tree from the host into the container. One of the major downsides of this is slow container creation, resulting in many developers treating their containers as pets, not cattle², understandably so.
- Building the “world” when using a bind mount would result in a node_modules directory of multiple GBs which would have to hop across the host to container boundary during the build, resulting in a ton of file I/O bottleneck. Bind mount performance actually performed worse than volume performance, even with the volume sync step!

By default, we were using a volume for the working tree. We could keep this as-is, but the step of syncing the host’s working tree into the container volume was adding minutes to container creation time and multiple seconds to a lot of common operations. In general, it was a significant developer productivity hindrance.

However, removing that volume and therefore the sync step by using a bind mount instead was not serving developers any better. While container creation and common operations were now “syncless” and therefore gained a bunch of time savings, that time savings was completely eaten up by significantly slower build times.

File I/O across the bind mount simply couldn’t keep up with the build process, as multiple files were constantly being created/referenced/etc. within the host’s node_modules directory. If only there were a way to drastically reduce the amount of file I/O required during the build… what if the entire node_modules directory was replaced with a single file called something like, say, .pnp.cjs? Then, we might be able to use a bind mount and benefit from the performance gains from syncless and have speedy builds not bottlenecked by file I/O!

It was clear that PnP showed a lot of promise in our current setup. The potential developer experience and performance gains were too hard to pass up, we had to try and make this work.

Getting started

Before diving in, there were some guides available on the subject of moving from node_modules to PnP. Some of these guides were internal company runbooks that engineers had put together which outlined what package upgrades and config changes would be necessary in order to get our packages in a PnP-supportable state. This pre-work was invaluable, as I was able to hit the ground running and extend their efforts to cover the entire codebase.

Along with this internal help, Yarn itself provided some “recipes” to help with the transition. I tossed a message in the Yarn discord looking for some more help before I went off and did my own thing:

szehnder — 08/12/2022 9:38 AM 👋 hello! I’m currently working on migrating a monorepo over to using PnP (using yarn v3 fyi). The Hybrid PnP + node_modules mono-repo recipe is helpful, but I’m wondering if there is a recommended path for a hybrid approach where PnP is opt-in, rather than the other way around like the above recipe suggests. Thanks in advance!

The gameplan

I didn’t hear back, so I had to do some thinking about how to go about doing this without causing downtime for our devs. If PnP can’t be opt-in, then the switch to PnP will need to happen in one fell swoop, which is much easier said than done! This is what I came up with:

szehnder — 08/29/2022 11:56 AM After some searching, it seems like there isn’t strong support for linking PnP-based packages to node-modules-based packages, which makes sense. I have instead opted to do the following, in case it helps anyone else:

switch .yarnrc.yml over to nodeLinker:pnp

attempt to yarn install and subsequently build atomic components within the monorepo with PnP

after making necessary tweaks to successfully build, switch back to nodeLinker: node-modules and yarn install and build with the changes I made for PnP, just to verfy it works in node-modules world as well

put up a diff for the atomic component, not adding .pnp.cjs and keeping the nodeLinker as node-modules

once all of these piecemeal changes have been made and committed, I’ll commit the nodeLinker:pnp change and the .pnp.cjs file, and we should ideally be able to make a somewhat painless switch over to PnP

I have written a helper script that toggles my environment between one that uses .pnp.cjs and node_modules/, so i am not spending a ton of time waiting for either to repopulate when switching my nodeLinker between the two

The process outlined above really did end up being the main workflow for handling the migration:

run my pnp-toggle script to switch to nodeLinker: pnp and move node_modules/ out of the way
try and build a package
resolve failures that pop up through a myriad of different fixes
once it built with PnP, run pnp-toggle and try and build it with nodeLinker: node-modules to see if it still worked
merge the changes without switching the repo over to PnP as the nodeLinker

Hindsight?

Thinking more about this now, I wonder if the Yarn-provided recipe could be leveraged in a way where:

the global nodeLinker is set to pnp

every package explicitly sets their linker to node-modules

every package gets added to pnpIgnorePatterns

one-by-one, each package is opted-in to PnP by

removing their local .yarnrc.yml

removing them from pnpIgnorePatterns in the central .yarnrc.yml

getting it building with pnp

One thing that I’m not sure about is how opted-in and opted-out packages would interact with one another. Past self makes it sound like I investigated this and determined that linking PnP-based packages to node-modules-based packages wasn’t supported, but now I don’t remember if/why this is the case. That would be something to investigate further.

The experience

While tedious, the gameplan went off without a hitch. Most of the changes required to get packages building in PnP were typically simple situations where they needed to more explicitly define their dependencies. In cases like these, pretty much every modification was inherently compatible with the node-modules linker, since the properties of the packages weren’t changing, we were just being more explicit about what they needed.

The spiciest modifications required were those which weren’t inherently compatible with both nodeLinkers. An example of this would be when interfacing with supporting tools outside of the Yarn ecosystem which can’t utlize the PnP resolver. For example, various gulp.js workflows referenced filepaths within cached packages. A helper function had to be added to redirect references depending on the linker currently in use. Once the repo was fully migrated to PnP, the helper function was safely removed.

A grab-bag of of other modifications required included:

modifying third-party package dependencies through packageExtensions to add dependencies that they don’t explicitly call out themselves. When first-party packages weren’t defining all their dependencies, we could simply update their package.json with the missing dependencies. packageExtensions is how you handle this for third-party packages.

updating third-party packages to versions that support the pnp nodeLinker, or patch them so that they do. This ranged from minor upgrades which didn’t require any code changes, to things like upgrading major versions of webpack and therefore refactoring a packages’ build.

many other small tweaks that I don’t remember!

By following this process, devs were completely unaffected and didn’t notice any difference while packages were being prepped for PnP. The only thing affected was my MacBook’s CPU³, and periodically my sanity. There were definitely times where I felt like I was playing a serious game of whack-a-mole with packages that see higher development velocity. There were certain cases where I had to redo the pnp-toggle process multiple times if a high velocity package saw a handful of dependency updates after I had already gone and made sure it built with pnp.

The results

Developers weren’t broken by any of the changes made leading up to the migration, and the migration itself came and went without much fanfare at all. No news is good news!

That large package I mentioned earlier went from a 13 minute build time when utilizing a bind mount all the way down to 2.5 minutes! We can’t take all of the credit though; part of this performance gain came from a very timely Docker Desktop upgrade which introduced beta virtioFS functionality, which is a faster file I/O implementation. It has since become the default implementation in newer versions.

Also, now that we could efficiently use a bind mount, we could remove the sync step to get the host’s working tree into the container’s volume, and the volume itself. Now, our containers could be configured to be syncless! So not only did we get huge build performance gains out of this effort, but we also shaved multiple minutes off of container creation time, and dozens of seconds from many frequently run container-related commands as well. This was essentially the best-case scenario we had hoped for. The local development loop saw massive time-savings all around.

Post-mortem

The conservative gameplan that we executed really does seem like it was worth it in the end, even though there were times where it felt like we couldn’t keep up with constant changes to high velocity services.

A few thoughts on how to improve the process:

It would have been ideal to get more teammates involved in making some of these changes. This was entirely my fault; I think I tend to shield teammates from “boring” or “tedious” work like this work felt at times. That being said, it may have been beneficial to allocate basically one engineer’s full time to the effort and keep other engineers focused on other goals, as I got very efficient with the process and was able to turn around packages really quickly by the end of the effort.
As mentioned in the hindsight section above, there may have been a better way to also accomplish a successful and uneventful migration that would have greatly reduced the whack-a-mole-ing I experienced during the migration. It probably isn’t feasible, but definitely worth looking into if you are finding yourself in a position of needing to do a similar migration.
We should have made it clearer to developers that a good chunk of the performance improvement was due to using virtioFS. Since it was a beta feature within Docker Desktop at the time, it required manual enablement by the user. We announced this fact in many messaging and announcement streams, but there is always the chance that users happen to miss these. We should have made our tooling emit a warning or error if it was detected that virtioFS wasn’t in use. This would have helped enable self-service. Instead, we had to tend to a small but steady stream of developers wondering why thier builds weren’t as fast as their teammates’.

What about you?

Have others gone through a similar migration? Did you do things differently that resulted in a less painful process? I’m curious to hear about other experiences!

∴

Edits

2023-11-08: A misspelling and some punctuation fixes

Footnotes

https://yarnpkg.com/features/pnp#how-does-it-work ↩
The history of pets vs cattle ↩
By the end of this effort, my MacBook CPU was legitimately cooked. Things started breaking in very weird ways, and diagnostics at the Genius Bar reported back that it had seen sustained high temperature events very frequently. 8 hours of compilation a day can do that I guess. ↩