Preface
Prior to this project, I didn’t know anything about Yarn or frontend build systems in general. Frontend “stuff” has never been my jam, but improving build times and developer productivity definitely is. With that in mind, please pardon my ignorance if I use some incorrect terminology throughout this article, and please let me know if any edits should be made to improve clarity!
Prior to my involvement, a few people had made multiple partial strides to help get this migration closer to the finish line, but it was clearly going to take some focused effort to connect the dots and get the monorepo to a point where the migration would be truly feasible. Despite the large hurdles in the way, the performance gains and other quality-of-life improvements for developers promised at the end of the migration were too good to leave on the table.
tl;dr
Switching over to Yarn Plug ‘n’ Play from
node_modules is an arduous process in a monorepo. All the changes required to reach
compatibility with Plug ‘n’ Play need to also support node_modules, as you can’t
incrementally switch packages over to Plug ‘n’ Play; the entire repo has to use Plug ‘n’
Play in an all-or-nothing config change. So, you have to constantly switch your local
config over to use the pnp
nodeLinker
,
make sure things build and test successfully, then make sure they still work as intended
with node_modules. Then, you merge your changes and rinse and repeat as you constantly
keep up with package changes coming in from developers. Then, once everything is finally
working and building, switch over the repo’s nodeLinker
to pnp
and hope for the best!
Context and status quo
What is Yarn Plug’n’Play?
Yarn Plug’n’Play, hereby referenced as PnP, works by
“[telling] Yarn to generate a single Node.js loader file in place of the typical
node_modules
folder. This loader file, named .pnp.cjs
, contains all information
about your project’s dependency tree, informing your tools as to the location of the
packages on the disk and letting them know how to resolve require and import calls.”1
So, instead of resolving imports directly from the node_modules
folder where they’ve
been fully decompressed and installed, .pnp.cjs
contains what is essentially a map
from a package reference to its location internally within the zip file of the package
within Yarn’s package cache. Here’s a snippet from berry/.pnp.cjs
which might better visualize
what’s happening:
["callsites", [\
["npm:3.0.0", {\
"packageLocation": "./.yarn/cache/callsites-npm-3.0.0-4966cb90f4-40e3cb2027.zip/node_modules/callsites/",\
"packageDependencies": [\
["callsites", "npm:3.0.0"]\
],\
"linkType": "HARD"\
}]\
]],\
When callsites
is referenced, Yarn knows how to resolve the import from right
within the zip
file! No need to decompress and install the cache to the node_modules
directory. Cool, but why would this benefit us?
Why would we want to use PnP anyway?
Here’s a gist of what we were dealing with prior to the switch to PnP:
- A multi-language monorepo with many frontend services, each with their own
package.json
, with a rootpackage.json
which points to all the workspaces within the repo. - Most services use the same version of common dependencies, but this isn’t always a guarantee. In some services, outdated versions of webpack were being used which didn’t support PnP out of the box. There were also a bunch of other packages which required major version upgrades to get to versions which supported PnP.
- The local build environment is a Docker container running through Docker Desktop on macOS.
- Build time for one of the larger services would take approximately 13 minutes when
using a bind mount from host to
container for the container’s working tree. Build time could be noticeably decreased
by using a volume instead of a bind mount
for the working tree, but this introduced an intermediate “sync” step between host and
container for the majority of developer operations (not just builds) in order to get
the up-to-date working tree from the host into the container. One of the major
downsides of this is slow container creation, resulting in many
developers treating their containers as pets, not cattle2, understandably so.
- Building the “world” when using a bind mount would result in a
node_modules
directory of multiple GBs which would have to hop across the host to container boundary during the build, resulting in a ton of file I/O bottleneck. Bind mount performance actually performed worse than volume performance, even with the volume sync step!
- Building the “world” when using a bind mount would result in a
By default, we were using a volume for the working tree. We could keep this as-is, but the step of syncing the host’s working tree into the container volume was adding minutes to container creation time and multiple seconds to a lot of common operations. In general, it was a significant developer productivity hindrance.
However, removing that volume and therefore the sync step by using a bind mount instead was not serving developers any better. While container creation and common operations were now “syncless” and therefore gained a bunch of time savings, that time savings was completely eaten up by significantly slower build times.
File I/O across the bind mount simply couldn’t keep up with the build process, as
multiple files were constantly being created/referenced/etc. within the host’s
node_modules directory. If only there were a way to drastically reduce the amount of
file I/O required during the build… what if the entire node_modules directory was
replaced with a single file called something like, say, .pnp.cjs
? Then, we might be
able to use a bind mount and benefit from the performance gains from syncless and have
speedy builds not bottlenecked by file I/O!
It was clear that PnP showed a lot of promise in our current setup. The potential developer experience and performance gains were too hard to pass up, we had to try and make this work.
Getting started
Before diving in, there were some guides available on the subject of moving from node_modules to PnP. Some of these guides were internal company runbooks that engineers had put together which outlined what package upgrades and config changes would be necessary in order to get our packages in a PnP-supportable state. This pre-work was invaluable, as I was able to hit the ground running and extend their efforts to cover the entire codebase.
Along with this internal help, Yarn itself provided some “recipes” to help with the transition. I tossed a message in the Yarn discord looking for some more help before I went off and did my own thing:
szehnder — 08/12/2022 9:38 AM 👋 hello! I’m currently working on migrating a monorepo over to using PnP (using yarn v3 fyi). The Hybrid PnP + node_modules mono-repo recipe is helpful, but I’m wondering if there is a recommended path for a hybrid approach where PnP is opt-in, rather than the other way around like the above recipe suggests. Thanks in advance!
The gameplan
I didn’t hear back, so I had to do some thinking about how to go about doing this without causing downtime for our devs. If PnP can’t be opt-in, then the switch to PnP will need to happen in one fell swoop, which is much easier said than done! This is what I came up with:
szehnder — 08/29/2022 11:56 AM After some searching, it seems like there isn’t strong support for linking PnP-based packages to node-modules-based packages, which makes sense. I have instead opted to do the following, in case it helps anyone else:
- switch .yarnrc.yml over to nodeLinker:pnp
- attempt to yarn install and subsequently build atomic components within the monorepo with PnP
- after making necessary tweaks to successfully build, switch back to nodeLinker: node-modules and yarn install and build with the changes I made for PnP, just to verfy it works in node-modules world as well
- put up a diff for the atomic component, not adding .pnp.cjs and keeping the nodeLinker as node-modules
once all of these piecemeal changes have been made and committed, I’ll commit the nodeLinker:pnp change and the .pnp.cjs file, and we should ideally be able to make a somewhat painless switch over to PnP
I have written a helper script that toggles my environment between one that uses .pnp.cjs and node_modules/, so i am not spending a ton of time waiting for either to repopulate when switching my nodeLinker between the two
The process outlined above really did end up being the main workflow for handling the migration:
- run my
pnp-toggle
script to switch tonodeLinker: pnp
and movenode_modules/
out of the way - try and build a package
- resolve failures that pop up through a myriad of different fixes
- once it built with PnP, run
pnp-toggle
and try and build it withnodeLinker: node-modules
to see if it still worked - merge the changes without switching the repo over to PnP as the nodeLinker
Hindsight?
Thinking more about this now, I wonder if the Yarn-provided recipe could be leveraged in a way where:
- the global nodeLinker is set to
pnp
- every package explicitly sets their linker to
node-modules
- every package gets added to
pnpIgnorePatterns
- one-by-one, each package is opted-in to PnP by
- removing their local
.yarnrc.yml
- removing them from
pnpIgnorePatterns
in the central.yarnrc.yml
- getting it building with
pnp
One thing that I’m not sure about is how opted-in and opted-out packages would interact with one another. Past self makes it sound like I investigated this and determined that linking PnP-based packages to node-modules-based packages wasn’t supported, but now I don’t remember if/why this is the case. That would be something to investigate further.
The experience
While tedious, the gameplan went off without a hitch. Most of the changes required to
get packages building in PnP were typically simple situations where they needed to more
explicitly define their dependencies. In cases like these, pretty much every
modification was inherently compatible with the node-modules
linker, since the
properties of the packages weren’t changing, we were just being more explicit about what
they needed.
The spiciest modifications required were those which weren’t inherently compatible
with both nodeLinkers
. An example of this would be when interfacing with supporting
tools outside of the Yarn ecosystem which can’t utlize the PnP resolver. For example,
various gulp.js workflows referenced filepaths within cached
packages. A helper function had to be added to redirect references depending on the
linker currently in use. Once the repo was fully migrated to PnP, the helper function
was safely removed.
A grab-bag of of other modifications required included:
- modifying third-party package dependencies through
packageExtensions
to add dependencies that they don’t explicitly call out themselves. When first-party packages weren’t defining all their dependencies, we could simply update theirpackage.json
with the missing dependencies.packageExtensions
is how you handle this for third-party packages.- updating third-party packages to versions that support the
pnp
nodeLinker
, or patch them so that they do. This ranged from minor upgrades which didn’t require any code changes, to things like upgrading major versions of webpack and therefore refactoring a packages’ build.- many other small tweaks that I don’t remember!
By following this process, devs were completely unaffected and didn’t notice any
difference while packages were being prepped for PnP. The only thing affected was my
MacBook’s CPU3, and periodically my sanity. There were definitely times where I felt
like I was playing a serious game of whack-a-mole with packages that see higher
development velocity. There were certain cases where I had to redo the pnp-toggle
process multiple times if a high velocity package saw a handful of dependency updates
after I had already gone and made sure it built with pnp
.
The results
Developers weren’t broken by any of the changes made leading up to the migration, and the migration itself came and went without much fanfare at all. No news is good news!
That large package I mentioned earlier went from a 13 minute build time when utilizing a bind mount all the way down to 2.5 minutes! We can’t take all of the credit though; part of this performance gain came from a very timely Docker Desktop upgrade which introduced beta virtioFS functionality, which is a faster file I/O implementation. It has since become the default implementation in newer versions.
Also, now that we could efficiently use a bind mount, we could remove the sync step to get the host’s working tree into the container’s volume, and the volume itself. Now, our containers could be configured to be syncless! So not only did we get huge build performance gains out of this effort, but we also shaved multiple minutes off of container creation time, and dozens of seconds from many frequently run container-related commands as well. This was essentially the best-case scenario we had hoped for. The local development loop saw massive time-savings all around.
Post-mortem
The conservative gameplan that we executed really does seem like it was worth it in the end, even though there were times where it felt like we couldn’t keep up with constant changes to high velocity services.
A few thoughts on how to improve the process:
- It would have been ideal to get more teammates involved in making some of these changes. This was entirely my fault; I think I tend to shield teammates from “boring” or “tedious” work like this work felt at times. That being said, it may have been beneficial to allocate basically one engineer’s full time to the effort and keep other engineers focused on other goals, as I got very efficient with the process and was able to turn around packages really quickly by the end of the effort.
- As mentioned in the hindsight section above, there may have been a better way to also accomplish a successful and uneventful migration that would have greatly reduced the whack-a-mole-ing I experienced during the migration. It probably isn’t feasible, but definitely worth looking into if you are finding yourself in a position of needing to do a similar migration.
- We should have made it clearer to developers that a good chunk of the performance improvement was due to using virtioFS. Since it was a beta feature within Docker Desktop at the time, it required manual enablement by the user. We announced this fact in many messaging and announcement streams, but there is always the chance that users happen to miss these. We should have made our tooling emit a warning or error if it was detected that virtioFS wasn’t in use. This would have helped enable self-service. Instead, we had to tend to a small but steady stream of developers wondering why thier builds weren’t as fast as their teammates’.
What about you?
Have others gone through a similar migration? Did you do things differently that resulted in a less painful process? I’m curious to hear about other experiences!
∴
Edits
- 2023-11-08: A misspelling and some punctuation fixes
Footnotes
-
By the end of this effort, my MacBook CPU was legitimately cooked. Things started breaking in very weird ways, and diagnostics at the Genius Bar reported back that it had seen sustained high temperature events very frequently. 8 hours of compilation a day can do that I guess. ↩