The seventh version of npm is already published and arrives with several highly requested features including workspaces, automatically installing peer dependencies and package lockfile improvements.
In this article, we’re going to focus on the changes that were done in regards to the package lockfiles.
The content is available as a video as well:
If you like this kind of work, you can save the full playlist and subscribe to my YouTube channel today to not miss new content.
Here we begin.
Motivation
Reproducible builds is an approach ensuring that the same source code, build environment and build instructions produce the exact copies of all specified artifacts verified with a bit-by-bit comparison. This means, a given source code must create the same result deterministically, the build tools should be predefined and the build process should validate that the output matches the original.
In terms of npm (and Yarn) - reproducible builds guarantee that all teammates will get the precise versions of all dependencies even though working on different machines, and so is the production environment. This is possible because that these CLI tools manage “lock” files (that are designed to be committed obviously) instructing them how to produce the precise node_modules
tree.
The truth is that reproducible builds (and package-lock.json
specifically) aren’t new and already implemented since npm v5. So, the question remains, what actually was changed? π€
Well, this what is we’re going to explain.
New Lockfile Format
npm v7 arrives with a newer version for the package-lock.json
format - allowing to reduce the need to read package.json
files and to have enough information to reliably describe the full and precise package tree all by itself. More than that, the resulting package tree using the new lockfile is flattened, and this is crucial to boost the performance.
Practically this means that starting on v7 the file is generated with a new set of semantics:
On the left there is a lockfile generated with npm v7 after installing React whereas on the right is the one generated with v6.
First of all, the lockfileVersion
field is an integer pointing which schematics version were used to generate the file. So, in case of npm v7, the schematics version is 2
which belongs to the new lockfile format. Important to note that lockfiles in v2 are backwards compatible with CLI versions supporting v1 lockfiles (for example, npm v5 & v6).
Secondly, a field called packages
was added which maps each installed package by its location to an object containing all needed information about this specific package. Of course, fields such as resolved
, integrity
, link
are still needed and contained. Though, the main change is that with v2 the information is mapped to the package relative location and not just the package name (as done in v1). Notice that the root project is listed and represented with a key of ""
- and then, all dependencies are listed with their relative paths to that root directory.
Thirdly, as said, the new lockfile is backward compatible so the legacy dependencies
field is still contained. This field by the way had been used to map the package information to the name, and it takes up a position for lower CLI versions that don’t recognize the new packages
field.
And now we can say how the reproducible builds are actually expressed - the lockfile is created in advance and committed to the source control. This file contains the resolved package deterministically by a URL to a tarball, while also including the integrity of the relative unpacking location. Put simply, the lockfile v2 is sophisticated enough to solely allow deterministic and reproducible builds - without additional gathering information from package.json
. πͺπ»
Yarn’s Lockfile Support
So far the yarn.lock
files were completely ignored by npm’s CLI, but the good news is, as of v7 - if these files are available, they will be used as a source of package metadata and resolution guidance.
In practice, the resolved
values that are contained inside Yarn’s lockfile will clearly instruct the CLI where to fetch packages from, whereas integrity
keeps being used to verify that the artifact matches. On top of that, the yarn.lock
file will be handled as well when installing or removing packages using npm’s CLI.
Note that when the package-lock.json
file exists it’s being used as the authoritative definition of the resulting package tree. The yarn.lock
file is supported mainly to provide better interoperability between npm and Yarn, in order to accomplish missing information if necessary.
Another question arises - why doesn’t npm just rely on the yarn.lock
without managing a lockfile of its own? Actually this is explained in detail within the npm’s official blog but let’s list the reasons in a nutshell:
- Yarn guarantees resolutions by given a single combination of
yarn.lock
file and specific CLI version - which means, different Yarn versions can produce different results ofnode_modules
tree. In contrast, npm differentiates between deterministic resolutions of dependencies and deterministic tree package shape of dependencies. - Yarn produces in some cases a tree with excessive duplication using its lockfile, which doesn’t allow npm to optimize the resulting tree.
- Locking down the resulting package tree shape inside the lockfile, allows npm to support features such as
--prefer-dedupe
without breaking the ability to produce deterministic reproducible builds. - Yarn (and npm v5/v6) is assisted by the
package.json
to build the package tree, compared to npm v7 that merely needspackage-lock.json
generated by schematics v2.
So, to answer the question, the current implementation of Yarn’s lockfile doesn’t have enough information needed for the complete npm functionality.
Hidden Lockfile
As of v7, npm places a hidden lockfile inside node_modules
containing all information about the package tree:
The purpose of this file is to avoid reading the entire tree repeatedly. In fact, it’s relevant only when created at the time of the most recent update of the package tree. In other words, if different CLIs modify the tree - the contained references might not be relevant thereby in this case the hidden lockfile is ignored.
Note that the hidden lockfile is ignored by npm v5/v6, since its lockfileVersion
field is 3
- which indicates non-backward compatibility (a.k.a breaking change) with older CLI versions. This schematic version would entirely be used in the future as soon as npm v6 support ends.
Performance
We already mentioned that using the new lockfile the CLI produces a flattened resulting package tree. We also said that that lockfile contains all the necessary information, and hereby makes the reads from package.json
redundant. On top of that, it helps to accelerates the fund
command that allows to retrieve the funding information. And, obviously the hidden lockfile, that might avoid the repeated tree reading.
Well, those together lead to significant improvements in the performance:
This benchmarks chart is directly taken from the recent CLI’s benchmark tooling. We can clearly notice that npm v7 is lower (which is faster) compared to v6 in most of the tests.
Summary
We introduced today the lockfile changes that were done in npm v7 which continues to ensure deterministic reproducible builds along with performance improvements.
Let’s recap:
- Reproducible builds ensure producing the exact artifacts verified with a bit-by-bit comparison - from the same source code, build environments and build instructions
- npm’s CLI v5/v6 (and Yarn) strives to guarantee deterministic reproducible builds using a lockfile resolving the precise package tree
- npm’s CLI v7 arrives with schematics generating a newer version of the lockfile format (v2)
- The v2 lockfile has enough information to describe the precise package tree all by itself
- The v2 lockfile maps the packages to their information by their relative location to the root (instead of their name)
- The v2 lockfile is backward compatible with CLIs in v5/v6
- npm’s CLI v7 uses
yarn.lock
files if available, as a source of package metadata and resolution guidance when there is missing information, knowing that thepackage-lock.json
is the authoritative definition yarn.lock
cannot completely replace npm’s lockfile since the current implementation doesnβt have enough information needed for the complete npm functionality- npm’s CLI v7 arrives with better performance because of:
- The new lock format helping to avoid reading from
package.json
by the CLI - The new lockfile format helping to produce a flattened package tree
- The new lockfile format helping to accelerate the
fund
command - The hidden lockfile placed inside
node_module
helping to avoid repeated package tree reading
- The new lock format helping to avoid reading from
Here’s the example project: