-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for brilliant yarn member who has first-hand knowledge of prior issues with symlinking modules #1761
Comments
@kittens Would you be a possible candidate to speak to this issue? Or know a more appropriate person? |
I've talked about this briefly in other places but haven't really given comprehensive reasoning as to the difficulties we faced with it. The truth is that the existing ecosystem does not cooperate very well when you start using symlinks. Operating system differencesSymlinks are supported differently on various operating systems. In Windows they aren't allowed unless you're an administrator for example. You can however use NTFS junctions that operate in a similar way but with the following restrictions:
If we want seamless Windows support then we'd need to impose restrictions on the development environment of Yarn users and when the existing alternatives don't have these same restrictions it's hard to justify. Alternatively we could support both symlinks/junctions and the current flat version but one of the big motivators behind Yarn is determinism and having different ways of representing the files on disk that are distinguishable from one another goes against this. It'd also lead to an explosion on support since we'd be forking the workflow and internals to support symlink resolution. (In fact somewhere in the git history you'd find we once supported both of these installation methods) Tooling not supporting file system cyclesTooling such as Jest would run into weird recursion errors when crawling the file system since symlinks allow cycles to appear. ie. a nested directory referencing another included in it's heirarchy. Jest is a lot better now and it probably fixed but this is a common problem that existing tools don't take into consideration. Poor support for file watchingFile watching across operating system is already a massive issue with their being a lot of inconsistencies and problems with normal files and folders. This issue is even more exaseparted when you take symlinks into consideration. Tools such as watchman don't support symlinks for specific. Tooling relying on
|
Thanks for the wonderful comment, @kittens! ❤️
I'm just passing through and don't know anywhere near as much as @kittens does with regards to the topic of symlinks and their usage here, but thought I'd add some quick points. In the end, I think a filesystem that supports copy on write will make everything better. With a file system that uses copy on write, copies are "lazy". This means that creating a copy of a file doesn't actually copy the data, it simply makes the new file a pointer to the old one, similar to what you'd get if you hardlinked the file. A true copy (allocating space on disk for the file, and actually copying the bytes across) is only performed when you modify the file. The end result is that you get all the benefits of symlinking (faster installations, less disk usage) with none of the disadvantages (modifying a copy doesn't modify the original, and tooling just sees everything as regular files, since they are regular files). On Linux, you can use btrfs or zfs for this. Even with Yarn in its current state today, using btrfs should give you a nice performance boost due to the CoW semantics. However, neither Mac OS nor Windows have a good copy-on-write filesystem today. In #499, @dfreeman said:
Hardlinks solve some of the drawbacks that symlinks have, but they also have their own issues. Positive (advantages of hardlinks over symlinks):
Negative (or negative-ish):
We have a backlog item to investigate hardlinking (#499), nobody's actively working on that right now though.
This is a great point! The fewer possible combinations there are, the easier debugging becomes. |
Sebastian, Thank you kindly for taking the time to explain your experience with symlinking. It's greatly appreciated. OS File System Linking DifferencesRegarding file system linking, I'm presuming If true, and given that FAT32 filesystems are also prevented from storing links when mounted to 'nixs, the only difference in linking behavior between the OS's that node supports, that would have a material impact in how Assuming the singular difference in linking behavior between OS's relevant to Given the one difference that would have had some impact, when you say:
..is my understanding correct then that the only imposed restriction would have been, on non Windows systems, yarn will not create local links to network shares to modules, ensuring that is behaved identically across all OS's? If that were something never required; i.e. a developer having local links to network shares for modules within a project being developed, would it be fair to say that, practically speaking, there really wouldn't be any issues that needed justification of restrictions to account for the difference in OS linking behavior in order to seamlessly support Windows? I ask only because the way in which I'm proposing to use symlinks, while not at all like how This was under Operating System Differences, but I couldn't understand how it was related to OSs?
When you say having different ways of representing the files on disk that are distinguishable in the context of determinism, are you implying that when I ask because I can't see a practical difference with respect to how The way in which I'm intending to use symlinks would not bind the folder structure in any way to any given folder's physical representation. I.e. regardless if any particular module folder was symlinked or a copy, and even if that characteristic of the folder varied from time to time, node would always deterministically walk the structure in exactly the same way, and see exactly the same content on every walk (to the degree node itself was looking for modules deterministically). If that were the case, would that qualify as deterministic enough to meet Tooling not supporting file system cyclesIf it could be guaranteed that it was physically impossible for cycles to exist in the organization of the dependency tree, as represented by the physical folder structure, even though every single module folder was a symlink, would the issue of tooling not supporting file system cycles be irrelevant and could be taken out of scope? Poor support for file watchingWhile watching for changes of a project's source files is a common place thing, like how Tooling relying on
|
@phestermcs - What do you think of the idea of using a copy-on-write filesystem to achieve the disk space reduction? That's something I've been thinking about a bit as well (as per my comment above) |
@Daniel15 What do I think? I think any solution that fixes this !@$# problem needs to be implemented IMMEDIATELY!!! hahaha. One of the great things of I have machines with all three major OS's (and others), and I've done lots of both OS agnostic and OS specific development on all of them. But I prefer Windows. I could of course use a VM to run linux within, but I like speed... like, alot! (hence this issue). |
@Daniel15 I've spent a little time pondering you comments. I think using hard links could be a decent option. I was so locked on to linking the whole folder (but never it's My approach requires a tiny change to Windows 8+ no longer requires a user have administrative rights to make hard links. So while that story is a little better, it's still not the best on Windows NT < 8. However, its the more common case developers are administrators of their machines, even in many enterprise IT departments, so still a pretty good story. And while creating hard links does require administrative rights on Win <8, it does not require running with elevated permissions on any Win version, contrary to creating symbolic links. So if a user has admin rights, The way I was thinking of using symlinks, would have come with the constraint that module folders where always read only (I see them logically like Which leads to one of the changes that could create issues with linking to a global location, and the application of the read-only constraint. That is that the If that issue does show itself to be a problem, only if even marginally, it would obviously need to be addressed. The first way is that you just don't use linking on a particular project; that fallback will always be available (although some might use the issue as justification to just not even provide a linking feature to begin with, as they'd fear it would confuse noobs (which frankly are already usually confused by things, so what's a little more confusion? giggle)). Another approach is that packages could flag themselves as Regarding COW, I'm just starting to dig into I'm going to run some tests to measure performance of hardlinking thousands of files, to see just how much faster it is than copying. If it's at least 4 or 5 times, I'm going to wrench I will say it's a quite a bit nicer working in |
Finally took some initial measurements using react repo ~ 21k files. Ran the tests 3 times each, so here's their ballparks. SSD "Install" (482MBs seq, 30k iops rand)
SSD Delete
7200RPM HD "Install"
So hardlinking can be twice as fast on SSD's, and up to 4x's as fast on HD. It's interesting it takes about the same time regardless if SSD or HD. But clearly, --adjacent-node-modules with symlinking is way faster |
@phestermcs Thank you for pushing for this. In my opinion, slow installation is the current biggest problem with yarn (though it's still faster than npm). It makes sense to me that hardlinks would be the same performance on both SSD and HDD, because they both do the same amount of work: they simply increment the ref count, but don't do any actual copying. Symlinks are faster than hardlinks because you only need to create a single symlink, rather than creating multiple directories and multiple hardlinks. Although I am in favor of hard/soft links, there is one downside: if somebody modifies a file in Personally, I think people should not be doing that, instead they should use local packages if they want to make file modifications. So I see it as more of an education/documentation problem. |
@Pauan yet, the ability to do that is a long-standing important part of node, and since node's |
I wouldn't necessarily characterize slow install times as a problem with any package manager in particular, as it's primarily a consequence of a constraint in I'm still surprised creating hardlinks took the same time between the SSD and HD, because in both cases the OS still had to physically write 21k entries (although understanding each entry was probable a handful of bytes or so) into the directory structures, which is not quite just simply a ref count increment. That just tells me the OS is spending more time doing things other than writing to the disk, which was just surprising those things (cpu bound) were taking more time than actually writing to the disk (io bound); it's usually the other way around. I should clarify, symlinking module folders in the way I'm intending (adj-nm & symlink) is not currently possible with any version of As I mentioned in an earlier comment, any kind of linking from multiple projects to a machine wide store requires read-only access on the store, precisely to prevent a change in the store effecting multiple trees. But the way I'm intending to implement would still allow one to have most modules symlinked, and then selectively have copies of modules should they want to muck with in that particular project, while still allowing At the moment, I'm still on the fence about creating a branch of @ljharb For those that find it important to be able to change content under |
@ljharb I'm curious if changing content under In my own experience, I have on rare occasions tweaked an installed module while trying to understand some behavior. I've then gone on to some other thing, and when I come back to the project, I've forgotten I made a change, and then spend a fair amount of time (maybe 30 minutes) scratching my head when my stuff relying on that module doesn't seem to behave like I expect, until I slap my forehead and go "Oh that's right, I changed that module!", and then I reinstall to bring it back. So I typically really try to not do that, or limit doing it, or if I do to immediately put back to its original state. I can say, again just my own experience, that I've spent way, way, way more time waiting for a package manager to copy modules, or have to delete Just some advice, it's possible that a change to a locally installed module could be forgotten, and someone checks their stuff in that's relying on the changed module, thinking everything's working, and than another gets the project and installs the original modules, only to have things not working as expected. A problem similar to this is one of the reasons |
I do this too, and I wonder if this could be helped by a convenience command which copies the module to the project and links it? maybe too wacky... but, in my experience, editing node_modules directly is a foot gun 😆 not wanting to get offtopic here, but if making node-modules read only yields a 30x speed boost.. I'm in! |
For those interested, I'm creating an experimental branch of yarn, that will implement a new switch: This version will not use hardlinks, but rather will require a version of I chose not to use hardlinks, even though that approach doesn't require a change to That means for the meantime those interested in exploring 2 sec install times will have to use my forked branches of both I'm highly motivated to address this problem, and confident I have the technical where-with-all (fwiw, 35+ years building gobs of all sorts of software; I'll leave it at that) to change But the reality is this will take at least months, probably longer, not for any particular technical reason, but in just changing peoples minds, both that the problem is real, the solution works and provides huge benefit, and the ecosystem can continue to run just fine and thrive. I'm doing this on the side, so it will be a few weeks before I have something for the initial brave few to step into and take for a spin. If you're interested in participating (and the more the better, so tell your friends), just comment on this issue. At some point in the future, I will mention you all on an issue within my |
Not a yarn member, and no plan to advertise my experimental 0 star repo, but I use symlink only approach on my package installer for myself and the project I work for my company. node_modules folder is really clean with only symlinks though.
|
@S-YOU Thanks for your input. I must clarify, the approach that I'm taking uses symlinks in a way that is completely impossible to do today. In order to make it possible, two changes are required in Currently, when a module's dependencies must be precisely installed, because one of those dependencies was also used somewhere else in the local tree, but @ a different version, the only physical way The and have it only take 2 seconds instead of 2 minutes.But I would be a fool to say there wont be any issues, and before I even created this issue I had already encountered and researched ways to address with a hacked version of We here have been using So, for those of you who feel anything like me, I'm going to make every attempt to fix this problem!! Your help and support would be appreciated, as clearly it will be an uphill battle for perception more than anything :). @S-YOU If you're interested in participating, I'll include in the list of those I notify of the first working version of |
I'm going to close this issue, as it's served its purpose, and at the moment it doesn't seem to be of much interest to the actual However, if you're interested in being notified when somethings ready, please just comment on here. |
Symlinks works too with current node, here is the output of one of my module's node_modules folder
I would like to get notified, thanks. Not very sure about I have ability to participate in the project itself though. |
@S-YOU Thanks for the input. Your example assumes that if you had multiple This is also one of the fundamental reasons This is also exacerbated by package managers bubbling common versions as far up to non conflicting ancestors as possible, which today is done to cut down on copies and shorten the depth (path name length) of the The other issues you ran into with bundlers, plugins, etc. where most likely a consequence of either you not running node with |
@S-YOU With the approach I'm taking
However, because a |
You are absolutely right on that case that my approach won't work, |
Because the version specifiers of dependencies in package.jsons are almost always some range of versions, and not specific versions, and so what version is actually installed (and without something like I just went to github.com/express/express/package.json:
the |
I see what you mean now, with my approach projectC will still use acepts 1.3.3, until new version of express release with accepts ~1.3.4, It won't break the module loading, but that was my design choice. |
Thats right. If we appreciate version dependencies across and down an entire tree can change quite a bit, it's possible your application can have subtly changing behavior. Hence |
Actually with your symlinking, projectA would end up using |
@S-YOU Not really knowing the environment you're doing this in, I would not recommend it if you're dealing with either several projects and/or several developers, and/or also symlinking on production servers.... I mean I just flat out wouldn't recommend. You're creating the potential for a nightmare situation.. but that's just my opinion... I'm just thinking about the world of hurt you might be setting yourself up for, that I would not want to be in. Things could be working on your dev machine, and you deploy to production, and several other things break, and then trying to put stuff back in way you know would work... I mean....gulp... be careful |
Thanks for the input, and yes, you are right and I am aware, projects I am using my approach is fully under my dedicated control and I always use and test latest version of all libraries, so it's possible at least for me. |
I'd like to get notified! Thanks for the mad science - love your work 💯 |
Some here may be interested in this |
@phestermcs I would like to be notified as well. I know there hasn't been a lot of vocal support in this issue (or in others), but I'm sure there are a lot of developers who have felt the pain of slow npm installations. They would appreciate faster install times, but they are unaware that these GitHub issues exist. |
@ptim @Pauan @Daniel15 @S-YOU I'm still trying to move the mountain. I have a new issue on nodejs/node that's simpler/shorter to consume & understand (I hope). I've since ran their FWIW, I have the definite impression But tweaking a package manager will be just about be the last thing I can do to show symlinks can work great without breaking things. It will take people like you (and your friends and coworkers) being vocal about the value to yarn, node, and wherever, to actually get the needed changes into a shipping version of node. Your continued support is greatly appreciated. |
@ghost I'm behind you 100% on this. Keep it up 👍 and let me know if something happens. |
Is there anywhere that the discussion formerly at https://github.com/yarnpkg/rfcs/issues/18 could still be viewed? It's linked from one of the Node PRs but apparently that repo has turned off issues (and GitHub retroactively hides existing issues when that happens, I guess). |
Attached a screenshot of it |
I think at this point we don't want to symlink/hardlink from cache. However Yarn should be more open to such experiments and allow third party code to override the linking phase with plugins, e.g. replace copy operations with linking, or JS copy commands with Native copy commands or some smarter hoisting algorithms. If anyone wants to lead this effort, speak up and send an RFC. |
Thanks for the screenshot! |
Yeah, this is a very annoying behaviour of GitHub. The RFC repo was never supposed to have issues enabled (RFCs are only submitted via pull requests), but issues were accidentally enabled in the beginning. We gave people time to create new PRs based on issues before disabling the issue tracker. It would have been nice for GitHub to keep read-only access to the existing issues. Oh well. |
In the integrationTest folder, use [`install-local`](https://www.npmjs.com/package/install-local) to install all stryker dependencies locally, without relying on npm link (which has [some issues](yarnpkg/yarn#1761 (comment))).
I was recently informed by @ thejameskyle that:
If you are a yarn member, who has first hand knowledge of what actually broke, and can technically explain why, I implore you to respond to this issue.
I believe I have found a solution, and my initial experiments indicate it is entirely viable. However, like thejameskyle, the only responses I seem to get as to why it wouldn't work at all, let alone on a broad scale, are entirely ancetodal.
Please, I'm desperate to find someone who knows what they are talking about, that has the technical acumen and understanding of node module resolution, as well as first hand knowledge of the issues yarn encountered, who can repudiate my solution by technical means of cause and effect, by first explaining what issues yarn encountered in its earlier attempts to exploit symlinking.
With my solution, one can have independent physical dependency tree renderings, were the specific module@versions used within a given tree remain specific to that particular physical tree, and can be locked, but where all modules across all physical trees on a given machine can all be symlinked to a single machine wide copy centrally stored, even when within each physical tree a common module@version is used in several trees, yet still resolves it's specific dependency versions based on the physical tree it is used in, and where those dependencies versions are slightly different between trees for whatever reason (but still within the semver range spec in package.json).
You're probably the right person if a) you understand exactly what I just wrote, and b) believe it's impossible.
Having this ability would mean modules no longer need to be copied all over the place, saving gigabytes of storage, and once centrally stored, all 'installs' could symlink all modules (but wouldn't have to), reducing install times from minutes to seconds.
The text was updated successfully, but these errors were encountered: