Downloading entire NPM package list

Downloading entire NPM package list - npm

This question is NPM specific.
Few years ago I wrote a tool named qnp that downloads entire list of npm packages and then executes local queries very fast, like 0.2 second per query. This allows to perform a very interesting study of the modern programming world, filtering by author names, descriptions, tags, etc, doing hundreds of queries, inspecting results, analyzing, having ideas, doing more queries. An official client is good, but does not allow you to do very fast queries at the speed of thought. Here is my question:
About a year ago the location of the registry metadata DB of NPM was abandoned, now it returns an empty file. How can I download/fetch the entire list of metadata now? I need at least those fields: title/author/description/keywords/date. Optionally downloads count, dependencies list, version.
Here is the code that was working previously:
var request = http.get({
host: 'registry.npmjs.org',
path: '/-/all/static/all.json',
headers: {
'Accept-Encoding': 'gzip, deflate'
}
}, function (a,b,c) {
var done = 0 ; var all = parseInt(a.headers['content-length'])
a.on('data', function (a,b,c) {
done += a.length
process.stdout.write( '\r' + (done / (all/100)).toFixed(2)+'% ' )
})
console.log('download started')
a.pipe(S)
S.on('finish', function (a,b,c) {
console.log('download complete')
S.close(f)
})
})

Since this post came up near the top when I searched for the answer, let me point out two packages that might be helpful for people landing on this old question:
https://github.com/nice-registry/all-the-package-names
https://github.com/bconnorwhite/all-package-names
Using the first one, I downloaded a list with 2.247.694 entries by using
pnpx all-the-package-names > ~/temp/all-the-package-names.txt
where pnpx is pnpm's equivalent to npx, the npm CLI runner (the latter installed with NodeJS).

Related

Detect the first time a VS Code extension version is loaded

I'd like to take an action the first time a user loads a new version of my VS Code extension. This is different from merely detecting first run as described by the How to run vscode extension command just right after installation? because I don't want to detect "just right after installation" I want to detect first run of each new version which is a totally different problem.
Mike Lischke's answer to that question doesn't actually answer that question, it answers this question, but that doesn't mean this is a duplicate question, it means the response to the other question doesn't actually answer the asked question, and since unlike many people I actually read the question, I didn't bother to read the answers, because answers to that question are not what I seek. Frankly I'm tempted to delete the question myself just to spite Stack Overflow because I'm fed up with this crap. Do whatever you like.
Searching the net turned up sample code
export function activate(context: vscode.ExtensionContext) {
if (context.firstTimeUse) {
//do the one-time-per-version-update thing
}
}
but ExtensionContext doesn't seem to have this property, at least not any more.
So how do you do it now?
I could record the version in a file and compare to the file before updating it, but if there's baked in support I'd rather do it the supported way.

There is no supported mechanism.
Since each update gets a new folder, you don't need to log a timestamp, just probe for the file. If it exists, not first run. If it doesn't exist, first run so create the file and do other first run things.
This is so simple and straightforward there probably won't ever be a supported mechanism. Thanks to Lex Li in the comments for confirming that this is the standard solution.
If you need to differentiate major, minor and maintenance releases, the simplest solution is to store the version string in context.globalState. You begin by trying to fetch from context.globalState. Absence means first run ever. If it's present, an exact match for current version means no change. For a non-match you can parse out major and minor version numbers context.globalState.
const currentVersion = context.extension.packageJSON.version as string;
const lastVersion = context.globalState.get("version") as string ?? "0.0.0"
if (lastVersion !== currentVersion) {
logger.warn(`Updated to ${currentVersion}`);
const lastVersionPart = lastVersion.split(".");
const currVersionPart = currentVersion.split(".");
if (lastVersionPart[0] !== currVersionPart[0]) {
// major version change
advertiseWalkthrough();
if (lastVersionPart[1] !== currVersionPart[1]) {
// minor version change
launchWhatsNew();
} else {
// it's a maintenance version change so don't pester the user
}
}
context.globalState.update("version", currentVersion);
}

Running manifests (classes) from a task or plan in Puppet Enterprise

TL;DR
In Puppet Enterprise, how do I run a manifest (testpp.pp) from a task or plan (not Bolt).
plan base_windows::testplan (
  TargetSpec $targets,
  Optional[String] $contents = undef,
  String $filename,
){
  $apply_prep($targets)
  $apply_results = apply($targets, '_catch_errors' => true) {
    class { 'base_windows::testpp': }
  }
  $apply_results.each | $result | {
    notice($result.report)
  }
}
apply_prep seems to succeed, but apply is failing with the following error:
{
"msg" : "Evaluation Error: Unknown function: 'report'. (file: /opt/puppetlabs/server/data/orchestration-services/code/environments/development/modules/base_windows/plans/testplan.pp, line: 16, column: 19)",
"kind" : "bolt/plan-failure",
"details" : {
"class" : "Bolt::PAL::PALError"
}
}
If I change the code to:
plan base_windows::testplan (
  TargetSpec $targets,
  Optional[String] $contents = undef,
  String $filename,
){
  apply_prep($targets)
  $apply_results = apply($targets, '_catch_errors' => true) {
# Is this how to call a class? I cannot find an example.    
class { 'base_windows::testpp': }
  }
  $apply_results.each |$result| {
$target = $result.target.name
if $result.ok {
  out::message("${target} returned a value: ${result.value}")
} else {
 out::message("${target} errored with a message: ${result.error.message}")
}
  }
}
The plan tells me it has failed, but there are no errors in the node's report. In fact, there is no entry for the time the plan was executed.
I cannot find any examples on how to call a class from a plan, so the above apply() is a guess, based on this documentation.
I have installed the puppetlabs_reboot module and successfully ran a plan using it, therefore, I conclude my system is set up correctly, it's just my code that is wrong.
Background
I may be going about this all wrong, so here is some background to the problem. Currently, I have a series of manifests that install various packages from the public Chocolatey repository depending on a node's classification. Package definitions are stored in Hiera data and each package' version is set to latest. At the end of the Package{} resource, some manifests include a reboot.
These manifests are used to provision new nodes and keep existing nodes up-to-date with the latest package version.
The Puppet agent is set to run once per hour and if the source package is updated in the Chocolatey repo, on the next Puppet run, the manifest will update the package, rebooting the node, if required.
Goal
New nodes are provisioned with the latest package version.
Prevent package updates at undetermined times on existing nodes.
Continue to allow Puppet agent runs every hour.
Make use of existing manifests.
Ideas
Split out the package{} code from the profile manifest and place them in tasks / plans, allowing packages to be updated out-of-hours.
Specify the actual package version in Hiera. Although this is more declarative and idempotent, it means keeping an eye on over 100 package version. I guess it would be fairly simple to interrogate the Chocolatey repos with code to pull the latest version number, but even so I am no better off.
Create a task with a script that runs choco upgrade all, however, the next Puppet run would revert package versions according to the version defined in Hiera, meaning Hiera still needs to be kept up-to-date.
Problems
As per the main crux of this question, how do I run manifests (classes) from plans? If I understand correctly, tasks are for ad-hoc scripts, whereas plans can run tasks and manifests. As a lot of time has been invested in writing manifests, I would prefer not to rewrite all my manifests as scripts.
I am confused by the Puppet documentation as it seems to switch between PE and Bolt syntax. I am using Puppet Enterprise where Puppet says they don't recommend using Bolt but their examples seem to site Bolt commands.
No errors in the node' report. apply_prep() reports executed successfully, albeit taking far longer to execute than puppetlabs_reboot module, but apply() results in a failure, but nothing is logged in the node's reports.
Using puppetlabs_reboot module as a reference, it appears their plan uses a bunch of tasks. It appears that they don't use apply() to run their reboot{} class. Is this not duplicating the work?
If anyone has any suggestions or ideas, I'd be grateful if you could share.

I've got it to work. The class I was trying to run, required parameters that I hadn't provided!
plan base_windows::testplan (
TargetSpec $targets,
Optional[String] $contents = undef,
String $filename,
){
apply_prep($targets)
$apply_results = apply($targets, '_catch_errors' => true) {
class { 'base_windows::testpp':
filename => $filename,
contents => $contents,
}
}
}
# Output the whole result_set in the PE console
return $apply_results
I found this out using the logs.
Turn on debug level logging in /etc/puppetlabs/puppetserver/logback.xml (root level="debug")
Tail the following logs:
tail -f /var/log/puppetlabs/bolt-server/bolt-server.log
tail -f /var/log/puppetlabs/puppetserver/puppetserver.log | grep -B 5 -A 5 'testplan'
tail -f /var/log/puppetlabs/orchestration-services/orchestration-services.log

Why does "npm search" provide incomplete results?

I've noticed, that using the npm search command does not guarantee complete results. Here is an example:
$npm search jasmine
does not list the jasmine-diff, jasmine-diff-reporter packages, while
$npm search jasmine diff
does.
I've read the doc, and there is no mention of any incompleteness, indeed it states
npm search performs a … search through package metadata for all files in the registry
I think, this somehow implies, that search should be consistent and complete. As one can see the jasmine-diff-reporter package does have the term jasmine in its keywords:
And it doesn't matter, that there is no word jasmine in the description section, since other packages like jasmine-diff have that word and are still missing in the $npm search jasmine output.
So could anyone explain this behavior somehow and/or suggest a workaround (except to use Google or something like that)?

The problem is the new "fast endpoint search" for the "npm search" that was implemented in the https://github.com/npm/npm/commit/e3229324d507fda10ea9e94fd4de8a4ae5025c75. I have registered a bug now: https://github.com/npm/cli/issues/1211.
I investigated npm scripts and found that the old search used the URL https://myNpmServer.com/repository/myNpmRegistry/-/all to get the package information while the new search uses https://myNpmServer.com/repository/myNpmRegistry/-/v1/search?text=%2F.*%2F&size=20. This value "20" is hardcoded, but you can change it with the --searchlimit=N parameter for the "npm search" and it is the simpliest workaround.
The only problem is that you never know how the search results are big. There is no value which means "infinity" (I tried to pass -1 and it did not work). If you really need the full search you can either refuse of "npm search" and parse directly the JSON output of the https://myNpmServer.com/repository/myNpmRegistry/-/all or you can hack the file <NodeInstallationDir>/lib/node_modules/npm/lib/search.js and add your own parameter --oldsearch:
if (npm.config.get('oldsearch')) {
allPackageSearch(searchOpts).on('data', function (pkg) {
entriesStream.write(pkg)
}).on('error', function (e) {
entriesStream.emit('error', e)
}).on('end', function () {
entriesStream.end()
})
} else {
esearch(searchOpts).on('data', function (pkg) {
entriesStream.write(pkg)
!esearchWritten && (esearchWritten = true)
}).on('error', function (e) {
if (esearchWritten) {
// If esearch errored after already starting output, we can't fall back.
return entriesStream.emit('error', e)
}
log.warn('search', 'fast search endpoint errored. Using old search.')
allPackageSearch(searchOpts).on('data', function (pkg) {
entriesStream.write(pkg)
}).on('error', function (e) {
entriesStream.emit('error', e)
}).on('end', function () {
entriesStream.end()
})
}).on('end', function () {
entriesStream.end()
})
After that you can say "npm search --oldsearch --registry ... '/regexp/'" and it must display really all packages.
ADDITIONAL NOTE (nice to know it):
Please be aware that during manipulating with .js scripts inside Node installation (adding there your printouts etc.) you can achieve an error message
npm ERR! invalid value written to input stream
After that something gets broken and "npm search" stops working at all or displays really few output. In order to repair this just keep adding other printouts until it fails again with the abovementioned message. Then the next run (only once) you see these messages:
npm WARN all-package-metadata cached-entry-stream Empty or invalid stream
npm WARN Failed to read search cache. Rebuilding
npm WARN Building the local index for the first time, please be patient
and it returns to proper state again. I did not investigate furhter why this happens and did not find a way to force invalidating this search cache.
I hope my investigations can become helpful for somebody.

npm - managing vendors for distribution

I am playing with Gulp.js & npm recently, it's great. However, I do not really get the idea of npm as a package manager for packages which will get pushed for dist.
Let's go with an example.
I want to download the latest jquery, bootstrap and font-awesome so I can include them into my project. I can simply download them from their websites and get the files to include. Another option seems to be a packet manager, i.e. NPM.
However, my node_modules directory is huge due to other packages such as gulp, and it's not nested at all. What would be the easiest way to move selected packages to another dir - for example src/vendors/
I was trying to achieve that by gulp task simply copying specified files from node_modules and moving them to a specified dir, nonetheless in the long run it's almost the same as manually copying files since I have to specify not only the input directory, but also the output directory for each single package.
My current solution:
gulp.task('vendors', function() {
var jquery = gulp.src(vendors.src.jquery)
.pipe(gulp.dest(vendors.dist.jquery));
var bootstrap = gulp.src(vendors.src.bootstrap)
.pipe(gulp.dest(vendors.dist.bootstrap));
return merge(jquery, bootstrap);
});
vendors = {
src: {
jquery: 'node_modules/jquery/dist/**/*',
bootstrap: 'node_modules/bootstrap/dist/**/*'
},
dist: {
jquery: 'src/resources/vendors/jquery',
bootstrap: 'src/resources/vendors/bootstrap'
}
}
Is there an option to do it faster and/or better?

There's no need to explicitly specify the source and destination directory for each vendor library.
Remember, gulp is just JavaScript. That means you can use loops, arrays and whatever else JavaScript has to offer.
In your case you can simply maintain a list of vendor folder names, iterate over that list and construct a stream for each folder. Then use merge-stream to merge the streams:
var gulp = require('gulp');
var merge = require('merge-stream');
var vendors = ['jquery/dist', 'bootstrap/dist'];
gulp.task('vendors', function() {
return merge(vendors.map(function(vendor) {
return gulp.src('node_modules/' + vendor + '/**/*')
.pipe(gulp.dest('src/resources/vendors/' + vendor.replace(/\/.*/, '')));
}));
});
The only tricky part in the above is correctly figuring out the destination directory. We want everything in node_modules/jquery/dist to end up in src/resources/vendors/jquery and not in src/resources/vendors/jquery/dist, so we have to strip away everything after the first / using a regex.
Now when you install a new library, you can just add it to the vendors array and run the task again.

ember data serialize embed records

Ember: 1.5.1 ember.js
Ember Data: 1.0.0-beta.7.f87cba88
I have a need for asymmetrical (de)serialization for one relationship type: sideloaded records on deserializing and embedded on serializing.
I have asked for this in the standard way:
RailsEmberTest.PlanItemSerializer = DS.ActiveModelSerializer.extend(DS.EmbeddedRecordsMixin, {
attrs: {
completions: {serialize: 'records', deserialize: 'ids'}//embedded: 'always'}
}
});
However, it doesn't seem to work. Following the execution through, I find that at line 498 of Ember data, the serializer decides whether or not to embed a relationship:
embed = attrs && attrs[key] && attrs[key].embedded === 'always';
At this stage, the attrs hash is well-formed, with completions containing the attributes as above. However, this line results in embed being false, and consequently the record is not embedded.
Overriding the value of embed to true makes it all hunky-dory.
Any ideas why Ember data is ignoring the settings? I suspect that maybe in my version the only option is embedded, and I need to upgrade to a later version to take advantage of the asymmetrical settings for serialize and deserialize.
However, given the possible manifold changes I am fearful of upgrading!
I'd be very grateful for your advice.

Courtesy of the London Ember meetup, I now know that it was simply down to the version of Ember Data! Now upgraded to the latest beta with no trouble.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Downloading entire NPM package list - npm

Related

Detect the first time a VS Code extension version is loaded

Running manifests (classes) from a task or plan in Puppet Enterprise

Why does "npm search" provide incomplete results?

npm - managing vendors for distribution

ember data serialize embed records

Categories

Resources