How to limit the number of calls in Express.js? - express

I'm using express for showing the result from doing some web scraping with puppeteer but I'm having a performance issue.
I call several times the scraper file because I want to get multiple results at once.
For instance:
const express = require('express')
const app = express()
const scraper = require('./scrapers/scraper.js');
app.get('/getResults', function(req, res, next) {
const url = 'http://www.example.com';
const val1 = new Promise((resolve, reject) => {
scraper
.getPrice(results, url, nights)
.then(data => {
resolve(data)
})
.catch(err => reject('Medium scrape failed'))
})
const url = 'http://www.example.com';
const val2 = new Promise((resolve, reject) => {
scraper
.getPrice(results, url, nights)
.then(data => {
resolve(data)
})
.catch(err => reject('Medium scrape failed'))
const url = 'http://www.example.com';
const val3 = new Promise((resolve, reject) => {
scraper
.getPrice(results, url, nights)
.then(data => {
resolve(data)
})
.catch(err => reject('Medium scrape failed'))
const url = 'http://www.example.com';
const val4 = new Promise((resolve, reject) => {
scraper
.getPrice(results, url, nights)
.then(data => {
resolve(data)
})
.catch(err => reject('Medium scrape failed'))
Promise.all([val1, val2, val3, val4])
.then(data => {
console.log(data)
})
.catch(err => res.status(500).send(err))
}
The code above will call the scraper.js file 4 times at once, but what should I do in order to call each one once the previous one is done? I mean, when val1 is completed, it should run val2 and so on.
In fact, my code calls the scraper file 18 times and that's not good for the computer performance since puppeteer is based with Chromium and it literally opens a new Chromium instance 18 times at once.
I even get this error when I run it:
(node:26600) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added. Use emitter.setMaxListeners() to increase limit

async await
You can write your code with async await. The fun thing is, you can handle all errors and the value is returned automagically with promises.
app.get('/getResults', async function(req, res, next) { //<-- notice the async here
try{
const val1 = await scraper.getPrice(results, url, nights)
const val2 = await scraper.getPrice(results, url, nights)
const val3 = await scraper.getPrice(results, url, nights)
const val4 = await scraper.getPrice(results, url, nights)
return res.send([val1, val2, val3, val4])
} catch(err) {
res.status(500).send(err)
}
})
p-limit
You can use a package called p-limit, which run multiple promise-returning & async functions with limited concurrency.
const pLimit = require('p-limit');
const limit = pLimit(1);
const input = [
limit(() => scraper.getPrice(results, url, nights)),
limit(() => scraper.getPrice(results, url, nights)),
limit(() => scraper.getPrice(results, url, nights))
];
(async () => {
// Only one promise is run at once
const result = await Promise.all(input);
console.log(result);
})();
for..of loop
You can optimize these codes and reduce code duplication. With async..await and for..of, you can reduce the code even more,
// assuming you have these urls
const urls = [
'http://example.com', 'http://example.com', 'http://example.com'
];
const results = []
for(let url of urls){
const data = await scraper.getPrice(results, url, nights);
results.push(data)
}
console.log(results)

Do you know that promises can be made sequentially?
val1.then(v1 => return val2).then(v2=> {...})
You should open a new Chromium tab, not an instance. (Did you just confuse concepts?)
And most importantly - you need to better manage download processes. The queue will be best here. It can be a simple: that makes sure that there are no more than n processes running or more advanced: that monitors the server resources.
You may be able to find some package. If nothing fits you, remember to handle the situation when something gets out and Node will not notice the end of the process.
I use methods interchangeably:
flag URL as being downloaded and if it is not retrieved for a given time, it returns to the queue (More specifically: specify when to re-download the URL. At the time of downloading it is +1 minute, after downloading it is eg 1 month)
I save the PID of the download process and check periodically it works
There are also rate-limits that control the number of HTTP calls. On the endpoint, on the number of simultaneous orders with IP.

Related

refetch or poll an external api with fetch in svelte

New to Svelte and am running into some issues.
Currently doing the following in +page.server.js
I would like to poll this API every couple hundred milliseconds, I am unsure how to do that. I have tried using set Interval here to no avail.
export async function load({params}) {
const response = await fetch(
`http://localhost:9595/api/v1/chrysalis/example?uid=${params.uid}`
)
const site = await response.json()
const siteData = site[0]
console.log(siteData)
return {
uid: params.uid,
partitions: siteData.partitions,
zones: siteData.zones,
zTypes: siteData.zTypes,
zStates: siteData.zStates,
zNames: siteData.zNames
}
}
For example, I've built this in next.Js using SWR with refreshInterval: 1.
const {data, error, isLoading} = useSWR(
'http://localhost:9595/api/v1/chrysalis/example',
(url) => {
const searchParams = new URLSearchParams();
searchParams.append("uid", body.uid)
const newUrl = `${url}?${searchParams.toString()}`
const options = {
method: 'GET',
headers: {'Content-Type': 'application/json'},
}
return fetch(newUrl, options).then(res => res.json())
},
{
refreshInterval: 1
}
);
I have also tried to do the following onMount of the +page.svelte but when trying to hit the API from the client I get CORS error.( ran into this before if +page.js was not +page.server.js
let x;
onMount(async () => {
setInterval(async () => {
const response = await fetch(
`http://localhost:9595/api/v1/chrysalis/example?uid=${data.uid}`
)
const site = await response.json()
x = site[0]
console.log(x)
}, 3000)
})
The CORS error results because +page.svelte/+page.js are run in the browser. So you need to proxy the call through a service that allows being called from the browser. (Or relax the CORS restrictions on http://localhost:9595)
You can use SvelteKit itself to proxy the call by creating an internal endpoint. So:
The internal endpoint simply fetches http://localhost:9595/... and returns the results. (You can just forward the response object from fetch())
+page.svelte calls that internal endpoint from setInterval().

PWA fetch request in service worker sends "the site can't be reached" error on login with google the 2nd time

This error is really driving me crazy for the last 2 days. Please help.
So when I try to login with google the 1st time on my website, it doesn't cause any problem but when I try to do it the second time, with any account, it shows this error in the console:
The FetchEvent for "http://localhost:3000/auth/google/callback?code=4%2F0AX4somethingsomethingsomethingsomething&scope=profile+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.profile" resulted in a network error response: an object that was not a Response was passed to respondWith().
and the webpage shows this error:
This site can’t be reached The web page at http://localhost:3000/auth/google/callback?code=4%2F0AX4somethingsomethingsomethingsomething&scope=profile+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.profile might be temporarily down or it may have moved permanently to a new web address.
I am quite new to pwa and don't understand some of the code in the service worker file (I have copy pasted the 'fetch' part of the code from this webiste: blog.bitsrc.io) so that might be the reason I am not able to identify the error in the code. But you might identify it, this is my service worker code:
const staticCacheName = "site-static-v2";
const dynamicCacheName = "site-dynamic-v2";
const assets = ["/", "/stories", "/groups", "offline.html"];
// cache size limit function
const limitCacheSize = (name, size) => {
caches.open(name).then((cache) => {
cache.keys().then((keys) => {
if (keys.length > size) {
cache.delete(keys[0]).then(limitCacheSize(name, size));
}
});
});
};
// install event
self.addEventListener("install", (evt) => {
//console.log('service worker installed');
evt.waitUntil(
caches.open(staticCacheName).then((cache) => {
console.log("caching shell assets");
cache.addAll(assets);
})
);
});
// activate event
self.addEventListener("activate", (evt) => {
//console.log('service worker activated');
evt.waitUntil(
caches.keys().then((keys) => {
//console.log(keys);
return Promise.all(
keys
.filter((key) => key !== staticCacheName && key !== dynamicCacheName)
.map((key) => caches.delete(key))
);
})
);
});
// fetch events
self.addEventListener("fetch", function (event) {
event.respondWith(
fetch(event.request)
.catch(function () {
return caches.match(event.request);
})
.catch("offline.html")
);
});
This is my script in main.hbs (just like index.html).
if('serviceWorker' in navigator) {
window.addEventListener('load', () => {
navigator.serviceWorker.register('/serviceworker.js', { scope: '/' })
.then((reg) => console.log('Success: ', reg.scope))
.catch((err) => console.log('Failure: ', err));
})
}
I am making my website using express by the way.
I have tried pretty much every solution on stackoverflow but none seem to work.
Just for Information, I have also tried this for the 'fetch' part:
self.addEventListener('fetch', evt => {
evt.respondWith(
caches.match(evt.request).then(cacheRes => {
return cacheRes || fetch(evt.request).then(fetchRes => {
return caches.open(dynamicCacheName).then(cache => {
cache.put(evt.request.url, fetchRes.clone());
// check cached items size
limitCacheSize(dynamicCacheName, 15);
return fetchRes;
})
});
}).catch(() => {
return caches.match('offline.html');
})
);
}
);
(The above code also lets me login only once but doesn't let me logout unlike the previous code)
I have copy pasted almost every 'fetch' code on the internet but all of them have a problem with google auth (I am using passport for google auth).
This is my auth.js code:
const express = require("express");
const router = express.Router();
const passport = require("passport");
//Authenticate with google
//GET /auth/google
router.get("/google", passport.authenticate("google", { scope: ["profile"] }));
//Google auth callback
//GET /auth/google/callback
router.get(
"/google/callback",
passport.authenticate("google", { failureRedirect: "/" }),
function (req, res) {
// Successful authentication, redirect home.
res.redirect("/stories");
}
);
router.get("/logout", (req, res) => {
req.logout();
res.redirect("/");
});
module.exports = router;
You can also suggest a workaround with workbox

How to get total member count of any Discord server?

I'm trying to build a scraping script to get a bunch of Discord server's total members. I actually did that with Puppeteer like below but I think my IP address has been banned because I'm getting "Invite Invalid" error from Discord even though invite links are working.
My question is that does Discord have APIs to get any server's total member count? Or is there any 3rd party library for that purpose? Or any other method?
const puppeteer = require('puppeteer')
const discordMembers = async ({ server, browser }) => {
if (!server) return
let totalMembers
const page = await browser.newPage()
try {
await page.goto(`https://discord.com/invite/${server}`, {
timeout: 3000
})
const selector = '.pill-qMtBTq'
await page.waitForSelector(selector, {
timeout: 3000
})
const totalMembersContent = await page.evaluate(selector => {
return document.querySelectorAll(selector)[1].textContent
}, selector)
if (totalMembersContent) {
totalMembers = totalMembersContent
.replace(/ Members/, '')
.replace(/,/g, '')
totalMembers = parseInt(totalMembers)
}
} catch (err) {
console.log(err.message)
}
await page.close()
if (totalMembers) return totalMembers
}
const asyncForEach = async (array, callback) => {
for (let i = 0; i < array.length; i++) {
await callback(array[i], i, array)
}
}
const run = async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
})
const servers = ['tQp4pSE', '3P5K3dzgdB']
await asyncForEach(servers, async server => {
const members = await discordMembers({ server, browser })
console.log({ server, members })
// result
// { server: 'tQp4pSE', members: 57600 }
// { server: '3P5K3dzgdB', members: 159106 }
})
await browser.close()
}
run()
Update: Mar 22, 2022
Thanks for #Vaviloff's answer we can actually access Discord's private APIs but the problem is it's only accessible over browser. I'm getting Request failed with status code 400 issue from Axios. Is it a CORS issue? How do we get the results in a Node.js app?
const axios = require('axios')
const discordMembers = async ({ server }) => {
try {
const apiResult = await axios({
data: {},
method: 'get',
url: `https://discord.com/api/v9/invites/${server}?with_counts=true&with_expiration=true`
})
console.log(apiResult)
} catch (err) {
console.log(err)
}
}
discordMembers({ server: 'tQp4pSE' })
A lot of modern web applications have their own internal APIs. Oftentimes you can spot frontend making requests to it, by using Networking tab in Devtools (filter by Fetch/XHR type):
Such API endpoints can change any time of course, but usually the last for a long time and is a rather convenient way of scraping
Currently Discord uses this URL for basic instance description:
https://discord.com/api/v9/invites/tQp4pSE?with_counts=true&with_expiration=true
By accessing it you get the desired data:
Update
To make your code work don't send any data in the request:
const apiResult = await axios({
method: 'get',
url: `https://discord.com/api/v9/invites/${server}?with_counts=true&with_expiration=true`
})

Expo react native log doesnt appear on console

I am having a problem when trying to log fetch json response.
The log doesnt seem to log, last night it was all fine. This morning suddenly that data cat be logged
const fetchResults = async (text, offset = data.length, options = {}) => {
const { timeout = 8000 } = options;
const controller = new AbortController();
const requestUrl = `https://itunes.apple.com/search?limit=30&offset=${offset}&term=${text}`;
console.log(requestUrl);
const res = fetch(requestUrl)
.then((res) => {
console.log(res)
return res.json()
})
.then((res) => {
// The log below doesnt appear on console
console.log(res);
})
.catch((err) => {
console.error('Request failed', err);
});
};
Can someone help me
https://snack.expo.dev/#aguav/react-songlist-test
I Found it, the limit was too high (30) when i change to 10 it started to show again, maybe the log cant print large data

How do I split my Jest + Puppeteer tests in multiple files?

I am writing automated tests using Jest & Puppeteer for a Front-end application written in Vue.js
So far I managed to write a set of tests, but they all reside in the same file:
import puppeteer from 'puppeteer';
import faker from 'faker';
let page;
let browser;
const width = 860;
const height = 1080;
const homepage = 'http://localhost:8001/brt/';
const timeout = 1000 * 16;
beforeAll(async () => {
browser = await puppeteer.launch({
headless: false, // set to false if you want to see tests running live
slowMo: 30, // ms amount Puppeteer operations are slowed down by
args: [`--window-size=${width},${height}`],
});
page = await browser.newPage();
await page.setViewport({ width, height });
});
afterAll(() => {
browser.close();
});
describe('Homepage buttons', () => {
test('Gallery Button', async () => {
// navigate to the login view
await page.goto(homepage);
await page.waitFor(1000 * 0.5); // without this, the test gets stuck :(
await page.waitForSelector('[data-testid="navBarLoginBtn"]');
await page.click('[data-testid="navBarLoginBtn"]'),
await page.waitForSelector('[data-testid="navBarGalleryBtn"]');
await page.click('[data-testid="navBarGalleryBtn"]'),
// test: check if we got to the gallery view (by checking nr of tutorials)
await page.waitForSelector('.card-header');
const srcResultNumber = await page.$$eval('.card-header', (headers) => headers.length);
expect(srcResultNumber).toBeGreaterThan(1);
}, timeout);
});
describe('Register', () => {
const btnLoginToRegister = '#btn-login-to-register';
const btnRegister = '#btn-register';
const btnToLogin = '#btn-goto-login';
test('Register failed attempt: empty fields', async () => {
// navigate to the register form page via the login button
await page.goto(homepage);
await page.waitForSelector(navLoginBtn);
await page.click(navLoginBtn);
await page.waitForSelector(btnLoginToRegister);
await page.click(btnLoginToRegister);
// test; checking for error messages
await page.waitForSelector(btnRegister);
await page.click(btnRegister);
const errNumber = await page.$$eval('#errMessage', (err) => err.length);
expect(errNumber).toEqual(3);
}, timeout);
test('Register failed: invalid char count, email format', async () => {
// fill inputs
await page.waitForSelector('#userInput');
await page.type('#userInput', 'a');
await page.waitForSelector('#emailInput');
await page.type('#emailInput', 'a');
await page.waitForSelector('#emailInput');
await page.type('#passInput', 'a');
await page.waitForSelector(btnRegister);
await page.click(btnRegister);
// test: check if we 3 errors (one for each row), from the front end validations
const err = await page.$$eval('#errMessage', (errors) => errors.length);
expect(err).toEqual(3);
}, timeout);
test('Register: success', async () => {
await page.click('#userInput', { clickCount: 3 });
await page.type('#userInput', name1);
await page.click('#emailInput', { clickCount: 3 });
await page.type('#emailInput', email1);
await page.click('#passInput', { clickCount: 3 });
await page.type('#passInput', password1);
await page.waitForSelector(btnRegister);
await page.click(btnRegister);
// test: check if go to login link appeared
await page.waitForSelector(btnToLogin);
await page.click(btnToLogin);
// await Promise.all([
// page.click(btnToLogin),
// page.waitForNavigation(),
// ]);
}, timeout);
test('Register failed: email already taken', async () => {
// navigate back to the register form
await page.waitForSelector(btnLoginToRegister);
await page.click(btnLoginToRegister);
await page.click('#userInput');
await page.type('#userInput', name2);
await page.click('#emailInput');
await page.type('#emailInput', email1); // <- existing email
await page.click('#passInput');
await page.type('#passInput', password2);
await page.click(btnRegister);
const err = await page.$eval('#errMessage', (e) => e.innerHTML);
expect(err).toEqual('Email already taken');
}, timeout);
});
I would like to be able to have a single test file that does the beforeAll and afterAll stuff, and each test suite: HomepageButtons, Register, etc. to reside in it's own test file. How would I be able to achieve this?
I've tried splitting tets into:
testsUtils.js that would contain the beforeAll and afterAll hooks and code but it doesn't guarantee that it runs when it needs: the beforeAll code to fire before all other test files and the afterAll code to fire after all the test files finished.
Sorry, I'd rather comment on your question, but I don't have reputation for that. Anyway, I think that you are looking for something like a "global beforeAll" and "global afterAll" hooks, right? Jest has it alread. It's called "globalSetup" and "globalTeardown".
Take a look at globalSetup. Excerpt:
This option allows the use of a custom global setup module which
exports an async function that is triggered once before all test
suites.
The Global Teardown one goes the same.
I think you'll have a headache trying to get a reference to the page or browser in globalSetup/globalTeardown and I confess that I never try this. Maybe the answer for that problem (if you have it) is on this page, under "Custom example without jest-puppeteer preset section.
Also there is a repo that tries to facilitate Jest + Puppeteer integration. Maybe you find it util: repo.
Good luck. :)