I'm trying to combine puppeter-extra with express. For each request I will be able to load a different plugin, for example:
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
const router = express.Router();
router.get("/", async (req, res) => {
const { useStealth } = req.query
if(useStealth) puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch(parameters..)
const page = await browser.newPage();
})
The problem is that, when I send the first request with query useStealth, it will set in node cache puppeteer-extra to use StealthPlugin, so the others next requests will use it. I tried to solve this problem by clear node cache, it works but it's a problem for concorrent requests. My code to try to solve it (But it has the concorrent request problem):
delete require.cache[require.resolve('puppeteer-extra')];
puppeteer = require('puppeteer-extra');
Is there anyway to clean puppeteer.use function ? (So, It would be a new instance of puppeteer-extra per request)
Thanks!
Related
Well currently I am disallowing all file uploads to routes by setting up the server like:
const upload = multer();
const server = express();
module.exports = () => {
// ...
server.use(logger('dev'));
server.use(express.json());
server.use(express.urlencoded({ extended: false }));
server.use(express.raw());
server.use(cookieParser());
server.use(express.static(path.join(projectRoot, 'public')));
server.set('trust proxy', 1);
server.use(upload.none());
server.use('/', router);
// ...
}
Which correctly blocks all files. Now I wish to allow uploading files only in the POST request to /test:
import * as express from "express";
import multer from "multer";
const upload = multer({storage: multer.memoryStorage()});
const router = express.Router();
router.post('/test', upload.single('pdf'), function(req, res, next) {
const r = 'respond with a test - POST';
res.send(r);
});
However when I try to use this in postman I get the error "multerError", "LIMIT_UNEXPECTED_FILE" for the field 'pdf'. I notice that if I remove the line server.use(multer.none()) it works, but then I can upload files to any place anyways, not exactly what I like?
Nothing will be uploaded to your server unless you specify a multer middleware on the entire server, on a route, or on a particular path. So you can safely remove the server.use(upload.none());.
The middleware will then not try to consume the payload of the incoming request. How much load the receiving (without consumption) of the payload causes on the server, I don't know, but you could theoretically destroy the connection whenever the client tries to submit a payload:
req.on("data", function() {
req.destroy();
});
But perhaps the creation of new connection afterwards causes more load on the server overall.
I have been trying to solve this issue for the past 2 days and haven't been able to. I've looked this up everywhere and still no solution.. Here's the code:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const PROXY_SERVER_IP = 'IP.IP.IP.IP';
const PROXY_SERVER_PORT = '1234';
const PROXY_USERNAME = 'username';
const PROXY_PASSWORD = 'password';
(async () => {
const browser = await puppeteer.launch({
args: [`--proxy-server=http://${PROXY_SERVER_IP}:${PROXY_SERVER_PORT}`],
});
const page = await browser.newPage();
await page.authenticate({
username: PROXY_USERNAME,
password: PROXY_PASSWORD,
});
await page.goto('https://www.google.ca/', {
timeout: 0,
});
await page.screenshot({ path: 'test4.png', fullPage: true });
await browser.close();
})();
I get a navigation timeout error on the page.goto() call because it just hangs for some reason. I can't figure out why. When I put a proxy that doesn't require authentication, it works. I'm thinking of switching to another headless solution because of this one issue and I would really appreciate some help.
So I figured it out. Turns out the proxy was really bad for some reason. The reason why Axios and cURL gave fast responses was because they just get the initial HTML code and unlike headless browsers, don't actually do anything with HTML text. With headless browsers, they actually make all the requests for the assets as well (css, images, etc.) and any other network requests and it's all going through the proxy, so it's much slower. When I tried a different proxy (one that requires authentication), it was much faster.
I am trying to implement some middleware in Express that should be called for all routes. This middleware should alter the request object.
I've tried several things already but seem to keep having the same issue. Soon as the middleware is left it looks like the request object is changed back to it's original state.
Currently my code resembles (I simplified it with a minimalistic example):
route.js:
const express = require('express');
const router = express.Router();
router.get('/getMe', (req, res) => {
// return the desired data.
// I expect req.params.myString to exist here but it does not.
});
module.exports = router;
index.js:
const express = require('express');
const router = express.Router();
router.use('/', require('./route'));
module.exports = router;
app.js:
const express = require('express');
const app = express();
const routes = require('./index');
app.use((req, res, next) => {
// Adding req.params.myString to the request object.
if (req.params.myString === undefined) req.params.myString = 'hello world';
next();
});
app.use('/api', routes);
As you can see I left out some of the code to keep it more readable. This is the code that gets the response and sets up the server.
Again, I am expecting that req.params.myString becomes available in the endpoint. Does anyone see what I am doing wrong?
In express docs ( http://expressjs.com/en/api.html#req.params ) it says:
If you need to make changes to a key in req.params, use the app.param
handler. Changes are applicable only to parameters already defined in
the route path.
So you need to check app.param handler.
http://expressjs.com/en/4x/api.html#app.param
You should app.set("myString", "hello World") inside your app.js and then you can access the field in your route.js/index.js scripts by using req.app.get("myString"). Or this should work too, set it like app.myString = "Hello world" and access it like req.app.myString.
Here are my routes:
app.get('/signUp', routes.signUp);
app.post('/signUp' , routes.signUp);
Here is my separate file for routes.
exports.signUp = function(req, res) {
res.render('signUp');
};
The second block of code is behaviour I want in response to a get request.
How do I respond to a post request? I have already tied up the signUp function with behaviour that responds to get. Do I bundle up the post behaviour in the same function and render the sign up page again? Suppose I simply want to render the view, I don't want the post behaviour to execute in that case so it would be strange to bundle those together.
I believe the express router module should resolve this for you.
route file -
var express = require('express');
var router = express.Router();
router.route("/")
.get(function (req, res) {
res.render('signUp');
})
.post(function (req, res) {
//do something else
})
module.exports = router
index.js/app.js/server.js/whatever you call it.
//..
signUp = require("./routes/signup.js"); //or wherever this is
//...
app.use("/signUp", signUp);
//..
quick question regarding using React-Router. I'm having trouble getting my server to handle pushState (if this is the correct term). Originally, I was using a module called connect-history-api-fallback, which was a middleware that enabled me to only server up static files form my dist directory. Visiting the client www.example.com obviously worked and I could navigate throughout the site, additionally, refreshing at any route like www.example.com/about - could also work.
However, I recently added one simple API endpoint on my Express server for the React app/client to ping. The problem now is that while I can get the initial page load to work (and thus the /api/news call to work, to fetch data from a remote service), I can no longer do a refresh on any other routes. For example, now going to www.example.com/about will result in a failed GET request for /about. How can I remediate this? Really appreciate the help! PS - not sure if it matters, but I'm considering implementing Server Side Rendering later on.
import express from 'express';
import historyApiFallback from 'connect-history-api-fallback';
import config from '../config';
import chalk from 'chalk';
import fetch from 'node-fetch';
import path from 'path';
const app = express();
// FIXME: Unsure whether or not this can be used.
// app.use(historyApiFallback({
// verbose : true
// }));
//// DEVELOPMENT MODE ONLY - USING EXPRESS + HMR ////
/* Enable webpack middleware for hot module reloading */
if (config.get('globals').__DEV__) {
const webpack = require('webpack');
const webpackConfig = require('../build/webpack/development_hot');
const compiler = webpack(webpackConfig);
app.use(require('./middleware/webpack-dev')({
compiler,
publicPath : webpackConfig.output.publicPath
}));
app.use(require('./middleware/webpack-hmr')({ compiler }));
}
//// PRODUCTION MODE ONLY - EXPRESS SERVER /////
if (config.get('globals').__PROD__) {
app.use(express.static(__dirname + '/dist'));
}
//// API ENDPOINTS FOR ALL ENV ////
app.get('/api/news', function (req, res) {
fetch('http://app-service:5000/news')
.then( response => response.json() )
.then( data => res.send(data) )
.catch( () => res.sendStatus(404) );
});
// Wildcard route set up to capture other requests (currently getting undexpected token '<' error in console)
app.get('*', function (req, res) {
res.sendFile(path.resolve(__dirname, '../dist', 'index.html'));
});
export default app;
Express works by implementing a series of middleware that you "plug in" in order via .use. The cool thing is your routes are also just middlware — so you can separate them out, have them before your history fallback, and then only requests that make it past your routes (e.g., didn't match any routes) will hit the fallback.
Try something like the following:
const app = express();
// ...
var routes = exprss.Router();
routes.get('/api/news', function (req, res) {
fetch('http://app-service:5000/news')
.then( response => response.json() )
.then( data => res.send(data) )
.catch( () => res.sendStatus(404) );
});
app.use(routes);
app.use(historyApiFallback({
verbose : true
}));