Import.io some crawlers don't have the button for crawl locally - import.io

I was creating some crawlers using import.io however seems like for some of them the option for run locally is not showing. Anyone knows why they dont have the run from local button or how i can make to put in the crawlers?

If you don't see the option to run the crawler [Remotely or locally] then it means that your crawler is already running locally only.

When you save a crawler, import.io does a few checks to see if it can be run remotely on our servers, in some cases this increases chances of crawlers working as the servers do additional processing.
If those check fail, then the crawler can only run locally and therefore your crawler will be run locally be default.

Related

Automatically Update PM2 Processes

I'm looking to automate how my bots are updated. They're hosted on a VPS from GalaxyGate and kept alive using PM2. I use VSCode to develop them, then I push to GitHub and pull the code from there to the VPS to update them directly in production. However, this currently has to be done manually and I always forget the git commands that I'm supposed to run.
I'm looking to have this process automated; ideally the server would periodically run some sort of script to update all bots that are hosted on it. It would fetch GitHub and apply the new changes (if any) and properly restart the bots, so I have a few questions:
Is this possible, and if so, how would I go about doing that?
Are there any risks/downsides with such a system?
Any help would be appreciated

Load Testing with Asset Loading

I'm currently trying to load test a homepage I develop. Till now Loader.io was good enough for my purposes, but I realized it does not download/use the embedded assets.
Is there a load test service, which get's as close as possible to real users?
I haven't found anything until now. Hopefully somebody of you guys knows a suitable service.
Thanks in advance!
Apache JMeter does it for sure:
See Web Testing with JMeter: How To Properly Handle Embedded Resources in HTML Responses
Moreover it simulates browser cache via HTTP Cache Manager
If you rather looking for a "service" there are several options of running a JMeter test in the cloud starting from shell scripts like JMeter ec2 Script and ending up with end-to-end solutions like Flood.io or BlazeMeter

Selenium test not loading some specific URLs

Using selenium through python on AWS Linux server, when the test start it doesn't load the page, the strange thing is if I try to run a test using a url from google or facebook the test works, I used curl and links commands to see if I have access from the server and they work so not sure what could be the issue.
Any help is appreciated.

what are the advantages use scrapyd?

The scrapy doc says that:
Scrapy comes with a built-in service, called “Scrapyd”, which allows you to deploy (aka. upload) your projects and control their spiders using a JSON web service.
is there some advantages in comformance use scrapyd?
Scrapyd allows you to run scrapy on a different machine than the one you are using via a handy web API which means you can just use curl or even a web browser to upload new project versions and run them. Otherwise if you wanted to run Scrapy in the cloud somewhere you would have to scp copy the new spider code and then login with ssh and spawn your scrapy crawl myspider.
Scrapyd will also manage processes for you if you want to run many spiders in parallel; but if you have Scrapy on your local machine and have access to the command-line or a way to run spiders and just want to run one spider at a time, then you're better off running the spider manually.
If you are developing spiders then for sure you don't want to use scrapyd for quick compile/test iterations as it just adds a layer of complexity.

Is there any error checking web app cralwers out there?

Wondering if there was some sort of crawler we could use to test and re-test everything when changes are made to the web app so we know some new change didn't error out any existing pages. Or maybe a web browser with a million frames so I could scroll down and look through the tiles to find any error pages... you get the idea.
Selenium will let you test forms and write and automate scripts. This is a firefox add in and is quite powerful. You can manually write the scripts and also "record" them
Jmeter will let you create scripts and then run them as multiple users to test and load test web sites as a whole. This a stand alone application and can mimic multiple users and randomise access etc. and loading to stress test the application.
You could presumably use both to error test by monitoring the output logs from them to catch errors.
Both will allow you to authenticate to log on to sites.