scrapyd running as daemon cannot find spider or project - scrapy

The name of spider is quotes14 and it works well from command line
i.e if I run scrapy crawl quotes14 from the directory /var/www/html/sprojects/tutorial/ it works fine in command line.
I have scrapyd running as daemon.
My scrapy spider files are present here: /var/www/html/sprojects/tutorial/tutorial/spiders
I have many spiders and other files under the above directory and project is /var/www/html/sprojects/tutorial/tutorial/
I have tried
curl http://localhost:6800/schedule.json -d project=tutorial -d spider=spiders/quotes14
curl http://localhost:6800/schedule.json -d project=/var/www/html/sprojects/tutorial/tutorial/tutorial -d spider=quotes14
curl http://localhost:6800/schedule.json -d project=/var/www/html/sprojects/tutorial/tutorial/ -d spider=quotes14
curl http://localhost:6800/schedule.json -d project=/var/www/html/sprojects/tutorial/tutorial/tutorial -d spider=spiders/quotes14
It either says project not found or spider not found
Please help

In order to use the schedule endpoint you have to first deploy the spider to the daemon. The docs tell you how to do this.
Deploying your project involves eggifying it and uploading the egg to Scrapyd via the addversion.json endpoint. You can do this manually, but the easiest way is to use the scrapyd-deploy tool provided by scrapyd-client which will do it all for you.

Related

Locally test AWS Lambda container with .NET 5 web api and Lambda RIE

I'm following the instructions to locally test lambda container https://docs.aws.amazon.com/lambda/latest/dg/images-test.html
but I am unable to do so.
I've created a sample project to reproduce it https://gitlab.com/sunnyatticsoftware/sandbox/lambda-dotnet5-webapi (see the README for step by step on its generation)
Basically I am using an Amazon dotnet template that generates an AWS Lambda function as a .NET 5 web api using containers.
It's all good with the project. The Dockerfile is described as
FROM public.ecr.aws/lambda/dotnet:5.0
WORKDIR /var/task
COPY "bin/Release/net5.0/publish" .
Now I want to test it locally using the Amazon Lambda Runtime Interface Emulator (RIE) and these are the steps I follow:
Build project with dotnet build -c Release
Publish artifacts with dotnet publish -c Release
Build docker image with docker build -t lambda-dotnet .
Download the RIE with
mkdir -p ~/.aws-lambda-rie && curl -Lo ~/.aws-lambda-rie/aws-lambda-rie https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie && chmod +x ~/.aws-lambda-rie/aws-lambda-rie
I can see the emulator downloaded properly
ls -la ~/.aws-lambda-rie/aws-lambda-rie
-rw-r--r-- 1 diego.martin 1049089 8155136 Feb 22 14:32 /c/Users/diego.martin/.aws-lambda-rie/aws-lambda-rie
Run the emulator passing the lambda image
docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 --entrypoint /aws-lambda/aws-lambda-rie lambda-dotnet:latest
Here is when I get the error
12997dddc6e50aca3020527be30a1479eee9ceef412ab5009b99e9eb8cf1fa67
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "C:/Users/diego.martin/AppData/Local/Programs/Git/aws-lambda/aws-lambda-rie": stat C:/Users/diego.martin/AppData/Local/Programs/Git/aws-lambda/aws-lambda-rie: no such file or directory: unknown.
What am I missing? I am not specifying any entrypoint because I don't have any.
PS: The last step would be to send some lambda event to my container's function with
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
The lambda docker images for dotnet already include the RIE, so it's enough with the following (see repo with further details):
To build image
docker build -t lambda-dotnet:latest .
To run it
docker run -p 9000:8080 lambda-dotnet "LambdaDotNet5::LambdaDotNet5.LambdaEntryPoint::FunctionHandlerAsync"
And then to test it, I can use CURL from a different terminal
curl -vX POST http://localhost:9000/2015-03-31/functions/function/invocations -d #test_request.json --header "Content-Type: application/json"
and in the test_request.json file I can have the json for the event I want to send to the lambda.

docker-selenium on customized /etc/hosts file ?

I have an docker image that contains a maven selenium project, and it is to test on host "dev-mock.abc.com". Following is my docker command to trigger the selenium tests to be executed.
docker run --rm --privileged \
--add-host="dev-mock.abc.com:123.45.67.89" \
${selenium-image}
What I have found is, during the runtime, the /etc/hosts of that container has been updated with this entry: "123.45.67.89 dev-mock.abc.com", but during the selenium execution, it still can not solve this "dev-mock.abc.com" name.
Does anyone know if selenium gets the customized entries in the /etc/hosts file, when it is being executed ? Thanks.
Maybe /etc/nsswitch.conf with correct content in your container is missing, so selenium "skips" /etc/hosts and is trying to use DNS. Try:
echo "hosts: files dns" > /tmp/nsswitch.conf
docker run --rm --privileged \
--volume /tmp/nsswitch.conf:/etc/nsswitch.conf \
--add-host="dev-mock.abc.com:123.45.67.89" \
${selenium-image}
Another option will be to edit /etc/hosts on your host OS and then use host OS networking for selenium container:
docker run --rm --privileged \
--net=host \
${selenium-image}

"Startup File" on Azure Docker Web App

Is the "Startup File" option on the docker web app options for docker-compose files? or shell commands? I cannot find any documentation for it...
Basically I'd like my Web App to run a docker-compose.yml instead of executing docker run [options] when I push an image to it.
This is documented now, see below or click here.
What are the expected values for the Startup File section when I
configure the runtime stack?
For Node.js, you specify the PM2 configuration file or your script
file. For .NET Core, specify your compiled DLL name as dotnet <myapp>.dll. For Ruby, you can specify the Ruby script that you want
to initialize your app with.
Not sure if this is still a problem but I just noticed it appends whatever you put in there to the default startup command.
2019-09-02 05:03:04.493 INFO - docker run -d -p 55721:80 --name xxxxxx -e WEBSITES_ENABLE_APP_SERVICE_STORAGE=false -e WEBSITE_SITE_NAME=xxxxx -e WEBSITE_AUTH_ENABLED=False -e PORT=80 -e WEBSITE_ROLE_INSTANCE_ID=0 -e WEBSITE_HOSTNAME=xxxxxx.azurewebsites.net -e WEBSITE_INSTANCE_ID=xxxxxxxxx -e HTTP_LOGGING_ENABLED=1 xxxxxx.azurecr.io/xxxxxxx:latest -p 80:4000 -p 443:8000
I put the -p 80:4000 -p 443:8000 into the textbox in the portal config
Azure Web Apps for Containers does not support multi-container apps (with docker-compose) at the time of writing.

SauceLabs Pass/Fail using behat

I am trying to add Pass/Fail status in Saucelabs whenever I run an Automated test but I can't figure out how shall I do it. I use Behat - Selenium Driver. I read the documentation but it didn't help me.
I tried to use the Saucelabs Rest API guide and I launch in my console the following
curl -X PUTĀ \
-s -d '{"passed":true}' \
-u https://USERNAME:APIKEY#saucelabs.com/rest/v1/users/USERNAME
But it doesn't work.
I think you need the session Id
ownCloud uses:
curl -X PUT -s -d "{\"passed\": $PASSED}" -u $SAUCE_USERNAME:$SAUCE_ACCESS_KEY https://saucelabs.com/rest/v1/$SAUCE_USERNAME/jobs/$SAUCELABS_SESSIONID
see: https://github.com/owncloud/core/blob/master/tests/travis/start_ui_tests.sh#L235
and this Id is pulled from the URL: https://github.com/owncloud/core/blob/master/tests/ui/features/bootstrap/FeatureContext.php#L171
but there might be better ways of getting it

Keep scrapyd running

I have scrapy and scrapyd installed on a debian machine. I log in to this server using a ssh-tunnel. I then start scrapyd by going:
scrapyd
Scrapyd starts up fine and I then open up another ssh-tunnel to the server and schedule my spider with:
curl localhost:6800/schedule.json -d project=myproject -d spider=myspider
The spider runs nicely and everything is fine.
The problem is that scrapyd stops running when I quit the session where I started up scrapyd. This prevents me from using cron to schdedule spiders with scrapyd since scrapyd isn't running when the cronjob is launched.
My simple question is: How do I keep scrapyd running so that it doesn't shut down when I quit the ssh session.
Run it in a screen session:
$ screen
$ scrapyd
# hit ctrl-a, then d to detach from that screen
$ screen -r # to re-attach to your scrapyd process
You might consider launching scrapyd with supervisor.
And there is a good .conf script available as a gist here:
https://github.com/JallyHe/scrapyd/blob/master/supervisord.conf
How about ?
$ sudo service scrapyd start