How to deploy more than one spider that shares root directory with scrapyd? - scrapy

I'm kinda new to scrapy and I'm building a project that plans to have multiple spiders. I read in the docs that multiple projects can exists on the same root directory of the folder that contains the scrapy.cfg file and then add the additional settings in the file.
I thought that it was a good idea, since each spider has a different logic and uses different pipelines (or same pipeline logic, but the process is different).
I built two spiders using this configuration:
main_folder:
scrapy.cfg
project1
items.py
pipelines.py
settings.py
...
spiders
spider1.py
project2
items.py
pipelines.py
settings.py
...
spiders
spider2.py
Using this configuration I ran scrapyd and could deploy the first spider from project1, but when trying to deploy the spider at project2 it gets deployed with the spider from project1 :(.
My scrapy.cfg looks like this:
[settings]
default = project1.settings
project2 = project2.settings
[deploy:project1]
url = http://localhost:6800/
project = project1
[deploy:project2]
url = http://localhost:6800/
project = project2
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
username =
password =
debug = off
runner = scrapyd.runner
jobstorage = scrapyd.jobstorage.MemoryJobStorage
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
eggstorage = scrapyd.eggstorage.FilesystemEggStorage
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
I have two questions:
If I'm expecting to crawl many sites (lets say 20), is this a "good" way to handle different spiders that crawl different sites? Or my whole configuration is wrong and I should actually have two separated folders?
How I can deploy my second spider? I have tried different configurations in the scrapy.cfg file and ran scrapyd-deploy with different targets and projects, but I can't manage to get the spider2 deployed in scrapyd.

Related

How to use terraform to ignore previous execution(state) [duplicate]

I'm a bit of a newbie with Terraform and still working my way through the documentation, have not yet been able to find a way to accomodate the set up I need to achieve for a specific solution and hoping that some kind soul may be able to give me a push in the right direction.
I'm trying to manage a single set of paramaterised templates which deploy everything needed to support a new application we are working on in GCP. What I am trying to achieve is being able to deploy those templates to three different environments, each environment being in a distinct GCP project, by itself.
The plan is, as per recommendations, run terraform and pass in
a) The specific .tfvars file depending on the environment/project being deployed to (dev/test/prod).
b) Use the -chdir parameter to tell Terraform to pick up all the templates from 'infra-common' folder.
The tricky part is that we want each environment (gcp project) to host it's own state file in gcs/storage.
I had been looking at workspaces but it appears that workspaces will just create state subfolders on a single backend.
Question: Can this be done or is there a better way to do it?
Thanks!
You can use --backend-config for this. Here's how you can achieve the desired behavior:
Create a .config file for each environment (dev.config, test.config, prod.config) which contain the name of the gcs bucket (which must already exist) for the respective environment
Specify the common backend in a single remote_state.tf file
Here's how it would look:
config/dev.config:
bucket = "tf-state-dev"
config/test.config:
bucket = "tf-state-test"
config/prod.config:
bucket = "tf-state-prod"
remote_state.tf:
terraform {
backend "gcs" {
prefix = "terraform/state"
}
}
then, you can run the init. So for example, for dev this would look like:
$ terraform init --backend-config=config/dev.config
then, you can create a workspace for the environment:
$ terraform workspace new dev
With this approach, you can use a single set of templates (you can in fact configure dynamic variables based on the current workspace).
What you could do (we have a project with a similar setup with a different cloud provider), is:
use infra-common as a module
instead of working with .tfvar files per environment, use a separate root module per environment which invokes infra-common as sub-module.
Your folder structure could look like:
project
|-- dev
| `-- main.tf
|-- modules
| `-- infra-common
| |-- main.tf
| `-- variables.tf
|-- test
| `-- main.tf
`-- prod
`-- main.tf
dev/main.tf
terraform {
backend "gcs" {
bucket = "tf-state-dev"
prefix = "terraform/state"
}
}
module "stage" {
source = "../modules/infra-common"
env = "dev"
some_var = "value"
}
prod/main.tf
terraform {
backend "gcs" {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}
module "stage" {
source = "../modules/infra-common"
env = "prod"
some_var = "value"
}

Detecting if we are running in dev mode or in production mode in CloudFlare Worker

I need to know in my worker script if it is running using wrangler dev locally
or it is running at cloudflare after wrangelr publish.
Is there an environment variable that tells me that, or a request headers?
Code snippet would be highly appriciated.
Thank you.
There isn't a builtin variable, but you can populate such info yourself by defining environments in your wrangler.toml
For example, if we say the topmost [vars] are meant to be used in production, we can declare another variable set meant to be used in local environment. (the environment name is irrelevant)
type = "webpack"
webpack_config = "webpack.config.js"
# these will be used in production
vars = { WORKER_ENV = "production", SENTRY_ENABLED = true }
[env.local]
# these will be used only when --env=local
vars = { WORKER_ENV = "local", SENTRY_ENABLED = false }
From then on, if you run your worker locally using
wrangler dev --env=local
the value of binding WORKER_ENV will be populated as defined under [env.local.vars].
By the way, the syntax of wrangler.toml above is equivalent to
type = "webpack"
webpack_config = "webpack.config.js"
[vars]
WORKER_ENV = "production"
SENTRY_ENABLED = true
[env]
[env.local]
[env.local.vars]
WORKER_ENV = "local"
SENTRY_ENABLED = false
Which I believe it's easier to understand
I believe there is an official update for this now (as of November 2022):
When developing locally, you can create a .dev.vars file in the
project root which allows you to define variables that will be used
when running wrangler dev or wrangler pages dev, as opposed to using
another environment and [vars] in wrangler.toml.
Cloudflare Workers documentation

VueJS place multiple .env in folder

Hello I'm using VueJS 2 and I have multiple .env in my project.
My app have .env for each company to select the company configuration (skin color / files...)
Actually I have all my .env in the root folder:
.env.company1-dev
.env.company1-staging
.env.company1-prod
.env.company2-dev
.env.company2-staging
.env.company2-prod
.env.company3-dev
.env.company3-staging
.env.company3-prod
So when I'll get 20 companies it will be confused on my root folder so it is possible to create a folder where I can place all my .env ?
The idea :
/environments/company1/
.env.dev
.env.staging
.env.prod
/environments/company2/
.env.dev
.env.staging
.env.prod
/environments/company3/
.env.dev
.env.staging
.env.prod
On your vue.config.js file you can add:
const dotenv = require("dotenv");
const path = require("path");
let envfile = ".env";
if (process.env.NODE_ENV) {
envfile += "." + process.env.NODE_ENV;
}
const result = dotenv.config({
path: path.resolve(`environments/${process.env.VUE_APP_COMPANY}`, envfile)
});
// optional: check for errors
if (result.error) {
throw result.error;
}
the before run you can set VUE_APP_COMPANY to a company name and run your app,
Note: It's important to put this code on vue.config.js and not in main.js because dotenv will use fs to read files.
References
https://github.com/motdotla/dotenv#path
https://github.com/vuejs/vue-cli/issues/787
https://cli.vuejs.org/guide/mode-and-env.html#environment-variables
The accepted answer we have also used in the past. But I found a better solution to handle different environments. Using the npm package dotenv-flow allows not only the use of different environments but has some more benefits like:
local overwriting of variables by using .env.local or .env.staging.local and so on
definition of defaults using .env.defaults
In combination we have set up our projects with this configuration:
.env
.env.defaults
.env.development
.env.production
.env.staging
.env.test
And the only thing you have to do in your vue.config.js, nuxt.config.js or other entry points is
require('dotenv-flow').config()
Reference: https://www.npmjs.com/package/dotenv-flow
The powershell solution
I was handling exactly the same problem. Accepted solution is kind of ok, but it did not solve all differences between companies. Also, if you are using npm, your scripts can look nasty. So if you have powershell, here is what I suggest - get rid of the .env files :)
You can keep your structure like you want in the question. Just convert the env files to ps1.
/build/company1/
build-dev.ps1
build-stage.ps1
build-prod.ps1
/build/company2/
build-dev.ps1
build-stage.ps1
build-prod.ps1
Inside each of those, you can fully customize all env variables, run build process and apply some advanced post-build logic (like careful auto-deploy, publishing, merging with api project, ..).
So for example company1\build-stage.ps1 can look like this:
# You can pass some arguments to the script
param (
[string]$appName = "company1"
)
# Set environment variables for vue pipeline
$env:VUE_APP_ENVIRONMENT = "company1-stage";
$env:NODE_ENV="development";
$env:VUE_APP_NAME=$appName;
$env:VUE_APP_API_BASE_URL="https://company1.stage.mycompany.com"
# Run the vue pipeline build
vue-cli-service build;
# Any additional logic e.g.
# Copy-Item -Path "./dist" -Destination "my-server/my-app" -Recurse¨
Last part is easy - just call it (manualy or from integration service like TeamCity). Or, you can put it inside package.json.
...
"scripts": {
"build-company1-stage": "#powershell -Command ./build/company1/build-stage.ps1 -appName Company-One",
}
...
The you can call whole build process just by calling
npm run build-company1-stage
Similary, you can create localhost, dev, prod, test and any other environment. Let the javascript handle the part of building the app itself. For other advanced work, use poweshell. I think that this solution gives you much more flexibility for configuration and build process.
P.S.
I know that this way I'm merging configuration and build process, but you can always extract the configuration outside the file if it gets bigger.

Phalcon Dev Tools - Builder doesn't knows where is the models directory

I'm getting "Builder doesn't knows where is the models directory" error when I run the phalcon all-models command in both Command Line and Phalcon Webtools.
Please let me know what am I missing?
My webtools.config.php content
define('PTOOLS_IP', '216.174.134.2');
define('PTOOLSPATH', '/var/www/html/vendor/phalcon/devtools');
My webtools.php content
use Phalcon\Web\Tools;
require 'webtools.config.php';
require PTOOLSPATH . '/scripts/Phalcon/Web/Tools.php';
Tools::main(PTOOLSPATH, PTOOLS_IP);
My config.ini content
[database]
adapter = Mysql
host = localhost
username = test
password = test
dbname = test
[application]
controllersDir = ../app/controllers/
modelsDir = ../app/models/
viewsDir = ../app/views/
pluginsDir = ../app/plugins/
libraryDir = ../app/library/
cacheDir = ../app/cache/
baseUri = /
[models]
metadata.adapter = "Memory"
I have change the modelsDir from ../app/models/ to /../app/models/ but still not working.
ANSWER FOUND:
Going to project root directory and type the command (instructions)
# phalcon all-models --directory /var/www/html/projec_name
I speficied the --directory which is the Base path on which project will be created.
Thank you colburton for helping me debug this problem. Much appreciated.
Going to project root directory and type the command (instructions)
# phalcon all-models --directory /var/www/html/projec_name
I speficied the --directory which is the Base path on which project will be created.
Thank you colburton for helping me debug this problem. Much appreciated.
In the options array you pass to the builder you need to add 'modelsDir' with the correct path.
On this page you can find a video with the webtools. There is a tab for "Configuration", where you can set them.
It is also located in the config.ini under app/config
I am using Phalcon 3. After generating a project with phalcon console tool I encountered this error.
There is an easy way out to resolve this issue. Change the following settings in app/config/config.ini if you have one.
[application]
controllersDir = app/controllers/
modelsDir = app/models/
viewsDir = app/views/
pluginsDir = app/plugins/
libraryDir = app/library/
cacheDir = ../cache/

Using xDebug with Yii with custom folder structure

The problem is that breaks don't work with files, that were moved out of webroot.
When all files were under the webroot all was OK:
z:\home\mysite.dev\www\framework\
z:\home\mysite.dev\www\protected\
z:\home\mysite.dev\www\index.php
Because of the specifics of the project, I have moved /framework and /protected out of webroot:
z:\common\yii\framework\
z:\home\mysite.dev\protected\
z:\home\mysite.dev\www\index.php
And so now the breaks on index.php in some /framework files are working, other don't. It seems to be I should make some tricky server mapping for xdebug, can anybody give me a hint?
xdebug settings from php.ini:
zend_extension="\usr\local\php5\ext\php_xdebug-2.2.0-5.3-vc9.dll"
xdebug.auto_trace = 0
xdebug.default_enable = 1
xdebug.idekey = "PHPSTORM"
xdebug.manual_url = "http://www.php.net"
xdebug.remote_enable = 1
xdebug.remote_handler = "dbgp"
xdebug.remote_host = "localhost"
xdebug.remote_mode = "req"
xdebug.remote_port = 9000
Win7x64
Denwer3
PHPStorm 6.0.3