Scrapy : How to write a HttpProxyMiddleware? - scrapy

The docs of Scrapy says about HttpProxyMiddleware is like this:
This middleware sets the HTTP proxy to use for requests, by setting the proxy meta value for Request objects.
Like the Python standard library modules urllib and urllib2, it obeys the following environment variables:
http_proxy
https_proxy
no_proxy
You can also set the meta key proxy per-request, to a value like http://some_proxy_server:port or http://username:password#some_proxy_server:port. Keep in mind this value will take precedence over http_proxy/https_proxy environment variables, and it will also ignore no_proxy environment variable.
docs:https://doc.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=Proxy#module-scrapy.downloadermiddlewares.httpproxy
But there are no examples in the docs.
I have no ideas how to write a HttpProxyMiddleware.
Are there any suggestions?

In settings.py just do this.
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 100
}
And then while yielding each request do this
yield Request(meta={'proxy': "http://%s"%(random.choice(["IP:PORT", "IP:PORT"]))})
That's it!

You don't need to write one. HttpProxyMiddleware already exists in Scrapy.
As docs state, there are two ways of letting Scrapy know you need your requests to go through a proxy:
Setting environment variables
(e.g. from the command line)
export http_proxy="http://username:password#host:port"
You can also set the meta key "proxy" per-request, to a value like http://some_proxy_server:port or http://username:password#some_proxy_server:port.
Keep in mind this value will take precedence over http_proxy/https_proxy environment variables, and it will also ignore no_proxy environment variable
e.g.
yield Request("http://google.com", meta={'proxy':'http://username:password#some_proxy_server:port'}, callback=self.some_method)

Related

Flowable - concat dynamic variable and a string in http task (url)

I need to convert the base url according to the production and other environments.
I am using script task before a http task to perform this logic.
baseUrl = http://localhost:8080
baseUrl, is the output of the script task. Now I need to add this base url as a prefix in http task url
Url = ${baseUrl}/application/find (something like this).
I am getting the following issue
Unknown Property used in the expression ${baseUrl}/application/find
Script
var env = execution.getVariable("env")
if(env == "prod") {
var baseUrl = "http://localhost:8080";
execution.setVariable("baseUrl", baseUrl);
}
Please assist.
This typically means that it is unable to find a property in the expression (as the message says). The only expression you are using is baseUrl which means that the issue is around the baseUrl. The concatenation as you have done it is correct and doesn't need to have an adaption.
You should check if the variable really exists, this you can do by introducing a wait state before your HTTP task and check afterwards if the variable is created. Rather than using outputs, you can also use the Java API in your script task to create the variable:
execution.setVariable("baseUrl", "http://localhost:8080");
Assuming you are using Spring Boot, for your specific use-case it would be also an option to use the application.properties to specify your base-url and then refer to the baseUrl with the following expression:
${environment.getProperty("baseUrl")}/application/find
This will allow you to change the baseUrl independent of your process definition.

Postman: How to use global variable defined in Pre-request Script in Tests Script

I have defined one global variable in a Pre-request Script.
I want to compare this global variable with variable present in the response.
As the warning message says, you're running a very old version of Postman and it's probably the chrome extension.
This is now several major versions behind and the pm.* functionality is not included in that old version of the chrome extension.
Download the native application and start using the newest version of Postman. By not doing this, you're missing out on so many new features.
As #Danny mentioned, it is recommended to update to the latest version.
Now to your question, if you want to compare the global variable with workkard_number present in response, you need to first parse the response and get the workkard_number in it, which you can then compare with your global variable. You could try something like this in your test script:
var jsonData = JSON.parse(responseBody);
var responseWorkkardNumber = jsonData.wokkard_number;
You can retreive the workkard_number in the response like this(assuming that your response is a json with "workkard_number" as a key in it. Then you can compare it as follows:
tests["workkard_numbers are equal"] = responseWorkkardNumber === globals.workkard_number;
or
tests["workkard_numbers are equal"] = responseWorkkardNumber === pm.globals.get("workkard_number");
Also note - "Warning - Environment and global variables will always be stored as strings. If you're storing objects/arrays, be sure to JSON.stringify() them before storing, and JSON.parse() them while retrieving." - https://www.getpostman.com/docs/v6/postman/environments_and_globals/manage_environments

How to set Neo4J config keys in gremlin-scala?

When running a Neo4J database server standalone (on Ubuntu 14.04), configuration options are set for the global installation in etc/neo4j/neo4j.conf or possibly $NEO4J_HOME/conf/neo4j.conf.
However, when instantiating a Neo4j database from Java or Scala using Apache's Neo4jGraph class (org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph), there is no global installation, and the constructor does not (as far as I can tell) look for any configuration files.
In particular, when running the test suite for my application, I end up with many simultaneous instances of Neo4jGraph, which ends up throwing a java.net.BindException: Address already in use because all of these instances are trying to communicate over a small range of ports for online backup, which I don't actually need. These channels are set with config options dbms.backup.address (default value: 127.0.0.1:6362-6372) and dbms.backup.enabled (default value: true).
My problem would be solved by setting dbms.backup.enabled to false, or expanding the port range.
Things that have not worked:
Creating /etc/neo4j/neo4j.conf containing the line dbms.backup.enabled=false.
Creating the same file in my project's src/main/resources directory.
Creating the same file in src/main/resources/neo4j.
Manually setting the configuration property inside the Scala code:
val db = new Neo4jGraph(dataDirectory)
db.configuration.addProperty("dbms.backup.enabled",false)
or
db.configuration.addProperty("neo4j.conf.dbms.backup.enabled",false)
or
db.configuration.addProperty("gremlin.neo4j.conf.dbms.backup.enabled",false)
How should I go about setting this property?
Neo4jGraph configuration through TinkerPop is accomplished by a pass-through of configuration keys. In TinkerPop 3.x, that would mean that all Neo4j keys prefixed with gremlin.neo4j.conf that are provided via Configuration object to Neo4jGraph.open() or GraphFactory.open() will be passed down directly to the Neo4j instance. You can see examples of this here in the TinkerPop documentation on high availability configuration.
In TinkerPop 2.x, the same approach was taken however the key prefix was instead blueprints.neo4j.conf.* as discussed here.
Manipulating db.configuration after the database connection had already been opened was definitely futile.
stephen mallette's answer was on the right track, but this particular configuration doesn't appear to pass through in the way his linked example does. There is a naming mismatch between the configuration keys expected in neo4j.conf and those expected in org.neo4j.backup.OnlineBackupKernelExtension. Instead of dbms.backup.address and dbms.backup.enabled, that class looks for config keys online_backup_server and online_backup_enabled.
I was not able to get these keys passed down to the underlying Neo4jGraphAPI instance correctly. What I had to do, instead, was the following:
import org.neo4j.tinkerpop.api.impl.Neo4jFactoryImpl
import scala.collection.JavaConverters._
val factory = new Neo4jFactoryImpl()
val config = Map(
"online_backup_enabled" -> "true",
"online_backup_server" -> "0.0.0.0:6350-6359"
).asJava
val db = Neo4jGraph.open(factory.newGraphDatabase(dataDirectory,config))
With this initialization, the instance correctly listened for backups on port 6350; changing "true" to "false" disabled backup listening.
Using Neo4j 3.0.0 the following disables port listening for me (Java code)
import org.apache.commons.configuration.BaseConfiguration;
import org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph;
BaseConfiguration conf = new BaseConfiguration();
conf.setProperty(Neo4jGraph.CONFIG_DIRECTORY, "/path/to/db");
conf.setProperty(Neo4jGraph.CONFIG_CONF + "." + "dbms.backup.enabled", "false");
graph = Neo4jGraph.open(config);

getting a "need project id error" in Keen

I get the following error:
Keen.delete(:iron_worker_analytics, filters: [{:property_name => 'start_time', :operator => 'eq', :property_value => '0001-01-01T00:00:00Z'}])
Keen::ConfigurationError: Keen IO Exception: Project ID must be set
However, when I set the value, I get the following:
warning: already initialized constant KEEN_PROJECT_ID
iron.io/env.rb:36: warning: previous definition of KEEN_PROJECT_ID was here
Keen works fine when I run the app and load the values from a env.rb file but from the console I cannot get past this.
I am using the ruby gem.
I figured it out. The documentation is confusing. Per the documentation:
https://github.com/keenlabs/keen-gem
The recommended way to set keys is via the environment. The keys you
can set are KEEN_PROJECT_ID, KEEN_WRITE_KEY, KEEN_READ_KEY and
KEEN_MASTER_KEY. You only need to specify the keys that correspond to
the API calls you'll be performing. If you're using foreman, add this
to your .env file:
KEEN_PROJECT_ID=aaaaaaaaaaaaaaa
KEEN_MASTER_KEY=xxxxxxxxxxxxxxx
KEEN_WRITE_KEY=yyyyyyyyyyyyyyy KEEN_READ_KEY=zzzzzzzzzzzzzzz If not,
make a script to export the variables into your shell or put it before
the command you use to start your server.
But I had to set it explicitly as Keen.project_id after doing a Keen.methods.
It's sort of confusing since from the docs, I assumed I just need to set the variables. Maybe I am misunderstanding the docs but it was confusing at least to me.

Rails 3 - how to use an environment-specific configuration in haml partial

Following the answer on How to define custom configuration variables in rails, I am trying to set up a configuration settings for different environments in config/environments/{env}.rb
e.g. in development.rb I set
config.elvis = 'alive'
and then in my haml template I can use this variable, e.g.
Elvis is #{Rails.configuration.elvis}.
However, when I want to wrap this in a condition:
- if Rails.configuration.elvis
<p>Elvis is #{Rails.configuration.elvis}</p>
it also works, but if the configuration isn't set, it throws a undefined method error.
If I try instead:
- if defined? Rails.configuration.elvis
<p>Elvis is #{Rails.configuration.elvis}</p>
it seems to always evaluate as false, even with the configuration defined.
Still very new to rails/ruby, so apologies if it's a very dumb question
You could use respond_to?:
- if Rails.configuration.respond_to?(:elvis)
<p>Elvis is #{Rails.configuration.elvis}</p>