perl6 io::socket::inet how to set timeout for socket? - raku

my $c = IO::Socket::INET.new(:host<localhost>, :port(80));
$c.print: 'Test';
say $c.recv;
how can i set timeout for IO::Socket::INET recv?

See the timeout example in the Promise.anyof doc.
(See also Concurrent::Progress for the more general case of tracking progress.)

Related

Jaxrs ability to set a global timeout (connect + read)

In Jaxrs (WebClient for instance) we can set a connect timeout and a read timeout.
ClientConfiguration c = WebClient.getConfig(client);
HTTPConduit http = c.getHttpConduit();
HTTPClientPolicy httpClientPolicy = new HTTPClientPolicy();
httpClientPolicy.setConnectionTimeout(timeout);
httpClientPolicy.setReceiveTimeout(timeout);
httpClientPolicy.setAllowChunking(false);
http.setClient(httpClientPolicy);
I would like to set a timeout that includes both, I don't really care how much time is spent in connecting or in receiving, my requirement is to get a response in X seconds or just discard the search.
There is no way with CXF to set a maximum timeout for a request that consider both connection and receive durations. The maximum timeout for a request will be:
maximum_timeout = connection_timeout + receive_timeout
See this similar question for Apache HTTP client. The workaround could be to set a timer in a separate Thread to abort the connection when the desired maximum timeout expires

How redis pipe-lining works in pyredis?

I am trying to understand, how pipe lining in redis works? According to one blog I read, For this code
Pipeline pipeline = jedis.pipelined();
long start = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
pipeline.set("" + i, "" + i);
}
List<Object> results = pipeline.execute();
Every call to pipeline.set() effectively sends the SET command to Redis (you can easily see this by setting a breakpoint inside the loop and querying Redis with redis-cli). The call to pipeline.execute() is when the reading of all the pending responses happens.
So basically, when we use pipe-lining, when we execute any command like set above, the command gets executed on the server but we don't collect the response until we executed, pipeline.execute().
However, according to the documentation of pyredis,
Pipelines are a subclass of the base Redis class that provide support for buffering multiple commands to the server in a single request.
I think, this implies that, we use pipelining, all the commands are buffered and are sent to the server, when we execute pipe.execute(), so this behaviour is different from the behaviour described above.
Could someone please tell me what is the right behaviour when using pyreids?
This is not just a redis-py thing. In Redis, pipelining always means buffering a set of commands and then sending them to the server all at once. The main point of pipelining is to avoid extraneous network back-and-forths-- frequently the bottleneck when running commands against Redis. If each command were sent to Redis before the pipeline was run, this would not be the case.
You can test this in practice. Open up python and:
import redis
r = redis.Redis()
p = r.pipeline()
p.set('blah', 'foo') # this buffers the command. it is not yet run.
r.get('blah') # pipeline hasn't been run, so this returns nothing.
p.execute()
r.get('blah') # now that we've run the pipeline, this returns "foo".
I did run the test that you described from the blog, and I could not reproduce the behaviour.
Setting breakpoints in the for loop, and running
redis-cli info | grep keys
does not show the size increasing after every set command.
Speaking of which, the code you pasted seems to be Java using Jedis (which I also used).
And in the test I ran, and according to the documentation, there is no method execute() in jedis but an exec() and sync() one.
I did see the values being set in redis after the sync() command.
Besides, this question seems to go with the pyredis documentation.
Finally, the redis documentation itself focuses on networking optimization (Quoting the example)
This time we are not paying the cost of RTT for every call, but just one time for the three commands.
P.S. Could you get the link to the blog you read?

Is it possible to set dynamic download delay in scrapy?

I know that a constant delay can be set in
settings.py
DOWNLOAD_DELAY = 2
however, if I set the delay to 2s it is not efficient enough. If I set the DOWNLOAD_DELAY = 0.
The crawler is able to crawl about 10 pages. after that, the target page will return something like " you are requesting too frequently ".
What I want to do is the keep the download_delay to 0. once the "requesting too frequently" msg is found in the html. it change the delay to 2s. After a while it switch back to zero.
is there any module can do this? or any other better idea to handle such case?
Update:
I found that is a extension call AutoThrottle
but is it able to customize some logic like this??
if (requesting too frequently) is found
increase the DOWNLOAD_DELAY
If right after you get anti-spider page, then in 2 seconds you can get data page, then what you are asking probably requires writing a downloader middleware
that checks for anti-spider page, reset all scheduled requests to a renew-queue, start a looping call when spider is idle to get request from the renew-queue, (the looping interval is your hack for a new download delay), and try to decide when the download delay is not necessary again (requires some tests), then stop the looping and reschedule all the requests in renew-queue to scrapy scheduler. You will need to use redis queue in case of distributed crawl.
With download delay set to 0, in my experience throughput can go easily above 1000 items/min. If anti-spider page pops up after 10 responses, then it is not worth the effort.
Instead maybe you can try to find out how fast does your target server allow, may be 1.5s, 1s, 0.7s, 0.5s etc. Then maybe redesign your product takes into consideration the throughput your crawler can achieve.
You can use Auto Throttle extension now. It is turned off by default. You can add these parameters in your project's settings.py file to enable it.
AUTOTHROTTLE_ENABLED = True
# The initial download delay
AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY = 300
# The average number of requests Scrapy should be sending in parallel to
# each remote server
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = True
Yes, You can use the time module to set the dynamic delay.
import time
for i in range(10):
*** Operations 1****
time.sleep( i )
*** Operations 2****
Now you can see the delay between Operations 1 and Operations 2.
Note:
the variable 'i' is in the form of seconds.

Redis SET command can't fail, but can?

I'm trying to debug some Redis issues I am experiencing and came by some inconclusive documentation about the SET command.
In my Redis config; I have the following lines (snippet):
# Note: with all the kind of policies, Redis will return an error on write
# operations, when there are not suitable keys for eviction.
#
# At the date of writing this commands are: set setnx setex append
On the documentation page for the SET command I found:
Status code reply: always OK since SET can't fail.
Any insights on the definitive behaviour?
tl;dr: SET will return an error response if the redis instance runs out of memory.
As far as I can tell from the source code in redis.c, esentially when a command is to be processed the flow goes like this (pseudo code):
IF memory is needed
IF we can free keys
Free keys
Process the command
SET -> process and return OK response
ELSE return error response
ELSE
Process command
SET -> process and return OK response
It's not exactly written this way, but the basic idea comes down to that: memory is being checked before the command is processed, so even if the command cannot fail, an error response will be returned if there's no memory regardless the actual response of the command.

mysql Command timeout error

I am converting my database from SQL Server 2005 to MySQL using asp .net mvc.
I have bulk data in SQL Server (400k records), but I am facing command timeout/waiting for CommandTimeout error which, when I search on Google, can be given 65535 as its highest value or 0 (if it should wait for unlimited time).
Both of these aren't working for me. I also have set any ConnectTimeout to 180. So should I have to change it too? Anybody who had faced this problem or have any confirmed knowledge please share.
For me increasing the CommandTimeout fixed the issue.
Code sample:
//time in seconds
int timeOut = 300;
//create command
MySqlCommand myCommand = new MySqlCommand(stringSQL);
//set timeout
myCommand.CommandTimeout = timeOut;
Try sending commands in a batch of 100/500 then there will be no need of command timeout. Hope it works for you