Socket waiting indefinitely during GLORP + Postgres insert to DB - smalltalk

While inserting multiple records to DB ,i find that at times the thread waits indefinitely at Socket -> waitForDataIfClosed: where the readSemaphore is asked to wait. I am not too much into sockets, i would appreciate if any Pharo gurus can take a look at it.
I have objects derived from PersistentObject , and its session will always return a single instance of GlorpSession. A collection of such objects are itratively sent messages bePersistent and commitUnitOfWork.
myPersistantObjectCollection do:[:each | each bePersistent;
commitUnitOfWork.]
Would appreciate any sort of comments.
Thanks.

Related

Redis stream XReadGroup not reading new messages even if `BLOCK` parameter is 0

I am using redis stream and XReadGroup for reading messages from stream. I have set block parameter as 0.
currently my code look like this
data, err := w.rdb.XReadGroup(ctx, &redis.XReadGroupArgs{
Group: w.opts.group,
Consumer: w.opts.consumer,
Streams: []string{w.opts.streamName, ">"},
Count: 1,
Block: 0,
}).Result()
I am currently facing a problem that if I keep the application (involving this code) idle for 10-12 hours, XReadGroup is not able to read new messages, if I restart the application then all the new messages consumed at once. Is there any solution for this problem?
You can have a block time of let's say 10s, it does not change anything (I guess the code you provided is in a while(true)).
From my experience you can keep the app idle for days and it still works.
I don't really know why but I guess it has to do with the "constant" back and forth "reseting" the connection.

ServerXmlHttpRequest hanging sometimes when doing a POST

I have a job that periodically does some work involving ServerXmlHttpRquest to perform an HTTP POST. The job runs every 60 seconds.
And normally it runs without issue. But there's about a 1 in 50,000 chance (every two or three months) that it will hang:
IXMLHttpRequest http = new ServerXmlHttpRequest();
http.open("POST", deleteUrl, false, "", "");
http.send(stuffToDelete); <---hang
When it hangs, not even the Task Scheduler (with the option enabled to kill the job if it takes longer than 3 minutes to run) can end the task. I have to connect to the remote customer's network, get on the server, and use Task Manager to kill the process.
And then its good for another month or three.
Eventually i started using Task Manager to create a process dump,
so i could analyze where the hang is. After five crash dumps (over the last 11 months or so) i get a consistent picture:
ntdll.dll!_NtWaitForMultipleObjects#20()
KERNELBASE.dll!_WaitForMultipleObjectsEx#20()
user32.dll!MsgWaitForMultipleObjectsEx()
user32.dll!_MsgWaitForMultipleObjects#20()
urlmon.dll!CTransaction::CompleteOperation(int fNested) Line 2496
urlmon.dll!CTransaction::StartEx(IUri * pIUri, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4453 C++
urlmon.dll!CTransaction::Start(const wchar_t * pwzURL, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4515 C++
msxml3.dll!URLMONRequest::send()
msxml3.dll!XMLHttp::send()
Contoso.exe!FrobImporter.TFrobImporter.DeleteFrobs Line 971
Contoso.exe!FrobImporter.TFrobImporter.ImportCore Line 1583
Contoso.exe!FrobImporter.TFrobImporter.RunImport Line 1070
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.HandleFrobImport Line 433
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.CoreExecute Line 71
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.Execute Line 84
Contoso.exe!Contoso.Contoso Line 167
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!__RtlUserThreadStart()
ntdll.dll!__RtlUserThreadStart#8()
So i do a ServerXmlHttpRequest.send, and it never returns. It will sit there for days (causing the system to miss financial transactions, until come Sunday night i get a call that it's broken).
It is of no help unless someone knows how to debug code, but the registers in the stalled thread at the time of the dump are:
EAX 00000030
EBX 00000000
ECX 00000000
EDX 00000000
ESI 002CAC08
EDI 00000001
EIP 732A08A7
ESP 0018F684
EBP 0018F6C8
EFL 00000000
Windows Server 2012 R2
Microsoft IIS/8.5
Default timeouts of ServerXmlHttpRequest
You can use serverXmlHttpRequest.setTimeouts(...) to configure the four classes of timeouts:
resolveTimeout: The value is applied to mapping host names (such as "www.microsoft.com") to IP addresses; the default value is infinite, meaning no timeout.
connectTimeout: A long integer. The value is applied to establishing a communication socket with the target server, with a default timeout value of 60 seconds.
sendTimeout: The value applies to sending an individual packet of request data (if any) on the communication socket to the target server. A large request sent to a server will normally be broken up into multiple packets; the send timeout applies to sending each packet individually. The default value is 30 seconds.
receiveTimeout: The value applies to receiving a packet of response data from the target server. Large responses will be broken up into multiple packets; the receive timeout applies to fetching each packet of data off the socket. The default value is 30 seconds.
The KB305053 (a server that decides to keep the connection open will cause serverXmlHttpRequest to wait for the connection to close) seems like it plausibly could be the issue. But the 30 second default timeout would have taken care of that.
Possible workaround - Add myself to a Job
The Windows Task Scheduler is unable to terminate the task; even though the option is enabled to do do.
I will look into using the Windows Job API to add my self process to a job, and use SetInformationJobObject to set a time limit on my process:
CreateJobObject
AssignProcessToJobObject
SetInformationJobObject
to limit my process to three minutes of execution time:
PerProcessUserTimeLimit
If LimitFlags specifies
JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process
user-mode execution time limit, in 100-nanosecond ticks. Otherwise,
this member is ignored.
The system periodically checks to determine
whether each process associated with the job has accumulated more
user-mode time than the set limit. If it has, the process is
terminated.
If the job is nested, the effective limit is the most
restrictive limit in the job chain.
Although since Task Scheduler uses Job objects to also limit a task's time, i'm not hopeful that the Job Object can limit a job either.
Edit: Job objects cannot limit a process by process time - only user time. And with a process idle waiting for an object, it will not accumulate any user time - certainly not three minutes worth.
Bonus Reading
How can a ServerXMLHTTP GET request hang? (GET, not POST)
KB305053: ServerXMLHTTP Stops Responding When You Send a POST Request (which says the timeout should expire; where mine does not)
MS Forums: oHttp.Send - Hangs (HEAD, not POST)
MS Forums: ASP to test SOAP WebService using MSXML2.ServerXMLHTTP Send hangs
CC to MS Support Forums
Consider switching to a newer, supported API.
msxml6.dll using MSXML2.ServerXMLHTTP.6.0
winhttpcom.dll using WinHttp.WinHttpRequest.5.1.
The msxml3.dll library is no longer supported and is only kept around for compatibility reasons. Plus, there were a number of security and stability improvements included with msxml4.dll (and newer) that you are missing out on.

Jedis getResource() is taking lot of time

I am trying to use sentinal redis to get/set keys from redis. I was trying to stress test my setup with about 2000 concurrent requests.
i used sentinel to put a single key on redis and then I executed 1000 concurrent get requests from redis.
But the underlying jedis used my sentinel is blocking call on getResource() (pool size is 500) and the overall average response time that I am achieving is around 500 ms, but my target was about 10 ms.
I am attaching sample of jvisualvm snapshot here
redis.clients.jedis.JedisSentinelPool.getResource() 98.02227 4.0845232601E7 ms 4779
redis.clients.jedis.BinaryJedis.get() 1.6894469 703981.381 ms 141
org.apache.catalina.core.ApplicationFilterChain.doFilter() 0.12820946 53424.035 ms 6875
org.springframework.core.serializer.support.DeserializingConverter.convert() 0.046286926 19287.457 ms 4
redis.clients.jedis.JedisSentinelPool.returnResource() 0.04444578 18520.263 ms 4
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept() 0.035538 14808.45 ms 11430
May anyone help to debug further into the issue?
From JedisSentinelPool implementation of getResource() from Jedis sources (2.6.2):
#Override
public Jedis getResource() {
while (true) {
Jedis jedis = super.getResource();
jedis.setDataSource(this);
// get a reference because it can change concurrently
final HostAndPort master = currentHostMaster;
final HostAndPort connection = new HostAndPort(jedis.getClient().getHost(), jedis.getClient()
.getPort());
if (master.equals(connection)) {
// connected to the correct master
return jedis;
} else {
returnBrokenResource(jedis);
}
}
}
Note the while(true) and the returnBrokenResource(jedis), it means that it tries to get a jedis resource randomly from the pool that is indeed connected to the correct master and retries if it is not the good one. It is a dirty check and also a blocking call.
The super.getResource() call refers to JedisPool traditionnal implementation that is actually based on Apache Commons Pool (2.0). It does a lot to get an object from the pool, and I think it even repairs fail connections for instance. With a lot of contention on your pool, as probably in your stress test, it can probably take a lot of time to get a resource from the pool, just to see it is not connected to the correct master, so you end up calling it again, adding contention, slowing getting the resource etc...
You should check all the jedis instances in your pool to see if there's a lot of 'bad' connections.
Maybe you should give up using a common pool for your stress test (only create Jedis instances manually connected to the correct node, and close them nicely), or setting multiple ones to mitigate the cost of looking to "dirty" unchecked jedis resources.
Also with a pool of 500 jedis instances, you can't emulate 1000 concurrent queries, you need at least 1000.

Scalable delayed task execution with Redis

I need to design a Redis-driven scalable task scheduling system.
Requirements:
Multiple worker processes.
Many tasks, but long periods of idleness are possible.
Reasonable timing precision.
Minimal resource waste when idle.
Should use synchronous Redis API.
Should work for Redis 2.4 (i.e. no features from upcoming 2.6).
Should not use other means of RPC than Redis.
Pseudo-API: schedule_task(timestamp, task_data). Timestamp is in integer seconds.
Basic idea:
Listen for upcoming tasks on list.
Put tasks to buckets per timestamp.
Sleep until the closest timestamp.
If a new task appears with timestamp less than closest one, wake up.
Process all upcoming tasks with timestamp ≤ now, in batches (assuming
that task execution is fast).
Make sure that concurrent worker wouldn't process same tasks. At the same time, make sure that no tasks are lost if we crash while processing them.
So far I can't figure out how to fit this in Redis primitives...
Any clues?
Note that there is a similar old question: Delayed execution / scheduling with Redis? In this new question I introduce more details (most importantly, many workers). So far I was not able to figure out how to apply old answers here — thus, a new question.
Here's another solution that builds on a couple of others [1]. It uses the redis WATCH command to remove the race condition without using lua in redis 2.6.
The basic scheme is:
Use a redis zset for scheduled tasks and redis queues for ready to run tasks.
Have a dispatcher poll the zset and move tasks that are ready to run into the redis queues. You may want more than 1 dispatcher for redundancy but you probably don't need or want many.
Have as many workers as you want which do blocking pops on the redis queues.
I haven't tested it :-)
The foo job creator would do:
def schedule_task(queue, data, delay_secs):
# This calculation for run_at isn't great- it won't deal well with daylight
# savings changes, leap seconds, and other time anomalies. Improvements
# welcome :-)
run_at = time.time() + delay_secs
# If you're using redis-py's Redis class and not StrictRedis, swap run_at &
# the dict.
redis.zadd(SCHEDULED_ZSET_KEY, run_at, {'queue': queue, 'data': data})
schedule_task('foo_queue', foo_data, 60)
The dispatcher(s) would look like:
while working:
redis.watch(SCHEDULED_ZSET_KEY)
min_score = 0
max_score = time.time()
results = redis.zrangebyscore(
SCHEDULED_ZSET_KEY, min_score, max_score, start=0, num=1, withscores=False)
if results is None or len(results) == 0:
redis.unwatch()
sleep(1)
else: # len(results) == 1
redis.multi()
redis.rpush(results[0]['queue'], results[0]['data'])
redis.zrem(SCHEDULED_ZSET_KEY, results[0])
redis.exec()
The foo worker would look like:
while working:
task_data = redis.blpop('foo_queue', POP_TIMEOUT)
if task_data:
foo(task_data)
[1] This solution is based on not_a_golfer's, one at http://www.saltycrane.com/blog/2011/11/unique-python-redis-based-queue-delay/, and the redis docs for transactions.
You didn't specify the language you're using. You have at least 3 alternatives of doing this without writing a single line of code in Python at least.
Celery has an optional redis broker.
http://celeryproject.org/
resque is an extremely popular redis task queue using redis.
https://github.com/defunkt/resque
RQ is a simple and small redis based queue that aims to "take the good stuff from celery and resque" and be much simpler to work with.
http://python-rq.org/
You can at least look at their design if you can't use them.
But to answer your question - what you want can be done with redis. I've actually written more or less that in the past.
EDIT:
As for modeling what you want on redis, this is what I would do:
queuing a task with a timestamp will be done directly by the client - you put the task in a sorted set with the timestamp as the score and the task as the value (see ZADD).
A central dispatcher wakes every N seconds, checks out the first timestamps on this set, and if there are tasks ready for execution, it pushes the task to a "to be executed NOW" list. This can be done with ZREVRANGEBYSCORE on the "waiting" sorted set, getting all items with timestamp<=now, so you get all the ready items at once. pushing is done by RPUSH.
workers use BLPOP on the "to be executed NOW" list, wake when there is something to work on, and do their thing. This is safe since redis is single threaded, and no 2 workers will ever take the same task.
once finished, the workers put the result back in a response queue, which is checked by the dispatcher or another thread. You can add a "pending" bucket to avoid failures or something like that.
so the code will look something like this (this is just pseudo code):
client:
ZADD "new_tasks" <TIMESTAMP> <TASK_INFO>
dispatcher:
while working:
tasks = ZREVRANGEBYSCORE "new_tasks" <NOW> 0 #this will only take tasks with timestamp lower/equal than now
for task in tasks:
#do the delete and queue as a transaction
MULTI
RPUSH "to_be_executed" task
ZREM "new_tasks" task
EXEC
sleep(1)
I didn't add the response queue handling, but it's more or less like the worker:
worker:
while working:
task = BLPOP "to_be_executed" <TIMEOUT>
if task:
response = work_on_task(task)
RPUSH "results" response
EDit: stateless atomic dispatcher :
while working:
MULTI
ZREVRANGE "new_tasks" 0 1
ZREMRANGEBYRANK "new_tasks" 0 1
task = EXEC
#this is the only risky place - you can solve it by using Lua internall in 2.6
SADD "tmp" task
if task.timestamp <= now:
MULTI
RPUSH "to_be_executed" task
SREM "tmp" task
EXEC
else:
MULTI
ZADD "new_tasks" task.timestamp task
SREM "tmp" task
EXEC
sleep(RESOLUTION)
If you're looking for ready solution on Java. Redisson is right for you. It allows to schedule and execute tasks (with cron-expression support) in distributed way on Redisson nodes using familiar ScheduledExecutorService api and based on Redis queue.
Here is an example. First define a task using java.lang.Runnable interface. Each task can access to Redis instance via injected RedissonClient object.
public class RunnableTask implements Runnable {
#RInject
private RedissonClient redissonClient;
#Override
public void run() throws Exception {
RMap<String, Integer> map = redissonClient.getMap("myMap");
Long result = 0;
for (Integer value : map.values()) {
result += value;
}
redissonClient.getTopic("myMapTopic").publish(result);
}
}
Now it's ready to sumbit it into ScheduledExecutorService:
RScheduledExecutorService executorService = redisson.getExecutorService("myExecutor");
ScheduledFuture<?> future = executorService.schedule(new CallableTask(), 10, 20, TimeUnit.MINUTES);
future.get();
// or cancel it
future.cancel(true);
Examples with cron expressions:
executorService.schedule(new RunnableTask(), CronSchedule.of("10 0/5 * * * ?"));
executorService.schedule(new RunnableTask(), CronSchedule.dailyAtHourAndMinute(10, 5));
executorService.schedule(new RunnableTask(), CronSchedule.weeklyOnDayAndHourAndMinute(12, 4, Calendar.MONDAY, Calendar.FRIDAY));
All tasks are executed on Redisson node.
A combined approach seems plausible:
No new task timestamp may be less than current time (clamp if less). Assuming reliable NTP synch.
All tasks go to bucket-lists at keys, suffixed with task timestamp.
Additionally, all task timestamps go to a dedicated zset (key and score — timestamp itself).
New tasks are accepted from clients via separate Redis list.
Loop: Fetch oldest N expired timestamps via zrangebyscore ... limit.
BLPOP with timeout on new tasks list and lists for fetched timestamps.
If got an old task, process it. If new — add to bucket and zset.
Check if processed buckets are empty. If so — delete list and entrt from zset. Probably do not check very recently expired buckets, to safeguard against time synchronization issues. End loop.
Critique? Comments? Alternatives?
Lua
I made something similar to what's been suggested here, but optimized the sleep duration to be more precise. This solution is good if you have few inserts into the delayed task queue. Here's how I did it with a Lua script:
local laterChannel = KEYS[1]
local nowChannel = KEYS[2]
local currentTime = tonumber(KEYS[3])
local first = redis.call("zrange", laterChannel, 0, 0, "WITHSCORES")
if (#first ~= 2)
then
return "2147483647"
end
local execTime = tonumber(first[2])
local event = first[1]
if (currentTime >= execTime)
then
redis.call("zrem", laterChannel, event)
redis.call("rpush", nowChannel, event)
return "0"
else
return tostring(execTime - currentTime)
end
It uses two "channels". laterChannel is a ZSET and nowChannel is a LIST. Whenever it's time to execute a task, the event is moved from the the ZSET to the LIST. The Lua script with respond with how many MS the dispatcher should sleep until the next poll. If the ZSET is empty, sleep forever. If it's time to execute something, do not sleep(i e poll again immediately). Otherwise, sleep until it's time to execute the next task.
So what if something is added while the dispatcher is sleeping?
This solution works in conjunction with key space events. You basically need to subscribe to the key of laterChannel and whenever there is an add event, you wake up all the dispatcher so they can poll again.
Then you have another dispatcher that uses the blocking left pop on nowChannel. This means:
You can have the dispatcher across multiple instances(i e it's scaling)
The polling is atomic so you won't have any race conditions or double events
The task is executed by any of the instances that are free
There are ways to optimize this even more. For example, instead of returning "0", you fetch the next item from the zset and return the correct amount of time to sleep directly.
Expiration
If you can not use Lua scripts, you can use key space events on expired documents.
Subscribe to the channel and receive the event when Redis evicts it. Then, grab a lock. The first instance to do so will move it to a list(the "execute now" channel). Then you don't have to worry about sleeps and polling. Redis will tell you when it's time to execute something.
execute_later(timestamp, eventId, event) {
SET eventId event EXP timestamp
SET "lock:" + eventId, ""
}
subscribeToEvictions(eventId) {
var deletedCount = DEL eventId
if (deletedCount == 1) {
// move to list
}
}
This however has it own downsides. For example, if you have many nodes, all of them will receive the event and try to get the lock. But I still think it's overall less requests any anything suggested here.

How to write a transactional, multi-threaded WCF service consuming MSMQ

I have a WCF service that posts messages to a private, non-transactional MSMQ queue. I have another WCF service (multi-threaded) that processes the MSMQ messages and inserts them in the database.
My issue is with sequencing. I want the messages to be in certain order. For example MSG-A need to go to the database before MSG-B is inserted. So my current solution for that is very crude and expensive from database perspective.
I am reading the message, if its MSG-B and there is no MSG-A in the database, I throw it back on the message queue and I keep doing that till MSG-A is inserted in the database. But this is a very expensive operation as it involves table scan (SELECT stmt).
The messages are always posted to the queue in sequence.
Short of making my WCF Queue Processing service Single threaded (By setting the service behavior attribute InstanceContextMode to Single), can someone suggest a better solution?
Thanks
Dan
Instead of immediately pushing messages to the DB after taking them out of the queue, keep a list of pending messages in memory. When you get an A or B, check to see if the matching one is in the list. If so, submit them both (in the right order) to the database, and remove the matching one from the list. Otherwise, just add the new message to that list.
If checking for a match is too expensive a task to serialize - I assume you are multithreading for a reason - the you could have another thread process the list. The existing multiple threads read, immediately submit most messages to the DB, but put the As and Bs aside in the (threadsafe) list. The background thread scavenges through that list finding matching As and Bs and when it finds them it submits them in the right order (and removes them from the list).
The bottom line is - since your removing items from the queue with multiple threads, you're going to have to serialize somewhere, in order to ensure ordering. The trick is to minimize the number of times and length of time you spend locked up in serial code.
There might also be something you could do at the database level, with triggers or something, to reorder the entries when it detects this situation. I'm afraid I don't know enough about DB programming to help there.
UPDATE: Assuming the messages contain some id that lets you associate a message 'A' with the correct associated message 'B', the following code will make sure A goes in the database before B. Note that it does not make sure they are adjacent records in the database - there could be other messages between A and B. Also, if for some reason you get an A or B without ever receiving the matching message of the other type, this code will leak memory since it hangs onto the unmatched message forever.
(You could extract those two 'lock'ed blocks into a single subroutine, but I'm leaving it like this for clarity with respect to A and B.)
static private object dictionaryLock = new object();
static private Dictionary<int, MyMessage> receivedA =
new Dictionary<int, MyMessage>();
static private Dictionary<int, MyMessage> receivedB =
new Dictionary<int, MyMessage>();
public void MessageHandler(MyMessage message)
{
MyMessage matchingMessage = null;
if (IsA(message))
{
InsertIntoDB(message);
lock (dictionaryLock)
{
if (receivedB.TryGetValue(message.id, out matchingMessage))
{
receivedB.Remove(message.id);
}
else
{
receivedA.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(matchingMessage);
}
}
else if (IsB(message))
{
lock (dictionaryLock)
{
if (receivedA.TryGetValue(message.id, out matchingMessage))
{
receivedA.Remove(message.id);
}
else
{
receivedB.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(message);
}
}
else
{
// not A or B, do whatever
}
}
If you're the only client of those queues, you could very easy add a timestamp as a message header (see IDesign sample) and save the Sent On field (kinda like an outlook message) in the database as well. You could process them in the order they were sent (basically you move the sorting logic at the time of consumption).
Hope this helps,
Adrian