Using twisted to proxy memcache calls on attribute access '__getattribute__' - twisted

I was attempting to trigger twisted memcache calls from getattribute and return values to my objects. Is this possible ? My thinking was that gatherResults waits for the call to succeed or fail and then returns the results - which it does but the interpreter returns a deferred to whatever is accessing the attribute.
def __getattribute__(self, key):
exempt = ['__providedBy__', '__class__', '__provides__', '__dict__', 'uid']
if key in exempt:
return object.__getattribute__(self, key)
else:
print key
addr = IPv4Address('TCP', '127.0.0.1', 11211)
mc_pool = MemCachePool(addr, maxClients=10)
uid = object.__getattribute__(self, key)
def return_res(res):
return res
deferred_calls = [mc_pool.get(key)]
d = defer.gatherResults(deferred_calls, consumeErrors = True)
d.addCallback(return_res)

Just a heads to anyone who comes across this. This approach doesn't, can't, won't, should never, and will never work. Twisted will not return a value to your blocking code. So if you run into such a problem you need to rethink your approach. Twisted rocks in so many ways - just not this one.

Related

NextFlow: How to fail if channel is empty ( .ifEmpty() )

I'd like for my NextFlow pipeline to fail if a specific channel is empty because, as is, the pipeline will continue as though nothing is wrong, but the process depending on the channel never starts. The answer to a related post states that we generally shouldn't check if a channel is empty, but I'm not sure how else to handle this.
The issue I'm having in the below example is that it always fails, but the process is called if I comment out the .ifEmpty() statement.
Here's a basic example:
/*
* There are .cram files in this folder
*/
params.input_sample_folder = 'path/to/folder/*'
samples = Channel.fromPath(params.input_sample_folder, checkIfExists: true)
.filter( ~/.*(\.sam|\.bam|\.cram)/ )
.ifEmpty( exit 1,
"ERROR: Did not find any samples in ${params.input_sample_folder})
workflow{
PROCESS_SAMPLES( samples )
}
Ultimate questions:
My guess is that the channel does not fill immediately. Is that true? If so, when does it fill?
How should I handle this situation? I want to fail if the channel doesn't get populated. e.g., I was surprised to learn that the channel remains empty if I only provide a folder path without a glob/wildcard character (/path/to/folder/; no * or *.cram, etc.). I don't think I can handle it in the process itself, because the process never gets called if the channel is legitimately empty.
Really appreciate your help.
Setting checkIfExists: true will actually throw an exception for you if the specified files do not exist on your file system. The trick is to specify the files you need when you create the channel, rather than filtering for them downstream. For example, all you need is:
params.input_sample_folder = 'path/to/folder'
samples = Channel.fromPath(
"${params.input_sample_folder}/*.{sam,bam,cram}",
checkIfExists: true,
)
Or arguably better; since this gives the user full control over the input files:
params.input_sample_files = 'path/to/folder/*.{sam,bam,cram}'
samples = Channel.fromPath( params.input_sample_files, checkIfExists: true )
Either way, both will have your pipeline fail with exit status 1 and the following message in red when no matching files exist:
No files match pattern `*.{sam,bam,cram}` at path: path/to/folder/
As per the docs, the ifEmpty operator is really just intended to emit a default value when a channel becomes empty. To avoid having to check if a channel is empty, the general solution is to just avoid creating an empty channel in the first place. There's lots of ways to do this, but one way might look like:
import org.apache.log4j.Logger
nextflow.enable.dsl=2
def find_sample_files( input_dir ) {
def pattern = ~/.*(\.sam|\.bam|\.cram)/
def results = []
input_dir.eachFileMatch(pattern) { item ->
results.add( item )
}
return results
}
params.input_sample_folder = 'path/to/folder'
workflow {
input_sample_folder = file( params.input_sample_folder )
input_sample_files = find_sample_files( input_sample_folder )
if ( !input_sample_files ) {
log.error("ERROR: Did not find any samples in ${params.input_sample_folder}")
System.exit(1)
}
sample_files = Channel.of( input_sample_files )
sample_files.view()
}

gRPC + Thread local issue

Im building a grpc server with python and trying to have some thread local storage handled with werkzeug Local and LocalProxy, similar to what flask does.
The problem I'm facing is that, when I store some data in the local from a server interceptor, and then try to retrieve it from the servicer, the local is empty. The real problem is that for some reason, the interceptor runs in a different greenlet than the servicer, so it's impossible to share data across a request since the werkzeug.local.storage ends up with different keys for the data that is supposed to belong to the same request.
The same happens using python threading library, it looks like the interceptors are run from the main thread or a different thread from the servicers. Is there a workaround for this? I would have expected interceptors to run in the same thread, thus allowing for this sort of things.
# Define a global somewhere
from werkzeug.local import Local
local = Local()
# from an interceptor save something
local.message = "test msg"
# from the service access it
local.service_var = "test"
print local.message # this throw a AttributeError
# print the content of local
print local.__storage__ # we have 2 entries in the storage, 2 different greenlets, but we are in the same request.
the interceptor is indeed run on the serving thread which is different from the handling thread. The serving thread is in charge of serving servicers and intercept servicer handlers. After the servicer method handler is returned by the interceptors, the serving thread will submit it to the thread_pool at _server.py#L525:
# Take unary unary call as an example.
# The method_handler is the returned object from interceptor.
def _handle_unary_unary(rpc_event, state, method_handler, thread_pool):
unary_request = _unary_request(rpc_event, state,
method_handler.request_deserializer)
return thread_pool.submit(_unary_response_in_pool, rpc_event, state,
method_handler.unary_unary, unary_request,
method_handler.request_deserializer,
method_handler.response_serializer)
As for workaround, I can only imagine passing a storage instance both to the interceptor and to servicer during initialization. After that, the storage can be used as a member variable.
class StorageServerInterceptor(grpc.ServerInterceptor):
def __init__(self, storage):
self._storage = storage
def intercept_service(self, continuation, handler_call_details):
key = ...
value = ...
self._storage.set(key, value)
...
return continuation(handler_call_details)
class Storage(...StorageServicer):
def __init__(self, storage):
self._storage = storage
...Servicer Handlers...
You can also wrap all the functions that will be called and set the threading local there, and return a new handler with the wrapped functions.
class MyInterceptor(grpc.ServerInterceptor):
def wrap_handler(self, original_handler: grpc.RpcMethodHandler):
if original_handler.unary_unary is not None:
unary_unary = original_handler.unary_unary
def wrapped_unary_unary(*args, **kwargs):
threading.local().my_var = "hello"
return unary_unary(*args, **kwargs)
new_unary_unary = wrapped_unary_unary
else:
new_unary_unary = None
...
# do this for all the combinations to make new_unary_stream, new_stream_unary, new_stream_stream
new_handler = grpc.RpcMethodHandler()
new_handler.request_streaming=original_handler.request_streaming
new_handler.response_streaming=original_handler.response_streaming
new_handler.request_deserializer=original_handler.request_deserializer
new_handler.response_serializer=original_handler.response_serializer
new_handler.unary_unary=new_unary_unary
new_handler.unary_stream=new_unary_stream
new_handler.stream_unary=new_stream_unary
new_handler.stream_stream=new_stream_stream
return new_handler
def intercept_service(self, continuation, handler_call_details):
return self.wrap_handler(continuation(handler_call_details))

Idiomatic approach to conditional update of key

I'd like to use Redis to cache the most recent piece of data that a user has sent to me. However, I can't just use SET, because the user may send data out of order, I need to condition the SET based on the value of another key, e.g.:
latest_timestamp = GET "latest_timestamp:<new_data.user_id>"
if latest_timestamp < new_data.timestamp {
SET "latest_timestamp:<new_data.user_id>" new_data.timestamp
SET "latest_data:<new_data.user_id>" new_data.to_string()
}
What is the idiomatic way to handle this situation?
A server-side Lua script (see EVAL) is the idiomatic-est approach IMO.
Make sure that your code passes the full names (i.e. does all substitutions) of both keys, as well as the new timestamp and the new data as arguments. The script should look something like this:
local lts = tonumber(redis.call('GET', KEYS[1]))
local nts = tonumber(ARGV[1])
if lts < nts then
redis.call('SET', KEYS[1], nts)
redis.call('SET, KEYS[2], ARGV[2])
end

Persistent connection in twisted

I'm new in Twisted and have one question. How can I organize a persistent connection in Twisted? I have a queue and every second checks it. If have some - send on client. I can't find something better than call dataReceived every second.
Here is the code of Protocol implementation:
class SyncProtocol(protocol.Protocol):
# ... some code here
def dataReceived(self, data):
if(self.orders_queue.has_new_orders()):
for order in self.orders_queue:
self.transport.write(str(order))
reactor.callLater(1, self.dataReceived, data) # 1 second delay
It works how I need, but I'm sure that it is very bad solution. How can I do that in different way (flexible and correct)? Thanks.
P.S. - the main idea and alghorithm:
1. Client connect to server and wait
2. Server checks for update and pushes data to client if anything changes
3. Client do some operations and then wait for other data
Without knowing how the snippet you provided links into your internet.XXXServer or reactor.listenXXX (or XXXXEndpoint calls), its hard to make head-or-tails of it, but...
First off, in normal use, a twisted protocol.Protocol's dataReceived would only be called by the framework itself. It would be linked to a client or server connection directly or via a factory and it would be automatically called as data comes into the given connection. (The vast majority of twisted protocols and interfaces (if not all) are interrupt based, not polling/callLater, thats part of what makes Twisted so CPU efficient)
So if your shown code is actually linked into Twisted via a Server or listen or Endpoint to your clients then I think you will find very bad things will happen if your clients ever send data (... because twisted will call dataReceived for that, which (among other problems) would add extra reactor.callLater callbacks and all sorts of chaos would ensue...)
If instead, the code isn't linked into twisted connection framework, then your attempting to reuse twisted classes in a space they aren't designed for (... I guess this seems unlikely because I don't know how non-connection code would learn of a transport, unless your manually setting it...)
The way I've been build building models like this is to make a completely separate class for the polling based I/O, but after I instantiate it, I push my client-list (server)factory into the polling instance (something like mypollingthing.servfact = myserverfactory) there-by making a way for my polling logic to be able to call into the clients .write (or more likely a def I built to abstract to the correct level for my polling logic)
I tend to take the examples in Krondo's Twisted Introduction as one of the canonical examples of how to do twisted (other then twisted matrix), and the example in part 6, under "Client 3.0" PoetryClientFactory has a __init__ that sets a callback in the factory.
If I try blend that with the twistedmatrix chat example and a few other things, I get:
(You'll want to change sendToAll to whatever your self.orders_queue.has_new_orders() is about)
#!/usr/bin/python
from twisted.internet import task
from twisted.internet import reactor
from twisted.internet.protocol import Protocol, ServerFactory
class PollingIOThingy(object):
def __init__(self):
self.sendingcallback = None # Note I'm pushing sendToAll into here in main
self.iotries = 0
def pollingtry(self):
self.iotries += 1
print "Polling runs: " + str(self.iotries)
if self.sendingcallback:
self.sendingcallback("Polling runs: " + str(self.iotries) + "\n")
class MyClientConnections(Protocol):
def connectionMade(self):
print "Got new client!"
self.factory.clients.append(self)
def connectionLost(self, reason):
print "Lost a client!"
self.factory.clients.remove(self)
class MyServerFactory(ServerFactory):
protocol = MyClientConnections
def __init__(self):
self.clients = []
def sendToAll(self, message):
for c in self.clients:
c.transport.write(message)
def main():
client_connection_factory = MyServerFactory()
polling_stuff = PollingIOThingy()
# the following line is what this example is all about:
polling_stuff.sendingcallback = client_connection_factory.sendToAll
# push the client connections send def into my polling class
# if you want to run something ever second (instead of 1 second after
# the end of your last code run, which could vary) do:
l = task.LoopingCall(polling_stuff.pollingtry)
l.start(1.0)
# from: https://twistedmatrix.com/documents/12.3.0/core/howto/time.html
reactor.listenTCP(5000, client_connection_factory)
reactor.run()
if __name__ == '__main__':
main()
To be fair, it might be better to inform PollingIOThingy of the callback by passing it as an arg to it's __init__ (that is what is shown in Krondo's docs), For some reason, I tend to miss connections like this when I read code and find class-cheating easier to see, but that may just by my personal brain-damage.

Do I use Option as result when fetching an object from the database with an Id?

I have made a definition which fetches a user from the database.
def user(userId: Int) : User = database withSession {
(for{
u <- Users if u.id === userId}
yield u).first
}
Potetially the database could return an empty list if used with an non existing userId.
However I can't see when a non existing userId would be provided. For example my userId is fetched from the logged in user. And if a non existing userId is provided then I think it's ok to fail the request hard.
Any thoughts?
No it's not ok to fail the request hard :
def user(userId: Int) : Option[User] // is OK
def user(userId: Int) : Either[String,User] // is OK
def user(usedId: Int) : User // is not OK
or else you could create a type (a concept) which encapsulate an Integer which make sure it's a valid UserId (at birthing).
sealed case class UserId(u:Int) //extends AnyVal // If it's scala 2.10.0
object UserId {
def get(i:Int) : Option[UserId] = //some validation
} /// ....
def user(userId:UserId) : User //is OK // well it depends on the semantic of user destruction.
When you make a def, you must make sure there is a proper relation between the domain (this and args) of your function and the codomain (result).
Anyways, do not hesitate to type (create concepts), it will help you to reason about your code.
Why def user(userId: Int) :User is not Ok ?
Because a relation between the elements of Integer to the elements of User doesn't exist. What if UserIds are all positive integers, but you ask for user(-10) ? (it won't happen, right ?) Should this call raise an exception ? Or return null ?
If you think it should return null, then return an Option, it encapsulates the potential missing correspondance.
If you think it should raise an exception, then return :
a Validation[SomethingRepresentingAnError, User] (scalaz),
an Either[SomethingRepresentingAnError, User] (scala 2.7, 2.8, 2.9)
or a Try[User] (scala 2.10)
Having rich return types will help you to use your API correctly.
Btw Scala doesn't use checked exception, so you cannot use exception as an alternative result. Exception should be keept for truly exceptional behaviour (as Runtime Exceptions).
See also :
http://www.scala-lang.org/api/current/index.html#scala.util.control.Exception$
I think it's always good idea to return Option[] when fetching data by id. You can not be sure that user with such id exist. E. g. another request has deleted this user or somebody was trying to tamper with your input data. Database is an external system to your application and if you know how to recover from such failures then you should do it. Especially in Scala where Option is a good tool for such task.
Option is the most minimalistic way to represent the return value from some computation that may fail. Throwing exceptions or returning null are acceptable only when dealing with Java code and your hands are somehow tied by an existing API (and when you're code is being called from Java code).
The next step up from Option would be Either[FailureIndication, SuccessValue].
A further improvement is ScalaZ's Validation.