I am trying to understand the client part of the SSH protocol and referring the Paramiko library which is a native Python library for SSH protocol. The corresponding code can be found here.
def _send_kex_init(self):
"""
announce to the other side that we'd like to negotiate keys, and what
kind of key negotiation we support.
"""
self.clear_to_send_lock.acquire()
try:
self.clear_to_send.clear()
finally:
self.clear_to_send_lock.release()
self.in_kex = True
if self.server_mode:
mp_required_prefix = 'diffie-hellman-group-exchange-sha'
kex_mp = [k for k in self._preferred_kex if k.startswith(mp_required_prefix)]
if (self._modulus_pack is None) and (len(kex_mp) > 0):
# can't do group-exchange if we don't have a pack of potential primes
pkex = [k for k in self.get_security_options().kex
if not k.startswith(mp_required_prefix)]
self.get_security_options().kex = pkex
available_server_keys = list(filter(list(self.server_key_dict.keys()).__contains__,
self._preferred_keys))
else:
available_server_keys = self._preferred_keys
m = Message()
m.add_byte(cMSG_KEXINIT)
m.add_bytes(os.urandom(16))
m.add_list(self._preferred_kex)
m.add_list(available_server_keys)
m.add_list(self._preferred_ciphers)
m.add_list(self._preferred_ciphers)
m.add_list(self._preferred_macs)
m.add_list(self._preferred_macs)
m.add_list(self._preferred_compression)
m.add_list(self._preferred_compression)
m.add_string(bytes())
m.add_string(bytes())
m.add_boolean(False)
m.add_int(0)
# save a copy for later (needed to compute a hash)
self.local_kex_init = m.asbytes()
self._send_message(m)
SSH protocol allows both parties to implement a different sets of algorithms for incoming and outgoing directions. Or practically speaking, it allows a party to implement a specific algorithm for one of the directions only.
Paramiko implements all algorithms for both directions, so it populates lists of both incoming and outgoing algorithms with the same set.
See the code in _parse_kex_init method, which parses the packet populated by the code you refer to:
client_encrypt_algo_list = m.get_list()
server_encrypt_algo_list = m.get_list()
...
agreed_local_ciphers = list(
filter(
client_encrypt_algo_list.__contains__,
self._preferred_ciphers,
)
)
agreed_remote_ciphers = list(
filter(
server_encrypt_algo_list.__contains__,
self._preferred_ciphers,
)
)
Related
I'd like for my NextFlow pipeline to fail if a specific channel is empty because, as is, the pipeline will continue as though nothing is wrong, but the process depending on the channel never starts. The answer to a related post states that we generally shouldn't check if a channel is empty, but I'm not sure how else to handle this.
The issue I'm having in the below example is that it always fails, but the process is called if I comment out the .ifEmpty() statement.
Here's a basic example:
/*
* There are .cram files in this folder
*/
params.input_sample_folder = 'path/to/folder/*'
samples = Channel.fromPath(params.input_sample_folder, checkIfExists: true)
.filter( ~/.*(\.sam|\.bam|\.cram)/ )
.ifEmpty( exit 1,
"ERROR: Did not find any samples in ${params.input_sample_folder})
workflow{
PROCESS_SAMPLES( samples )
}
Ultimate questions:
My guess is that the channel does not fill immediately. Is that true? If so, when does it fill?
How should I handle this situation? I want to fail if the channel doesn't get populated. e.g., I was surprised to learn that the channel remains empty if I only provide a folder path without a glob/wildcard character (/path/to/folder/; no * or *.cram, etc.). I don't think I can handle it in the process itself, because the process never gets called if the channel is legitimately empty.
Really appreciate your help.
Setting checkIfExists: true will actually throw an exception for you if the specified files do not exist on your file system. The trick is to specify the files you need when you create the channel, rather than filtering for them downstream. For example, all you need is:
params.input_sample_folder = 'path/to/folder'
samples = Channel.fromPath(
"${params.input_sample_folder}/*.{sam,bam,cram}",
checkIfExists: true,
)
Or arguably better; since this gives the user full control over the input files:
params.input_sample_files = 'path/to/folder/*.{sam,bam,cram}'
samples = Channel.fromPath( params.input_sample_files, checkIfExists: true )
Either way, both will have your pipeline fail with exit status 1 and the following message in red when no matching files exist:
No files match pattern `*.{sam,bam,cram}` at path: path/to/folder/
As per the docs, the ifEmpty operator is really just intended to emit a default value when a channel becomes empty. To avoid having to check if a channel is empty, the general solution is to just avoid creating an empty channel in the first place. There's lots of ways to do this, but one way might look like:
import org.apache.log4j.Logger
nextflow.enable.dsl=2
def find_sample_files( input_dir ) {
def pattern = ~/.*(\.sam|\.bam|\.cram)/
def results = []
input_dir.eachFileMatch(pattern) { item ->
results.add( item )
}
return results
}
params.input_sample_folder = 'path/to/folder'
workflow {
input_sample_folder = file( params.input_sample_folder )
input_sample_files = find_sample_files( input_sample_folder )
if ( !input_sample_files ) {
log.error("ERROR: Did not find any samples in ${params.input_sample_folder}")
System.exit(1)
}
sample_files = Channel.of( input_sample_files )
sample_files.view()
}
I'm new in Cytoscape. I want to know how can I run an app (for example MCL clustering algorithm) multiple times with different parameters in Cytoscape. Is there any way to write an script to do that instead of running manually multiple times for different parameters?
Thanks!
Thanks Scooter.
I saw his answer.
Still I have problem with MCODE.
I figured it out by reading this paper "Cytoscape Automation: empowering workflow-based network analysis".
I want to put the script here in the case that maybe somebody has question.
From python you need to import
import requests, json
import numpy
REST_ENDPOINT = 'http://localhost:1234'
and then let's say we want to use affinity propagation clustering algorithm, then you can go to help->Automation->CyRest command API. Here you can find the app and all its parameters. You load the input network from cytoscape in the beginning.
counter = 0
ap_clusters = dict()
for i in numpy.arange(-1.0, 1.1, 0.1):
message_body = {
"preference": str(round(i,1))
}
response = requests.post(REST_ENDPOINT + '/v1/commands/cluster/ap', data =
json.dumps(message_body), headers = {'Content-Type': 'application/json'})
response_data = response.json()['data']
ap_clusters[counter] = response_data['clusters']
counter += 1
above is a code to call AP clustering multiple times from python.
For AP and MCL the code works for multiple parameters. However when I tried to call MCODE with different set of parameters, it stopped the connection and closed the cytoscape app. It can only run for on set of parameters.
This is the error:
" raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"
here is the code for mcode algorithm:
counter = 0
mcode_clusters = dict()
for i in numpy.arange(3,6,1):
for j in numpy.arange(0.1,0.56,0.05): #----vertex weight percentage
for h in ["on","off"]:
for f in ["on","off"]:
if f=="on":
for p in [0,0.1,0.2]: #---fluffing percentage
message_body = {
"fluff" : f,
"fluffNodeDensityCutoff" : str(round(p,1)),
"haircut" : h,
"maxDepthFromStart" : str(i),
"nodeScoreCutoff": str(round(j,1))
}
response = requests.post(REST_ENDPOINT + '/v1/commands/cluster/mcode', data = json.dumps(message_body), headers = {'Content-Type': 'application/json'})
response_data = response.json()['data']
mcode_clusters[counter] = response_data['clusters']
counter += 1
If you have any solutions I really appreciate it if you could share it with me.
Thanks.
SaRa
I think this was answered by Ruth pretty clearly in cytoscape-helpdesk:
You can do all of the above. Whatever is easiest for you.
There is a library py2cytoscape that you can use to issue commands to cytoscape from > python. Info can be found here: https://py2cytoscape.readthedocs.io/en/latest/
for more info on automation in cytoscape check out: http://manual.cytoscape.org/en/stable/Programmatic_Access_to_Cytoscape_Features_Scripting.html
But you can also run it through automation. You can create a text file with each of > your commands (for example a list of commands like: cluster mcl attribute="correlation" network=1234") and then go to Tools --> execute batch file to > execute the whole file. I am not sure if support loops. If you want to loop through anything I would recommend using python.
Thanks,
Ruth
I'll just add that currently, looping isn't supported in batch files.
-- scooter
In regard to my problem, I have to say that:
There are two mcode in cytoscape. One is in clusterMaker and the other belongs to cytoscape. When I tried to call mcode, I used the command "'/v1/commands/cluster/mcode'" I called the one in clusterMaker however the parameters' name was based on the one in cytoscape. I changed the command to "'/v1/commands/mcode/cluster'" and the the problem is solved now.
Many thanks.
SaRa
I want to encrypt values in one column of my Pandas (or PySpark) dataframe, e.g. to take the the column mobno in the following dataframe, encrypt it and put the result in the encrypted_value column:
I want to use AWS KMS encryption key. My question is: what is the most elegant way how to achieve this?
I am thinking about using UDF, which will call the boto3's KMS client. Something like:
#udf
def encrypt(plaintext):
response = kms_client.encrypt(
KeyId=aws_kms_key_id,
Plaintext=plaintext
)
ciphertext = response['CiphertextBlob']
return ciphertext
and then applying this udf on the whole column.
But I am not quite confident this is the right way. This stems from the fact that I am an encryption-rookie - first, I don't even know this kms_client_encrypt function is meant for encrypting values (from the columns) or it is meant for manipulate the keys. Maybe the better way is to obtain the key and then use some python encryption library (such as hashlib).
I would like to have some clarification on the encryption process and also recommendation what the best approach to column encryption is.
Since Spark 3.3 you can do AES encryption (and decryption) without UDF.
aes_encrypt(expr, key[, mode[, padding]]) - Returns an encrypted value of expr using AES in given mode with the specified padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS') and ('GCM', 'NONE'). The default mode is GCM.
Arguments:
expr - The binary value to encrypt.
key - The passphrase to use to encrypt the data.
mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM.
padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB and NONE for GCM.
from pyspark.sql import functions as F
df = spark.createDataFrame([('8223344556',)], ['mobno'])
df = df.withColumn('encrypted_value', F.expr("aes_encrypt(mobno, 'your_secret_keyy')"))
df.show()
# +----------+--------------------+
# | mobno| encrypted_value|
# +----------+--------------------+
# |8223344556|[9B 33 DB 9B 5D C...|
# +----------+--------------------+
df.printSchema()
# root
# |-- mobno: string (nullable = true)
# |-- encrypted_value: binary (nullable = true)
To avoid many calls to the KMS service in a UDF, use AWS Secrets Manager instead to retrieve your encryption key and pycrypto to encrypt the column. The following works:
from pyspark.sql.functions import udf, col
from Crypto.Cipher import AES
region_name = "eu-west-1"
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name=region_name)
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
secret_key = json.loads(get_secret_value_response['SecretString'])
clear_text_column = 'mobo'
def encrypt(key, text):
obj = AES.new(key, AES.MODE_CFB, 'This is an IV456')
return obj.encrypt(text)
def udf_encrypt(key):
return udf(lambda text: encrypt(key, text))
df.withColumn("encrypted", udf_encrypt(secret_key)(col(clear_text_column))).show()
Or alternatively, using more efficient Pandas UDF as suggested by #Vektor88 (PySpark 3 syntax):
from functools import partial
encrypt_with_key = partial(encrypt, secret_key)
#pandas_udf(BinaryType())
def pandas_udf_encrypt(clear_strings: pd.Series) -> pd.Series:
return clear_strings.apply(encrypt_with_key)
df.withColumn('encrypted', pandas_udf_encrypt(clear_text_column)).show()
I'm new in Twisted and have one question. How can I organize a persistent connection in Twisted? I have a queue and every second checks it. If have some - send on client. I can't find something better than call dataReceived every second.
Here is the code of Protocol implementation:
class SyncProtocol(protocol.Protocol):
# ... some code here
def dataReceived(self, data):
if(self.orders_queue.has_new_orders()):
for order in self.orders_queue:
self.transport.write(str(order))
reactor.callLater(1, self.dataReceived, data) # 1 second delay
It works how I need, but I'm sure that it is very bad solution. How can I do that in different way (flexible and correct)? Thanks.
P.S. - the main idea and alghorithm:
1. Client connect to server and wait
2. Server checks for update and pushes data to client if anything changes
3. Client do some operations and then wait for other data
Without knowing how the snippet you provided links into your internet.XXXServer or reactor.listenXXX (or XXXXEndpoint calls), its hard to make head-or-tails of it, but...
First off, in normal use, a twisted protocol.Protocol's dataReceived would only be called by the framework itself. It would be linked to a client or server connection directly or via a factory and it would be automatically called as data comes into the given connection. (The vast majority of twisted protocols and interfaces (if not all) are interrupt based, not polling/callLater, thats part of what makes Twisted so CPU efficient)
So if your shown code is actually linked into Twisted via a Server or listen or Endpoint to your clients then I think you will find very bad things will happen if your clients ever send data (... because twisted will call dataReceived for that, which (among other problems) would add extra reactor.callLater callbacks and all sorts of chaos would ensue...)
If instead, the code isn't linked into twisted connection framework, then your attempting to reuse twisted classes in a space they aren't designed for (... I guess this seems unlikely because I don't know how non-connection code would learn of a transport, unless your manually setting it...)
The way I've been build building models like this is to make a completely separate class for the polling based I/O, but after I instantiate it, I push my client-list (server)factory into the polling instance (something like mypollingthing.servfact = myserverfactory) there-by making a way for my polling logic to be able to call into the clients .write (or more likely a def I built to abstract to the correct level for my polling logic)
I tend to take the examples in Krondo's Twisted Introduction as one of the canonical examples of how to do twisted (other then twisted matrix), and the example in part 6, under "Client 3.0" PoetryClientFactory has a __init__ that sets a callback in the factory.
If I try blend that with the twistedmatrix chat example and a few other things, I get:
(You'll want to change sendToAll to whatever your self.orders_queue.has_new_orders() is about)
#!/usr/bin/python
from twisted.internet import task
from twisted.internet import reactor
from twisted.internet.protocol import Protocol, ServerFactory
class PollingIOThingy(object):
def __init__(self):
self.sendingcallback = None # Note I'm pushing sendToAll into here in main
self.iotries = 0
def pollingtry(self):
self.iotries += 1
print "Polling runs: " + str(self.iotries)
if self.sendingcallback:
self.sendingcallback("Polling runs: " + str(self.iotries) + "\n")
class MyClientConnections(Protocol):
def connectionMade(self):
print "Got new client!"
self.factory.clients.append(self)
def connectionLost(self, reason):
print "Lost a client!"
self.factory.clients.remove(self)
class MyServerFactory(ServerFactory):
protocol = MyClientConnections
def __init__(self):
self.clients = []
def sendToAll(self, message):
for c in self.clients:
c.transport.write(message)
def main():
client_connection_factory = MyServerFactory()
polling_stuff = PollingIOThingy()
# the following line is what this example is all about:
polling_stuff.sendingcallback = client_connection_factory.sendToAll
# push the client connections send def into my polling class
# if you want to run something ever second (instead of 1 second after
# the end of your last code run, which could vary) do:
l = task.LoopingCall(polling_stuff.pollingtry)
l.start(1.0)
# from: https://twistedmatrix.com/documents/12.3.0/core/howto/time.html
reactor.listenTCP(5000, client_connection_factory)
reactor.run()
if __name__ == '__main__':
main()
To be fair, it might be better to inform PollingIOThingy of the callback by passing it as an arg to it's __init__ (that is what is shown in Krondo's docs), For some reason, I tend to miss connections like this when I read code and find class-cheating easier to see, but that may just by my personal brain-damage.
I'm trying to connect MoinMoin with my ldap server, however it doesn't work. Am I doing the setting in a proper way?
I'm using MoinMoin from the Ubuntu's repository.
Here I show you my farmconfig.py:
from farmconfig import FarmConfig
# now we subclass that config (inherit from it) and change what's different:
class Config(FarmConfig):
# basic options (you normally need to change these)
sitename = u'MyWiki' # [Unicode]
interwikiname = u'MyWiki' # [Unicode]
# name of entry page / front page [Unicode], choose one of those:
# a) if most wiki content is in a single language
#page_front_page = u"MyStartingPage"
# b) if wiki content is maintained in many languages
page_front_page = u"FrontPage"
data_dir = '/usr/share/moin/data'
data_underlay_dir = '/usr/share/moin/underlay'
from MoinMoin.auth.ldap_login import LDAPAuth
ldap_authenticator1 = LDAPAuth(
server_uri='ldap://192.168.1.196',
bind_dn='cn=admin,ou=People,dc=company,dc=com',
bind_pw='secret',
scope=2,
referrals=0,
search_filter='(uid=%(username)s)',
givenname_attribute='givenName',
surname_attribute='sn',
aliasname_attribute='displayName',
email_attribute='mailRoutingAddress',
email_callback=None,
coding='utf-8',
timeout=10,
start_tls=0,
tls_cacertdir=None,
tls_cacertfile=None,
tls_certfile=None,
tls_keyfile=None,
tls_require_cert=0,
bind_once=True,
autocreate=True,
)
auth = [ldap_authenticator1, ]
cookie_lifetime = 1
This is an indentation issue, auth and cookie_lifetime must be within class Config (so just indent all that by 4 spaces).