twisted server, nc client - twisted

Ill demonstrate the problem I am facing with a small example.
class TestProtocol(basic.LineReceiver):
def lineReceived(self, line):
print line
Everything works fine as long as I use the telnet client to connect to the server. However, the line is not received connect and send the data using netcat. I have a feeling that this has something to do with the default delimiter being "\r\n" in twisted.
How could I make a server such that both the clients(telnet and nc) would behave in a similar manner when connecting to the client?

LineReceiver only supports one delimiter. You can specify it, but there can only be one at a time. In general, if you want to support multiple delimiters, you'll need to implement a new protocol that supports that. You could take a look at the implementation of LineReceiver for some ideas about how a line-based protocol is implemented.
netcat sends whatever you type, so the delimiter is often \n (but it may vary from platform to platform and terminal emulator to terminal emulator). For the special case of \n, which is a substring of the default LineReceiver delimiter \r\n, there's another trick you can use. Set the TestProtocol.delimiter to "\n" and then strip the "\r" off the end of the line passed to lineReceived if there is one.
class TestProtocol(basic.LineReceiver):
delimiter = "\n"
def lineReceived(self, line):
print line.rstrip("\r")

Another workaround is to use nc with the -C switch.
From the manual:
-C Send CRLF as line-ending
or as #CraigMcQueen suggested:
socket with -c switch (Ubuntu package).

Twisted's LineReceiver and LineOnlyReceiver only support one line ending delimiter.
Here is code for UniversalLineReceiver and UniversalLineOnlyReceiver, which override the dataReceived() method with support for universal line endings (any combination of CR+LF, CR or LF). The line breaks are detected with the regular expression object delimiter_re.
Note, they override functions with more code in them than I'd like, so there's a chance that they may break if the underlying Twisted implementation changes. I've tested they work with Twisted 13.2.0. The essential change is the use of delimiter_re.split() from the re module.
# Standard Python packages
import re
# Twisted framework
from twisted.protocols.basic import LineReceiver, LineOnlyReceiver
class UniversalLineReceiver(LineReceiver):
delimiter_re = re.compile(br"\r\n|\r|\n")
def dataReceived(self, data):
"""
Protocol.dataReceived.
Translates bytes into lines, and calls lineReceived (or
rawDataReceived, depending on mode.)
"""
if self._busyReceiving:
self._buffer += data
return
try:
self._busyReceiving = True
self._buffer += data
while self._buffer and not self.paused:
if self.line_mode:
try:
line, remainder = self.delimiter_re.split(self._buffer, 1)
except ValueError:
if len(self._buffer) > self.MAX_LENGTH:
line, self._buffer = self._buffer, b''
return self.lineLengthExceeded(line)
return
else:
lineLength = len(line)
if lineLength > self.MAX_LENGTH:
exceeded = self._buffer
self._buffer = b''
return self.lineLengthExceeded(exceeded)
self._buffer = remainder
why = self.lineReceived(line)
if (why or self.transport and
self.transport.disconnecting):
return why
else:
data = self._buffer
self._buffer = b''
why = self.rawDataReceived(data)
if why:
return why
finally:
self._busyReceiving = False
class UniversalLineOnlyReceiver(LineOnlyReceiver):
delimiter_re = re.compile(br"\r\n|\r|\n")
def dataReceived(self, data):
"""
Translates bytes into lines, and calls lineReceived.
"""
lines = self.delimiter_re.split(self._buffer+data)
self._buffer = lines.pop(-1)
for line in lines:
if self.transport.disconnecting:
# this is necessary because the transport may be told to lose
# the connection by a line within a larger packet, and it is
# important to disregard all the lines in that packet following
# the one that told it to close.
return
if len(line) > self.MAX_LENGTH:
return self.lineLengthExceeded(line)
else:
self.lineReceived(line)
if len(self._buffer) > self.MAX_LENGTH:
return self.lineLengthExceeded(self._buffer)

Related

Perl 6 $*ARGFILES.handles in binary mode?

I'm trying out $*ARGFILES.handles and it seems that it opens the files in binary mode.
I'm writing a zip-merge program, that prints one line from each file until there are no more lines to read.
#! /usr/bin/env perl6
my #handles = $*ARGFILES.handles;
# say $_.encoding for #handles;
while #handles
{
my $handle = #handles.shift;
say $handle.get;
#handles.push($handle) unless $handle.eof;
}
I invoke it like this: zip-merge person-say3 repeat repeat2
It fails with: Cannot do 'get' on a handle in binary mode in block at ./zip-merge line 7
The specified files are text files (encoded in utf8), and I get the error message for non-executable files as well as executable ones (with perl6 code).
The commented out line says utf8 for every file I give it, so they should not be binary,
perl6 -v: This is Rakudo version 2018.10 built on MoarVM version 2018.10
Have I done something wrong, or have I uncovered an error?
The IO::Handle objects that .handles returns are closed.
my #*ARGS = 'test.p6';
my #handles = $*ARGFILES.handles;
for #handles { say $_ }
# IO::Handle<"test.p6".IO>(closed)
If you just want get your code to work, add the following line after assigning to #handles.
.open for #handles;
The reason for this is the iterator for .handles is written in terms of IO::CatHandle.next-handle which opens the current handle, and closes the previous handle.
The problem is that all of them get a chance to be both the current handle, and the previous handle before you get a chance to do any work on them.
(Perhaps .next-handle and/or .handles needs a :!close parameter.)
Assuming you want it to work like roundrobin I would actually write it more like this:
# /usr/bin/env perl6
use v6.d;
my #handles = $*ARGFILES.handles;
# a sequence of line sequences
my $line-seqs = #handles.map(*.open.lines);
# Seq.new(
# Seq.new( '# /usr/bin/env perl6', 'use v6.d' ), # first file
# Seq.new( 'foo', 'bar', 'baz' ), # second file
# )
for flat roundrobin $line-seqs {
.say
}
# `roundrobin` without `flat` would give the following result
# ('# /usr/bin/env perl6', 'foo'),
# ('use v6.d', 'bar'),
# ('baz')
If you used an array for $line-seqs, you will need to de-itemize (.<>) the values before passing them to roundrobin.
for flat roundrobin #line-seqs.map(*.<>) {
.say
}
Actually I personally would be more likely to write something similar to this (long) one-liner.
$*ARGFILES.handles.eagerĀ».openĀ».lines.&roundrobin.flat.map: *.put
:bin is always set in this type of objects. Since you are working on the handles, you should either read line by line as instructed on the example, or reset the handle so that it's not in binary mode.

pdfminer3k - pdf2txt.py error

I want to convert my pdf files to txt files and used pdfminer3k module & pdf2txt.py, however, I got an error.
pdf2txt.py -o file.txt -t tag file.pdf
This is my code at cmd screen.
Traceback (most recent call last):
File "C:\Python36\lib\site.py", line 67, in
import os
File "C:\Python36\lib\os.py", line 409
yield from walk(new_path, topdown, onerror, followlinks)
^
SyntaxError: invalid syntax
This is an error message that I got.
Could you help me to fix this problem??
Added for reference: Great resourse:
http://www.degeneratestate.org/posts/2016/Jun/15/extracting-tabular-data-from-pdfs/
The -t flag is the type of output. The options are text, tag, xml, and html.
Tag refers to generating a tag for xml. Replace tag with text in your command and try it.
The order of optional input also matters.
You also must invoke python, your command line does'nt know what import means, yet some of your environment seems to be setup. My example is for windows cmd from Anaconda3\Scripts directory. If your in juptyer notebook or a console, you should be able to run import pdf2txt with the .py
To setup your environment you need to append the os.path.append(yourpdfdirectory) otherwise file.pdf will not be found.
Try python pdf2txt.py -t text -o file.txt file.pdf
Or if you are brave...this is how to do programmatically. The trouble with xml is if you want to get the text, each character from xml tree is returned in an arbitrary order. You can get it to work but you need to build the string character by character which is not that hard, its just logically time consuming.
fp = open(filesin,'rb')
parser = PDFParser(fp)
doc = PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
doc.initialize('')
rsrcmgr = PDFResourceManager(caching=False)
laparams = LAParams(all_texts=True)
laparams.boxes_flow = -0.2
laparams.paragraph_indent = 0.2
laparams.detect_vertical = False
#laparams.heuristic_word_margin = 0.03
laparams.word_margin = 0.2
laparams.line_margin = 0.3
outfp = open(filesin+".out.tag" ,'wb')
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
#process_pdf(rsrcmgr, device, pdfparse, pagenos,caching=c, check_extractable=True)
for p,page in enumerate(doc.get_pages()):
if p == 0: #temporary for page 1
interpreter.process_page(page)
layout = device.get_result()
alltextinbox = ''
#This is a rich environment so categorization of this object hierarchy is needed
for c,lt_obj in enumerate(layout):
#print(type(lt_obj),"This is type ",c,"th object on the ",p,"th page")
if isinstance(lt_obj,LTTextBoxHorizontal) or isinstance(lt_obj,LTTextBox) or isinstance(lt_obj,LTTextLine):
print("Type ,",type(lt_obj)," and text ..",lt_obj.get_text())
obj_textbox_line.update({lt_obj:lt_obj.get_text()})
elif p != 0:
pass
fp.close()
#print(obj_textbox_line)
#call the column finder here
#check_matching("example", "example1")
#text_doc_df = pd.DataFrame(obj_textbox_line,columns=['text'])
#print (text_doc_df)
pass
I'm working on a generic row/column matcher. If you don't want to bother, you can buy this software already for like 150 bucks for a pro converter.

Jython not finding file when using variable to pass file name

So heres the issue guys,
I have a very simple little program that reads in some setup details from a file (to make it reuseable for other sets of data) and stores them into variables.
It then uses one of those variables to open another file that I need to write some results to, as well as various search parameters.
When passing the variable to the .open() function, it fails saying it cant find the file, but when passing the exact same information, but as a written string instead of a variable, it works.
Is this a known problem, or am I just doing something wrong?
The code(problem bit bolded)
def urlTrawl(filename):
import urllib
read = open(getMediaPath(filename), "rt")
baseurl = read.readline()
orgurl = read.readline()
lasturlfile = read.readline()
linksfile = read.readline()
read.close()
webpage = ""
links = ""
counter = 0
lasturl = ""
nexturl = ""
url = ""
connection = ""
try:
read = open(lasturlfile, "rt")
lasturl = read.readline()
except IOError:
print "IOError"
webpage = connection.read()
connection.close()
**file = open(linksfile, "wt")**
file.close()
file = open(lasturlfile, "wt")
file.write(nexturl)
return 1
The information being passed in
http://www.questionablecontent.net/
http://www.questionablecontent.net/view.php?comic=2480
C:\\Users\\James\\Desktop\\comics\\qclast.txt
C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt
strip\"
src=\"
\"
Pevious
Next
f=\"
\"
EDIT: removed working code, to narrow down the problem area and updated code to use a direct reference rather then a relative one.
I found the problem in the end.
The problem was that it was reading in the \n at the end of each line in my details file, and of course the \n isn't anywhere in the website data I'm reading. Removing the last character of each read did the trick:
baseurl = baseurl[:-1]
orgurl = orgurl[:-1]
lasturlfile = lasturlfile[:-1]
linksfile = linksfile[:-1]
search1 = search1[:-1]
search2 = search2[:-1]
search3 = search3[:-1]
search4 = search4[:-1]
search5 = search5[:-1]
search6 = search6[:-1]
I might not be right, but I think this is what's happening.
You're saying this works fine:
file = open('C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt', "wt")
But this doesn't:
# After reading three lines
linksfile = read.readline()
file = open(linksfile, "wt")
There is a difference between these two. In the first piece of code, the double slashes are escapes. They resolve to single slashes when Python is done parsing. Like so:
>>> print 'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt'
C:\Users\James\Desktop\comics\comiclinksqc.txt
But when you read that same text from the file, there's no parsing of the text. That means that the string stored in your variable still has double slashes.
Try this command out. I bet it fails the same way as when you read the file path in:
file = open(r'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt', "wt")
The r stands for "raw"; it prevents Python from interpreting escape characters. If it does fail the same way, then the double slashes are your problem. To fix it, in your file, you need to remove the double slashes:
C:\Users\James\Desktop\comics\comiclinksqc.txt
This isn't a problem in CPython 2.7; I'm betting it's not in 3.x, either. CPython interprets double slashes in some manner that they are effectively a single slash (in most cases, at least). So this may be an issue specific to Jython.
If unclean paths cause errors, you might want to consider doing something to clean them up. os.path.abspath might be helpful, although I can't say if Jython's implementation works as well as CPython's:
>>> print os.path.abspath(r'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt')
C:\Users\James\Desktop\comics\comiclinksqc.txt
>>> print os.path.abspath(r'C:/Users/James/Desktop/comics/comiclinksqc.txt')
C:\Users\James\Desktop\comics\comiclinksqc.txt
I am trying to create a script which will list the datasource name and will show the connection pool utilization(pooled connection, Free Pool Size ext.)
But facing the issue when list the connection pool, if the data source name having space in between the name like "Default Datasource"
then it is listing list "Default Datasource and it is not parsing the datasource name correctly to the next function.
datasource = AdminConfig.list('DataSource', AdminConfig.getid( '/Cell:'
+ cell + '/')).splitlines()
for datasourceID in datasource:
datasourceName = datasourceID.split('(')[0]
print datasourceName
Request you to help if possible drop me mail at bubuldey#gmail.com
Regards,
Bubul

Is it possible to ack nagios alerts from the terminal?

I have nagios alerts set up to come through jabber with an http link to ack.
Is is possible there is a script I can run from a terminal on a remote workstation that takes the hostname as a parameter and acks the alert?
./ack hostname
The benefit, while seemingly mundane, is threefold. First, take http load off nagios. Secondly, nagios http pages can take up to 10-20 seconds to load, so I want to save time there. Thirdly, avoiding slower use of mouse + web interface + firefox/other annoyingly slow browser.
Ideally, I would like a script bound to a keyboard shortcut that simply acks the most recent alert. Finally, I want to take the inputs from a joystick, buttons and whatnot, and connect one to a big red button bound to the script so I can just ack the most recent nagios alert by hitting the button lol. (It would be rad too if the button had a screen on the enclosure that showed the text of the alert getting acked lol)
Make fun of me all you want, but this is actually something that would be useful to me. If I can save five seconds per alert, and I get 200 alerts per day I need to ack, that's saving me 15 minutes a day. And isn't the whole point of the sysadmin to automate what can be automated?
Thanks!
Yes, it's possible to ack nagios by parsing /var/lib/nagios3/retention.dat file.
See :
#!/usr/bin/env python
# -*- coding: utf8 -*-
# vim:ts=4:sw=4
import sys
file = "/var/lib/nagios3/retention.dat"
try:
sys.argv[1]
except:
print("Usage:\n"+sys.argv[0]+" <HOST>\n")
sys.exit(1)
f = open(file, "r")
line = f.readline()
c=0
name = {}
state = {}
host = {}
while line:
if "service_description=" in line:
name[c] = line.split("=", 2)[1]
elif "current_state=" in line:
state[c] = line.split("=", 2)[1]
elif "host_name=" in line:
host[c] = line.split("=", 2)[1]
elif "}" in line:
c+=1
line = f.readline()
for i in name:
num = int(state[i])
if num > 0 and sys.argv[1] == host[i].strip():
print(name[i].strip("\n"))
You simply have to put the host as parameter, and the script will displays the broken services.

Execute SQL from file in SQLAlchemy

How can I execute whole sql file into database using SQLAlchemy? There can be many different sql queries in the file including begin and commit/rollback.
sqlalchemy.text or sqlalchemy.sql.text
The text construct provides a straightforward method to directly execute .sql files.
from sqlalchemy import create_engine
from sqlalchemy import text
# or from sqlalchemy.sql import text
engine = create_engine('mysql://{USR}:{PWD}#localhost:3306/db', echo=True)
with engine.connect() as con:
with open("src/models/query.sql") as file:
query = text(file.read())
con.execute(query)
SQLAlchemy: Using Textual SQL
text()
I was able to run .sql schema files using pure SQLAlchemy and some string manipulations. It surely isn't an elegant approach, but it works.
# Open the .sql file
sql_file = open('file.sql','r')
# Create an empty command string
sql_command = ''
# Iterate over all lines in the sql file
for line in sql_file:
# Ignore commented lines
if not line.startswith('--') and line.strip('\n'):
# Append line to the command string
sql_command += line.strip('\n')
# If the command string ends with ';', it is a full statement
if sql_command.endswith(';'):
# Try to execute statement and commit it
try:
session.execute(text(sql_command))
session.commit()
# Assert in case of error
except:
print('Ops')
# Finally, clear command string
finally:
sql_command = ''
It iterates over all lines in a .sql file ignoring commented lines.
Then it concatenates lines that form a full statement and tries to execute the statement. You just need a file handler and a session object.
You can do it with SQLalchemy and psycopg2.
file = open(path)
engine = sqlalchemy.create_engine(db_url)
escaped_sql = sqlalchemy.text(file.read())
engine.execute(escaped_sql)
Unfortunately I'm not aware of a good general answer for this. Some dbapi's (psycopg2 for instance) support executing many statements at a time. If the files aren't huge you can just load them into a string and execute them on a connection. For others, I would try to use a command-line client for that db and pipe the data into that using the subprocess module.
If those approaches aren't acceptable, then you'll have to go ahead and implement a small SQL parser that can split the file apart into separate statements. This is really tricky to get 100% correct, as you'll have to factor in database dialect specific literal escaping rules, the charset used, any database configuration options that affect literal parsing (e.g. PostgreSQL standard_conforming_strings).
If you only need to get this 99.9% correct, then some regexp magic should get you most of the way there.
If you are using sqlite3 it has a useful extension to dbapi called conn.executescript(str), I've hooked this up via something like this and it seemed to work: (Not all context is shown but it should be enough to get the drift)
def init_from_script(script):
Base.metadata.drop_all(db_engine)
Base.metadata.create_all(db_engine)
# HACK ALERT: we can do this using sqlite3 low level api, then reopen session.
f = open(script)
script_str = f.read().strip()
global db_session
db_session.close()
import sqlite3
conn = sqlite3.connect(db_file_name)
conn.executescript(script_str)
conn.commit()
db_session = Session()
Is this pure evil I wonder? I looked in vain for a 'pure' sqlalchemy equivalent, perhaps that could be added to the library, something like db_session.execute_script(file_name) ? I'm hoping that db_session will work just fine after all that (ie no need to restart engine) but not sure yet... further research needed (ie do we need to get a new engine or just a session after going behind sqlalchemy's back?)
FYI sqlite3 includes a related routine: sqlite3.complete_statement(sql) if you roll your own parser...
You can access the raw DBAPI connection through this
raw_connection = mySqlAlchemyEngine.raw_connection()
raw_cursor = raw_connection() #get a hold of the proxied DBAPI connection instance
but then it will depend on which dialect/driver you are using which can be referred to through this list.
For pyscog2, you can just do
raw_cursor.execute(open("my_script.sql").read())
but pysqlite you would need to do
raw_cursor.executescript(open("my_script").read())
and in line with that you would need to check the documentation of whichever DBAPI driver you are using to see if multiple statements are allowed in one execute or if you would need to use a helper like executescript which is unique to pysqlite.
Here's how to run the script splitting the statements, and running each statement directly with a "connectionless" execution with the SQLAlchemy Engine. This assumes that each statement ends with a ; and that there's no more than one statement per line.
engine = create_engine(url)
with open('script.sql') as file:
statements = re.split(r';\s*$', file.read(), flags=re.MULTILINE)
for statement in statements:
if statement:
engine.execute(text(statement))
In the current answers, I did not found a solution which works when a combination of these features in the .SQL file is present:
Comments with "--"
Multi-line statements with additional comments after "--"
Function definitions which have multiple SQL-queries ending with ";" butmust be executed as a whole statement
A found a rather simple solution:
# check for /* */
with open(file, 'r') as f:
assert '/*' not in f.read(), 'comments with /* */ not supported in SQL file python interface'
# we check out the SQL file line-by-line into a list of strings (without \n, ...)
with open(file, 'r') as f:
queries = [line.strip() for line in f.readlines()]
# from each line, remove all text which is behind a '--'
def cut_comment(query: str) -> str:
idx = query.find('--')
if idx >= 0:
query = query[:idx]
return query
# join all in a single line code with blank spaces
queries = [cut_comment(q) for q in queries]
sql_command = ' '.join(queries)
# execute in connection (e.g. sqlalchemy)
conn.execute(sql_command)
Code bellow works for me in alembic migrations
from alembic import op
import sqlalchemy as sa
from ekrec.common import get_project_root
def upgrade():
path = f'{get_project_root()}/migrations/versions/fdb8492f75b2_.sql'
op.execute(open(path).read())
I had success with David's answer here, with two slight modifications:
Use get_bind() as I was working with a Session rather than an Engine
Call cursor() on the raw connection
raw_connection = myDbSession.get_bind().raw_connection()
raw_cursor = raw_connection.cursor()
raw_cursor.execute(open("my_script.sql").read())