New to Jython Need help extracting data from file - jython

I am new to scripting and programing in general. I am trying to run WebSphere command line tool, wsadmin, and it keeps failing. I am looking for answers for 2 questions about the following code:
**import sys
import os
import re
execfile('wsadminlib.py')
appName = sys.argv[1]
configLocation = "/location/of/config/"
config_prop = open(configLocation + appName+"-Config.csv", "r")
for line in config_prop:
line = line.split(",")
print line**
I launch run the scripts in as wsadmin and from the command line is as follows:
>>>>./wsadmin.sh -lang jython -f deploy.sh param1
Questions:
The problem is that it fails on the "for line in config_prop" with AttributeError: getitem?
when I run this through python on the same machine, the code works. Just not when I run it through wsadmin tool?
Is there other ways to extract data from txt or csv with comma delimited and setting a variable for each word that is only one line long.

Issue has been resolved. the libraries used is 2.1 and the syntax i was using was post 2.2.

Related

What is the structure of the executable transformation script for transform_script of GCSFileTransformOperator?

Currently working on a task in Airflow that requires pre-processing a large csv file using GCSFileTransformOperator. Reading the documentation on the class and its implementation, but don't quite understand how the executable transformation script for transform_script should be structured.
For example, is the following script structure correct? If so, does that mean with GCSFileTransformOperator, Airflow is calling the executable transformation script and passing arguments from command line?
# Import the required modules
import preprocessing modules
import sys
# Define the function that passes source_file and destination_file params
def preprocess_file(source_file, destination_file):
# (1) code that processes the source_file
# (2) code then writes to destination_file
# Extract source_file and destination_file from the list of command-line arguments
source_file = sys.argv[1]
destination_file = sys.argv[2]
preprocess_file(source_file, destination_file)
GCSFileTransformOperator passes the script to subprocess.Popen, so your script will work but you will need to add a shebang #!/usr/bin/python (of wherever Python is on your path in Airflow).
Your arguments are correct and the format of your script can be anything you want. Airflow passes in the path of the downloaded file, and a temporary new file:
cmd = (
[self.transform_script]
if isinstance(self.transform_script, str)
else self.transform_script
)
cmd += [source_file.name, destination_file.name]
with subprocess.Popen(
args=cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True
) as process:
# ...
process.wait()
(you can see the source here)

Snakemake --forceall --dag results in mysterius Error: <stdin>: syntax error in line 1 near 'File' from Graphvis

My attempts to construct DAG or rulegraph from RNA-seq pipeline using snakemake results in error message from graphviz. 'Error: : syntax error in line 1 near 'File'.
The error can be corrected by commenting out two print commands with no visible syntax errors. I have tried converting the scripts from UTF-8 to Ascii in Notepad++. Graphviz seems to have issues with these two specific print statements because there are other print statements within the pipeline scripts. Even though the error is easily corrected, it's still annoying because I would like colleagues to be able to construct these diagrams for their publications without hassle, and the print statements inform them of what is happening in the workflow. My pipeline consists of a snakefile and multiple rule files, as well as a config file. If the offending line is commented out in the Snakefile, then graphviz takes issue with another line in a rule script.
#######Snakefile
!/usr/bin/env Python
import os
import glob
import re
from os.path import join
import argparse
from collections import defaultdict
import fastq2json
from itertools import chain, combinations
import shutil
from shutil import copyfile
#Testing for sequence file extension
directory = "."
MainDir = os.path.abspath(directory) + "/"
## build the dictionary with full path for each for sequence files
fastq=glob.glob(MainDir+'*/*'+'R[12]'+'**fastq.gz')
if len(fastq) > 0 :
print('Sequence file extensions have fastq')
os.system('scripts/Move.sh')
fastq2json.fastq_json(MainDir)
else :
print('File extensions are good')
######Rule File
if not config["GroupdFile"]:
os.system('Rscript scripts/Table.R')
print('No GroupdFile provided')
snakemake --forceall --rulegraph | dot -Tpdf > dag.pdf should result in an pdf output showing the snakemake workflow, but if the two lines aren't commented out it results in Error: : syntax error in line 1 near
To understand what is going on take a close look at the command to generate your dag.pdf.
Try out the first part of your command:
snakemake --forceall --rulegraph
What does that do? It prints out the dag in text form.
By using a | symbol you 'pipe' (pass along) this print to the next part of your command:
dot -Tpdf > dag.pdf
And this part makes the actual pdf from the text that is 'piped' and stores in in dag.pdf. The problem is that when your snakefile makes print statements these prints also get 'piped' to the second half of your command, which interferes with the making of your dag.pdf.
A kinda hackish way how I solved the issue to be able to print, but also to be able to generate the dag is to use the logging functionality of snakemake. It is not a documented way, and a bit hackish, but works really well for me:
from snakemake.logging import logger
logger.info("your print statement here!")

How to open and read a .gz file in Nim (preferably line by line)

I just sat down to write my first Nim script to parse a .vcf (Variant Call Format) file. This file format stores genetic mutations from sequencing data.
For scripting languages, I 'grew up' on Perl and later migrated to Python, but I would love to use a language with the speed that Nim offers. I realize Nim is still young, but I couldn't even find a clear example for how to open and read a .gz (gzip) file (preferably line by line).
Can anyone provide a simple example to open and read a gzip file using Nim, line by line?
In Python, I'm accustomed to the following (uber-simple) code:
import gzip
my_file = gzip.open('my_file.vcf.gz', 'w')
for line in my_file:
# do something
my_file.close()
I have seen related questions, but they're not clear. The posts are also relatively old and I hope/suspect something better has come about. Here's what I've found:
Read gzip-compressed file line by line
File, FileStream, and GZFileStream
Reading files from tar.gz archive in Nim
Really appreciate it.
P.S. I also think it would be useful if someone created a Nim tag in StackOverflow. I do not have the reputation to create tags.
Just in case you need to handle VCF rather than .gz, there's a nice wrapper for htslib written by Brent Pedersen:
https://github.com/brentp/hts-nim
You need to install the htslib in your system, and then require the library in your .nimble file with requires "hts", or install the library with nimble install hts. If you are going to do NGS analysis in Nim you'll need it.
The code you need:
import hts
var v:VCF
doAssert open(v, "myfile.vcf.gz")
# Here you have the VCF file loaded in v, and can access the headers through
# v.header property
for record in v:
# Here you get a Record object per line, e.g. extract the Ref and Alts:
echo v.REF, " ", v.ALT
v.close()
Be sure to follow the docs, because some things differ from python, specially when getting the INFO and FORMAT fields.
Checkout the whole Brent repo. It has plenty of wrappers, code samples and utilities to handle NGS problems (e.g. an ultrafast coverage tool utility called Mosdepth).
Per suggestion from Maurice Meyer, I looked at the tests for the Nim zip package. It turned out to be quite simple. This is my first Nim script, so my apologies if I didn't follow convention, etc.
import zip/gzipfiles # Import zip package
block:
let vcf = newGzFileStream("my_file.vcf.gz") # Open gzip file
defer: outFile.close() # Close file (like a 'final' statement in 'try' block)
var line: string # Declare line variable
# Loop over each line in the file
while not vcf.atEnd():
line = vcf.readLine()
# Cure disease with my VCF file
To install the zip package, I simply ran because it is already in the Nim package library:
> nimble refresh
> nimble install zip
I tried to use Nim some time ago to parse a fastq or fastq.gz file.
The code should be available here:
https://gitlab.pasteur.fr/bli/qaf_demux/blob/master/Nim/src/qaf_demux.nim
I don't remember exactly how this works, but apparently, I did an import zip/gzipfiles and used newGZFileStream on the input file name to obtain a Stream from which lines can be read using .readLine() in this piece of code:
proc fastqParser(stream: Stream): iterator(): Fastq =
result = iterator(): Fastq =
var
nameLine: string
nucLine: string
quaLine: string
while not stream.atEnd():
nameLine = stream.readLine()
nucLine = stream.readLine()
discard stream.readLine()
quaLine = stream.readLine()
yield [nameLine, nucLine, quaLine]
It is used in something that amounts to this piece of code:
let inputFqs = fastqParser(newGZFileStream($inFastqFilename))
Hopefully you can adapt this to your case.
My .nimble file has a requires "zip#head". I suppose this triggers the installation of zip/gzipfiles.

Can I execute .sql file from SQLite command line when I don't have a .db file?

I've been writing SQL in environments where the databases and tables are all easy to pull in using simple 'FROM db.table'. Now I'm trying to do my own project on .csv files. I want to be able to write all of my queries in .sql files and execute them using command line.
I'm uncertain about the following:
What the best program to use is.
Wow to execute a .sql file from command line.
How to import a .csv file.
Yipes! I'm new to using command line and I'm new to pulling in my own tables.
I'm currently trying out SQLlite3, but from the documentation* it doesn't look like I can simply execute a .sql file using SQLlite3 in command line.
I've tried running "sqlite3 HelloWorld.sql" in command line for a file that just has "SELECT 'Hello World';" in it and this is what I get:
SQLite version 3.9.2 2015-11-02 18:31:45
Enter ".help" for usage hints.
sqlite>
Any advice would be greatly appreciated!
https://www.sqlite.org/cli.htmlb
On Windows you can execute SQL (files) via the command line:
>sqlite3 "" "SELECT 'Hello World!';"
Hello World!
>sqlite3 "" ".read HelloWorld.sql"
Hello World!
This won't create a database file because the first parameter is empty ("") and would normally yield in a database file.

How to read live output from subprocess python 2.7 and Apache

I have an Apache web server and I made a python script to run a command. Command that I'm running is launching a ROS launch file, that is working indefinitely. I would like to read output from the subprocess live and display it in the page. With my code so far I could only manage to make output to be printed after I terminate the process. I've tried all kinds of solutions from the web but none of them seem to work
command = "roslaunch package test.launch"
proc = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
env=env,
shell=True,
bufsize=1,
)
print "Content-type:text/html\r\n\r\n"
for line in iter(proc.stdout.readline, ''):
strLine = str(line).rstrip()
print(">>> " + strLine)
print("<br/>")
The problem is that the output of roslaunch is being buffered. subprocess is not the best tool for real-time output processing in such situation, but there is a perfect tool for just that task in Python: pexpect. The following snippet should do the trick:
import pexpect
command = "roslaunch package test.launch"
p = pexpect.spawn(command)
print "Content-type:text/html\r\n\r\n"
while not p.eof():
strLine = p.readline()
print(">>> " + strLine)
print("<br/>")
Andrzej Pronobis' answer above suffices for UNIX-based systems but the package pexpect does not work as effectively as one would expect for Windows in certain particular scenarios. Here, spawn() doesn't work for Windows as expected. We still can use it with some alterations that can be seen here in the official docs.
The better way here might be to use wexpect (official docs here). It caters to Windows alone.