How to open and read a .gz file in Nim (preferably line by line) - gzip

I just sat down to write my first Nim script to parse a .vcf (Variant Call Format) file. This file format stores genetic mutations from sequencing data.
For scripting languages, I 'grew up' on Perl and later migrated to Python, but I would love to use a language with the speed that Nim offers. I realize Nim is still young, but I couldn't even find a clear example for how to open and read a .gz (gzip) file (preferably line by line).
Can anyone provide a simple example to open and read a gzip file using Nim, line by line?
In Python, I'm accustomed to the following (uber-simple) code:
import gzip
my_file = gzip.open('my_file.vcf.gz', 'w')
for line in my_file:
# do something
my_file.close()
I have seen related questions, but they're not clear. The posts are also relatively old and I hope/suspect something better has come about. Here's what I've found:
Read gzip-compressed file line by line
File, FileStream, and GZFileStream
Reading files from tar.gz archive in Nim
Really appreciate it.
P.S. I also think it would be useful if someone created a Nim tag in StackOverflow. I do not have the reputation to create tags.

Just in case you need to handle VCF rather than .gz, there's a nice wrapper for htslib written by Brent Pedersen:
https://github.com/brentp/hts-nim
You need to install the htslib in your system, and then require the library in your .nimble file with requires "hts", or install the library with nimble install hts. If you are going to do NGS analysis in Nim you'll need it.
The code you need:
import hts
var v:VCF
doAssert open(v, "myfile.vcf.gz")
# Here you have the VCF file loaded in v, and can access the headers through
# v.header property
for record in v:
# Here you get a Record object per line, e.g. extract the Ref and Alts:
echo v.REF, " ", v.ALT
v.close()
Be sure to follow the docs, because some things differ from python, specially when getting the INFO and FORMAT fields.
Checkout the whole Brent repo. It has plenty of wrappers, code samples and utilities to handle NGS problems (e.g. an ultrafast coverage tool utility called Mosdepth).

Per suggestion from Maurice Meyer, I looked at the tests for the Nim zip package. It turned out to be quite simple. This is my first Nim script, so my apologies if I didn't follow convention, etc.
import zip/gzipfiles # Import zip package
block:
let vcf = newGzFileStream("my_file.vcf.gz") # Open gzip file
defer: outFile.close() # Close file (like a 'final' statement in 'try' block)
var line: string # Declare line variable
# Loop over each line in the file
while not vcf.atEnd():
line = vcf.readLine()
# Cure disease with my VCF file
To install the zip package, I simply ran because it is already in the Nim package library:
> nimble refresh
> nimble install zip

I tried to use Nim some time ago to parse a fastq or fastq.gz file.
The code should be available here:
https://gitlab.pasteur.fr/bli/qaf_demux/blob/master/Nim/src/qaf_demux.nim
I don't remember exactly how this works, but apparently, I did an import zip/gzipfiles and used newGZFileStream on the input file name to obtain a Stream from which lines can be read using .readLine() in this piece of code:
proc fastqParser(stream: Stream): iterator(): Fastq =
result = iterator(): Fastq =
var
nameLine: string
nucLine: string
quaLine: string
while not stream.atEnd():
nameLine = stream.readLine()
nucLine = stream.readLine()
discard stream.readLine()
quaLine = stream.readLine()
yield [nameLine, nucLine, quaLine]
It is used in something that amounts to this piece of code:
let inputFqs = fastqParser(newGZFileStream($inFastqFilename))
Hopefully you can adapt this to your case.
My .nimble file has a requires "zip#head". I suppose this triggers the installation of zip/gzipfiles.

Related

GIMP Script.Fu script to batch convert JPEG to PNG

Can someone give me the script I would need to run to batch convert many *.jpeg files to *.png in Script.Fu in GIMP?
Currently I am spending way too much time manually exporting every image and it's a waste of time.
I can't install anything right now so can't use alternative applications.
Alright, after a lot of trials and errors I finally figured out how to convert one file format to another using only GIMP.
This is the Script-Fu script for conversion to PNG:
(
let* ((filename "{{filename}}")
(output "{{output}}")
(image (car (gimp-file-load 1 filename filename)))
(drawable (car (gimp-image-get-active-layer image))))
(file-png-save-defaults 1 image drawable output output)
)
Where {{filename}} is input file that needs to be converted (a jpeg file, for example), {{output}} is the output file that you need (it can be simply the same file name but with PNG extension)
How to run it: it can probably be improved
gimp -i -n -f -d --batch "{{one-line script-fu}}"
More about command line options you can find in GIMP online documentation.
The place that needs to be changed is {{one-line script-fu}} and it has to be... one-line! You can probably do all of this in one file using cmd (in case if you use Windows), but for me it was easier to use Python, so here's the script for it:
import subprocess, os
def convert_to_png(file_dds):
#Loads the command to run gimp cli (second code block)
#Note: remove "{{one-line script-fu}}" and leave one space after the --batch
with open("gimp-convert.bat", "r") as f:
main_script = f.read()
#Prepares the Script-Fu script to be run, replacing necessary file names and makes it one-line (the firs code block)
with open("gimp-convert-png.fu", "r") as f:
script = f.read().replace("\n", " ").replace("{{filename}}", file_dds) \
.replace("{{output}}", file_dds[:-3]+"PNG").replace("\\", "\\\\").replace("\"", "\\\"")
subprocess.run(main_script + " \"" + script + "\" --batch \"(gimp-quit 1)\"",
cwd = os.getcwd(),
shell = True)
And you should get your file converted to PNG!
I needed this for my texture upscale project, all of the code below you can find here.
Tested with GIMP 2.10
The real solution is to use ImageMagicks convert, as simple as magick convert some.jpeg some.png. There must be a "portable" version somewhere that you can use off a USB key.
Otherwise with Gimp, a much less manual way that doesn't need for a new script, since it uses an existing script:
get/install ofn-export-layers
File>Open the first JPEG
File>Open as layers more Jpegs. You can select several/all jpegs in one call (actual number limited by available RAM mostly). Once this is done you have many Jpegs stacked in the same image
File>Export all layers, making sure the name pattern you use ends in .png (the doc that comes with the script explains how that works).

Lua syntax highlighting latex for arXiv

I have a latex file which needed to include snippets of Lua code (for display, not execution), so I used the minted package. It requires latex to be run with the latex -shell-escape flag.
I am trying to upload a PDF submission to arXiv. The site requires these to be submitted as .tex, .sty and .bbl, which they will automatically compile to PDF from latex. When I tried to submit to arXiv, I learned that there was no way for them to activate the -shell-escape flag.
So I was wondering if any of you knew a way to highlight Lua code in latex without the -shell-escape flag. I tried the listings package, but I can't get it to work for Lua on my Ubuntu computer.
You can set whichever style you want inline using listings. It's predefined Lua language has all the keywords and associated styles identified, so you can just change it to suit your needs:
\documentclass{article}
\usepackage{listings,xcolor}
\lstdefinestyle{lua}{
language=[5.1]Lua,
basicstyle=\ttfamily,
keywordstyle=\color{magenta},
stringstyle=\color{blue},
commentstyle=\color{black!50}
}
\begin{document}
\begin{lstlisting}[style=lua]
-- defines a factorial function
function fact (n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
print("enter a number:")
a = io.read("*number") -- read a number
print(fact(a))
\end{lstlisting}
\end{document}
Okay so lhf found a good solution by suggesting the GNU source-hightlight package. I basically took out each snippet of lua code from the latex file, put it into an appropriately named [snippet].lua file and ran the following on it to generate a [snippet]-lua.tex :
source-highlight -s lua -f latex -i [snippet].lua -o [snippet]-lua.tex
And then I included each such file into the main latex file using :
\input{[snippet]-lua}
The result really isn't as nice as that of the minted package, but I am tired of trying to convince the arXiv admin to support minted...

Ansys multiphysics: blank output file

I have a model of a heating process on Ansys Multiphysics, V11.
After running the simulation, I have a script to plot a temperature profile:
!---------------- POST PROCESSING -----------------------
/post1 ! tdatabase postprocessor
!---define profile temperature
path,s_temp1,2,,100 ! define a path
ppath,1,,dop/2,0,0 ! create a path point
ppath,2,,dop/2,1.5,0 ! create a path point
PDEF,surf_t1,TEMP, ,noav ! print a path
plpath,surf_t1 ! plot a path
What I now need, is to save the resulting path in a text file. I have already looked online for a solution, and found the following code to do it, which I appended after the lines above:
/OUTPUT,filename,extension
PRPATH,surf_t1
/OUTPUT
Ansys generates the file filename.extension but it is empty. I tried to place the OUTPUT command in a few locations in the script, but without any success.
I suspect I need to define something else, but I have no idea where to look, as Ansys documentation online is terribly chaotic, and all internet pages I've opened before writing this question are not better.
A final note: Ansys V11 is an old version of the software, but I don't want to upgrade it and fit the old model to the new software.
For the output of the simulation (which includes all calculation steps, and sub-steps description and node-by-node results) the output must be declared in the beginning of the code, and not in the postprocessing phase.
Declaring
/OUTPUT,filename,extension
in the preamble of the main script makes such that the output is stored in the right location, with the desired extension. At the end of the scripts, you must then declare
/OUTPUT
to reset the output file location for ANSYS.
The output to the PATH call made in the postprocessing script is however not printed in the file.
It is convenient to use
*CFOPEN,file,ext
*VWRITE,Vector(1,1).Vector(1,2)
(2F12.6)
*CFCLOSE
where Vector(1,1) is a two column array created by *DIM, and stores your data to output to file
As this is a special command, run it from file i.e. macro_output.mac

How to document Visual Basic with Doxygen

I am trying to use some Doxygen filter for Visual Basic in Windows.
I started with Vsevolod Kukol filter, based on gawk.
There are not so many directions.
So I started using his own commented VB code VB6Module.bas and, by means of his vbfilter.awk, I issued:
gawk -f vbfilter.awk VB6Module.bas
This outputs a C-like code on stdin. Therefore I redirected it to a file with:
gawk -f vbfilter.awk VB6Module.bas>awkout.txt
I created this Doxygen test.cfg file:
PROJECT_NAME = "Test"
OUTPUT_DIRECTORY = test
GENERATE_LATEX = NO
GENERATE_MAN = NO
GENERATE_RTF = NO
CASE_SENSE_NAMES = NO
INPUT = awkout.txt
QUIET = NO
JAVADOC_AUTOBRIEF = NO
SEARCHENGINE = NO
To produce the documentation I issued:
doxygen test.cfg
Doxygen complains as the "name 'VB6Module.bas' supplied as the second argument in the \file statement is not an input file." I removed the comment #file VB6Module.bas from awkout.txt. The warning stopped, but in both cases the documentation produced was just a single page with the project name.
I tried also the alternative filter by Basti Grembowietz in Python vbfilter.py. Again without documentation, again producing errors and without any useful output.
After trials and errors I solved the problem.
I was unable to convert a .bas file in a format such that I can pass it to Doxygen as input.
Anyway, following #doxygen user suggestions, I was able to create a Doxygen config file such that it can interpret the .bas file comments properly.
Given the file VB6Module.bas (by the Doxygen-VB-Filter author, Vsevolod Kukol), commented with Doxygen style adapted for Visual Basic, I wrote the Doxygen config file, test.cfg, as follows:
PROJECT_NAME = "Test"
OUTPUT_DIRECTORY = test
GENERATE_LATEX = NO
GENERATE_MAN = NO
GENERATE_RTF = NO
CASE_SENSE_NAMES = NO
INPUT = readme.md VB6Module.bas
QUIET = YES
JAVADOC_AUTOBRIEF = NO
SEARCHENGINE = NO
FILTER_PATTERNS = "*.bas=vbfilter.bat"
where:
readme.md is any Markdown file that can used as the main documentation page.
vbfilter.bat contains:
#echo off
gawk.exe -f vbfilter.awk "%1%"
vbfilter.awk by the filter author is assumed to be in the same folder as the input files to be documented and obviously gawk should be in the path.
Running:
doxygen test.cfg
everything is smooth, apart two apparently innocuous warnings:
gawk: vbfilter.awk:528: warning: escape sequence `\[' treated as plain `['
gawk: vbfilter.awk:528: warning: escape sequence `\]' treated as plain `]'
Now test\html\index.html contains the proper documentation as extracted by the ".bas" and the Markdown files.
Alright I did some work:
You can download this .zip file. It contains:
MakeDoxy.bas The macro that makes it all happen
makedoxy.cmd A shell script that will be executed by MakeDoxy
configuration Folder that contains doxygen and gawk binaries which are needed to create the doxygen documentation as well as some additional filtering files which were already used by the OP.
source Folder that contains example source code for doxygen
How To Use:
Note: I tested it with Excel 2010
Extract VBADoxy.zip somehwere (referenced as <root> from now on)
Import MakeDoxy.bas into your VBA project. You can also import the files from source or use your own doxygen-documented VBA code files but you'll need at least one documented file in the same VBA project.
Add "Microsoft Visual Basic for Applications Extensibility 5.3" or higher to your VBA Project References (did not test it with lower versions). It's needed for the export-part (VBProject, VBComponent).
Run macro MakeDoxy
What is going to happen:
You will be asked for the <root> folder.
You will be asked if you want to delete <root>\source afterwards It is okay to delete those files. They will not be removed from your VBA Project.
MakeDoxy will export all .bas, cls and .frm files to location:<root>\source\<modulename>\<modulename>(.bas|.cls|.frm)
cmd.exewill be commanded to run makedoxy.cmd and delete <root>\source if you've chosen that way which alltogether will result in your desired documentation.
A logfile MakeDoxy.bas.logwill be re-created each time MakeDoxy is executed.
You can play with configuration\vbdoxy.cfg a little if you want to change doxygens behavior.
There is still some room for improvements but I guess this is something one can work with.

Ignore includes with #pycparser and define multiple Subgraphs in #pydot

I am new to stackoverflow, but I got a lot of help until now, thanks to the community for that.
I'm trying to create a software showing me caller depandencys for legacycode.
I'parsing a directory with c code with pycparcer, and for each file i want to create a subgraph with pydot.
Two questions:
When parsing a c file, the parser references the #includes, an i get also functions in my AST, from the included files. How can i know, if the function is included, or originaly from this actual file/ or ignore the #includes??
For each file i want to create a subgraph, an then add all functions in this file to this subgraph. I don't know how many subgraphs i have to create...
I have a set of files, where each file is a frozenset with the functions of this file
somthing like this is pssible?
for files in SetOfFiles:
#how to create subgraph with name of files?
for function in files:
self.graph.add_node(pydot.Node(funktion)) #--> add node to subgraph "files"
I hope you got my challange... any ideas?
Thanks!
EDIT:
I solved the question about pydot, it was quiet easy... So I stay with my pycparser problem :(
for files in ListOfFuncs:
cluster_x = pydot.Cluster(files, label=files)
for functions in files:
cluster_x.add_node(pydot.Node(functions))
graph.add_subgraph(cluster_x)
I can address the pycparser part. The preprocessor leaves #line directives that specify which file & line code came for, and pycparser consumes those. You can get that information from the AST it creates (see tests for an example).