SE MTF Nyquist plugin for images - batch-processing

I'd like to write a macro with SE MTF Nyquist plugin in fiji for a stack or many images in a directory. But I have to set some parameter for every image in a setting window. Any ideas?
macro "TD2"{
inputFolder = getDirectory('');
outputFolder = gerDirectory('');
setBatchMode(true);
images = getFileList(inputFolder);
for ( i=0; i <images.length;i++){
inputPath = inputFolder + images[i];
open(inputPath);
makeRectangle(1632, 568, 684, 296);
run("SE MTF Nyquist");
outputPath = outputFolder + images[i];
save(outputPath);
close();
}
}
setBatchMode(false);
exit();

It depends if you want to use the same parameters, or different ones, for every image.
But before you tackle that question, you need to know if the macro can pass the parameters to the plugin. Some plugins are macro-recordable and some are not.
Try recording the command in the Macro Recorder and see if the parameters show up in the recorder window. If so, then you can replace them in your macro with the desired numbers or variables as needed.
If the plugin is not macro-friendly (that is, you just get the "run" command with no arguments as shown in your code), you could try to modify it following the guidelines in section 11 "Designing macro-aware plugins" in the macro programming guide.

Related

Any way to remove the red/green color of report.html output from robot framework api using python script

I am using below code to run the suite i have created. it is working fine.
suite.run()
# suite.run(critical='Medium')
# Reading from output XML and creating Report and Log files
writer = ResultWriter('output.xml')
writer.write_results(report='report.html', log='log.html')
i need to remove the green/red color coming in the report.html output file.
is there any argument in the function call to do that?
You could pass parameter reportbackground, to write_results function.
writer = ResultWriter('output.xml')
writer.write_results(report='report.html', log='log.html', reportbackground='#ffffff:#ffffff:#ffffff')

Ansys multiphysics: blank output file

I have a model of a heating process on Ansys Multiphysics, V11.
After running the simulation, I have a script to plot a temperature profile:
!---------------- POST PROCESSING -----------------------
/post1 ! tdatabase postprocessor
!---define profile temperature
path,s_temp1,2,,100 ! define a path
ppath,1,,dop/2,0,0 ! create a path point
ppath,2,,dop/2,1.5,0 ! create a path point
PDEF,surf_t1,TEMP, ,noav ! print a path
plpath,surf_t1 ! plot a path
What I now need, is to save the resulting path in a text file. I have already looked online for a solution, and found the following code to do it, which I appended after the lines above:
/OUTPUT,filename,extension
PRPATH,surf_t1
/OUTPUT
Ansys generates the file filename.extension but it is empty. I tried to place the OUTPUT command in a few locations in the script, but without any success.
I suspect I need to define something else, but I have no idea where to look, as Ansys documentation online is terribly chaotic, and all internet pages I've opened before writing this question are not better.
A final note: Ansys V11 is an old version of the software, but I don't want to upgrade it and fit the old model to the new software.
For the output of the simulation (which includes all calculation steps, and sub-steps description and node-by-node results) the output must be declared in the beginning of the code, and not in the postprocessing phase.
Declaring
/OUTPUT,filename,extension
in the preamble of the main script makes such that the output is stored in the right location, with the desired extension. At the end of the scripts, you must then declare
/OUTPUT
to reset the output file location for ANSYS.
The output to the PATH call made in the postprocessing script is however not printed in the file.
It is convenient to use
*CFOPEN,file,ext
*VWRITE,Vector(1,1).Vector(1,2)
(2F12.6)
*CFCLOSE
where Vector(1,1) is a two column array created by *DIM, and stores your data to output to file
As this is a special command, run it from file i.e. macro_output.mac

How to document Visual Basic with Doxygen

I am trying to use some Doxygen filter for Visual Basic in Windows.
I started with Vsevolod Kukol filter, based on gawk.
There are not so many directions.
So I started using his own commented VB code VB6Module.bas and, by means of his vbfilter.awk, I issued:
gawk -f vbfilter.awk VB6Module.bas
This outputs a C-like code on stdin. Therefore I redirected it to a file with:
gawk -f vbfilter.awk VB6Module.bas>awkout.txt
I created this Doxygen test.cfg file:
PROJECT_NAME = "Test"
OUTPUT_DIRECTORY = test
GENERATE_LATEX = NO
GENERATE_MAN = NO
GENERATE_RTF = NO
CASE_SENSE_NAMES = NO
INPUT = awkout.txt
QUIET = NO
JAVADOC_AUTOBRIEF = NO
SEARCHENGINE = NO
To produce the documentation I issued:
doxygen test.cfg
Doxygen complains as the "name 'VB6Module.bas' supplied as the second argument in the \file statement is not an input file." I removed the comment #file VB6Module.bas from awkout.txt. The warning stopped, but in both cases the documentation produced was just a single page with the project name.
I tried also the alternative filter by Basti Grembowietz in Python vbfilter.py. Again without documentation, again producing errors and without any useful output.
After trials and errors I solved the problem.
I was unable to convert a .bas file in a format such that I can pass it to Doxygen as input.
Anyway, following #doxygen user suggestions, I was able to create a Doxygen config file such that it can interpret the .bas file comments properly.
Given the file VB6Module.bas (by the Doxygen-VB-Filter author, Vsevolod Kukol), commented with Doxygen style adapted for Visual Basic, I wrote the Doxygen config file, test.cfg, as follows:
PROJECT_NAME = "Test"
OUTPUT_DIRECTORY = test
GENERATE_LATEX = NO
GENERATE_MAN = NO
GENERATE_RTF = NO
CASE_SENSE_NAMES = NO
INPUT = readme.md VB6Module.bas
QUIET = YES
JAVADOC_AUTOBRIEF = NO
SEARCHENGINE = NO
FILTER_PATTERNS = "*.bas=vbfilter.bat"
where:
readme.md is any Markdown file that can used as the main documentation page.
vbfilter.bat contains:
#echo off
gawk.exe -f vbfilter.awk "%1%"
vbfilter.awk by the filter author is assumed to be in the same folder as the input files to be documented and obviously gawk should be in the path.
Running:
doxygen test.cfg
everything is smooth, apart two apparently innocuous warnings:
gawk: vbfilter.awk:528: warning: escape sequence `\[' treated as plain `['
gawk: vbfilter.awk:528: warning: escape sequence `\]' treated as plain `]'
Now test\html\index.html contains the proper documentation as extracted by the ".bas" and the Markdown files.
Alright I did some work:
You can download this .zip file. It contains:
MakeDoxy.bas The macro that makes it all happen
makedoxy.cmd A shell script that will be executed by MakeDoxy
configuration Folder that contains doxygen and gawk binaries which are needed to create the doxygen documentation as well as some additional filtering files which were already used by the OP.
source Folder that contains example source code for doxygen
How To Use:
Note: I tested it with Excel 2010
Extract VBADoxy.zip somehwere (referenced as <root> from now on)
Import MakeDoxy.bas into your VBA project. You can also import the files from source or use your own doxygen-documented VBA code files but you'll need at least one documented file in the same VBA project.
Add "Microsoft Visual Basic for Applications Extensibility 5.3" or higher to your VBA Project References (did not test it with lower versions). It's needed for the export-part (VBProject, VBComponent).
Run macro MakeDoxy
What is going to happen:
You will be asked for the <root> folder.
You will be asked if you want to delete <root>\source afterwards It is okay to delete those files. They will not be removed from your VBA Project.
MakeDoxy will export all .bas, cls and .frm files to location:<root>\source\<modulename>\<modulename>(.bas|.cls|.frm)
cmd.exewill be commanded to run makedoxy.cmd and delete <root>\source if you've chosen that way which alltogether will result in your desired documentation.
A logfile MakeDoxy.bas.logwill be re-created each time MakeDoxy is executed.
You can play with configuration\vbdoxy.cfg a little if you want to change doxygens behavior.
There is still some room for improvements but I guess this is something one can work with.

Batch OCR Program for PDFs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
This has been asked before, but I don't really know if the answers help me. Here is my problem: I got a bunch of (10,000 or so) pdf files. Some were text files that were saved using adobe's print feature (so their text is perfect and I don't want to risk screwing them up). And some were scanned images (so they don't have any text and I will have to settle for OCR). The files are in the same directory and I can't tell which is which. Ultimately I want to turn them into .txt files and then do string processing on them. So I want the most accurate OCR possible.
It seems like people have recommended:
adobe pdf (I don't have a licensed copy of this so ... plus if ABBYY finereader or something is better, why pay for it if I won't use it)
ocropus (I can't figure out how to use this thing),
Tesseract (which seems like it was great in 1995 but I'm not sure if there's something more accurate plus it doesn't do pdfs natively and I've have to convert to TIFF. that raises its own problem as I don't have a licensed copy of acrobat so I don't know how I'd convert 10,000 files to tiff. plus i don't want 10,000 30 page documents turned into 30,000 individual tiff images).
wowocr
pdftextstream (that was from 2009)
ABBYY FineReader (apparently its' $$$, but I will spend $600 to get this done if this thing is significantly better, i.e. has more accurate ocr).
Also I am a n00b to programming so if it's going to take like weeks to learn how to do something, I would rather pay the $$$. Thx for input/experiences.
BTW, I'm running Linux Mint 11 64 bit and/or windows 7 64 bit.
Here are the other threads:
Batch OCRing PDFs that haven't already been OCR'd
Open source OCR
PDF Text Extraction Approach Using OCR
https://superuser.com/questions/107678/batch-ocr-for-many-pdf-files-not-already-ocred
Just to put some of your misconceptions straight...
" I don't have a licensed copy of acrobat so I don't know how I'd convert 10,000 files to tiff."
You can convert PDFs to TIFF with the help of Free (as in liberty) and free (as in beer) Ghostscript. Your choice if you want to do it on Linux Mint or on Windows 7. The commandline for Linux is:
gs \
-o input.tif \
-sDEVICE=tiffg4 \
input.pdf
"i don't want 10,000 30 page documents turned into 30,000 individual tiff images"
You can have "multipage" TIFFs easily. Above command does create such TIFFs of the G4 (fax tiff) flavor. Should you even want single-page TIFFs instead, you can modify the command:
gs \
-o input_page_%03d.tif \
-sDEVICE=tiffg4 \
input.pdf
The %03d part of the output filename will automatically translate into a series of 001, 002, 003 etc.
Caveats:
The default resolution for the tiffg4 output device is 204x196 dpi. You probably want a better value. To get 720 dpi you should add -r720x720 to the commandline.
Also, if your Ghostscript installation uses letter as its default media size, you may want to change it. You can use -gXxY to set widthxheight in device points. So to get ISO A4 output page dimensions in landscape you can add a -g8420x5950 parameter.
So the full command which controls these two parameters, to produce 720 dpi output on A4 in portrait orientation, would read:
gs \
-o input.tif \
-sDEVICE=tiffg4 \
-r720x720 \
-g5950x8420 \
input.pdf
Figured I would try to contribute by answering my own question (have written some nice code for myself and could not have done it without help from this board). If you cat the pdf files in unix (well, osx for me), then the pdf files that have text will have the word "Font" in them (as a string, but mixed in with other text) b/c that's how the file tells Adobe what fonts to do display.
The cat command in bash seems to have the same output as reading the file in binary mode in python (using 'rb' mode when opening file instead of 'w' or 'r' or 'a'). So I'm assuming that all pdf files that contain text with have the word "Font" in the binary output and that no image-only files ever will. If that's always true, then this code will make a list of all pdf files in a single directory that have text and a separate list of those that have only images. It saves each list to a separate .txt file, then you can use a command in bash to move the pdf files to the appropriate folder.
Once you have them in their own folders, then you can run your batch ocr solution on just the pdf files in the images_only folder. I haven't gotten that far yet (obviously).
import os, re
#path is the directory with the files, other 2 are the names of the files you will store your lists in
path = 'C:/folder_with_pdfs'
files_with_text = open('files_with_text.txt', 'a')
image_only_files = open('image_only_files.txt', 'a')
#have os make a list of all files in that dir for a loop
filelist = os.listdir(path)
#compile regular expression that matches "Font"
mysearch = re.compile(r'.*Font.*', re.DOTALL)
#loop over all files in the directory, open them in binary ('rb'), search that binary for "Font"
#if they have "Font" they have text, if not they don't
#(pdf does something to understand the Font type and uses this word every time the pdf contains text)
for pdf in filelist:
openable_file = os.path.join(path, pdf)
cat_file = open(openable_file, 'rb')
usable_cat_file = cat_file.read()
#print usable_cat_file
if mysearch.match(usable_cat_file):
files_with_text.write(pdf + '\n')
else:
image_only_files.write(pdf + '\n')
To move the files, I entered this command in bash shell:
cat files_with_text.txt | while read i; do mv $i Volumes/hard_drive_name/new_destination_directory_name; done
Also, I didn't re-run the python code above, I just hand-edited the thing, so it might be buggy, Idk.
This is an interesting problem. If you are willing to work on Windows in .NET, you can do this with dotImage (disclaimer, I work for Atalasoft and wrote most of the OCR engine code). Let's break the problem down into pieces - the first is iterating over all your PDFs:
string[] candidatePDFs = Directory.GetFiles(sourceDirectory, "*.pdf");
PdfDecoder decoder = new PdfDecoder();
foreach (string path in candidatePDFs) {
using (FileStream stm = new FileStream(path, FileMode.Open)) {
if (decoder.IsValidFormat(stm)) {
ProcessPdf(path, stm);
}
}
}
This gets a list of all files that end in .pdf and if the file is a valid pdf, calls a routine to process it:
public void ProcessPdf(string path, Stream stm)
{
using (Document doc = new Document(stm)) {
int i=0;
foreach (Page p in doc.Pages) {
if (p.SingleImageOnly) {
ProcessWithOcr(path, stm, i);
}
else {
ProcessWithTextExtract(path, stm, i);
}
i++;
}
}
}
This opens the file as a Document object and asks if each page is image only. If so it will OCR the page, else it will text extract:
public void ProcessWithOcr(string path, Stream pdfStm, int page)
{
using (Stream textStream = GetTextStream(path, page)) {
PdfDecoder decoder = new PdfDecoder();
using (AtalaImage image = decoder.Read(pdfStm, page)) {
ImageCollection coll = new ImageCollection();
coll.Add(image);
ImageCollectionImageSource source = new ImageCollectionImageSource(coll);
OcrEngine engine = GetOcrEngine();
engine.Initialize();
engine.Translate(source, "text/plain", textStream);
engine.Shutdown();
}
}
}
what this does is rasterizes the PDF page into an image and puts it into a form that is palatable for engine.Translate. This doesn't strictly need to be done this way - one could get an OcrPage object from the engine from an AtalaImage by calling Recognize, but then it would be up to client code to loop over the structure and write out the text.
You'll note that I've left out GetOcrEngine() - we make available 4 OCR engines for client use: Tesseract, GlyphReader, RecoStar, and Iris. You would select the one that would be best for your needs.
Finally, you would need the code to extract text from the pages that already have perfectly good text on them:
public void ProcessWithTextExtract(string path, Stream pdfStream, int page)
{
using (Stream textStream = GetTextStream(path, page)) {
StreamWriter writer = new StreamWriter(textStream);
using (PdfTextDocument doc = new PdfTextDocument(pdfStream)) {
PdfTextPage page = doc.GetPage(i);
writer.Write(page.GetText(0, page.CharCount));
}
}
}
This extracts the text from the given page and writes it to the output stream.
Finally, you need GetTextStream():
public Stream GetTextStream(string sourcePath, int pageNo)
{
string dir = Path.GetDirectoryName(sourcePath);
string fname = Path.GetFileNameWithoutExtension(sourcePath);
string finalPath = Path.Combine(dir, String.Format("{0}p{1}.txt", fname, pageNo));
return new FileStream(finalPath, FileMode.Create);
}
Will this be a 100% solution? No. Certainly not. You could imagine PDF pages that contain a single image with a box draw around it - this would clearly fail the image only test but return no useful text. Probably, a better approach is to just use the extracted text and if that doesn't return anything, then try an OCR engine. Changing from one approach to the other is a matter of writing a different predicate.
The simplest approach would be to use a single tool such a ABBYY FineReader, Omnipage etc to process the images in one batch without having to sort them out into scanned vs not scanned images. I believe FineReader converts the PDF's to images before performing OCR anyway.
Using an OCR engine will give you features such as automatic deskew, page orientation detection, image thresholding, despeckling etc. These are features you would have to buy an image processng library for and program yourself and it could prove difficult to find an optimal set of parameters for your 10,000 PDF's.
Using the automatic OCR approach will have other side effects depending on the input images and you would find you would get better results if you sorted the images and set optimal parameters for each type of images. For accuracy it would be much better to use a proper PDF text extraction routine to extract the PDF's that have perfect text.
At the end of the day it will come down to time and money versus the quality of the results that you need. At the end of the day, a commercial OCR program will be the quickest and easiest solution. If you have clean text only documents then a cheap OCR program will work as well as an expensive solution. The more complex your documents, the more money you will need to spend to process them.
I would try finding some demo / trial versions of commercial OCR engines and just see how they perform on your different document types before spending too much time and money.
I have written a small wrapper for Abbyy OCR4LINUX CLI engine (IMHO, doesn't cost that much) and Tesseract 3.
The wrapper can batch convert files like:
$ pmocr.sh --batch --target=pdf --skip-txt-pdf /some/directory
The script uses pdffonts to determine if a PDF file has already been OCRed to skip them. Also, the script can work as system service to monitor a directory and launch an OCR action as soon as a file enters the directory.
Script can be found here:
https://github.com/deajan/pmOCR
Hopefully, this helps someone.

AIR - Batch File As CMD.exe Argument

AIR doesn't permit launching .bat files as a native process directly, so apparently i'm suppose to set CMD.exe as my startupInfo executable and pass my .bat file and it's arguments.
i can't get it to work, so i'm hoping it's a syntax problem. here is my code:
var testStartupInfo:NativeProcessStartupInfo = new NativeProcessStartupInfo();
testStartupInfo.executable = new File("C:\\WINDOWS\\system32\\cmd.exe");
var processArguments:Vector.<String> = new Vector.<String>();
processArguments[0] = "/c";
processArguments[1] = "\"C:\\Documents and Settings\\Administrator\\Desktop\\Test\\Test.bat\"";
processArguments[2] = "-testBatPeram1";
processArguments[3] = "-testBatPeram2";
processArguments[4] = "Peram3";
processArguments[5] = "C:\\Documents and Settings\\Administrator\\Desktop\\SaveText.txt";
testStartupInfo.arguments = processArguments;
var test:NativeProcess = new NativeProcess();
test.start(testStartupInfo);
the batch file and its parameters work fine if i manually write them in the command line prompt, so i don't know why nothing is happening when launched from AIR.
Ok i think that by now (3 months later) you have realized that this doesn't work because your bat file path contains spaces.
Have you find any workaround or solution or something?
I have a good approximation that could be enough for you:
Instead of passing parameters to your bat try writing to it through its stdinput.
I mean, instead of passing parameters when calling your bat, treat that info as a variable read in execution.