How to remove certain Aspell dictionaries? - spell-checking

If I run aspell dump dicts, I get the following dictionaries listed. If I just want to keep en_GB and en_US, how can I get rid of the others? I've been through the documentation here but don't see how to do this. (If it's relevant this is on both Linux and Windows.)
en
en_CA
en_CA-w-accents
en_GB
en_GB-w-accents
en_US
en_US-w-accents
es
fr-40
fr-60
fr
fr-80
fr_CH-40
fr_CH
fr_CH-60
fr_CH-80
fr_FR-40
fr_FR-60
fr_FR
fr_FR-80

ignore the answer, if I understood the question wrong! Or tell me more about what you mean.
On Windows:
If you want only a list of installed dicts you can get it with ..
aspell dicts | findstr /I en_
displays only the name of dicts that "en_" contain
On linux look at "grep"
If you want to hide the dicts
search for fr_*.multi and move all these files out of the Aspell dict folder.

Related

How can I remove the last three pages of a set of PDF files using pdf-stapler?

I have a large set of PDF files, each of which contains three unnecessary pages at the end. I was originally trying to create a solution based on an answer I found on askUbuntu and another StackOverflow question, but pdf-stapler apparently doesn't support the "r" notation that pdftk did, e.g. if I try pdf-stapler del myFile.pdf r4-r1 outputFiles/myFile.pdf or pdf-stapler cat myFile.pdf 1-r4 outputFiles/myFile.pdf, I get an error saying "Invalid range: r4-r1" for the first option and an output file with only page 1 for the second option.
Is there a way to do this with pdf-stapler?
The qpdf CLI command should be available in the Fedora repositories. It supports the "r" notation, so removing the last three pages from a file should work like this:
qpdf --pages myFile.pdf 1-r4 -- myFile.pdf outputFiles/myFile.pdf

Finete State machine visualizer

I need an application that prints/visualizes input/output pairs during the FST runs. I mean, for each state of the fst, it needs to print out a tuple that contains input for that state and output of the state. Right now I can generate fst files that is compatible with foma,hfst and xfst fst tools. So, I guess the visualization tool I need should be enough to compatible with any of them. Is there anyone who knows such a tool ?
foma can produce dot format files that can be visualized by graphviz. On Debian/Ubuntu, install graphviz with
$ sudo apt-get install graphviz
foma can read att format files (produced with hfst-fst2txt for anything HFST can read, or lt-print for anything from lttoolbox); assuming you've got such a file named myfst.att, you can do
$ foma
foma[0]: read att myfst.att
foma[1]: view
to display the full FST. That will show each input/output pair on each edge between states of the FST.
But you say "during runs" – are you talking about also showing the queue of "live states"? If so, I don't know of a tool that does this, that would be nice! One thing you could do is to modify the HFST source to output the list of live states and string vectors as it's processing, and then combine that with the dot file to e.g. colour in the live states. (If so, you may want to take this to the #hfst channel on irc.freenode.net.)
There is also a script att2dot.py on https://ftyers.github.io/2017-%D0%9A%D0%9B_%D0%9C%D0%9A%D0%9B/hfst.html that can be used on the command line like
hfst-fst2txt chv.lexc.hfst | python3 att2dot.py | dot -Tpng -ochv.lexc.png if you prefer something more scriptable. If you use that from the Python library of HFST, you might be able to get the "live states" for every part of an analysis more easily.

Rename ttf/woff/woff2 file to PostScript Font Name with Script

I am a typographer working with many fonts that have incorrect or incomplete filenames. I am on a Mac and have been using Hazel, AppleScript, and Automator workflows, attempting to automate renaming these files*. I require a script to replace the existing filename of ttf, woff, or woff2 files in Finder with the font's postscriptName. I know of tools (fc-scan/fontconfig, TTX, etc) which can retrieve the PostScript name-values I require, but lack the programming knowhow to code a script for my purposes. I've only managed to setup a watched directory that can run a script when any files matching certain parameters are added.
*To clarify, I am talking about changing the filename only, not the actual names stored within the font. Also I am open to a script of any compatible language or workflow of scripts if possible, e.g. this post references embedding AppleScript within Shell scripts via osascript.
StackExchange Posts I've Consulted:
How to get Fontname from OTF or TTF File?
How to get PostScript name of TTF font in OS X?
How to Change Name of Font?
Automate Renaming Files in macOS
Others:
https://github.com/dtinth/JXA-Cookbook/wiki/Using-JavaScript-for-Automation
https://github.com/fonttools/fonttools
https://github.com/devongovett/fontkit
https://www.npmjs.com/package/rename-js
https://opentype.js.org/font-inspector.html
http://www.fontgeek.net/blog/?p=343
https://www.lantean.co/osx-renaming-fonts-for-free
Edit: Added the following by request.
1) Screenshot of a somewhat typical webfont, illustrating how the form fields for font family and style names are often incomplete, blank, or contain illegal characters.
2) The woff file depicted (also, as base64).
Thank you all in advance!
Since you mentioned Automator in your question, I thought I'd try and solve this while using that to rename the file, along with standard Mac bash to get the font name. Hopefully, it beats learning a whole programming language.
I don't know what your workflow is so I'll leave any deviations to you but here is a method to select a font file and from Services, rename the file to the font's postscript name… based on Apple's metadata, specifically "com_apple_ats_name_postscript". This is one of the pieces of data retrieved using 'mdls' from the Terminal on the font file. To focus on the postscript name, grep the output for name_postscript. For simplicity here, I'll exclude the path to the selected file.
Font Name Aquisition
So… running this command…
mdls GenBkBasBI.ttf | grep -A1 name_postscript
… generates this output, which contains FontBook's Postscript name. The 'A1' in grep returns the found line and the first line after, which is the one containing the actual font name.
com_apple_ats_name_postscript = (
"GentiumBookBasic-BoldItalic"
Clean this up with some more bash (tr, tail)…
tr -d \ | tail -n 1 | tr -d \"
In order, these strip spaces, all lines excepting the last, and quotation marks. So for the first 'tr' instance, there is an extra space after the backslash.
In a single line, it looks like this…
mdls GenBkBasBI.ttf | grep -A1 name_postscript | tr -d \ | tail -n 1 | tr -d \"
…and produces this…
GentiumBookBasic-BoldItalic
Now, here is the workflow that includes the above bash command. I got the idea for variable usage from the answer to this question…
Apple Automator “New PDF from Images” maintaining same filename
Automator Workflow
Automator Workflow screenshot
At the top; Service receives selected 'files or folders' in 'Finder'.
Get Selected Finder Items
This (or Get Specified…) is there to allow testing. It is obviated by using this as a Service.
Set Value of Variable (File)
This is to remember which file you want to rename
Run Shell Script
This is where we use the bash stuff. The $f is the selected/specified file. I'm running 'zsh' for whatever reason. You can set it to whatever you're running, presumably 'bash'.
Set Value of Variable (Text)
Assign the bash output to a variable. This will be used by the last action for the new filename.
Get Value of Variable (File)
Recall the specified/selected file to rename.
Rename Finder Items: Name Single Item
I have it set to 'Basename only' so it will leave the extension alone. Enter the 'Text' variable from action 4 in here.

Is the ELF .notes section really needed?

On Linux, I'm trying to strip a statically linked ELF file to the bare essentials. When I run:
strip --strip-unneeded foo
or
strip --strip-all foo
The resulting file still has a fat .notes section that appears to be full of funky strings.
Is the .notes section really needed or can I safely force it out with --remove-section?
Thanks for any help.
From experience and from looking at the man page for strip, it looks like strip isn't supposed to get rid of any and all sections and strings that aren't needed; just symbols. Quoth the man page:
GNU strip discards all symbols from object files objfile.
That being said, from experience, strip, even without --strip-all, removes sections unneeded for loading, such as .symtab and .strtab, and you can, as you note, remove sections you want it with --remove-section.
As an example of a .notes section, I took /bin/ls from my Ubuntu 11.10 64-bit box:
$ readelf -Wn /bin/ls
Notes at offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.15
Notes at offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 3e6f3159144281f709c3c5ffd41e376f53b47952
That encompasses the .note.ABI-tag section and the .note.gnu.build-id section. It looks like they contain data that isn't necessary to load the program, but also isn't standard, and isn't known by strip to not be necessary for the proper running of the program, since an ELF can have any number of additional "unknown" sections that aren't safe to remove. So rather using a virtual whitelist (which would fail miserably), it uses a blacklist of sections that it knows it can get rid of, and does so.
Short version: these sections don't seem to be standard and could be used for various things, so strip can't know it's safe to remove them. But based on the info inside the one I took above, if it's your own program, it's almost certainly safe to remove it.

Does grep work properly on pdf files?

Is it possible to search multiple pdf files using the 'grep' command. It doesn't seem to work, how do people search content on multiple pdf files?
Well, PDF is a binary format, and grep can search binary files as if they were text
grep -a
or you can just use pdftotext (which comes with xpdf) like this:
pdftotext whee.pdf | grep pattern
You don't mention which OS you're using, but under Mac OS X you can use mdfind from the command line:
mdfind -onlyin search/directory/path "kind:pdf search text"
use something like Solr or clucene I think they can do what you want.
Pdf is a binary format, that's why searching it with grep is not that helpful. You can search the strings is a pdf with grep like this:
ls dir_with_pdfs/*.pdf|xargs strings|grep "keyword"
Or you can use the pdf2text command on pdf's and then search result with grep.
This tool pdfgrep will do the work. It has a syntax similar to grep. To search in several files just a simple shell script. For example:
$> ls Documents/*.pdf | xargs pdfgrep -n -H "system"
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: designed episodic memory system
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: how ISAC's episodic memory system is
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: cognitive system employs a combination
....
PDF is a binary dump of objects used to display the pages. There may be some meta data you can grep but the actual page text is in a Postscript stream and may be encoded in a variety of ways. Its also not guaranteed to be in any order. You need to think of PDF as more like a Vector image file than a text file.
There is a short article explaining text in PDFs in more detail at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams
If you have pdftotext installed via the popplar package, then try this perl script :
#!/usr/bin/perl
my $p = shift;
foreach my $fn (#ARGV) {
open(F,"pdftotext $fn - |");
while (<F>) { print "$fn:$_" if /$p/; }
close(F);
}