Regex speed in Perl 6 - raku

I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly.
I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:
my #array = "aaaaa" .. "fffff";
say +#array; # 7776 = 6 ** 5
my #search = <abcde cdeff fabcd>;
my token search {
#search
}
my #new_array = #array.grep({/ <search> /});
say #new_array;
Then I printed #array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.
$ grep -f search array
After both programs produced the same result, as expected, I measured the time they were working.
$ time perl6 search.p6
real 0m6,683s
user 0m6,724s
sys 0m0,044s
$ time grep -f search array
real 0m0,009s
user 0m0,008s
sys 0m0,000s
So, what am I doing wrong in my Perl 6 code?
UPD: If I pass the search tokens to grep, looping through the #search array, the program works much faster:
my #array = "aaaaa" .. "fffff";
say +#array;
my #search = <abcde cdeff fabcd>;
for #search -> $token {
say ~#array.grep({/$token/});
}
$ time perl6 search.p6
real 0m1,378s
user 0m1,400s
sys 0m0,052s
And if I define each search pattern manually, it works even faster:
my #array = "aaaaa" .. "fffff";
say +#array; # 7776 = 6 ** 5
say ~#array.grep({/abcde/});
say ~#array.grep({/cdeff/});
say ~#array.grep({/fabcd/});
$ time perl6 search.p6
real 0m0,587s
user 0m0,632s
sys 0m0,036s

The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.
For a more performant example, you could pre-compile the regex:
my $search = "/#search.join('|')/".EVAL;
# $search = /abcde|cdeff|fabcd/;
say ~#array.grep($search);
That change causes it to run in about half a second.
If there is any chance of malicious data in #search, and you have to do this it may be safer to use:
"/#search».Str».perl.join('|')/".EVAL
The compiler can't quite generate that optimized code for /#search/ as #search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as #search doesn't get modified.
(I think Perl 5 does something similar)
One important fact you have to keep in mind is that a regex in Perl 6 is just a method that is written in a domain specific sub-language.

Related

Nextflow: add unique ID, hash, or row number to tuple

ch_files = Channel.fromPath("myfiles/*.csv")
ch_parameters = Channel.from(['A','B, 'C', 'D'])
ch_samplesize = Channel.from([4, 16, 128])
process makeGrid {
input:
path input_file from ch_files
each parameter from ch_parameters
each samplesize from ch_samplesize
output:
tuple path(input_file), parameter, samplesize, path("config_file.ini") into settings_grid
"""
echo "parameter=$parameter;sampleSize=$samplesize" > config_file.ini
"""
}
gives me a number_of_files * 4 * 3 grid of settings files, so I can run some script for each combination of parameters and input files.
How do I add some ID to each line of this grid? A row ID would be OK, but I would even prefer some unique 6-digit alphanumeric code without a "meaning" because the order in the table doesn't matter. I could extract out the last part of the working folder which is seemingly unique per process; but I don't think it is ideal to rely on sed and $PWD for this, and I didn't see it provided as a runtime metadata variable provider. (plus it's a bit long but OK). In a former setup I had a job ID from the LSF cluster system for this purpose, but I want this to be portable.
Every combination is not guaranteed to be unique (e.g. having parameter 'A' twice in the input channel should be valid).
To be clear, I would like this output
file1.csv A 4 pathto/config.ini 1ac5r
file1.csv A 16 pathto/config.ini 7zfge
file1.csv A 128 pathto/config.ini ztgg4
file2.csv A 4 pathto/config.ini 123js
etc.
Given the input declaration, which uses the each qualifier as an input repeater, it will be difficult to append some unique id to the grid without some refactoring to use either the combine or cross operators. If the inputs are just files or simple values (like in your example code), refactoring doesn't make much sense.
To get a unique code, the simple options are:
Like you mentioned, there's no way, unfortunately, to access the unique task hash without some hack to parse $PWD. Although, it might be possible to use BASH parameter substitution to avoid sed/awk/cut (assuming BASH is your shell of course...) you could try using: "${PWD##*/}"
You might instead prefer using ${task.index}, which is a unique index within the same task. Although the task index is not guaranteed to be unique across executions, it should be sufficient in most cases. It can also be formatted for example:
process example {
...
script:
def idx = String.format("%06d", task.index)
"""
echo "${idx}"
"""
}
Alternatively, create your own UUID. You might be able to take the first N characters but this will of course decrease the likelihood of the IDs being unique (not that there was any guarantee of that anyway). This might not really matter though for a small finite set of inputs:
process example {
...
script:
def uuid = UUID.randomUUID().toString()
"""
echo "${uuid}"
echo "${uuid.take(6)}"
echo "${uuid.takeBefore('-')}"
"""
}

Why are inline if statements an average of at least one-third slower than other types of if?

Consider the following Perl 6 script skeleton:
my regex perlish { .*[ea]?[ui]? rl $ }
my Str #words = '/usr/share/dict/words'.IO.lines;
for #words -> $word {
...
}
base idea for the code in this question from the perl6 website's examples.
My /usr/share/dict/words is an indirect symbolic link to /usr/share/dict/american-english. It's 99,171 lines long, with one word/line.
For comparison's sake, Python 3 does 100 loops of the below in a total of 32 seconds: that's just 0.32942s / loop.1
Here are the things I've tried putting in place of the stub code, with their benchmark times as noted:
"Inline" if — 100 loops, average 9.74219s / loop, totalling 16 min 14.219s
say "$word probably rhymes with Perl" if $word ~~ /<perlish>/;
say "$word is a palindrome" if $word eq $word.flip && $word.chars > 1;
Short Circuit (not ternary) — 10 loops, average 6.1925s / loop, normalised to totalling +/- 10.3 min
$word eq $word.flip && $word.chars > 1 && say "$word is a palindrome";
$word ~~ /<perlish>/ && say "$word probably rhymes with Perl";
given/when (switch/case) — 100 loops, average 6.18568s / loop totalling 10 min 18.568s
given $word {
when /<perlish>/
{ say "$word probably rhymes with Perl"; proceed; }
when $word eq $word.flip && $word.chars > 1
{ say "$word is a palindrome"; proceed; }
}
"normal" if block — 100 loops, average 6.0588s / loop totalling 10 min 5.880s
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome"; }
if $word ~~ /<perlish>/ { say "$word probably rhymes with Perl"; }
Somewhat unsurprisingly, the normal if block is fastest. But, why is the inline if (what the website uses for an example) so much slower?
1 I'm not saying Perl 6 is slow... but I thought Python was slow and... wow. Perl 6 is slow... ignoring multithreading, parallelism and concurrency, all of which are built in by Perl 6 and which Python leaves much to be desired.
Specs: Rakudo version 2015.12-219-gd67cb03 on MoarVM version 2015.12-29-g8079ca5 implementing Perl 6.c on a
2.2GHz QuadCore Intel Mobile i7 with 6GB of RAM.
I ran the tests like time for i in ``seq 0 100``; do perl6 --optimize=3 words.pl6; done.
(This page became the p6doc Performance page.)
Dealing with Perl 6 speed issues
I don't know why the statement modifier form of if is slower. But I can share things that can help folk deal with Perl 6 speed issues in general so I'll write about those, listed easiest first. (I mean easiest things for users and potential users to do, not easiest for compiler devs.)
Why does the speed of your code matter?
I recommend you share your answer to these higher level questions:
How much faster would your code need to run to make a worthwhile difference? Could the full speed up wait another month? Another year?
Are you exploring Perl 6 for fun, assessing its potential long term professional relevance to you, and/or using it in your $dayjob?
Wait for Rakudo to speed up
5 years ago Rakudo was 1,000 times slower or more for some operations. It's been significantly speeding up every year for years even though speeding it up was explicitly not the #1 dev priority. (The mantra has been "make it work, make it work right, make it fast". 2016 is the first year in which the "make it work fast" aspect is truly in the spotlight.)
So, imo, one sensible option if the Rakudo Perl 6 compiler is really too slow for what you want to do, is to wait for others to make it faster for you. It could make sense to wait for the next official release (there's at least several each year) or wait a year or three depending on what you're looking for.
Visit the freenode IRC channel #perl6
Compiler devs, the folk who best know how to speed up Perl 6 code, aren't answering SO questions. But they are generally responsive on #perl6.
If you don't get all the details or results you want from here then your best bet is to join the freenode IRC channel #perl6 and post your code and timings. (See next two headings for how best to do that.)
Profile code snippets
Rakudo on MoarVM has a built in profiler:
$ perl6 --profile -e 'say 1'
1
Writing profiler output to profile-1453879610.91951.html
The --profile option is currently only for micro-analysis -- the output from anything beyond a tiny bit of code will bring your browser to its knees. But it could be used to compare profiles of simple snippets using if conventionally vs as a statement modifier. (Your regex using examples are almost certainly too complex for the current profiler.)
Profiling results may well mean little to you without help and/or may point to confusing internal stuff. If so, please visit #perl6.
Write faster Perl 6 code, line by line
Your immediate focus seems to be the question of why one way of writing a line of code is slower than another way. But the flipside of this "academic" question is the practical one of writing faster lines of code.
But if someone's a Perl 6 newbie, how are they going to know how? Asking here is one way but the recommended approach is visiting #perl6 and letting folk know what you want.
#perl6 has on-channel evalbots that help you and others investigate your issue together. To try code snippets out publicly enter m: your code goes here. To do so privately write /msg camelia m: your code goes here.
For simple timing use variations on the idiom now - INIT now. You can also generate and share --profile results easily using a #perl6 evalbot. Just join the channel and enter prof-m: your code goes here.
Write faster Perl 6 code by refactoring
Use better algorithms, especially parallel/concurrent ones.
Use native arrays (eg Array[int8] for an array of 8 bit integers) for compact, faster number crunching.
For more info about doing this, visit #perl6.
Use (faster) foreign code
Use NativeCall wrappers for C libs such as Gumbo or for C++ libs (experimental). NativeCall itself is currently poorly optimized but that's set to change in 2016 and for many applications the NativeCall overhead is a small part of performance anyway.
Inline::Perl5 builds on NativeCall to enable use of Perl 5 in Perl 6 (and vice-versa) including arbitrary Perl 5 code and high-performance Perl 5 XS modules. This interop allows passing integers, strings, arrays, hashes, code references, file handles and objects between Perl 5 and Perl 6; calling methods on Perl 5 objects from Perl 6 and calling methods on Perl 6 objects from Perl 5; and subclassing Perl 5 classes in Perl 6.
(There are similar but less mature or even alpha variants for other langs like Inline::Python, Inline::Lua and Inline::Ruby.)
Review benchmarks
The best relevant benchmarking tool I know of is perl6-bench which compares various versions of Perl with each other including various versions of both Perl 5 and Perl 6.
There may already be benchmarks contrasting a regular if statement and a statement modifier form if statement but I doubt it. (And if not, you would be making a nice contribution to Perl 6 if you wrote an extremely simple pair of snippets and got them added to perl6-bench.)
Help speed Rakudo up
The Rakudo Perl 6 compiler is largely written in Perl 6. So if you can write Perl 6, then you can hack on the compiler, including optimizing any of the large body of existing high-level code that impacts the speed of your code.
Most of the rest of the compiler is written in a small language called NQP that's almost just a subset of Perl 6. So if you can write Perl 6 you can fairly easily learn to use and improve the middle-level NQP code too.
Finally, if low-level C hacking is your idea of fun, checkout MoarVM.
I had a different answer before, which was based on a piece of code I accidentally left in in between benchmark runs.
given this benchmark code:
my regex perlish { [ea?|u|i] rl $ }
my Str #words = '/usr/share/dict/words'.IO.lines;
multi sub MAIN('postfixif') {
for #words -> $word {
say "$word probably rhymes with Perl" if $word ~~ / [ea?|u|i] rl $ /;
say "$word is a palindrome" if $word eq $word.flip && $word.chars > 1;
}
}
multi sub MAIN('prefixif') {
for #words -> $word {
if $word ~~ /[ea?|u|i] rl $ / { say "$word probably rhymes with Perl" };
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome" };
}
}
multi sub MAIN('postfixif_indirect') {
for #words -> $word {
say "$word probably rhymes with Perl" if $word ~~ / <perlish> /;
say "$word is a palindrome" if $word eq $word.flip && $word.chars > 1;
}
}
multi sub MAIN('prefixif_indirect') {
for #words -> $word {
if $word ~~ / <perlish> / { say "$word probably rhymes with Perl" };
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome" };
}
}
multi sub MAIN('shortcut') {
for #words -> $word {
if $word.ends-with('rl') && $word ~~ / [ea?|u|i] rl $ / { say "$word probably rhymes with Perl" };
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome" };
}
}
I get the following results:
3x postfixif_indirect: real 1m20.470s
3x prefixif_indirect: real 1m21.970s
3x postfixif: real 0m50.242s
3x prefixif: real 0m49.946s
3x shortcut: real 0m8.077s
The postfixif_indirect code corresponds to your "Inline" if, the prefixif_indirect code corresponds to your "normal" if block. The ones without "_indirect" just have the regex itself in the if statement rather than indirectly called as <perlish>.
As you can see, the speed difference between regular if blocks and postfix if is barely measurable on my machine. But also, I was measuring against a different file from yours. Mine has 479.828 lines, so you can't directly compare the timings anyway.
However, a quick glance over the profile output from perl6 --profile pointed out that 83% of total time was spent in ACCEPTS (which is the method that implements the smart match operator ~~) or in things called by it.
What tipped me off to the fact that the indirect call to perlish may be expensive was that the time spent inside perlish was only 60%. So about 23% of time was spent doing some sort of setup work before perlish could even start matching against the string. Pretty bad, I admit. Surely, this'll be a good target for optimization.
But the biggest gain was adding a short-circuiting check just to see if the string ends in "rl". This gets our code down to 10% of what it used to take.
Our regex engine surely deserves a whole lot more optimization. Potentially, if a regex can be statically known to only ever match if the target string starts with or ends in a specific substring, it could have a check emitted up-front so that none of the setup work has to be done in the "failure to match" case.
We'll definitely see what 2016 will bring. I'm already excited for sure!
EDIT: Even though i used "for i in seq 0 100, that only executes things three times on my machine. I have no clue what's up with that, but I corrected the timing lines to say 3x instead of 100x.

while [[ condition ]] stalls on loop exit

I have a problem with ksh in that a while loop is failing to obey the "while" condition. I should add now that this is ksh88 on my client's Solaris box. (That's a separate problem that can't be addressed in this forum. ;) I have seen Lance's question and some similar but none that I have found seem to address this. (Disclaimer: NO I haven't looked at every ksh question in this forum)
Here's a very cut down piece of code that replicates the problem:
1 #!/usr/bin/ksh
2 #
3 go=1
4 set -x
5 tail -0f loop-test.txt | while [[ $go -eq 1 ]]
6 do
7 read lbuff
8 set $lbuff
9 nwords=$#
10 printf "Line has %d words <%s>\n" $nwords "${lbuff}"
11 if [[ "${lbuff}" = "0" ]]
12 then
13 printf "Line consists of %s; time to absquatulate\n" $lbuff
14 go=0 # Violate the WHILE condition to get out of loop
15 fi
16 done
17 printf "\nLooks like I've fallen out of the loop\n"
18 exit 0
The way I test this is:
Run loop-test.sh in background mode
In a different window I run commands like "echo some nonsense >>loop_test.txt" (w/o the quotes, of course)
When I wish to exit, I type "echo 0 >>loop-test.txt"
What happens? It indeed sets go=0 and displays the line:
Line consists of 0; time to absquatulate
but does not exit the loop. To break out I append one more line to the txt file. The loop does NOT process that line and just falls out of the loop, issuing that "fallen out" message before exiting.
What's going on with this? I don't want to use "break" because in the actual script, the loop is monitoring the log of a database engine and the flag is set when it sees messages that the engine is shutting down. The actual script must still process those final lines before exiting.
Open to ideas, anyone?
Thanks much!
-- J.
OK, that flopped pretty quick. After reading a few other posts, I found an answer given by dogbane that sidesteps my entire pipe-to-while scheme. His is the second answer to a question (from 2013) where I see neeraj is using the same scheme I'm using.
What was wrong? The pipe-to-while has always worked for input that will end, like a file or a command with a distinct end to its output. However, from a tail command, there is no distinct EOF. Hence, the while-in-a-subshell doesn't know when to terminate.
Dogbane's solution: Don't use a pipe. Applying his logic to my situation, the basic loop is:
while read line
do
# put loop body here
done < <(tail -0f ${logfile})
No subshell, no problem.
Caveat about that syntax: There must be a space between the two < operators; otherwise it looks like a HEREIS document with bad syntax.
Er, one more catch: The syntax did not work in ksh, not even in the mksh (under cygwin) which emulates ksh93. But it did work in bash. So my boss is gonna have a good laugh at me, 'cause he knows I dislike bash.
So thanks MUCH, dogbane.
-- J
After articulating the problem and sleeping on it, the reason for the described behavior came to me: After setting go=0, the control flow of the loop still depends on another line of data coming in from STDIN via that pipe.
And now that I have realized the cause of the weirdness, I can speculate on an alternative way of reading from the stream. For the moment I am thinking of the following solution:
Open the input file as STDIN (Need to research the exec syntax for that)
When the condition occurs, close STDIN (Again, need to research the syntax for that)
It should then be safe to use the more intuitive:while read lbuffat the top of the loop.
I'll test this out today and post the result. I'd hope someone else benefit from the method (if it works).

Processing apache logs quickly

I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.
Here is the awk script:
#!/bin/bash
awk '{t=$4" "$5; gsub("[\[\]\/]"," ",t); sub(":"," ",t);printf("%s,",$1);system("date -d \""t"\" +%s");}' $1
EDIT:
For non-awkers, the script reads each line, gets the date information, modifies it to a format the utility date recognizes and calls it to represent the date as the number of seconds since 1970, finally returning it as a line of a .csv file, along with the IP.
Example input: 189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
Returned output: 189.5.56.113,124237889
#OP, your script is slow mainly due to the excessive call of system date command for every line in the file, and its a big file as well (in the GB). If you have gawk, use its internal mktime() command to do the date to epoch seconds conversion
awk 'BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
gsub(/\[/,"",$4); gsub(":","/",$4); gsub(/\]/,"",$5)
n=split($4, DATE,"/")
day=DATE[1]
mth=DATE[2]
year=DATE[3]
hr=DATE[4]
min=DATE[5]
sec=DATE[6]
MKTIME= mktime(year" "date[mth]" "day" "hr" "min" "sec)
print $1,MKTIME
}' file
output
$ more file
189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
$ ./shell.sh
189.5.56.113 1264110895
If you really really need it to be faster, you can do what I did. I rewrote an Apache log file analyzer using Ragel. Ragel allows you to mix regular expressions with C code. The regular expressions get transformed into very efficient C code and then compiled. Unfortunately, this requires that you are very comfortable writing code in C. I no longer have this analyzer. It processed 1 GB of Apache access logs in 1 or 2 seconds.
You may have limited success removing unnecessary printfs from your awk statement and replacing them with something simpler.
If you are using gawk, you can massage your date and time into a format that mktime (a gawk function) understands. It will give you the same timestamp you're using now and save you the overhead of repeated system() calls.
This little Python script handles a ~400MB worth of copies of your example line in about 3 minutes on my machine producing ~200MB of output (keep in mind your sample line was quite short, so that's a handicap):
import time
src = open('x.log', 'r')
dest = open('x.csv', 'w')
for line in src:
ip = line[:line.index(' ')]
date = line[line.index('[') + 1:line.index(']') - 6]
t = time.mktime(time.strptime(date, '%d/%b/%Y:%X'))
dest.write(ip)
dest.write(',')
dest.write(str(int(t)))
dest.write('\n')
src.close()
dest.close()
A minor problem is that it doesn't handle timezones (strptime() problem), but you could either hardcode that or add a little extra to take care of it.
But to be honest, something as simple as that should be just as easy to rewrite in C.
gawk '{
dt=substr($4,2,11);
gsub(/\//," ",dt);
"date -d \""dt"\" +%s"|getline ts;
print $1, ts
}' yourfile

Nano hacks: most useful tiny programs you've coded or come across

It's the first great virtue of programmers. All of us have, at one time or another automated a task with a bit of throw-away code. Sometimes it takes a couple seconds tapping out a one-liner, sometimes we spend an exorbitant amount of time automating away a two-second task and then never use it again.
What tiny hack have you found useful enough to reuse? To make go so far as to make an alias for?
Note: before answering, please check to make sure it's not already on favourite command-line tricks using BASH or perl/ruby one-liner questions.
i found this on dotfiles.org just today. it's very simple, but clever. i felt stupid for not having thought of it myself.
###
### Handy Extract Program
###
extract () {
if [ -f $1 ] ; then
case $1 in
*.tar.bz2) tar xvjf $1 ;;
*.tar.gz) tar xvzf $1 ;;
*.bz2) bunzip2 $1 ;;
*.rar) unrar x $1 ;;
*.gz) gunzip $1 ;;
*.tar) tar xvf $1 ;;
*.tbz2) tar xvjf $1 ;;
*.tgz) tar xvzf $1 ;;
*.zip) unzip $1 ;;
*.Z) uncompress $1 ;;
*.7z) 7z x $1 ;;
*) echo "'$1' cannot be extracted via >extract<" ;;
esac
else
echo "'$1' is not a valid file"
fi
}
Here's a filter that puts commas in the middle of any large numbers in standard input.
$ cat ~/bin/comma
#!/usr/bin/perl -p
s/(\d{4,})/commify($1)/ge;
sub commify {
local $_ = shift;
1 while s/^([ -+]?\d+)(\d{3})/$1,$2/;
return $_;
}
I usually wind up using it for long output lists of big numbers, and I tire of counting decimal places. Now instead of seeing
-rw-r--r-- 1 alester alester 2244487404 Oct 6 15:38 listdetail.sql
I can run that as ls -l | comma and see
-rw-r--r-- 1 alester alester 2,244,487,404 Oct 6 15:38 listdetail.sql
This script saved my career!
Quite a few years ago, i was working remotely on a client database. I updated a shipment to change its status. But I forgot the where clause.
I'll never forget the feeling in the pit of my stomach when I saw (6834 rows affected). I basically spent the entire night going through event logs and figuring out the proper status on all those shipments. Crap!
So I wrote a script (originally in awk) that would start a transaction for any updates, and check the rows affected before committing. This prevented any surprises.
So now I never do updates from command line without going through a script like this. Here it is (now in Python):
import sys
import subprocess as sp
pgm = "isql"
if len(sys.argv) == 1:
print "Usage: \nsql sql-string [rows-affected]"
sys.exit()
sql_str = sys.argv[1].upper()
max_rows_affected = 3
if len(sys.argv) > 2:
max_rows_affected = int(sys.argv[2])
if sql_str.startswith("UPDATE"):
sql_str = "BEGIN TRANSACTION\\n" + sql_str
p1 = sp.Popen([pgm, sql_str],stdout=sp.PIPE,
shell=True)
(stdout, stderr) = p1.communicate()
print stdout
# example -> (33 rows affected)
affected = stdout.splitlines()[-1]
affected = affected.split()[0].lstrip('(')
num_affected = int(affected)
if num_affected > max_rows_affected:
print "WARNING! ", num_affected,"rows were affected, rolling back..."
sql_str = "ROLLBACK TRANSACTION"
ret_code = sp.call([pgm, sql_str], shell=True)
else:
sql_str = "COMMIT TRANSACTION"
ret_code = sp.call([pgm, sql_str], shell=True)
else:
ret_code = sp.call([pgm, sql_str], shell=True)
I use this script under assorted linuxes to check whether a directory copy between machines (or to CD/DVD) worked or whether copying (e.g. ext3 utf8 filenames -> fusebl
k) has mangled special characters in the filenames.
#!/bin/bash
## dsum Do checksums recursively over a directory.
## Typical usage: dsum <directory> > outfile
export LC_ALL=C # Optional - use sort order across different locales
if [ $# != 1 ]; then echo "Usage: ${0/*\//} <directory>" 1>&2; exit; fi
cd $1 1>&2 || exit
#findargs=-follow # Uncomment to follow symbolic links
find . $findargs -type f | sort | xargs -d'\n' cksum
Sorry, don't have the exact code handy, but I coded a regular expression for searching source code in VS.Net that allowed me to search anything not in comments. It came in very useful in a particular project I was working on, where people insisted that commenting out code was good practice, in case you wanted to go back and see what the code used to do.
I have two ruby scripts that I modify regularly to download all of various webcomics. Extremely handy! Note: They require wget, so probably linux. Note2: read these before you try them, they need a little bit of modification for each site.
Date based downloader:
#!/usr/bin/ruby -w
Day = 60 * 60 * 24
Fromat = "hjlsdahjsd/comics/st%Y%m%d.gif"
t = Time.local(2005, 2, 5)
MWF = [1,3,5]
until t == Time.local(2007, 7, 9)
if MWF.include? t.wday
`wget #{t.strftime(Fromat)}`
sleep 3
end
t += Day
end
Or you can use the number based one:
#!/usr/bin/ruby -w
Fromat = "http://fdsafdsa/comics/%08d.gif"
1.upto(986) do |i|
`wget #{sprintf(Fromat, i)}`
sleep 1
end
Instead of having to repeatedly open files in SQL Query Analyser and run them, I found the syntax needed to make a batch file, and could then run 100 at once. Oh the sweet sweet joy! I've used this ever since.
isqlw -S servername -d dbname -E -i F:\blah\whatever.sql -o F:\results.txt
This goes back to my COBOL days but I had two generic COBOL programs, one batch and one online (mainframe folks will know what these are). They were shells of a program that could take any set of parameters and/or files and be run, batch or executed in an IMS test region. I had them set up so that depending on the parameters I could access files, databases(DB2 or IMS DB) and or just manipulate working storage or whatever.
It was great because I could test that date function without guessing or test why there was truncation or why there was a database ABEND. The programs grew in size as time went on to include all sorts of tests and become a staple of the development group. Everyone knew where the code resided and included them in their unit testing as well. Those programs got so large (most of the code were commented out tests) and it was all contributed by people through the years. They saved so much time and settled so many disagreements!
I coded a Perl script to map dependencies, without going into an endless loop, For a legacy C program I inherited .... that also had a diamond dependency problem.
I wrote small program that e-mailed me when I received e-mails from friends, on an rarely used e-mail account.
I wrote another small program that sent me text messages if my home IP changes.
To name a few.
Years ago I built a suite of applications on a custom web application platform in PERL.
One cool feature was to convert SQL query strings into human readable sentences that described what the results were.
The code was relatively short but the end effect was nice.
I've got a little app that you run and it dumps a GUID into the clipboard. You can run it /noui or not. With UI, its a single button that drops a new GUID every time you click it. Without it drops a new one and then exits.
I mostly use it from within VS. I have it as an external app and mapped to a shortcut. I'm writing an app that relies heavily on xaml and guids, so I always find I need to paste a new guid into xaml...
Any time I write a clever list comprehension or use of map/reduce in python. There was one like this:
if reduce(lambda x, c: locks[x] and c, locknames, True):
print "Sub-threads terminated!"
The reason I remember that is that I came up with it myself, then saw the exact same code on somebody else's website. Now-adays it'd probably be done like:
if all(map(lambda z: locks[z], locknames)):
print "ya trik"
I've got 20 or 30 of these things lying around because once I coded up the framework for my standard console app in windows I can pretty much drop in any logic I want, so I got a lot of these little things that solve specific problems.
I guess the ones I'm using a lot right now is a console app that takes stdin and colorizes the output based on xml profiles that match regular expressions to colors. I use it for watching my log files from builds. The other one is a command line launcher so I don't pollute my PATH env var and it would exceed the limit on some systems anyway, namely win2k.
I'm constantly connecting to various linux servers from my own desktop throughout my workday, so I created a few aliases that will launch an xterm on those machines and set the title, background color, and other tweaks:
alias x="xterm" # local
alias xd="ssh -Xf me#development_host xterm -bg aliceblue -ls -sb -bc -geometry 100x30 -title Development"
alias xp="ssh -Xf me#production_host xterm -bg thistle1 ..."
I have a bunch of servers I frequently connect to, as well, but they're all on my local network. This Ruby script prints out the command to create aliases for any machine with ssh open:
#!/usr/bin/env ruby
require 'rubygems'
require 'dnssd'
handle = DNSSD.browse('_ssh._tcp') do |reply|
print "alias #{reply.name}='ssh #{reply.name}.#{reply.domain}';"
end
sleep 1
handle.stop
Use it like this in your .bash_profile:
eval `ruby ~/.alias_shares`