Find minimal perfect hash function with gperf - optimization

I found gperf to be suitable for my project and are now looking for a way to optimize the size of the generated table. As the switches -i and -j influence the length of the table deterministically, I wrote a small script iterating over those values, finding the minimal table length. The script stores the -i and -j values for retrieval of the current minimum table as well as the currently tried values, when the script is terminated, so it can continue its search later.
Now I saw that there exists a switch -m, which states that it does exactly what I do with my little script. I guess using this switch is a lot faster than calling gperf for a single iteration only. But I need to know two things for replacing the gperf call, which I couldn't find in the gperf help:
Which values if -i and -j are tried if I use the -m switch?
How do I know, which values for -i and -j are actually used, i. e. which are the values leading to the minimum found table lengh for the current gperf call?

Which values if -i and -j are tried if I use the -m switch?
You find this info in the source code, lines 1507..1515.
How do I know, which values for -i and -j are actually used, i. e. which are the values leading to the minimum found table lengh for the current gperf call?
You don't need to know. These values are just describing the starting point in gperf's internal path through the search space.

Related

What does www mean in ps auxwww?

I found that ps aux lists processes that are currently running and I found other people mentioning ps auxwww . I am wondering what this means? or what it does? What's the difference between ps aux and ps auxwww ?
To quote the man page (on Mac OS, other systems will vary slightly, but the idea is the same):
-w
Use 132 columns to display information, instead of the default which is your window size. If the -w option is specified more than once, ps will use as many columns as
necessary without regard for your window size. When output is not to a terminal, an unlimited number of columns are always used.
ps auxwww is useful when you have a lot of data (how many columns you want)

Where clause searches everything if it has character `s` at the end

Im trying to run a simple select command in sqlite3 and getting strange result. I want to search a column and display all rows that has a string dockerhosts in it. But result shows rows without dockerhosts string in it.
For example search for dockerhosts:
sqlite> SELECT command FROM history WHERE command like '%dockerhosts%' ORDER BY id DESC limit 50;
git status
git add --all v1 v2
git status
If I remove s from the end I get what I need:
sqlite> SELECT command FROM history WHERE command like '%dockerhost%' ORDER BY id DESC limit 50;
git checkout -b hotfix/collapse-else-if-in-dockerhost
vi opt/dockerhosts/Docker
aws s3 cp dockerhosts.json s3://xxxxx/dockerhosts.json --profile dev
aws s3 cp dockerhosts.json s3://xxxxx/dockerhosts.json --profile dev
history | grep dockerhost | grep prod
history | grep dockerhosts.json
What am I missing?
I see a note here that there are configurable limits for a LIKE pattern - sqlite.org/limits.html ... 10 seems pretty short but maybe that's what you are running into.
The pattern matching algorithm used in the default LIKE and GLOB
implementation of SQLite can exhibit O(N²) performance (where N is the
number of characters in the pattern) for certain pathological cases.
To avoid denial-of-service attacks from miscreants who are able to
specify their own LIKE or GLOB patterns, the length of the LIKE or
GLOB pattern is limited to SQLITE_MAX_LIKE_PATTERN_LENGTH bytes. The
default value of this limit is 50000. A modern workstation can
evaluate even a pathological LIKE or GLOB pattern of 50000 bytes
relatively quickly. The denial of service problem only comes into play
when the pattern length gets into millions of bytes. Nevertheless,
since most useful LIKE or GLOB patterns are at most a few dozen bytes
in length, paranoid application developers may want to reduce this
parameter to something in the range of a few hundred if they know that
external users are able to generate arbitrary patterns.
The maximum length of a LIKE or GLOB pattern can be lowered at
run-time using the
sqlite3_limit(db,SQLITE_LIMIT_LIKE_PATTERN_LENGTH,size) interface.

Sort sets by number of elements in Redis

I have a Redis database with a number of sets, all identified by a common key pattern, let's say "myset:".
Is there a way, from the command line client, to sort all my sets by number of elements they contain and return that information? The SORT command only takes single keys, as far as I understand.
I know I can do it quite easily with a programming language, but I prefer to be able to do it without having to install any driver, programming environment and so on on the server.
Thanks for your help.
No, there is no easy trick to do this.
Redis is a store, not really a database management system. It supports no query language. If you need some data to be retrieved, then you have to anticipate the access paths and design the data structure accordingly.
For instance in your example, you could maintain a zset while adding/removing items from the sets you are interested in. In this zset, the value will be the key of the set, and the score the cardinality of the set.
Retrieving the content of the zset by rank will give you the sets sorted by cardinality.
If you did not plan for this access path and still need the data, you will have no other choice than using a programming language. If you cannot install any Redis driver, then you could work from a Redis dump file (to be generated by the BGSAVE command), download this file to another box, and use the following package from Sripathi Krishnan to parse it and calculate the statistics you require.
https://github.com/sripathikrishnan/redis-rdb-tools
Caveat: The approach in this answer is not intended as a general solution -- remember that use of the keys command is discouraged in a production setting.
That said, here's a solution which will output the set name followed by it's length (cardinality), sorted by cardinality.
# Capture the names of the keys (sets)
KEYS=$(redis-cli keys 'myset:*')
# Paste each line from the key names with the output of `redis-cli scard key`
# and sort on the second key - the size - in reverse
paste <(echo "$KEYS") <(echo "$KEYS" | sed 's/^/scard /' | redis-cli) | sort -k2 -r -n
Note the use of the paste command above. I count on redis-cli to send me the results in order, which I'm pretty sure it will do. So paste will take one name from the $KEYS and one value from the redis output and output them on a single line.

Get a range of keys with redis?

Something that comes up all the time with our data set at work is needing to query for a bunch of values given a range of keys. Date ranges are an obvious example.
I know you can use unix timestamps and a sorted set to query by date ranges, but it seems annoying, because I'd have to either
put the whole document as the value in the sorted set, or
just put ids in it, then ask redis for each key.
Maybe option 2 is standard? Is there a way to ask redis for multiple keys at once? Like mongodb's $in query? Or perhaps asking for a bunch of keys in a pipeline is just as fast?
Options 2, put Ids into sorted set then use mget to get values out, if your keys are hashes then you need to issue multiple hget, but the advantage is that you can pull out specific parts of the object that you actually need instead of everything. It is very fast in practice.
Maybe some bash magic helps?
echo 'keys YOURKEY*' | redis-cli | sed 's/^/get /' | redis-cli
This will output the data from all the keys which begin with YOURKEY

Using GNU Make for Script Building

I have a script (the language is VBScript, but for the sake of the question, it's unimportant) which is used as the basis for a number of other scripts -- call it a "base script" or "wrapper script" for others. I would like to modularize my repository so that this base script can be combined with the functions unique to a specific script instance and then rebuilt later, should either one of the two change.
Example:
baseScript.vbs -- Logging, reporting, and other generic functions.
queryServerFunctions.vbs -- A script with specific, unique tasks (functions) that depend on functions in baseScript.vbs.
I would like to use make to combine the two (or any arbitrary number of files with script fragments) into a single script -- say, queryServer.vbs -- that is entirely self-contained. This self-contained script could then be rebuilt by make anytime either of its source scripts changes.
The question, then, is: Can I use make to manage script builds and, if so, what is the best or preferred way of doing so?
If it matters, my scripting environment is Cygwin running on Windows 7 x64.
The combination of VBScript and GNU make is unusual, so I doubt you'll find a "preferred way" of doing this. It is certainly possible. Taking your example, and adding another script called fooServer.vbs to show how the solution works for multiple scripts, here's a simple makefile:
# list all possible outputs here
OUTPUT := queryServer.vbs fooServer.vbs
# tell make that the default action should be to produce all outputs
.PHONY: all
all: $(OUTPUT)
# generic recipe for combining scripts together
$(OUTPUT):
cat $+ > $#
# note: The first character on the line above should be a tab, not a space
# now we describe the inputs for each output file
queryServer.vbs: baseScript.vbs queryServerFunctions.vbs
fooServer.vbs: baseScript.vbs fooFunctions.vbs
That will create the two scripts for you from their inputs, and if you touch, for example queryServerFunctions.vbs then only queryServer.vbs will be remade.
Why go to all that trouble, though?
The purpose of make is to "rebuild" things efficiently, by comparing file timestamps to judge when a build step can be safely skipped because the inputs have not changed since the last time the output file was updated. The assumption is that build steps are expensive, so it's worth skipping them where possible, and the performance gain is worth the risk of skipping too much due to bugs in the makefile or misleading file timestamps.
For copying a few small files around, I would say a simple batch file like
copy /y baseScript.vbs+queryServerFunctions.vbs queryServer.vbs
copy /y baseScript.vbs+fooServerFunctions.vbs fooServer.vbs
would be a better fit.