What does www mean in ps auxwww? - process

I found that ps aux lists processes that are currently running and I found other people mentioning ps auxwww . I am wondering what this means? or what it does? What's the difference between ps aux and ps auxwww ?

To quote the man page (on Mac OS, other systems will vary slightly, but the idea is the same):
-w
Use 132 columns to display information, instead of the default which is your window size. If the -w option is specified more than once, ps will use as many columns as
necessary without regard for your window size. When output is not to a terminal, an unlimited number of columns are always used.

ps auxwww is useful when you have a lot of data (how many columns you want)

Related

Can I measure how much storage does the doc_values of a given field take?

First little bit of background:
I have a significant elasticsearch cluster with more than 20TB of primary data (and replication is a must). We are using SSDs so storage is not that cheap. After reindexing my data to elasticsearch 5.4.1 I noticed my data takes 30 % storage space more than it used to when using elastic 1.7.5. My guess is, that it is caused by doc_values being on by default when using keyword mapping type.
I would like to measure how much storage doc_values of a particular mapping field take? I know that I could index documents with two different mappings and measure the difference, but is there any easier and quicker way?
I checked _cat and indices apis and I couldn't find any storage detailed breakdown, only memory. Maybe I could just find doc_values files in the file system and measure those?
Because of particular plugins requirement I'm using version 5.4.1 of elasticsearch.
edit1:
Ok according to lucene docs doc_values have following extensions
.dvd: DocValues data
.dvm: DocValues metadata
I could measure the size of those files in the folder of a shard of a given index. Here example output:
399M ./nodes/0/indices/O0qTaAQHSDOfEMSD-zZXTw/2/index/_qed_Lucene54_0.dvd
646M ./nodes/0/indices/O0qTaAQHSDOfEMSD-zZXTw/2/index/_wux_Lucene54_0.dvd
185M ./nodes/0/indices/O0qTaAQHSDOfEMSD-zZXTw/2/index/_yve_Lucene54_0.dvd
From what I see some indexes have more of those files and some less. Also not sure if I can find out which field are they belonging to, maybe all doc_values are stored together? Both data and metadata files seem to be binary so nothing I can extract from there.
edit2:
Taking naive approach of just measuring files i checked on one of my nodes:
$ du -h ~/elasticsearch/data -d 0
642G /home/chimeo/elasticsearch/data
$ find ~/elasticsearch/data/ -name "*.dv[dm]" -print0 | du -h --files0-from=- --total -s | tail -1
75G total
So that's almost 12% of all storage on that node. Here bash one liner to calculate ratio:
bc <<< "scale=2; "`find ~/elasticsearch/data/ -name "*.dv[dm]" -print0 | du -h --files0-from=- --total -s | tail -1 | cut -f 1 | tr -d G`/`du -h ~/elasticsearch/data -d 0 | cut -f 1 | tr -d G`
I run this one liner on my whole cluster and results where 0.11-0.12. To my surprise fielddata (.fdt) files take almost 40% of all storage.

Where clause searches everything if it has character `s` at the end

Im trying to run a simple select command in sqlite3 and getting strange result. I want to search a column and display all rows that has a string dockerhosts in it. But result shows rows without dockerhosts string in it.
For example search for dockerhosts:
sqlite> SELECT command FROM history WHERE command like '%dockerhosts%' ORDER BY id DESC limit 50;
git status
git add --all v1 v2
git status
If I remove s from the end I get what I need:
sqlite> SELECT command FROM history WHERE command like '%dockerhost%' ORDER BY id DESC limit 50;
git checkout -b hotfix/collapse-else-if-in-dockerhost
vi opt/dockerhosts/Docker
aws s3 cp dockerhosts.json s3://xxxxx/dockerhosts.json --profile dev
aws s3 cp dockerhosts.json s3://xxxxx/dockerhosts.json --profile dev
history | grep dockerhost | grep prod
history | grep dockerhosts.json
What am I missing?
I see a note here that there are configurable limits for a LIKE pattern - sqlite.org/limits.html ... 10 seems pretty short but maybe that's what you are running into.
The pattern matching algorithm used in the default LIKE and GLOB
implementation of SQLite can exhibit O(N²) performance (where N is the
number of characters in the pattern) for certain pathological cases.
To avoid denial-of-service attacks from miscreants who are able to
specify their own LIKE or GLOB patterns, the length of the LIKE or
GLOB pattern is limited to SQLITE_MAX_LIKE_PATTERN_LENGTH bytes. The
default value of this limit is 50000. A modern workstation can
evaluate even a pathological LIKE or GLOB pattern of 50000 bytes
relatively quickly. The denial of service problem only comes into play
when the pattern length gets into millions of bytes. Nevertheless,
since most useful LIKE or GLOB patterns are at most a few dozen bytes
in length, paranoid application developers may want to reduce this
parameter to something in the range of a few hundred if they know that
external users are able to generate arbitrary patterns.
The maximum length of a LIKE or GLOB pattern can be lowered at
run-time using the
sqlite3_limit(db,SQLITE_LIMIT_LIKE_PATTERN_LENGTH,size) interface.

Set Cache Limit or Expiration for S3FS

I have this option "-o use_cache=/tmp" set when I mount my S3 bucket. Is there a limit on how much room it will try to use in tmp? Is there a way to limit that or otherwise expire items after X amount of time?
You could use the (unsupported) sample_delcache.sh script from the s3fs-fuse project. Set up a cron job to run it every so often. There'd still be the risk of running out space (or inodes, as I just did) before the next time you ran the cleanup script, but you should be able to dial it in.
At this time there is an option ensure_diskfree to preserve some space.
The local cache's growth is apparently unbounded but it is truly a "cache" (as opposed to what might be called a "working directory") in the sense that it can be safely purged at any time, such as with a cron job that removes files after a certain age, combining find and xargs and rm.
(xargs isn't strictly necessary, but it avoids issues that can occur when too many files are found to remove in one invocation.)

Find minimal perfect hash function with gperf

I found gperf to be suitable for my project and are now looking for a way to optimize the size of the generated table. As the switches -i and -j influence the length of the table deterministically, I wrote a small script iterating over those values, finding the minimal table length. The script stores the -i and -j values for retrieval of the current minimum table as well as the currently tried values, when the script is terminated, so it can continue its search later.
Now I saw that there exists a switch -m, which states that it does exactly what I do with my little script. I guess using this switch is a lot faster than calling gperf for a single iteration only. But I need to know two things for replacing the gperf call, which I couldn't find in the gperf help:
Which values if -i and -j are tried if I use the -m switch?
How do I know, which values for -i and -j are actually used, i. e. which are the values leading to the minimum found table lengh for the current gperf call?
Which values if -i and -j are tried if I use the -m switch?
You find this info in the source code, lines 1507..1515.
How do I know, which values for -i and -j are actually used, i. e. which are the values leading to the minimum found table lengh for the current gperf call?
You don't need to know. These values are just describing the starting point in gperf's internal path through the search space.

Sort sets by number of elements in Redis

I have a Redis database with a number of sets, all identified by a common key pattern, let's say "myset:".
Is there a way, from the command line client, to sort all my sets by number of elements they contain and return that information? The SORT command only takes single keys, as far as I understand.
I know I can do it quite easily with a programming language, but I prefer to be able to do it without having to install any driver, programming environment and so on on the server.
Thanks for your help.
No, there is no easy trick to do this.
Redis is a store, not really a database management system. It supports no query language. If you need some data to be retrieved, then you have to anticipate the access paths and design the data structure accordingly.
For instance in your example, you could maintain a zset while adding/removing items from the sets you are interested in. In this zset, the value will be the key of the set, and the score the cardinality of the set.
Retrieving the content of the zset by rank will give you the sets sorted by cardinality.
If you did not plan for this access path and still need the data, you will have no other choice than using a programming language. If you cannot install any Redis driver, then you could work from a Redis dump file (to be generated by the BGSAVE command), download this file to another box, and use the following package from Sripathi Krishnan to parse it and calculate the statistics you require.
https://github.com/sripathikrishnan/redis-rdb-tools
Caveat: The approach in this answer is not intended as a general solution -- remember that use of the keys command is discouraged in a production setting.
That said, here's a solution which will output the set name followed by it's length (cardinality), sorted by cardinality.
# Capture the names of the keys (sets)
KEYS=$(redis-cli keys 'myset:*')
# Paste each line from the key names with the output of `redis-cli scard key`
# and sort on the second key - the size - in reverse
paste <(echo "$KEYS") <(echo "$KEYS" | sed 's/^/scard /' | redis-cli) | sort -k2 -r -n
Note the use of the paste command above. I count on redis-cli to send me the results in order, which I'm pretty sure it will do. So paste will take one name from the $KEYS and one value from the redis output and output them on a single line.