What is a "gigamem"? - taocp

I am reading Donald Knuth's TAOCP: volume 4, Fascicle 6, p18.
He mentions the word gigamem.
What does he mean? What is a gigamem?

Check the index and glossary if you don't understand a specific term:
Gigamems = billions of memory accesses

Related

Is there an API to get a full citation (such as a BibTeX or JSON citation) from an arbitrary URL?

Say I have a URL like https://www.science.org/doi/10.1126/science.abb4363, how can I get the full citation as:
#article{
doi:10.1126/science.abb4363,
author = {Sergio Almécija and Ashley S. Hammond and Nathan E. Thompson and Kelsey D. Pugh and Salvador Moyà-Solà and David M. Alba },
title = {Fossil apes and human evolution},
journal = {Science},
volume = {372},
number = {6542},
pages = {eabb4363},
year = {2021},
doi = {10.1126/science.abb4363},
URL = {https://www.science.org/doi/abs/10.1126/science.abb4363},
eprint = {https://www.science.org/doi/pdf/10.1126/science.abb4363},
abstract = {There has been much focus on the evolution of primates and especially where and how humans diverged in this process. It has often been suggested that the last common ancestor between humans and other apes, especially our closest relative, the chimpanzee, was ape- or chimp-like. Almécija et al. review this area and conclude that the morphology of fossil apes was varied and that it is likely that the last shared ape ancestor had its own set of traits, different from those of modern humans and modern apes, both of which have been undergoing separate suites of selection pressures. Science, this issue p. eabb4363 A Review describes the unique and varied morphologies in fossil and modern apes, including humans. Humans diverged from apes (chimpanzees, specifically) toward the end of the Miocene ~9.3 million to 6.5 million years ago. Understanding the origins of the human lineage (hominins) requires reconstructing the morphology, behavior, and environment of the chimpanzee-human last common ancestor. Modern hominoids (that is, humans and apes) share multiple features (for example, an orthograde body plan facilitating upright positional behaviors). However, the fossil record indicates that living hominoids constitute narrow representatives of an ancient radiation of more widely distributed, diverse species, none of which exhibit the entire suite of locomotor adaptations present in the extant relatives. Hence, some modern ape similarities might have evolved in parallel in response to similar selection pressures. Current evidence suggests that hominins originated in Africa from Miocene ape ancestors unlike any living species.}}
I was able to download the citation by visiting the link manually, but are there any programmatic APIs to convert a URL (like even a Wikipedia URL) into a formal citation? If not, I am not sure what is the recommended approach to getting these efficiently.

Static data-heavy Rust library seems bloated

I've been developing a Rust library recently to try to provide fast access to a large database (the Unicode character database, which as a flat XML file is 160MB). I also want it to have a small footprint so I've used various approaches to reduce the size. The end result is that I have a series of static slices that look like:
#[derive(Clone,Copy,Eq,PartialEq,Debug)]
pub enum UnicodeCategory {
UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter,
NonspacingMark,
SpacingMark,
EnclosingMark,
DecimalNumber,
// ...
}
pub static UCD_CAT: &'static [((u8, u8, u8), (u8, u8, u8), UnicodeCategory)] =
&[((0, 0, 0), (0, 0, 31), UnicodeCategory::Control),
((0, 0, 32), (0, 0, 32), UnicodeCategory::SpaceSeparator),
((0, 0, 33), (0, 0, 35), UnicodeCategory::OtherPunctuation),
/* ... */];
// ...
pub static UCD_DECOMP_MAP: &'static [((u8, u8, u8), &'static [(u8, u8, u8)])] =
&[((0, 0, 160), &[(0, 0, 32)]),
((0, 0, 168), &[(0, 0, 32), (0, 3, 8)]),
((0, 0, 170), &[(0, 0, 97)]),
((0, 0, 175), &[(0, 0, 32), (0, 3, 4)]),
((0, 0, 178), &[(0, 0, 50)]),
/* ... */];
In total, all the data should only take up around 600kB max (assuming extra space for alignment etc), but the library produced is 3.3MB in release mode. The source code itself (almost all data) is 2.6MB, so I don't understand why the result would be more. I don't think the extra size is intrinsic as the size was <50kB at the beginning of the project (when I only had ~2kB of data). If it makes a difference, I'm also using the #![no_std] feature.
Is there any reason for the extra binary bloat, and is there a way to reduce the size? In theory I don't see why I shouldn't be able to reduce the library to a megabyte or less.
As per Matthieu's suggestion, I tried analysing the binary with nm.
Because all my tables were represented as borrowed slices, this wasn't very useful for calculating table sizes as they were all in anonymous _refs. What I could determine was the maximum address, 0x1208f8, which would be consistent with a filesize of ~1MB rather than 3.3MB. I also looked through the hex dump to see if there were any null blocks that might explain it, but there weren't.
To see if it was the borrowed slices that were the problem, I turned them into non-borrowed slices ([T; N] form). The filesize didn't change much, but now I could interpret the nm data quite easily. Weirdly, the tables took up exactly how much I expected them to (even more weirdly, they matched my lower bounds when not accounting for alignment, and there was no space between the tables).
I also looked at the tables with nested borrowed slices, e.g. UCD_DECOMP_MAP above. When I removed all of these (about 2/3 of the data), the filesize was ~1MB when it should have only been ~250kB (by my calculations and the highest nm address, 0x3d1d0), so it doesn't look like these tables were the problem either.
I tried extracting the individual files from the .rlib file (which is a simple ar-format archive). It turns out that 40% of the library is just metadata files, and that the actual object file is 1.9MB. Further, when I do this to the library without the borrowed references the object file is 261kB! I then went back to the original library and looked at the sizes of the individual _refs and found that for a table like UCD_DECOMP_MAP: &'static [((u8,u8,u8),&'static [(u8,u8,u8)])], each value of type ((u8,u8,u8),&'static [(u8,u8,u8)]) takes up 24 bytes (3 bytes for the u8 triplet, 5 bytes of padding and 16 bytes for the pointer), and that as a result these tables take up a lot more room than I would have thought. I think I can now fully account for all the filesize.
Of course, 3MB is still quite small, I just wanted to keep the file as small as possible!
Thanks to Matthieu M. and Chris Emerson for pointing me towards the solution. This is a summary of the updates in the question, sorry for the duplication!
It seems that there are two reasons for the supposed bloat:
The .rlib file outputted is not a pure object file, but is an ar archive file. Usually such a file would consist entirely of one or more object files, but rust also includes metadata. Part of the reason for this seems to be to obviate the need for separate header files. This accounted for around 40% of the final filesize.
My calculations turned out to not be accurate for some of the tables, which also happened to be the largest ones. Using nm I was able to find that for normal tables such as UCD_CAT: &'static [((u8,u8,u8), (u8,u8,u8), UnicodeCategory)], the size was 7 bytes for each item (which is actually less than I originally anticipated, assuming 8 bytes for alignment). The total of all these tables was about 230kB, and the object file including just these came in at 260kB (after extraction), so this was all consistent.
However, examining the nm output more closely for the other tables (such as UCD_DECOMP_MAP: &'static [((u8,u8,u8),&'static [(u8,u8,u8)])]) was more difficult because they appear as anonymous borrowed objects. Nevertheless, it turned out that each ((u8,u8,u8),&'static [(u8,u8,u8)]) actually takes up 24 bytes: 3 bytes for the first tuple, 5 bytes of padding, and an unexpected 16 bytes for the pointer. I believe this is because the pointer also includes the size of the referenced array. This added around a megabyte of bloat to the library, but does seem to account for the entire filesize.

PLC Best Naming Conventions for RSLogix 5000

What good naming conventions do you use for PLC?
I've seen hundreds of projects from different programmers, dozens of companies standards, RA, Beckhoff posted in some documents their naming... dozens of different ideas.
For years, naming tags was one of the most difficult task for me. You can't imagine discussion when I ask a student to create a bit. It's like being the hardest thing on Earth :) (usually, after creating a_bit and another_bit, inspiration is gone).
I asked for RSLogix 5000 because I found it most flexible, having tags, alias, scope tags, descriptions(stored in CPU for latest versions).
Have some tips to share that you find suitable for your use?
Naming tags should have a refrence to the real world. A recent example I did was this:
PTK3KOS1
Pressure Transmitter Kettle 3 Kettle Overhead Solvent #1
This is the tag used in the CMMS system (Maintenance system), and the P&ID
I use UDT's in RSL5K, so that becomes the following in RSLogix:
PTK3KOS1.VAL (Current value)
PTK3KOS1.MIN (I use this especially when I use flex I/O for scaling)
PTK3KOS1.MAX (And I also use it to pass min/max values to some HMI's like WW)
PTK3KOS1.LFF (Signal fault)
PTK3KOS1.LLA (Low alarm bit)
PTK3KOS1.LLL (Low Low bit)
PTK3KOS1.LHA (Hi Alarm bit)
PTK3KOS1.LHH (Hi Hi Bit)
PTK3KOS1.SLA (Setpoint low alarm)
PTK3KOS1.SLL
PTK3KOS1.SHA
PTK3KOS1.SHH
The most common system is the ISA system, see
http://www.engineeringtoolbox.com/isa-intrumentation-codes-d_415.html for an example.
There is also the KKS system, which I personally believe was designed by masochists, and will only use it when forced to do so.
http://www.vgb.org/en/db_kks_eng.html
I like to use some thing like this:
aabccdd_eeee_human-readable-name_wirenumber
aa
DO=Digital Output
DI=Digital Input
AO=Analog Output
AI=Analog Input
gl=Global variable
co=constant
pt=produced Tag
ct=consumed Tag
b
Rack Number
cc
Slot
dd
address 0-64
eeeee
panel/drawing tag
DO10606_MA949_WshLoaderAdvance_9491

Lucene IndexSearcher always return 20 ScoreDocs

Hello I have such a piece of code
IndexSearcher iSearcher = new IndexSearcher(dir);
TopDocs docs = iSearcher.search(parsedQuery,filter, 9);
I always get 20 scoredocs. Could anyone help?
1) Do you always get back exactly 20 ScoreDocs? As your search limits the result to 9 ScoreDocs, I am curious about the '20 ScoreDocs'.
2) Have you verified that your index contains Lucene terms that would result in more than 20 ScoreDocs? I have found it useful after changing my indexing strategy to test the index with Luke before performing any other testing.
Just because doc.scoreDocs.length == 20 doesn't mean that you have received 20 results. You must check each individual result's doc id not to be equal to Integer.MAX_VALUE, which is used as a 'no result here' sentinel value. The point of all this is typical of Lucene -- minimizing memory allocation, in this case by reusing already allocated result arrays.
Thank you for your answers. Problem was in build path actually. It cantains not exisiting library and project could not be built. That is why every time I tried to change code and debug it, previously built version was used.
Sorry for misleading.

text based RPG command interpreter

I was just playing a text based RPG and I got to wondering, how exactly were the command interpreters implemented and is there a better way to implement something similar now? It would be easy enough to make a ton of if statements, but that seems cumbersome especially considering for the most part pick up the gold is the same as pick up gold which has the same effect as take gold. I'm sure this is a really in depth question, I'd just like to know the general idea of how interpreters like that were implemented. Or if there's an open source game with a decent and representative interpreter, that would be perfect.
Answers can be language independent, but try to keep it in something reasonable, not prolog or golfscript or something. I'm not sure exactly what to tag this as.
The usual name for this sort of game is text adventure or interactive fiction, if it is single player, or MUD if it is multiplayer.
There are several special purpose programming languages for writing interactive fiction, such as Inform 6, Inform 7 (an entirely new language that compiles down to Inform 6), TADS, Hugo, and more.
Here's an example of a game in Inform 7, that has a room, an object in the room, and you can pick up, drop, and otherwise manipulate the object:
"Example Game" by Brian Campbell
The Alley is a room. "You are in a small, dark alley." A bronze key is in the
Alley. "A bronze key lies on the ground."
Produces when played:
Example Game
An Interactive Fiction by Brian Campbell
Release 1 / Serial number 100823 / Inform 7 build 6E59 (I6/v6.31 lib 6/12N) SD
Alley
You are in a small, dark alley.
A bronze key lies on the ground.
>take key
Taken.
>drop key
Dropped.
>take the key
Taken.
>drop key
Dropped.
>pick up the bronze key
Taken.
>put down the bronze key
Dropped.
>
For the multiplayer games, which tend to have simpler parsers than interactive fiction engines, you can check out a list of MUD servers.
If you would like to write your own parser, you can start by simply checking your input against regular expressions. For instance, in Ruby (as you didn't specify a language):
case input
when /(?:take|pick +up)(?: +(?:the|a))? +(.*)/
take_command(lookup_name($3))
when /(?:drop|put +down)(?: +(?:the|a))? +(.*)/
drop_command(lookup_name($3))
end
You may discover that this becomes cumbersome after a while. You could simplify it somewhat using some shorthands to avoid repetition:
OPT_ART = "(?: +(?:the|a))?" # shorthand for an optional article
case input
when /(?:take|pick +up)#{OPT_ART} +(.*)/
take_command(lookup_name($3))
when /(?:drop|put +down)#{OPT_ART} +(.*)/
drop_command(lookup_name($3))
end
This may start to get slow if you have a lot of commands, and it checks the input against each command in sequence. You also may find that it still becomes hard to read, and involves some repetition that is difficult to simply extract into shorthands.
At that point, you might want to look into lexers and parsers, a topic much too big for me to do justice to in a reply here. There are many lexer and parser generators, that given a description of a language, will produce a lexer or parser that is capable of parsing that language; check out the linked articles for some starting points.
As an example of how a parser generator would work, I'll give an example in Treetop, a Ruby based parser generator:
grammar Adventure
rule command
take / drop
end
rule take
('take' / 'pick' space 'up') article? space object {
def command
:take
end
}
end
rule drop
('drop' / 'put' space 'down') article? space object {
def command
:drop
end
}
end
rule space
' '+
end
rule article
space ('a' / 'the')
end
rule object
[a-zA-Z0-9 ]+
end
end
Which can be used as follows:
require 'treetop'
Treetop.load 'adventure.tt'
parser = AdventureParser.new
tree = parser.parse('take the key')
tree.command # => :take
tree.object.text_value # => "key"
If by 'text based RPG' you are referring to Interactive Fiction, there are specific programming languages for this. My favorite (the only one I know ;P) is Inform: http://en.wikipedia.org/wiki/Inform
The rec.arts.int-fiction FAQ has further information: http://www.plover.net/~textfire/raiffaq/FAQ.htm