How to use Locality Sensitive Hash --LSHKIT - locality-sensitive-hash

I really need to use LSHKIT for my program to measure the similarity of some high dimensional vectors. there is a library for lsh called lshkit which can be found here: http://lshkit.sourceforge.net/
I am confused to use it. First of all I could not build it so I went to section 3.2 which is "Directly add LSHKIT source to your project"
I put all the src codes in one project and fixed the errors but now I do not know how to use it and compile it for a sample data (which is proposed in the lshkit website)
could you guys please help me to find out how to call the functions and see the results?
thanks

Shameless plug: this implementation of Multi-probe LSH is much easier to use than the C++ library. It also implements LSH Forest.

Related

What is meant by 'listings for your program'?

I am writing a program in Java for a university project, part of the write up report states:
'You must provide listings for your program'
Can anyone provide me with some clarification on what is meant by this?
I have looked high and wide online but nothing i've come across has helped clear this up for me. I found a definition 'With computer programming, a program listing is the complete listing of a computer program, source code, and all files that make up the software program', but his hasn't helped with my understanding of what is being asked in the report.
Should I be providing screen-grabs of my code? Or a screen-grab of the folder with all related files?
Any help would be appreciated, thanks.
A listing of your program used to mean the code of your program rendered into printed form; i.e. on paper. These days, it could also mean that the source code is formatted and included as a PDF file, or a Word document or something else.
Should I be providing screen-grabs of my code?
It is unclear if that is what your lecturer wants. I don't expect so, because screenshots are harder to read than formatted text.
Or a screen-grab of the folder with all related files?
That is highly unlikely, IMO. If that is what your lecturer wanted they would have said "directory listing" not "listings for your program". (And that would be useless for assessment purposes.)
But my advice is to ask your lecturer if you are at all unclear what is required of you.
And if your lecturer is unwilling to explain, just do what you think is correct.
What you find is correct, you need to give source code and any other resources needed to build and run the softaware.
One options could be to :
- pack my project with some build manager (maven, gradle, etc)
- push it to some repository (like github) with a README.md for building and running
- give the github project reference.
If you prefer not to make public the code, just pack it and send an archive with the maven project.
They are looking for a nice printed output of your source code. In olden days (pre 2K) compilers would produce a output, well formatted (often with a list of symbols and line number to aid in understanding the code.

systemverilog module namespaces

I am combining two designs into a single chip design. The RTL code is written in SystemVerilog for synthesis. Unfortunately, the two designs contain a number of modules with identical names but slightly different logic.
Is there a namespace or library capability in SystemVerilog that would allow me to specify different modules with the same name? In other words is there a lib1::module1, lib2::module1 syntax I could use to specify which module I want? How is this sort of module namespace pollution best handled?
Thanks
Look into config and library. See IEEE Std 1800-2017 § 33. Configuring the contents of a design
library will map this files to target libraries based on file paths (IEEE Std 1800-2017 § 33.3. Libraries)
config will map which library to use for paralytic module (global, instances, subscope) (IEEE Std 1800-2017 § 33.4. Configurations)
Examples are provided in the section 33.8.
Note: some simulators want -libmap <configfile> in the command line. Refer to your simulators manual.
Unfortunately, neither verilog nor system verilog provide a comprehensive solution for the namespaces problem for design element (which include modules). V2K libraries and config statements (yes,they were introduced in verilog v2k) can partially help you solving this issue for modules only, and only if you plan for this in advance and use correct methodology to implement it. Not many people try to use v2k libs to solve it.
There are other parts of this as well, which you might discover. It include other design elements, macro names, file names, package names, ... System verilog makes it even worse with introducing of the global scopes.
So, depending on the complexity of your design you might be able to fix it with v2k libs. But in general, the solution always lies in the methodology and having those names uniquified upfront. Some companies even try to use on-fly uniquification by automatically rewriting verilog code in order to make those names unique.
You might also be able to solve some of the issues like that using compilation units, as defined in the SV standard and which are implemented at least by major tool vendors.

Does Cmake handle recursive builds correctly

I am currently looking at replacements for the old make system for some projects. One of the alternatives I am currently looking at is cmake. However from what I know so far, CMake prefers to have one configuration file for each directory, similar to what Autotools and others prefer as well.
I know that recursive make should be considered harmfull, because not the whole dependency graph is know at all times (see the paper for details). For that reason other tools, which rely on recursive make, either do not work correctly in all cases or need some wrappers to work around these problems.
I am currently trying to figure out how CMake handles this case, and if the issues mentioned in the paper were taken into account in the design. I will try out the examples mentioned in the paper, but in case they work, it will not give me a certainty, that recursive builds work in all cases.
So the main question is: Was this issue taken into consideration for CMake or not? If so where can I read anything about their solution.
EDIT:
I found the CMake FAQ entry on this issue, but I am not really satisfied by the answer. I guess the real answer is in there somewhere, but I cannot find it, because I have no knowledge so far of CMake internals and I am not planning on learning them, if I might decide beforehand, that it is not the right tool for my purposes.
CMake does not generate one makefile per CMakeLists.txt. It collects all information from all of them and than creates something, that is unbelievably complex, but presumably written that way exactly so that it's reliable and ensures files are rebuilt if the relevant CMakeLists.txt changes.

What open source tools could help me understand a large legacy application written in C?

I need a tool that I can use to get a better understanding of a large C
project. I'd like to be able to see the relationship between the various C
modules and what calls what, most used functions, what headers are used, etc.
I've searched here and Google but all the source code analysis tools seem to give
you the number of lines of code and other metrics that I'm not interested in. I just
want to get a high level view of how things are structured and interconnected before jumping into the code.
Does anything like this exist?
I've looked at these but they do not seem to do what I want: Source Code Tools
Since posting this I've tried Doxygen and it seems to give me some of what I need. Any others?
Try GNU cflow, that will analyze the call tree of the functions - you will nicely see the call hierarchy of the functions. Or browse the code with Eclipse.
Source Navigator may be helpful for some things (I used it to see call trees). See screenshots.
cxref builds annotated source code cross reference that's easy to view and navigate (I used to create HTML reference of some of my code). See cxref's output on its own source code here. Can be used to document the code.
It is not OSS, but the tool CppDepend can certainly help when it comes to understand a large legacy application written in C or C++.

Process for reducing the size of an executable

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC
General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.
Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.
On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.
Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.
Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.
Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.
To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.
You could look at something like executable compression.