Using only the last 12 Characters of a GUID / UUID? - filenames

Here is the problem I want to solve:
Let's say I have a collection of 30 000 files (videos, images, sounds, etc...). And let's say I need to move/rename them a lot (including sub and parent folder). Not to mention those files will move from different OS (win, osx, linux(NAS) ) in a near future.
My problem stands when it comes to keep a link references to those files in my personal notes (evernote and blog mostly). Since URLs will broke all the time, I was thinking of adding a GUID/UUID in the file name (only for the file I need to refer to). this way, I could always do a search and find my file no matter where it is.
But GUID are quite a big string (36 characters if I've counted right). I don't need a world global uniqueness for my file. Just enough uniqueness to make a difference between maybe less than 10 000 reference (in my hole life ^^).
So I was thinking of using only the last 12 characters of a GUID string and add it to my file. Keeping the same string in my notes. And in case of collision, that would not be a problem because I know my files well and figuring out witch one is the right wouldn't be a problem.
Could this works?
Could I use an even smaller string ? (like the first 8 characters)
thx for your help.
Best.
William

files_array=( $(ls directory/of/files) )
for i in "${files_array[#]}"
do
var=$(echo $(uuidgen -r | cut -d'-' -f 5))
rn $i $i"$var"
done

Related

How to determine Thousands Separator using Format in VBA

I would like to determine the Thousand Separator used while running a VBA Code on a target machine without resolving to calling system built-in functions such as (Separator = Application.ThousandsSeparator).
I am using the following simple code using 'Format':
ThousandSeparator = Mid(Format(1000, "#,#"), 2, 1)
The above seems to work fine, and would like to confirm if this is a safe method of doing it without resorting to system calls.
I would expect the result to be a single char string in the form of , or . or ' or a Space as applicable to the locale on the machine.
Please note that I want to only use a language statement such as Format or similar (no sys calls). Also this relates to Thousands Separator not Decimal Separator. This article Using VBA to detect which decimal sign the computer is using does not help or answer my question. Thanks
Thanks in advance.
The strict answer to whether it is safe to use Format to get the thousands separator is No.
E.g. on Windows, it is possible to enter up to three characters into the Thousands Separator field in the regional settings in the control panel.
Suppose you enter asd and click OK.
If you now call Format(1000, "#,#") it will give you 1a000. That is only the first letter of your thousands separator. You have failed to retrieve it correctly.
Reading the registry:
? CreateObject("WScript.Shell").RegRead("HKCU\Control Panel\International\sThousand")
you get back asd in full.
To be fair, the Excel international properties do not seem to be of much help either. Application.International(xlThousandsSeparator) in this situation will return the separator originally defined in your computer's locale, not the value you've overridden it to.
Having that said, the practical answer is Yes, because it would appear (and if you happen to know for sure, please post an answer here) that there is no culture with multi-char thousand separator (even in China where scary things like 1ε„„2345δΈ‡6789 or 1ε„„2345萬6789 exist, they happen to be represented with just one UTF-16 character), and you probably are happy to ignore the people who decided to play with their locale settings in that fashion.

CHM/HHP: maximum length of variable names in [ALIAS] section

What is the maximum length of variable names in the [ALIAS] section of HHP files?
I_AM_WONDERING_ABOUT_THE_MAXIMUM_LENGTH_OF_THIS_STRING_RIGHT_HERE=this-is-some-really-helpful-html-file.html
I have found a CHM/HHP specification right here:
https://www-user.tu-chemnitz.de/~heha/viewchm.php/hs/chmspec.chm/hhp.html
That page only talks about the length of the overall line, though (and not about the length of the variable name). Very specific question, I know. Still, someone may be able to point me somewhere.
As far as I know never asked before and I never heard about limitations. But I think this is because nobody used long variable names in this place so far.
The purpose of the two files e.g. alias.h and map.h is to ease the coordination between developer and help author. The mapping file links an ID to the map number - typically this can be easily created by the developer and passed to the help author. Then the help author creates an alias file linking the IDs to the topic names. That was the idea behind years (decades) ago by Ralph Walden (ex Microsoft).
Please note HTMLHelp is about 20 years old and these context ID strings inside a alias.h file were derived from WinHelp as a predecessor of HTMLHelp.
You'll find some further Information at Creating Context-Sensitive Help for Applications.
In general I'd recommend to use ID's with a fixed format because of the better legibility like shown below:
;-------------------------------------------------------------
; alias.h file example for HTMLHelp (CHM)
; www.help-info.de
;
; All IDH's > 10000 for better format
; last edited: 2006-07-09
;---------------------------------------------------
IDH_90001=index.htm
IDH_10000=Context-sensitive_example\contextID-10000.htm
IDH_10010=Context-sensitive_example\contextID-10010.htm
IDH_20000=Context-sensitive_example\contextID-20000.htm
IDH_20010=Context-sensitive_example\contextID-20010.htm
I'd recommend to use less than 1024 bytes per line.

Why does this search for [help/dont-ask] return irrelevant results in DSE?

Why does this ridiculously simple query on data.stackexchange.com return results that don't have [help/dont-ask] in the comment text? I feel like I'm missing something mind-numbingly obvious here.
select top 10 Id, PostId, Text
from comments
where text like '%[help/dont-ask]%'
Results I currently get:
Id PostId Text
-- ------- -----------------------------------
1 35314 not sure why this is getting downvoted -- it is correct! Double check it in your compiler if you don't believe him!
2 35314 Yeah, I didn't believe it until I created a console app - but good lord! Why would they give you the rope to hang yourself! I hated that about VB.NET - the OrElse and AndAlso keywords!
4 35195 I don't see an accepted answer now, I wonder how that got unaccepted. Incidentally, I would have marked an accepted answer based on the answers available at the time. Also, accepted doesn't mean Best :)
9 47239 Jonathan: Wow! Thank you for all of that, you did an amazing amount of work!
10 45651 It will help if you give some details of which database you are using as techniques vary.
12 47428 One of the things that make a url user-friendly is 'discover-ability', meaning you can take a guess at url's simply from the address bar. http://i.love.pets.com/search/cats+dogs could easily lead to http://i.love.pets.com/search/pug+puppies etc
14 47481 I agree, both CodeRush and RefactorPro are visually impressive (most of which can be turned off BTW), but for navigating and refactoring Resharper is much better in my opinion of using both products.
15 47373 Just wanted to mention that this is an excellent solution if you consider the problem to be linear (i.e. treating `A1B2` as a single number). I still think the problem is multi-dimensional, but I guess we'll just have to wait for the author to clarify :)
16 47497 Indeed, the only way to do this is get the server to generate your CSS file which can be done in many ways depending on which language you are using. HttpHandlers are common in C#. You could use jQuery or the likes to add styling to every element with the class 'ourColur' and parametrise your JS
18 47513 This advice goes against the spirit of CSS, which is separation of content and presentation. You way requires changing HTML for presentation sake, and stating in content which elements have same color.
...none of which contains the magic link (or even the text dont-ask).
Because [] delimits a set of characters to find.
You need to escape them.
Or just use CHARINDEX as the search is unsargable anyway.
WHERE CHARINDEX('[help/dont-ask]', text) > 0

Add spaces between words in spaceless string

I'm on OS X, and in objective-c I'm trying to convert
for example,
"Bobateagreenapple"
into
"Bob ate a green apple"
Is there any way to do this efficiently? Would something involving a spell checker work?
EDIT: Just some extra information:
I'm attempting to build something that takes some misformatted text (for example, text copy pasted from old pdfs that end up without spaces, especially from internet archives like JSTOR). Since the misformatted text is probably going to be long... well, I'm just trying to figure out whether this is feasibly possible before I actually attempt to actually write system only to find out it takes 2 hours to fix a paragraph of text.
One possibility, which I will describe this in a non-OS specific manner, is to perform a search through all the possible words that make up the collection of letters.
Basically you chop off the first letter of your letter collection and add it to the current word you are forming. If it makes a word (eg dictionary lookup) then add it to the current sentence. If you manage to use up all the letters in your collection and form words out of all of them, then you have a full sentence. But, you don't have to stop here. Instead, you keep running, and eventually you will produce all possible sentences.
Pseudo-code would look something like this:
FindWords(vector<Sentence> sentences, Sentence s, Word w, Letters l)
{
if (l.empty() and w.empty())
add s to sentences;
return;
if (l.empty())
return;
add first letter from l to w;
if w in dictionary
{
add w to s;
FindWords(sentences, s, empty word, l)
remove w from s
}
FindWords(sentences, s, w, l)
put last letter from w back onto l
}
There are, of course, a number of optimizations you could perform to make it go fast. For instance checking if the word is the stem of any word in the dictionary. But, this is the basic approach that will give you all possible sentences.
Solving this problem is much harder than anything you'll find in a framework. Notice that even in your example, there are other "solutions": "Bob a tea green apple," for one.
A very naive (and not very functional) approach might be to use a spell-checker to try to isolate one "real word" at a time in the string; of course, in this example, that would only work because "Bob" happens to be an English word.
This is not to say that there is no way to accomplish what you want, but the way you phrase this question indicates to me that it might be a lot more complicated than what you're expecting. Maybe someone can give you an acceptable solution, but I bet they'll need to know a lot more about what exactly you're trying to do.
Edit: in response to your edit, it would probably take less effort to run some kind of OCR tool on a PDF and correct its output than it would just to correct what this system might give you, let alone program it
I implemented a solution, the code is avaible on code project:
http://www.codeproject.com/Tips/704003/How-to-add-spaces-between-spaceless-strings
My idea was to prioritize results that use up most of the characters (preferable all of them) then favor the ones with the longest words, because 2,3 or 4 character long words can often come up by chance from leftout characters. Most of the times this provides the correct solution.
To find all possible permutations I used recursion. The code is quite fast even with big dictionaries (tested with 50 000 words).

Controlling Doxygen's LaTeX output for making PDF documentation

I'm using Doxygen to generate documentation for my code. I need to make a PDF version of this and using Doxygen's LaTeX output appears to be the way to do it.
However I've run into a number of annoying problems, and not knowing anything about LaTeX previously haven't really got much of an idea on how to approach them, and the countless references for LaTeX related things are not much help...
I worked out how to create a custom style thing in a sty file and how to get Doxygen to use it. After a lot of searching I found out how to set the page margins etc. through this, and I'm guessing the perhaps this is the file I want for doing the other things I want, but I cant seem to find any commands for doign what I want :(
The table of contents at the start of the document contains a lot of items Id rather it didn't as it makes the contents very long. Is there some way to limit this contents to just say the first two levels, rather than having entries for every single individual function, variable, etc.? Id quite like to keep all the bookmarks however. I did try the "COMPACT_LATEX" option but as well as removing items on the contents pages, it removed the bookmarks and the member lists at the start of each section, which I do really want to keep.
Is there a way to change the order of things, like putting the full class description at the start of the section, rather than after all the members and attributes?
Wow, that's kind of evil of Doxygen.
Okay, to get around the tocdepth counter problem, add the following line to your .sty file:
\AtBeginDocument{\setcounter{tocdepth}{2}}% or whatever level you want
You can set the PDF bookmarks depth to a separate value:
% requires you \usepackage{hyperref} first
\hypersetup{
bookmarksdepth = section, % of whatever level you want
}
Also note that if you have a list of figures/tables, the tocdepth must be at least 2 for them to show up.
I don't see any way of rearranging those items within the LaTeX files---Doxygen just barfs them out there, so we can't do much. You'll have to poke around the Doxygen documentation to see if there's any way to specify the order I guess. (Here's hoping!)
You're so close.
Googling on "latex contents level" brought me to LaTeX - customizing the depth of the table of contents for different parts of the thesis which suggests
\setcounter{tocdepth}{n}
where n starts at zero for only the highest level division. This is presumable defined in all the default styles, but is worth a try in doxygen.
You could write a Perl/Awk script to simply delete the unwanted lines from the table of contents. For the file burble.tex, Latex will generate the file burble.toc, which will contain lines such as:
\contentsline {subsection}{Class F rewrites}{38}
\contentsline {subsection}{Class M rewrites}{39}
\contentsline {section}{\numberline {7}Definition and properties of the translation}{44}
\contentsline {paragraph}{Well-formedness}{54}
Simple regexes will identify which levels each line belongs to, and you can filter the file based on that. Once you have the table of contents the way you want it, insert \nofiles in the appropriate place (the style sheet?), which means that Latex will read the auxiliary files but not overwrite them.