Shorten text codes - cryptography

My question is more of cryptographic matter than programming.
I have codes of 5 chars each, that can be concatenated in pairs. The result is a code of 10 chars.
The problem is that the database field where I must store these values is only 6 chars width, and i'd prefer not to resize it.
Is there a known method or algorithm which could shorten the pairs of value, to change from 10 chars to 6 chars max ? The result can be made of any printable chars (preferably ASCII), and must avoid any duplicated values for two distinct pairs of codes.
Another solution may be shortening the 5 chars codes to 3 chars, but the remaining problem is also about no duplicates allowed when concatenated by pairs.
Thank you for any idea. I tried several solution (including Base64 encoding !) but my results are always too long, or they include duplicated values.

This question has nothing to do with cryptography. It should probably be tagged information-theory.
There are 97 printable ASCII characters, so the maximum amount of information that you can store in 6 chars is 39.6 bits (=6 × log2(97)). If you spread the same amount of information across 10 characters, then you only can only carry 3.96 bits in each character. That means you can use an alphabet of 15 characters for your codes (e.g., uppercase letters from A to O).

Related

How to encode string to unique long?

The server sends alphanumerical ids for a list of items. At the same time, recycler view getItemId (required for has stable ids) must return Long. How to encode string to unique long?
Short answer: you probably can't.  Not unless the IDs are guaranteed to be short.
A Long uses 8 bytes, so it can hold 2⁶⁴ (about 1.8×10¹⁹) different values.  So it could only represent that number of strings.  (A result of the pigeonhole principle.)
However, if the IDs contain only basic ASCII letters (let's assume upper case) and digits — 36 possibilities — and are 13 characters long, then there are 36¹³ (about 1.7×10²⁰) different strings.  That's an order of magnitude more than 2⁶⁴, so some of them will have to map to the same Long value.
(In fact, each Long would map to about 10 IDs on average — and even more if you include strings with fewer characters, and/or a greater range of characters.)
So unless the range of IDs is limited, you'll have to find another approach.

Link numbers with an equation/algorithm

I am making an anagram solver in Visual Basic that gives you every possible combination when you enter a string. I need to work out how many combinations there are depending on the amount of characters in the string and how many different characters there are.
E.G.
Sample string:
abc
Total characters: 3, Different Characters: 3
Possible combinations: 6
abc, acb, bac, bca, cab, cba
I need an equation (using the number of characters and different characters) to link this to a string that contains a different amount of characters.
I've been using trial and error to try and figure is out, but I can't quite get my head around it. So far I have:
((letters - 1) ^ (different letters - 1)) + (letters - 1)
which works for a few different letter counts but now for all.
Help please???
I'll lead you to the answer, but I'll try to explain along the way. Let's say you had 10 different letters. You'd have 10 choices for the first, 9 for the second, 8 for the third, etc. Ultimately, there would be 10*9*8*7*6...*2*1 = 10! possibilities. However, sometimes you'll have multiple instances of the same letter. For example, using that for the string "aaabcd" would overcount possibilities, because it counts each of the a's as distinct letters, even though they're not. To correct for that, you would have to divide by the factorial of the number of repeated letters. A good way to calculate the total number of possibilities would be (total number of letters factorial)/ (product of the factorials of the number of repeated instances of each letter).
For example:
There are 6!/(3!) ways to arrange the letters in "aaabcd"
There are 6! ways to arrange the letters is "abcdef"
There are 6!/(3!*2!) ways to arrange the letters in "aaabbc"
There are 10!/(5!*3!*2!) ways to arrange the letters in "aaaaabbbcc"
I hope this helps.
For the possible counting number, it's exactly the same as computing Multinomial Coefficient
A simple explanation is that, for no repeating characters,
It's simply permutation = n!
(It is easy to understand if you draw a tree diagram, with first character has n choices, second character has n-1choices...etc.)
However as you may have repeating characters, you will double count many of them.
Let's see an simple example: for aaa, how many possible arrangements IF WE COUNT EVEN THE OUTCOME IS THE SAME?
Answer is 3!(aaa,aaa,aaa,aaa,aaa,aaa)
This gives us an idea that, when we have a character appearing for m times, we will count m! instead of 1
So the counting is just n!(all possible arrangements, including same outcome) / m! (a character appear for m times)
Same for more characters repeating: n!/a!b!c!.. (first character appear a times, another appear for b times...)
If you understand the concept behind, then you will find that, actually for those "non-repeating" characters, it's just dividing an 1!. For eg, character (multi)set = {a,a,a,b,b,c}, #a = 3, #b = 2, #c = 1, so the answer (without repeating count) is (3+2+1)!/3!2!1! and fraction of this format is named multinomial coefficient as stated above.
In programming point of view, you can just pre-compute all factorials (with a pretty small n though as n~30 is already too large for a variable to store) with simple for loop
declare frac = array(n);
frac[0] = 1;
FOR i=1; i<=n;i++
frac[i] = i*frac[i-1]
For a larger n, you may just calculate double/float division on the fly in the loop to avoid overflow..you may face precision problem though.
If you further need to output the different strings, you may use DFS to backtrack all the possible outcomes. Or if you could use another language like C++, you can use built-in function like next_permutation() after sort the character set.

Random string generation using arc4random

I'm trying to create a method that creates a random string consisting of 32 characters. This method will generate a random number using arc4random_uniform(62) to choose a number between 0 and 61 and then chose a character from a string that holds numbers from 0 to 9 and alphabet letters both small and capital letters, respectively. For an instance, if arc4random_uniform(62) returns 10, the chosen character will be a, if it returns 61, the chosen character will be Z). The method will do this for 32 times to create the final generated string.
I was wondering when this approach will fail to generate a unique String and result in a repeated one. I searched about this topic and didn't find a satisfying answer. I hope that you will help with me this since I am trying to use this method to generate unique IDs for use in my app.
This method will generate a random number using arc4random_uniform(62) to choose a number between 0 and 61 and then chose a character from a string that holds numbers from 0 to 9 and alphabet letters both small and capital letters, respectively.
You could create an array with a string for all the characters you want to include, and randomly pick values. Or, alternatively you could take advantage of the ASCII encoding has mostly sequential character positions and you can fairly easily convert an ascii number to an NSString.
An integer between 48 and 57 is the numbers 0-9 in ASCII, 65 to 90 is A-Z and 97 to 122 is a-z: https://en.wikipedia.org/wiki/Ascii_table#ASCII_printable_code_chart
I was wondering when this approach will fail to generate a unique String and result in a repeated one. I searched about this topic and didn't find a satisfying answer.
It's often referred to as the "birthday problem". As long as your value is reasonably long (say, 20 characters), it is effectively impossible to have a collision. The world is more likely to be destroyed in the next 2 seconds than your app ever creating a collision.
I hope that you will help with me this since I am trying to use this method to generate unique IDs for use in my app.
Apple provides an API for generating unique IDs. You should use that instead of inventing your own system:
NSString *id = [NSUUID UUID].UUIDString;
That will give you a value like D19B40AA-322C-4ADF-BEF6-2EC4D4CE7BA8. It conforms to "Version 4" of the UUID standard — according to Wikipedia if you generate 1 billion UUIDs every second for the next 100 years, there is a 50% chance of getting two IDs that are the same.
If the UUID is longer than you want, you could grab a smaller part part of the string. Beware that the 4 at the start of the third block means this is a "version 4" UUID and is not a random value. Also the first character at the start of the 4th block is only has four possible values — so avoid or strip off those two characters if you want to grab a smaller part of the string for use as your random ID. See the wikipedia page on UUIDs for more detail.

Make unique readable string out of a long integer

I have long integers numbers like this: 5291658276538691055
How could I programmatically convert this number to a 4-6 capital letters only that is a unique combination that can also be reversed to get back to the number?
For example using OBJ-C.
There are 26 capital letters;
6 of them could represent 26 ^ 6 numbers (308915776);
So, no. You are trying to map a much larger range of numbers into a much smaller range, it cannot be reversible.
Also, log 5291658276538691055 / log 26 is less than 14, so if 14 letters is good for you, just transform the number into 26-based and map the digits to letters.
And one more thing - if the range of numbers is small enough, you could do some manipulation on the numbers (e.g., just subtract the min) and encode it, which will cost you less digits.
You will need to convert the numbers to Base 26 (Hexavigesimal - snappy name!)
The Wikipedia article on Hexavigesimal gives example code in Java - you should be able to adapt this pretty easily.
NB: You cannot get the long number you mentioned down to 4-6 capital letters only using a conversion algorithm (your example in Base 26 is BCKSATKEBRYBXJ). If you need conversion that short, you only have two options:
Lookup tables (store mappings, e.g. 5291658276538691055 = ABCDEF). Obviously only useful if you have a discrete set of numbers.
Including additional characters (e.g. lower case + numbers).

SQL Server : Taking Numerical Characters and Hashing them under with a max length of 20 characters

Hello I was trying to find a good way to hash a set of numerical numbers which its output would be under 20 characters that are positive and unique. Any one have any suggestions?
For hashing in general, I'd use the HASHBYTES function. You can then convert the binary data to a string and just pick the first 20 characters, that should still be unique enough.
To get around HASHBYTES limitations (8000 bytes for instance), you can incrementally hash, e.g. for each value concat the previous hash with the value to be added and hash that again. This will make it unique with order etc. and unless you append close to 8000 bytes in one value it will not cause data truncation for the hashing.