Store an NSString as a fixed length integer? - objective-c

having a bit of trouble finding a solution to this.
I want to take a large ordered text file of words and create - in the same order - a text file of fixed length numeric values.
For example:
Input File Output File
AAA -> 00000001
AAH -> 00002718
AAZ -> 71827651
Initially it seemed a hash function would do the trick. However they are one way. Also perhaps they are a bit "heavyweight" for this. After all, I don't need any cryptography. Plus, it's a reference file. It will never change.
Any compression is a bonus not essential. That said, I don't want the file to get any bigger than it already is. Which is why I don't just want to write out the words as text but with fixed lengths.
So, bottom line; input is a NSString of variable length, output is an integer of fixed length. And, I must be able to take the integer and figure out the string.
Any help much appreciated!
Thanks!
xj

Well, this would be a bit of a brute force method, but here's my guess.
Start by making a custom function to convert one letter of text to an integer less than 100. (I'm not sure if such a function already exists, if so then great!) You might need to just go to stuff like "if ([input isEqual: #"a"]){ return 1;}
Then, run that function on each letter of text, and get the final integer by combining the previous results.
For example:
int myVal1 = [intConverter firstLetter];
int myVal2 = [intConverter secondLetter];
int myVal3 = [intConverter thirdLetter];
int finalValue =100^3 + 100^2*myVal1 + 100*myVal2 + myVal3;
Then, finalValue would be of the form 1(myVal1)(myVal2)(myVal3), which is what I think you're looking for.
To get back the original string, simply use the mod (%) and division functions to get the individual values back, then run the intConverter function backwards. (This would probably mean writing a new function that basically runs those if statements in reverse, but oh well.)
I hope this helps.

Related

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

I've been breaking my head over this for a few days now and can't seem to be able to figure it out. Perhaps it's glaringly obvious, but I don't seem to be able to spot it. I've read up on all the basics of unicode, UTF-8, UTF-16, normalisation, etc, but to no avail. Hopefully somebody's able to help me out here...
I'm using Go's Value function from the testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface for the structs in question. Specifically, given a Metadata struct, I've defined the implementation as follows:
func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
value = reflect.ValueOf(m).Elem()
for i := 0; i < value.NumField(); i++ {
if t, ok := quick.Value(value.Field(i).Type(), r); ok {
value.Field(i).Set(t)
}
}
return
}
Now, in doing so, I'll end up with both the receiver and the return value being set with random generated values of the appropriate type (strings, ints, etc. in the receiver and reflect.Value in the returned reflect.Value).
Now, the implementation for the Value function states that it will return something of type []rune converted to type string. As far as I know, this should allow me to then use the functions in the runes, unicode and norm packages to define a filter which filters out everything which is not part of 'Latin', 'Letter' or 'Number'. I defined the following filter which uses a transform to filter out letters which are not in those character rangetables (as defined in the unicode package):
func runefilter(in reflect.Value) (out reflect.Value) {
out = in // Make sure you return something
if in.Kind() == reflect.String {
instr := in.String()
t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
outstr, _, _ := transform.String(t, instr)
out = reflect.ValueOf(outstr)
}
return
}
Now, I think I've tried just about anything, but I keep ending up with a series of strings which are far from the Latin range, e.g.:
𥗉똿穊
𢷽嚶
秓䝏小𪖹䮋
𪿝ท솲
𡉪䂾
ʋ𥅮ᦸ
堮𡹯憨𥗼𧵕ꥆ
𢝌𐑮𧍛併怃𥊇
鯮
𣏲𝐒
⓿ꐠ槹𬠂黟
𢼭踁퓺𪇖
俇𣄃𔘧
𢝶
𝖸쩈𤫐𢬿詢𬄙
𫱘𨆟𑊙
欓
So, can anybody explain what I'm overlooking here and how I could instead define a transformer which removes/replaces non-letter/number/latin characters so that I can use the Value function as intended (but with a smaller subset of 'random' characters)?
Thanks!
Confusingly the Generate interface needs a function using the type not a the pointer to the type. You want your type signature to look like
func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)
You can play with this here. Note: the most important thing to do in that playground is to switch the type of the generate function from m Metadata to m *Metadata and see that Hi Mom! never prints.
In addition, I think you would be better served using your own type and writing a generate method for that type using a list of all of the characters you want to use. For example:
type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"
and then use the generator
func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
var buffer bytes.Buffer
for i := 0; i < size; i++ {
buffer.WriteString(string(latin[rand.Intn(len(latin))]))
}
s := LatinString(buffer.String())
return reflect.ValueOf(s)
}
playground
Edit: also this library is pretty cool, thanks for showing it to me
The answer to my own question is, it seems, a combination of the answers provided in the comments by #nj_ and #jimb and the answer provided by #benjaminkadish.
In short, the answer boils down to:
"Not such a great idea as you thought it was", or "Bit of an ill-posed question"
"You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))
Now for the longer version...
The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.
Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.
In practice, that means I'm running into two problems:
There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc🙁') or 'www.🕸site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.
So, what do we need to take away from this?
So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:
Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold 💰💰' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
Don't use testing/quick for fuzzy testing, because as soon as you run into a generated string value, you can (and should) get a very random string.
As always, further comments are of course welcome, as it's quite possible I overlooked something.

wxWidgets - wxGrid - reading/writing non string cell values

I have a wxGrid to edit an array of numerical data.
I was wondering what's the best way to get non-string data in and out of the cells without going through the string to numeric conversion all the time.
I've used SetCellEditor() to control the data entry.
currently I use this:
// numeric value into cell
str.clear();
str << val1;
m_grid4->SetCellValue(row, col, str);
..
// read value from back into variable
val = atoi(m_grid4->GetCellValue(row, col));
Apart from the fact that atoi() is a bit ugly and a template function with a stringstream would be better, is there a way do get non-string values a bit better in and out of cells?
I was looking at the editors and renderers but can't figure it out.
If you worry about efficiency, you almost certainly should use a custom table class deriving from wxGridTableBase instead of using the default trivial wxGridStringTable implementation which stores everything as strings. Then, and much less importantly, if it makes sense in your case, you can use wxGridCellNumberRenderer which will call your table GetValueAsLong() method instead of GetValue() (which returns a string).
Both of those are demonstrated in wxGrid sample, notably look at BugsGridTable there.
Good luck!

How to change the variable length in Progress?

I'm pretty new to progress and I want to ask a question.
How do I change variable (string) length in runtime?
ex.
define variable cname as char.
define variable clen as int.
cname= "".
DO cnts = 1 TO 5.
IF prc[cnts] <> "" THEN DO:
clen = clen + LENGTH(prc[cnts]).
cname = cname + prc[cnts].
END.
END.
Put cname format '???' at 1. /here change variable length/
Thanks for the reply
If the PUT statement is what you want to change, then
PUT UNFORMATTED cname.
will write the entire string out without having to worry about the length of the FORMAT phrase.
If you need something formatted, then
PUT UNFORMATTED STRING(cname, fill("X", clen)).
will do what you want. Look up the "STRING()" function in the ABL Ref docs.
In Progress 4GL all data is variable length.
This is one of the big differences between Progress and lots of other development environments. (And in my opinion a big advantage.)
Each datatype has a default format, which you can override, but that is only for display purposes.
Display format has no bearing on storage.
You can declare a field with a display format of 3 characters:
define variable x as character no-undo format "x(3)".
And then stuff 60 characters into the field. Progress will not complain.
x = "123456789012345678901234567890123456789012345678901234567890".
It is extremely common for 4gl application code to over-stuff variables and fields.
(If you then use SQL-92 to access the data you will hear much whining and gnashing of teeth from your SQL client. This is easily fixable with the "dbtool" utility.)
You change the display format when you define something:
define variable x as character no-undo format "x(30)".
or when you use it:
put x format "x(15)".
or
display x format "x(43)".
(And in many other ways -- these are just a couple of easy examples.)
Regardless of the display format the length() function will report the actual length of the data.

Convert an alphanumeric string to integer format

I need to store an alphanumeric string in an integer column on one of my models.
I have tried:
#result.each do |i|
hex_id = []
i["id"].split(//).each{|c| hex_id.push(c.hex)}
hex_id = hex_id.join
...
Model.create(:origin_id => hex_id)
...
end
When I run this in the console using puts hex_id in place of the create line, it returns the correct values, however the above code results in the origin_id being set to "2147483647" for every instance. An example string input is "t6gnk3pp86gg4sboh5oin5vr40" so that doesn't make any sense to me.
Can anyone tell me what is going wrong here or suggest a better way to store a string like the aforementioned example as a unique integer?
Thanks.
Answering by request form OP
It seems that the hex_id.join operation does not concatenate strings in this case but instead sums or performs binary complement of the hex values. The issue could also be that hex_id is an array of hex-es rather than a string, or char array. Nevertheless, what seems to happen is reaching the maximum positive value for the integer type 2147483647. Still, I was unable to find any documented effects on array.join applied on a hex array, it appears it is not concatenation of the elements.
On the other hand, the desired result 060003008600401100500050040 is too large to be recorded as an integer either. A better approach would be to keep it as a string, or use different algorithm for producing a number form the original string. Perhaps aggregating the hex values by an arithmetic operation will do better than join ?

What's the easiest approach for a calculator keypad algorithm?

Want to code a key pad for an calculator. What I want to make is:
Keypad with keys from 0 to 9
Special keys: + - * / . =
My conceptual so far:
When a numeric key is pressed, convert it's int value into an string and append that string to the bufferString. That way the input value gets built up. When the user presses . (to make a float value), check if . is already in the bufferString. If it is, ignore that.
But: is that really a good way to go? Or should I do all this number input stuff pure mathematically?
The idea is to convert a infix expression to a postfix expression (Reverse Polish notation) using the the Shunting yard algorithm. Then the postfix expression is easy to resolve.
Why converting from int to string when you could just pass directly a string? Everything else looks ok for me.