Is is possible to read in a list of numbers in SML? - input

I'm trying to make a program in SML that will read in a series/list/sequence of numbers from the user, process the numbers, and output the result. I don't know how many numbers the user will input. The program can either read in all the numbers and output the results all together or read and output one at a time. I don't care whether the input is in a separate file or manually input at a console.
What do I need to do to be able to read input?
fun fact x = if x<2 then 1 else x*fact(x-1);
let val keepgoing:bool ref = ref true in
while !keepgoing do
let val num = valOf(TextIO.inputLine TextIO.stdIn) in
print( Int.toString( fact( valOf( Int.fromString( num ) ) ) ) );
keepgoing := (null(explode(num)))
end
end;
Sorry about the convoluted conversions. If you also know an easier way to read in integers, I'd appreciate that, too.

Your logic is just flawed here. You want keepgoing := not (null (explode num)). Right? It works fine for me with that change. You need to implement removal of the final newline (so null explode does what you want) and parsing a line with more than one number, but you basically have the right idea.

Related

Finding best delimiter by size of resulting array after split Kotlin

I am trying to obtain the best delimiter for my CSV file, I've seen answers that find the biggest size of the header row. Now instead of doing the standard method that would look something like this:
val supportedDelimiters: Array<Char> = arrayOf(',', ';', '|', '\t')
fun determineDelimiter(headerRow): Char {
var headerLength = 0
var chosenDelimiter =' '
supportedDelimiters.forEach {
if (headerRow.split(it).size > headerLength) {
headerLength = headerRow.split(it).size
chosenDelimiter = it
}
}
return chosenDelimiter
}
I've been trying to do it with some in-built Kotlin collections methods like filter or maxOf, but to no avail (the code below does not work).
fun determineDelimiter(headerRow: String): Char {
return supportedDelimiters.filter({a,b -> headerRow.split(a).size < headerRow.split(b)})
}
Is there any way I could do it without forEach?
Edit: The header row could look something like this:
val headerRow = "I;am;delimited;with;'semi,colon'"
I put the '' over an entry that could contain other potential delimiter
You're mostly there, but this seems simpler than you think!
Here's one answer:
fun determineDelimiter(headerRow: String)
= supportedDelimiters.maxByOrNull{ headerRow.split(it).size } ?: ' '
maxByOrNull() does all the hard work: you just tell it the number of headers that a delimiter would give, and it searches through each delimiter to find which one gives the largest number.
It returns null if the list is empty, so the method above returns a space character, like your standard method. (In this case we know that the list isn't empty, so you could replace the ?: ' ' with !! if you wanted that impossible case to give an error, or you could drop it entirely if you wanted it to give a null which would be handled elsewhere.)
As mentioned in a comment, there's no foolproof way to guess the CSV delimiter in general, and so you should be prepared for it to pick the wrong delimiter occasionally. For example, if the intended delimiter was a semicolon but several headers included commas, it could wrongly pick the comma. Without knowing any more about the data, there's no way around that.
With the code as it stands, there could be multiple delimiters which give the same number of headers; it would simply pick the first. You might want to give an error in that case, and require that there's a unique best delimiter. That would give you a little more confidence that you've picked the right one — though there's still no guarantee. (That's not so easy to code, though…)
Just like gidds said in the comment above, I would advise against choosing the delimiter based on how many times each delimiter appears. You would get the wrong answer for a header row like this:
Type of shoe, regardless of colour, even if black;Size of shoe, regardless of shape
In the above header row, the delimiter is obviously ; but your method would erroneously pick ,.
Another problem is that a header column may itself contain a delimiter, if it is enclosed in quotes. Your method doesn't take any notice of possible quoted columns. For this reason, I would recommend that you give up trying to parse CSV files yourself, and instead use one of the many available Open Source CSV parsers.
Nevertheless, if you still want to know how to pick the delimiter based on its frequency, there are a few optimizations to readability that you can make.
First, note that Kotlin strings are iterable; therefore you don't have to use a List of Char. Use a String instead.
Secondly, all you're doing is counting the number of times a character appears in the string, so there's no need to break the string up into pieces just to do that. Instead, count the number of characters directly.
Third, instead of finding the maximum value by hand, take advantage of what the standard library already offers you.
const val supportedDelimiters = ",;|\t"
fun determineDelimiter(headerRow: String): Char =
supportedDelimiters.maxBy { delimiter -> headerRow.count { it == delimiter } }
fun main() {
val headerRow = "one,two,three;four,five|six|seven"
val chosenDelimiter = determineDelimiter(headerRow)
println(chosenDelimiter) // prints ',' as expected
}

Creating 4 digit number with no repeating elements in Kotlin

Thanks to #RedBassett for this Ressource (Kotlin problem solving): https://kotlinlang.org/docs/tutorials/koans.html
I'm aware this question exists here:
Creating a 4 digit Random Number using java with no repetition in digits
but I'm new to Kotlin and would like to explore the direct Kotlin features.
So as the title suggests, I'm trying to find a Kotlin specific way to nicely solve generate a 4 digit number (after that it's easy to make it adaptable for length x) without repeating digits.
This is my current working solution and would like to make it more Kotlin. Would be very grateful for some input.
fun createFourDigitNumber(): Int {
var fourDigitNumber = ""
val rangeList = {(0..9).random()}
while(fourDigitNumber.length < 4)
{
val num = rangeList().toString()
if (!fourDigitNumber.contains(num)) fourDigitNumber +=num
}
return fourDigitNumber.toInt()
}
So the range you define (0..9) is actually already a sequence of numbers. Instead of iterating and repeatedly generating a new random, you can just use a subset of that sequence. In fact, this is the accepted answer's solution to the question you linked. Here are some pointers if you want to implement it yourself to get the practice:
The first for loop in that solution is unnecessary in Kotlin because of the range. 0..9 does the same thing, you're on the right track there.
In Kotlin you can call .shuffled() directly on the range without needing to call Collections.shuffle() with an argument like they do.
You can avoid another loop if you create a string from the whole range and then return a substring.
If you want to look at my solution (with input from others in the comments), it is in a spoiler here:
fun getUniqueNumber(length: Int) = (0..9).shuffled().take(length).joinToString('')
(Note that this doesn't gracefully handle a length above 10, but that's up to you to figure out how to implement. It is up to you to use subList() and then toString(), or toString() and then substring(), the output should be the same.)

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

I've been breaking my head over this for a few days now and can't seem to be able to figure it out. Perhaps it's glaringly obvious, but I don't seem to be able to spot it. I've read up on all the basics of unicode, UTF-8, UTF-16, normalisation, etc, but to no avail. Hopefully somebody's able to help me out here...
I'm using Go's Value function from the testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface for the structs in question. Specifically, given a Metadata struct, I've defined the implementation as follows:
func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
value = reflect.ValueOf(m).Elem()
for i := 0; i < value.NumField(); i++ {
if t, ok := quick.Value(value.Field(i).Type(), r); ok {
value.Field(i).Set(t)
}
}
return
}
Now, in doing so, I'll end up with both the receiver and the return value being set with random generated values of the appropriate type (strings, ints, etc. in the receiver and reflect.Value in the returned reflect.Value).
Now, the implementation for the Value function states that it will return something of type []rune converted to type string. As far as I know, this should allow me to then use the functions in the runes, unicode and norm packages to define a filter which filters out everything which is not part of 'Latin', 'Letter' or 'Number'. I defined the following filter which uses a transform to filter out letters which are not in those character rangetables (as defined in the unicode package):
func runefilter(in reflect.Value) (out reflect.Value) {
out = in // Make sure you return something
if in.Kind() == reflect.String {
instr := in.String()
t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
outstr, _, _ := transform.String(t, instr)
out = reflect.ValueOf(outstr)
}
return
}
Now, I think I've tried just about anything, but I keep ending up with a series of strings which are far from the Latin range, e.g.:
𥗉똿穊
𢷽嚶
秓䝏小𪖹䮋
𪿝ท솲
𡉪䂾
ʋ𥅮ᦸ
堮𡹯憨𥗼𧵕ꥆ
𢝌𐑮𧍛併怃𥊇
鯮
𣏲𝐒
⓿ꐠ槹𬠂黟
𢼭踁퓺𪇖
俇𣄃𔘧
𢝶
𝖸쩈𤫐𢬿詢𬄙
𫱘𨆟𑊙
欓
So, can anybody explain what I'm overlooking here and how I could instead define a transformer which removes/replaces non-letter/number/latin characters so that I can use the Value function as intended (but with a smaller subset of 'random' characters)?
Thanks!
Confusingly the Generate interface needs a function using the type not a the pointer to the type. You want your type signature to look like
func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)
You can play with this here. Note: the most important thing to do in that playground is to switch the type of the generate function from m Metadata to m *Metadata and see that Hi Mom! never prints.
In addition, I think you would be better served using your own type and writing a generate method for that type using a list of all of the characters you want to use. For example:
type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"
and then use the generator
func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
var buffer bytes.Buffer
for i := 0; i < size; i++ {
buffer.WriteString(string(latin[rand.Intn(len(latin))]))
}
s := LatinString(buffer.String())
return reflect.ValueOf(s)
}
playground
Edit: also this library is pretty cool, thanks for showing it to me
The answer to my own question is, it seems, a combination of the answers provided in the comments by #nj_ and #jimb and the answer provided by #benjaminkadish.
In short, the answer boils down to:
"Not such a great idea as you thought it was", or "Bit of an ill-posed question"
"You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))
Now for the longer version...
The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.
Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.
In practice, that means I'm running into two problems:
There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc🙁') or 'www.🕸site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.
So, what do we need to take away from this?
So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:
Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold 💰💰' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
Don't use testing/quick for fuzzy testing, because as soon as you run into a generated string value, you can (and should) get a very random string.
As always, further comments are of course welcome, as it's quite possible I overlooked something.

Constructing a recursive compare with SQL

This is an ugly one. I wish I wasn't having to ask this question, but the project is already built such that we are handling heavy loads of validations in the database. Essentially, I'm trying to build a function that will take two stacks of data, weave them together with an unknown batch of operations or comparators, and produce a long string.
Yes, that was phrased very poorly, so I'm going to give an example. I have a form that can have multiple iterations of itself. For some reason, the system wants to know if the entered start date on any of these forms is equal to the entered end date on any of these forms. Unfortunately, due to the way the system is designed, everything is stored as a string, so I have to format it as a date first, before I can compare. Below is pseudo code, so please don't correct me on my syntax
Input data:
'logFormValidation("to_date(#) == to_date(^)"
, formname.control1name, formname.control2name)'
Now, as I mentioned, there are multiple iterations of this form, and I need to loop build a fully recursive comparison (note: it may not always be typical boolean comparisons, it could be internally called functions as well, so .In or anything like that won't work.) In the end, I need to get it into a format like below so the validation parser can read it.
OR(to_date(formname.control1name.1) == to_date(formname.control2name.1)
,to_date(formname.control1name.2) == to_date(formname.control2name.1)
,to_date(formname.control1name.3) == to_date(formname.control2name.1)
,to_date(formname.control1name.1) == to_date(formname.control2name.2)
:
:
,to_date(formname.control1name.n) == to_date(formname.control2name.n))
Yeah, it's ugly...but given the way our validation parser works, I don't have much of a choice. Any input on how this might be accomplished? I'm hoping for something more efficient than a double recursive loop, but don't have any ideas beyond that
Okay, seeing as my question is apparently terribly unclear, I'm going to add some more info. I don't know what comparison I will be performing on the items, I'm just trying to reformat the data into something useable for ANY given function. If I were to do this outside the database, it'd look something like this. Note: Pseudocode. '#' is the place marker in a function for vals1, '^' is a place marker for vals2.
function dynamicRecursiveValidation(string functionStr, strArray vals1, strArray vals2){
string finalFunction = "OR("
foreach(i in vals1){
foreach(j in vals2){
finalFunction += functionStr.replace('#', i).replace('^', j) + ",";
}
}
finalFunction.substring(0, finalFunction.length - 1); //to remove last comma
finalFunction += ")";
return finalFunction;
}
That is all I'm trying to accomplish. Take any given comparator and two arrays, and create a string that contains every possible combination. Given the substitution characters I listed above, below is a list of possible added operations
# > ^
to_date(#) == to_date(^)
someFunction(#, ^)
# * 2 - 3 <= ^ / 4
All I'm trying to do is produce the string that I will later execute, and I'm trying to do it without having to kill the server in a recursive loop
I don't have a solution code for this but you can algorithmically do the following
Create a temp table (start_date, end_date, formid) and populate it with every date from any existing form
Get the start_date from the form and simply:
SELECT end_date, form_id FROM temp_table WHERE end_date = <start date to check>
For the reverse
SELECT start_date, form_id FROM temp_table WHERE start_date = <end date to check>
If the database is available why not let it do all the heavy lifting.
I ended up performing a cross product of the data, and looping through the results. It wasn't the sort of solution I really wanted, but it worked.

Detecting a sub-string

I have a lot of data that I need to sort through and one of the fields contains both the make/model of the vehicle as well as the reg, sometimes separated by a dash (-) sometimes however it is not. Here is an example of such a string:
VehicleModel - TU69YUP
VehicleModel - TU69 YUP
VehicleModel TU69YUP
VehicleModel TU69 YUP
There are also some other variations but they are the main ones I have encountered. Is there a way that I can reliably go through all of the data and separate the vehicle reg from the model?
The data is currently contained within a Paradox database which I have no problem going through. I do not have a list of all of the vehicle models and names that are contained within the database, likewise, I also do not have a list of the licence plates.
The project is written in Delphi/SQL so I would prefer to stick with either one of these if at all possible.
Trouble ahead
If that field was originally entered by a user in the form that you're now seeing, then we can assume there was no validation, the original program would simply store whatever the user entered. If that's the case, you can't get 100% accuracy: human beings will always make mistakes, intentionally or unintentionally. Expect this kinds of human errors:
Missing fields (ie: registration only, no vehicle information - or the other way around)
Meaningless duplication of words (example: "Ford Ford K - TU 69 YUP")
Missing letters, duplicated letters, extra garbage letters. Example: "For K - T69YUP"
Wrong order of fields
Other small errors you can't even dream of.
Plain garbage that not even a human would make sense of.
You might have guessed I'm a bit pessimistic when dealing with human-entered data straight into text fields. I had the distinct misfortune to deal with a database where all data was text and there was no validation: Can you guess the kind of nonsense people typed in unvalidated date fields that allowed free user input?
The plan
Things aren't as dark as they seem, you can probably "fix" lots of things. The trick here is making sure you only fix data that's unambiguous and let a human sift through the rest of the stuff. The easiest way to do that is to do something like this:
Look at the data you have and wasn't automatically fixed yet. Figure out a rule that unambiguously applies to lots of records.
Apply the unambiguous rule.
Repeat until only a few records are left. Those should be fixed by hand, because they resisted all automatic methods that were applied.
The implementation
I strongly recommend using regular expressions for all the tests, because you'll surely end up implementing lots of different tests, and regular expressions can easily "express" the slight variations in search text. For example the following reg-ex can parse all 4 of your examples and give the correct result:
(.*?)(\ {1,3}-\ {1,3})?(\b[A-Z]{2}\ {0,2}[0-9]{2}\ {0,3}[A-Z]{3}\b)
If you've never worked with regular expressions before, that single expressions looks unintelligible, but it's in fact very simple. This is not a reg-ex question so I'm not going into any details. I'd rather explain how I've come up with the idea.
First of all, if the text includes vehicle registration numbers, those numbers will be in a very strict format: they'd be easy to match. Given your example I assume all registration numbers are of the form:
LLNNLLL
where "L" is a letter and "N" is a number. My regex is rigid in it's interpretation of it: it wants exactly two uppercase letters, followed by a small number of spaces (or no space), followed by exactly two digits, followed by a small number of spaces (or no space), finally followed by exactly 3 uppercase letters. The part of the regex that deals with that is:
[A-Z]{2}\ {0,2}[0-9]{2}\ {0,3}[A-Z]{3}
The rest of the regex makes sure the registration number isn't found embedded into other words, deals with grouping text into capture groups and creates an "lazy capture group" for the VehicleModel.
If I were to implement this myself, I'd probably write a "master" function and a number of simpler "case" functions, each function dealing with one kind of variation in user input. Example:
// This function does a validation of the extracted data. For example it validates the
// Registration number, using other, more precise criteria. The parameters are VAR so the
// function may normalize the results.
function ResultsAreValid(var Make, Registration:string): Boolean;
begin
Result := True; // Only you know what your data looks like and how it can be validated.
end;
// This is a case function that deals with a very rigid interpretation of user data
function VeryStrictInterpretation(const Text:string; out Make, Registration: string): Boolean;
var TestMake, TestReg: string;
// regex engine ...
begin
Result := False;
if (your condition) then
if ResultsAreValid(TestMake, TestReg) then
begin
Make := TestMake;
Registration := TestReg;
Result := True;
end;
end;
// Master function calling many different implementations that each deal with all sorts
// of variations of input. The most strict function should be first:
function MasterTest(const Text:string; out Make, Registration: string): Boolean;
begin
Result := VeryStrictInterpretation(Text, Make, Registration);
if not Result then Result := SomeOtherImplementation(Text, Make, Registration);
if not Result then Result := ThirdInterpretation(Text, Make, Registration);
end;
The idea here is to try to make multiple SIMPLE procedures, that each understands one kind of input in an unambiguous way; And make sure each step doesn't return false positives! And finally don't forget, a human should deal with the last few cases, so don't aim for a fix-it-all solution.
Well assuming that they are of the same format. Word[space]Word
Then you can iterate through them all, and if you encounter a whitespace without a dash, insert a dash. Then split as normal.
Here is a code example.
It will check for the - and also remove possible spaces in the license number.
Note : (as commented by Ken White), if the vehicle contains a space, this will have to be handled as well.
type
EMySplitError = class(Exception);
procedure SplitVehicleAndLicense( s : String; var vehicle,license : String);
var
p : Integer;
begin
vehicle := '';
license := '';
p := Pos('-',s);
if (p = 0) then
begin // No split delimiter
p := Pos(' ',s);
if (p > 0) then
begin
vehicle := Trim(Copy(s,1,p-1));
license := Trim(Copy(s,p+1,Length(s)));
end
else
Raise EMySplitError.CreateFmt('Not a valid vehicle/license name:%s',[s]);
end
else
begin
vehicle := Trim(Copy( s,1,p-1));
license := Trim(Copy( s,p+1,Length(s)));
end;
// Trim spaces in license
repeat
p := Pos(' ',license);
if (p <> 0) then Delete(license,p,1);
until (p = 0);
end;