Why did this hashmap stop working out of no where? - kotlin

I have used this HashMap for a few days now and no problems at all. Now I get an error about FloatingDecimal,parseDouble, ReadWrite, FileReadWrite, and Looping error.
the last thing I did to the program was adding $%.2f.formant to my ducts.second element it ran a few times I left to eat and came back to this!
I was able to narrow it down to when it pulls the data from the file and converts it to the hashmap setting.
Data in the file example 111,shoes,59.00
val fileName = "src/products.txt"
var products = HashMap<Int, Pair<String, Double>>()
var inputFD =File(fileName).forEachLine {
var pieces = it.split(",")
// println(pieces)
products [pieces [0].toInt()] = Pair(pieces [1].trim(),pieces[2].toDouble())
}

The data type in the file was altered when reading back in causing the whole program to crash. I wanted my double to be example 9.99 and when I read the file back in I added a $ sign meant for the front in view only. When the program was looking for a double (9.99) is only had the option of ($9.99) causing the error.

Related

get every other index in for loop

I have an interesting issue. I have a string that's in html and I need to parse a table so that I can get the data I need out of that table and present it in a way that looks good on a mobile device. So I use regex and it works just fine but now I'm porting my code to using Kotlin and the solution I have is not porting over well. Here is what the solution looks currently:
var pointsParsing = Regex.Matches(htmlBody, "<td.*?>(.*?)</td>", RegexOptions.IgnoreCase | RegexOptions.Compiled);
var pointsSb = new StringBuilder();
for (var i = 0; i < pointsParsing.Count; i+= 2)
{
var pointsTitle = pointsParsing[i].Groups[1].Value.Replace("&", "&");
var pointsValue = pointsParsing[i+1].Groups[1].Value;
pointsSb.Append($"{pointsTitle} {pointsValue} {pointsVerbiage}\n");
}
return pointsSb.ToString();
as you see, each run in the loop I get two results from the regex search and as a result I tell the for loop to increment by two to avoid collision.
However I don't seem to have this ability within Kotlin, I know how to get the index in a for loop but no idea on how to tell it to skip by 2 so I don't accidentally get something I already parsed on the last loop lap.
how would I tell the for loop to work the way I need it to in Kotlin?
You might be looking for chunked which lets you split an iterable into chunks of e.g. 2 elements:
ptsListResults.chunked(2).forEach { data -> // data is a list of (up to) two elements
val pointsTitle = data[0].groups[1]!!.value
val pointsValue = data[1].groups[1]!!.value
// etc
}
so that's more explicit about breaking your list up into meaningful chunks, and operating within the structure of those chunks, rather that manipulating indices.
There's also windowed which is a bit more complex and gives you more options, one of which is disallowing partial windows (i.e. chunks at the end that don't have the required number of elements). Probably doesn't apply here but just so's you know!
I found a solution that looks to work and thought I'd share.
thanks to this SO answer I see how you can skip over the indexes.
val pointsListSearch = "<td.*?>(.*?)</td>".toRegex()
val pointsListSearchResults = pointsListSearch.findAll(htmlBody)
val pointsSb = StringBuilder()
val ptsListResults = pointsListSearchResults.toList()
for (i in ptsListResults.indices step 2)
{
val pointsTitle = ptsListResults[i].groups[1]!!.value
val pointsValue = ptsListResults[i+1].groups[1]!!.value
pointsSb.append("${pointsTitle}: ${pointsValue}")
}

Reading a CSV file with 50M lines, how to improve performance

I have a data file in CSV (Comma-Separated-Value) format that has about 50 million lines in it.
Each line is read into a string, parsed, and then used to fill in the fields of an object of type FOO. The object then gets added to a List(of FOO) that ultimately has 50 million items.
That all works, and fits in memory (at least on an x64 machine), but its SLOW. It takes like 5 minutes every time load and parse the file into the list. I would like to make it faster. How can I make it faster?
The important parts of the code are shown below.
Public Sub LoadCsvFile(ByVal FilePath As String)
Dim s As IO.StreamReader = My.Computer.FileSystem.OpenTextFileReader(FilePath)
'Find header line
Dim L As String
While Not s.EndOfStream
L = s.ReadLine()
If L = "" Then Continue While 'discard blank line
Exit While
End While
'Parse data lines
While Not s.EndOfStream
L = s.ReadLine()
If L = "" Then Continue While 'discard blank line
Dim T As FOO = FOO.FromCSV(L)
Add(T)
End While
s.Close()
End Sub
Public Class FOO
Public time As Date
Public ID As UInt64
Public A As Double
Public B As Double
Public C As Double
Public Shared Function FromCSV(ByVal X As String) As FOO
Dim T As New FOO
Dim tokens As String() = X.Split(",")
If Not DateTime.TryParse(tokens(0), T.time) Then
Throw New Exception("Could not convert CSV to FOO: Invalid ISO 8601 timestamp")
End If
If Not UInt64.TryParse(tokens(1), T.ID) Then
Throw New Exception("Could not convert CSV to FOO: Invalid ID")
End If
If Not Double.TryParse(tokens(2), T.A) Then
Throw New Exception("Could not convert CSV to FOO: Invalid Format for A")
End If
If Not Double.TryParse(tokens(3), T.B) Then
Throw New Exception("Could not convert CSV to FOO: Invalid Format for B")
End If
If Not Double.TryParse(tokens(4), T.C) Then
Throw New Exception("Could not convert CSV to FOO: Invalid Format for C")
End If
Return T
End Function
End Class
I did some benchmarking and here are the results.
The complete algorithm above took 314 seconds to load the whole file and put the objects into the list.
With the body of FromCSV() reduced to just returning a new object of type FOO with default field values, the whole process took 84 seconds. Therefore it appears that processing the line of text into the object fields is taking 230 seconds (73% of the total time).
Doing everything but parsing the ISO 8601 date string takes 175 seconds. Therefore it appears that processing the date string takes 139 seconds, which is 60% of the text processing time, just for that one field.
Just reading the lines in the file without any processing or object creating takes 41 seconds.
Using StreamReader.ReadBlock to read the whole file in chunks of about 1KB takes 24s, but its a minor improvement in the grand scheme of things and probably not worth the added complexity. In order to use TryParse I would now need to manually create the temporary strings rather than using String.Split().
At this point the only path I see is to just display status to the user every few seconds so they don't wonder if the program is frozen or something.
UPDATE
I created two new functions. One can save the dataset from memory into a binary file using System.IO.BinaryWriter. The other function can load that binary file back into memory using System.IO.BinaryReader. The binary versions were considerably faster than CSV versions, and the binary files take up much less space.
Here are the benchmark results (same dataset for all tests):
LOAD CSV: 340s
SAVE CSV: 312s
SAVE BIN: 29s
LOAD BIN: 41s
CSV FILE SIZE: 3.86GB
BIN FILE SIZE: 1.63GB
I have a lot of experience with CSV, and the bad news is that you aren't going to be able to make this a whole lot faster. CSV libraries aren't going to be of much assistance here. The difficult problem with CSV, that libraries attempt to handle, is dealing with fields that have embedded commas, or newlines, which require quoting and escaping. Your dataset doesn't have this issue, since none of the columns are strings.
As you have discovered, the bulk of the time is spent in the parse methods. Andrew Morton had a good suggestion, using TryParseExact for DateTime values can be a quite a bit faster than TryParse. My own CSV library, Sylvan.Data.Csv (which is the fastest available for .NET), uses an optimization where it parses primitive values directly out of the stream read buffer without converting to string first (only when running on .NET core), that can also speed things up a bit. However, I wouldn't expect it to be possible to cut the processing time in half while sticking with CSV.
Here is an example of using my library, Sylvan.Data.Csv to process the CSV in C#.
static List<Foo> Read(string file)
{
// estimate of the average row length based on Andrew Morton's 4GB/50m
const int AverageRowLength = 80;
var textReader = File.OpenText(file);
// specifying the DateFormat will cause TryParseExact to be used.
var csvOpts = new CsvDataReaderOptions { DateFormat = "yyyy-MM-ddTHH:mm:ss" };
var csvReader = CsvDataReader.Create(textReader, csvOpts);
// estimate number of rows to avoid growing the list.
var estimatedRows = (int)(textReader.BaseStream.Length / AverageRowLength);
var data = new List<Foo>(estimatedRows);
while (csvReader.Read())
{
if (csvReader.RowFieldCount < 5) continue;
var item = new Foo()
{
time = csvReader.GetDateTime(0),
ID = csvReader.GetInt64(1),
A = csvReader.GetDouble(2),
B = csvReader.GetDouble(3),
C = csvReader.GetDouble(4)
};
data.Add(item);
}
return data;
}
I'd expect this to be somewhat faster than your current implementation, so long as you are running on .NET core. Running on .NET framework the difference, if any, wouldn't be a significant. However, I don't expect this to be acceptably fast for your users, it will still likely take tens of seconds, or minutes to read the whole file.
Given that, my advice would be to abandon CSV altogether, which means you can abandon parsing which is what is slowing things down. Instead, read and write the data in binary form. Your data records have a nice property, in that they are fixed width: each record contains 5 fields that are 8 bytes (64bit) wide, so each record requires exactly 40 bytes in binary form. 50m x 40 = 2GB. So, assuming Andrew Morton's estimate of 4GB for the CSV is correct, moving to binary will halve the storage needs. Immediately, that means there is half as much disk IO needed to read the same data. But beyond that, you won't need to parse anything, the binary representation of the value will essentially be copied directly to memory.
Here are some examples of how to do this in C# (don't know VB very well, sorry).
static List<Foo> Read(string file)
{
var stream = File.OpenRead(file);
// the exact number of records can be determined by looking at the length of the file.
var recordCount = stream.Length / 40;
var data = new List<Foo>(recordCount);
var br = new BinaryReader(stream);
for (int i = 0; i < recordCount; i++)
{
var ticks = br.ReadInt64();
var id = br.ReadInt64();
var a = br.ReadDouble();
var b = br.ReadDouble();
var c = br.ReadDouble();
var f = new Foo()
{
time = new DateTime(ticks),
ID = id,
A = a,
B = b,
C = c,
};
data.Add(f);
}
return data;
}
static void Write(List<Foo> data, string file)
{
var stream = File.Create(file);
var bw = new BinaryWriter(stream);
foreach(var item in data)
{
bw.Write(item.time.Ticks);
bw.Write(item.ID);
bw.Write(item.A);
bw.Write(item.B);
bw.Write(item.C);
}
}
This should almost certainly be an order of magnitude faster than a CSV-based solution. The question then becomes: is there some reason that you must use CSV? If the source of the data is out of your control, and you must use CSV, I would then ask: will the data file change every time, or will it only be appended to with new data? If it is appended to, I would investigate a solution where each time the app starts convert only the new section of appended CSV data and add it to a binary file that you will then load everything from. Then you only have to pay the cost of processing the new CSV data each time, and will load everything quickly from the binary form.
This could be made even faster by creating fixed layout struct (Foo), allocating an array of them, and using span-based trickery to read the array data directly from the FileStream. This can be done because all of your data elements are "blittable". This would be the absolute fastest way to load this data into your program. Start with the BinaryReader/Writer and if you find that still isn't fast enough, then investigate this.
If you find this solution to work, I'd love to hear the results.

Segmentation fault in file I/O, cannot figure out

Basically, I have rewritten code that kept giving me a segmentation fault (core dump) error when running, and I decided to check each step to rule out issues.
My code works, until I try accessing/using the last line of the input files data. I do the same things to this line as to the line previous, but it's suggesting somethings wrong.
Here is my code for the file I/O and data handling:
The input file itself is simply:
20 20
10 10 u
5 5 d
In line 27, you dereference the uninitialised playerDirInput, which is undefined behaviour:
playerDir = (char)playerDirInput[0];
That's probably the cause of your crash. If that code block is meant to mirror the following one, it looks like you just haven't read the third item on that line, which is where playerDirInput is probably meant to come from. That would be something like:
fgets(line, 8, f);
playerRowChar = strtok(line, " ");
playerRow = atoi(playerRowChar);
playerColChar = strtok(NULL, " "); // <- fixed this, see below.
playerCol = atoi(playerColChar);
playerDirInput = strtok(NULL, " "); // <- add this.
playerDir = (char)playerDirInput[0];
However, I would suggest you instead opt for the simpler sscanf version, which would go something like (including a check to ensure you get the three items):
fgets(line, 8, f);
if (sscanf(line, "%d %d %c", &playerRow, &playerCol, &playerDir) != 3) {
handleErrorIntelligently();
}
I tend to prefer fgets followed by sscanf, rather than fscanf. The latter can fail in such a way that you're not sure where the input stream pointer ends up as. With fgets, you always know you've read a line (or can easily detect that you read a partial line and adjust for it).
Other potential problems you should look at:
On line 25, this strtok should be of the NULL type, not line. The latter will simply re-read the first item on that line wheras you want the next item.
You really should check functions that can return problematic values (such as NULL from strtok). Otherwise, using them can cause issues. That depends on the data you're reading, of course, so may not necessarily be a problem if you control that.

storage automatically get cleared in codename one

Hi I am writing an arraylist in storage using
private void addItemToRecentListStorage1(Hashtable h){
Storage s1 = Storage.getInstance();
ArrayList<Hashtable> a = (ArrayList<Hashtable>)s1.readObject("RecentItems");
...
...
a.add(0,h);//adding on top
s1.writeObject("RecentItems", a);
}
If I inspect s1 immediately after adding 1st element, it shows me appropriately in storage hierarchy.
But at a time of adding 2nd element(HashTable) it clear the 1st stored hashtable values, though it shows as a blank element.
Means, I am getting 1 element(HashTable) in arraylist by readObject() but that hashtable's all 4 elements are wiped out. This was working earlier but now its wiping out HashTable data from arraylist.
So eachtime I am getting number of element incremented by 1 when I add element. But all previous hashtables' values are cleared.
Same thing is happening in emulator as well as device.
Check out if you have an exception in the console that might have been triggered by serialization failing for some of the objects within the array.

ngui dynamic text advice (from a noob)

Sorry in advance, this is an extremely noobie question (but i'm just getting into NGUI with unityscript and can't find many answers/tutorials/docs).. Also my untiyscript skills are sub-par.
I have a TCG/Playing card game object with some basic RPG stats (strength, dexterity) that currently display on the card in GUIlabel and trying to convert this to NGUI. I'm adding a UILabel as a child to the card (which contains the stats script)
Looking for some advice on going about this, the only way I've even remotely gotten something to display correctly is, unfortunately I have to attach the stats script to the label too:
var strLbl : UILabel;
function Start() {
var strLbl = GetComponent(UILabel);
}
function OnGUI() {
strLbl.text = strength.ToString();
}
This is throwing numberous 'nullreferenceexception: object reference not set to an instance of an object (for the stats script)
Do I need to make a separate label for each stat or is there a way
to aggregate it into one label? (seems when I try to add strength
,then dexterity it overrides it)
is OnGUI the correct course for NGUI or is there a more efficient
function?
Is this script attached to the object that the UILabel is on? You should do a check for
if(strLbl != null)
strLbl.text = strength.ToString();
You could aggregate them into one label (though if individual stats update I would advise against it), assuming you want each stat on a newline then your next would be: strLbl.text += "\n" + dexterity.ToString()
No need to use OnGUI with NGUI. Especially not for setting things. You probably want to do this whole stage in Start() and have another method called for updating the label.