Flutter how to turn Lists encoded as Strings for a SQFL database back to Lists concisely? - sql

I fear I'm trying to reinvent the wheel here. I'm putting Objects into my SQFL database:
https://pub.dev/packages/sqflite
some of the object fields are Lists of ints others are Lists of Strings. I'm encoding these as plain Strings to place in a TEXT field in my SQFL database.
At some point I'm going to have to turn them back, I couldn't find anything on Google, which is surprising because this must be a very common occurrence with SQFL
I've started coding the 'decoding', but it's rookie dart. Is there anything performant around I ought to use?
Code included to prove I'm not totally lazy, no need to look, edge cases make it fail.
List<int> listOfInts = new List<int>();
String testStringOfInts = "[1,2,4]";
List<String> intermediateStep2 = testStringOfInts.split(',');
int numListElements = intermediateStep2.length;
print("intermediateStep2: $intermediateStep2, numListElements: $numListElements");
for (int i = 0; i < numListElements; i++) {
if (i == 0) {
listOfInts.add(int.parse(intermediateStep2[i].substring(1)));
continue;
}
else if ((i) == (numListElements - 1)) {
print('final element: ${intermediateStep2[i]}');
listOfInts.add(int.parse(intermediateStep2[i].substring(0, intermediateStep2[i].length - 1)));
continue;
}
else listOfInts.add(int.parse(intermediateStep2[i]));
}
print('Output: $listOfInts');
/* DECODING LISTS OF STRINGS */
String testString = "['element1','element2','element23']";
List<String> intermediateStep = testString.split("'");
List<String> output = new List<String>();
for (int i = 0; i < intermediateStep.length; i++) {
if (i % 2 == 0) {
continue;
} else {
print('adding a value to output: ${intermediateStep[i]}');
//print('value is a: ${(intermediateStep[i]).runtimeType}');
output.add(intermediateStep[i]);
}
}
print('Output: $output');
}

For the integers your could make the parsing like:
void main() {
print(parseStringAsIntList("[1,2,4]")); // [1, 2, 4]
}
List<int> parseStringAsIntList(String stringOfInts) => stringOfInts
.substring(1, stringOfInts.length - 1)
.split(',')
.map(int.parse)
.toList();
I need more information about how the Strings are saved in some corner cases like if they contain , and/or ' since this will change how the parsing should be done. But if both characters are valid in the string (especially ,) I will recommend you to change the storage format into JSON instead which makes it a lot easier to encode/decode and without the risk of using characters which can give you issues).
But a rather naive solution can be made like this if we know each String does not contain ,:
void main() {
print(parseStringAsStringList("['element1','element2','element23']"));
// [element1, element2, element23]
}
List<String> parseStringAsStringList(String stringOfStrings) => stringOfStrings
.substring(1, stringOfStrings.length - 1)
.split(',')
.map((string) => string.substring(1, string.length - 1))
.toList();

Related

Slice() nested for loop values i and j Kotlin

I'm wanting to slice a range which I can do in Javascfript but am struggling in kotlin.
my current code is:
internal class blah {
fun longestPalindrome(s: String): String {
var longestP = ""
for (i in 0..s.length) {
for (j in 1..s.length) {
var subS = s.slice(i, j)
if (subS === subS.split("").reversed().joinToString("") && subS.length > longestP.length) {
longestP = subS
}
}
}
return longestP
}
and the error I get is:
Type mismatch.
Required:
IntRange
Found:
Int
Is there a way around this keeping most of the code I have?
As the error message says, slice wants an IntRange, not two Ints. So, pass it a range:
var subS = s.slice(i..j)
By the way, there are some bugs in your code:
You need to iterate up to the length minus 1 since the range starts at 0. But the easier way is to grab the indices range directly: for (i in s.indices)
I assume j should be i or bigger, not 1 or bigger, or you'll be checking some inverted Strings redundantly. It should look like for (j in i until s.length).
You need to use == instead of ===. The second operator is for referential equality, which will always be false for two computed Strings, even if they are identical.
I know this is probably just practice, but even with the above fixes, this code will fail if the String contains any multi-code-unit code points or any grapheme clusters. The proper way to do this would be by turning the String into a list of grapheme clusters and then performing the algorithm, but this is fairly complicated and should probably rely on some String processing code library.
class Solution {
fun longestPalindrome(s: String): String {
var longestPal = ""
for (i in 0 until s.length) {
for (j in i + 1..s.length) {
val substring = s.substring(i, j)
if (substring == substring.reversed() && substring.length > longestPal.length) {
longestPal = substring
}
}
}
return longestPal
}
}
This code is now functioning but unfortunately is not optimized enough to get through all test cases.

postings nextPosition returns nul, freq returns 0, payload() returns null

I made simplest index with one document using LuceneTestCase. My goal is to write numbers to payload for each position of each term, that will be used in custom scoring formula implemented in custom Query/Scorer.
I used SimpleTextCodec and checked, that freq, positions and payload was really written to index.
But when I'm reading freq from the PostingEnum it returns 0, payload() returns null, nextPosition() throws an exception:
java.lang.AssertionError: got line=field model
at __randomizedtesting.SeedInfo.seed([D334C9D1B5C155E3:2AAE4BE5481F4C8F]:0)
at
org.apache.lucene.codecs.simpletext.SimpleTextFieldsReader$SimpleTextPostings Enum.nextPosition(SimpleTextFieldsReader.java:455)
Here is how I'm reading the postings in the custom Query:
for (String field: fieldScores.keySet()) {
final Terms fieldTerms = reader.terms(field);
if (fieldTerms == null) {
continue;
}
if (!fieldTerms.hasPositions())
throw new IllegalStateException("Index does not contain positions");
if (!fieldTerms.hasPayloads())
throw new IllegalStateException("Index does not contain payloads");
final TermsEnum te = fieldTerms.iterator();
for (int j = 0; j < terms.length; j++) {
final Term t = terms[j];
if (t.field().equals(field) && te.seekExact(t.bytes())) {
PostingsEnum postingsEnum = te.postings(null, PostingsEnum.ALL);
int pos = postingsEnum.nextPosition();
BytesRef payload = postingsEnum.getPayload();
// assert payload.bytesEquals(new BytesRef(new byte[]{1}));
// TODO: use payload in scoring formula
fldScorers.add(new ConstTermScorer(this, t,
fieldScores.get(field) * termScores.get(t.text()),
postingsEnum));
}
}
}
I've found the reason. nextPosition(), freq() and payload() return 0 (or null) values because postingsEnum (iterator) is just created and not positioned on concrete document yet. postingsEnum.nextDoc() wasn't called and postingsEnum.docID() is -1. Stupid situation, but it would be better may be if nextPosition(), freq() and payload() would check postingsEnum.docID.

Sage: Iterate over increasing sequences

I have a problem that I am unwilling to believe hasn't been solved before in Sage.
Given a pair of integers (d,n) as input, I'd like to receive a list (or set, or whatever) of all nondecreasing sequences of length d all of whose entries are no greater than n.
Similarly, I'd like another function which returns all strictly increasing sequences of length d whose entries are no greater than n.
For example, for d = 2 n=3, I'd receive the output:
[[1,2], [1,3], [2,3]]
or
[[1,1], [1,2], [1,3], [2,2], [2,3], [3,3]]
depending on whether I'm using increasing or nondecreasing.
Does anyone know of such a function?
Edit Of course, if there is such a method for nonincreasing or decreasing sequences, I can modify that to fit my purposes. Just something to iterate over sequences
I needed this algorithm too and I finally managed to write one today. I will share the code here, but I only started to learn coding last week, so it is not pretty.
Idea Input=(r,d). Step 1) Create a class "ListAndPosition" that has a list L of arrays Integer[r+1]'s, and an integer q between 0 and r. Step 2) Create a method that receives a ListAndPosition (L,q) and screens sequentially the arrays in L checking if the integer at position q is less than the one at position q+1, if so, it adds a new array at the bottom of the list with that entry ++. When done, the Method calls itself again with the new list and q-1 as input.
The code for Step 1)
import java.util.ArrayList;
public class ListAndPosition {
public static Integer r=5;
public final ArrayList<Integer[]> L;
public int q;
public ListAndPosition(ArrayList<Integer[]> L, int q) {
this.L = L;
this.q = q;
}
public ArrayList<Integer[]> getList(){
return L;
}
public int getPosition() {
return q;
}
public void decreasePosition() {
q--;
}
public void showList() {
for(int i=0;i<L.size();i++){
for(int j=0; j<r+1 ; j++){
System.out.print(""+L.get(i)[j]);
}
System.out.println("");
}
}
}
The code for Step 2)
import java.util.ArrayList;
public class NonDecreasingSeqs {
public static Integer r=5;
public static Integer d=3;
public static void main(String[] args) {
//Creating the first array
Integer[] firstArray;
firstArray = new Integer[r+1];
for(int i=0;i<r;i++){
firstArray[i] = 0;
}
firstArray[r] = d;
//Creating the starting listAndDim
ArrayList<Integer[]> L = new ArrayList<Integer[]>();
L.add(firstArray);
ListAndPosition Lq = new ListAndPosition(L,r-1);
System.out.println(""+nonDecSeqs(Lq).size());
}
public static ArrayList<Integer[]> nonDecSeqs(ListAndPosition Lq){
int iterations = r-1-Lq.getPosition();
System.out.println("How many arrays in the list after "+iterations+" iterations? "+Lq.getList().size());
System.out.print("Should we stop the iteration?");
if(0<Lq.getPosition()){
System.out.println(" No, position = "+Lq.getPosition());
for(int i=0;i<Lq.getList().size();i++){
//Showing particular array
System.out.println("Array of L #"+i+":");
for(int j=0;j<r+1;j++){
System.out.print(""+Lq.getList().get(i)[j]);
}
System.out.print("\nCan it be modified at position "+Lq.getPosition()+"?");
if(Lq.getList().get(i)[Lq.getPosition()]<Lq.getList().get(i)[Lq.getPosition()+1]){
System.out.println(" Yes, "+Lq.getList().get(i)[Lq.getPosition()]+"<"+Lq.getList().get(i)[Lq.getPosition()+1]);
{
Integer[] tempArray = new Integer[r+1];
for(int j=0;j<r+1;j++){
if(j==Lq.getPosition()){
tempArray[j] = new Integer(Lq.getList().get(i)[j])+1;
}
else{
tempArray[j] = new Integer(Lq.getList().get(i)[j]);
}
}
Lq.getList().add(tempArray);
}
System.out.println("New list");Lq.showList();
}
else{
System.out.println(" No, "+Lq.getList().get(i)[Lq.getPosition()]+"="+Lq.getList().get(i)[Lq.getPosition()+1]);
}
}
System.out.print("Old position = "+Lq.getPosition());
Lq.decreasePosition();
System.out.println(", new position = "+Lq.getPosition());
nonDecSeqs(Lq);
}
else{
System.out.println(" Yes, position = "+Lq.getPosition());
}
return Lq.getList();
}
}
Remark: I needed my sequences to start at 0 and end at d.
This is probably not a very good answer to your question. But you could, in principle, use Partitions and the max_slope=-1 argument. Messing around with filtering lists of IntegerVectors sounds equally inefficient and depressing for other reasons.
If this has a canonical name, it might be in the list of sage-combinat functionality, and there is even a base class you could perhaps use for integer lists, which is basically what you are asking about. Maybe you could actually get what you want using IntegerListsLex? Hope this proves helpful.
This question can be solved by using the class "UnorderedTuples" described here:
http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/tuple.html
To return all all nondecreasing sequences with entries between 0 and n-1 of length d, you may type:
UnorderedTuples(range(n),d)
This returns the nondecreasing sequence as a list. I needed an immutable object (because the sequences would become keys of a dictionary). So I used the "tuple" method to turn the lists into tuples:
immutables = []
for s in UnorderedTuples(range(n),d):
immutables.append(tuple(s))
return immutables
And I also wrote a method which picks out only the increasing sequences:
def isIncreasing(list):
for i in range(len(list) - 1):
if list[i] >= list[i+1]:
return false
return true
The method that returns only strictly increasing sequences would look like
immutables = []
for s in UnorderedTuples(range(n),d):
if isIncreasing(s):
immutables.append(tuple(s))
return immutables

iTextSharp can't read numbers in this PDF

I'm reading PDF by iTextSharp-5.5.7.0, PdfTextExtractor.GetTextFromPage() works well in most of files until this: sample PDF
I can't read any number from it, for example: only return 'ANEU' from 'A0NE8U', they are fine in Adobe Reader to copy out. Code is here:
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString();
}
}
The font in question has a ToUnicode map which is used for text extraction. Unfortunately, though, iText(Sharp) reads it only partially, and digits are located after the mappings read.
In detail:
The cause for the issue is the implementation of AbstractCMap.addRange (I'm showing the iText Java code as iText also has this issue and I'm more into the Java version):
void addRange(PdfString from, PdfString to, PdfObject code) {
byte[] a1 = decodeStringToByte(from);
byte[] a2 = decodeStringToByte(to);
if (a1.length != a2.length || a1.length == 0)
throw new IllegalArgumentException("Invalid map.");
byte[] sout = null;
if (code instanceof PdfString)
sout = decodeStringToByte((PdfString)code);
int start = a1[a1.length - 1] & 0xff;
int end = a2[a2.length - 1] & 0xff;
for (int k = start; k <= end; ++k) {
a1[a1.length - 1] = (byte)k;
PdfString s = new PdfString(a1);
s.setHexWriting(true);
if (code instanceof PdfArray) {
addChar(s, ((PdfArray)code).getPdfObject(k - start));
}
else if (code instanceof PdfNumber) {
int nn = ((PdfNumber)code).intValue() + k - start;
addChar(s, new PdfNumber(nn));
}
else if (code instanceof PdfString) {
PdfString s1 = new PdfString(sout);
s1.setHexWriting(true);
++sout[sout.length - 1];
addChar(s, s1);
}
}
}
The loop only considers the range in the least significant byte of from and to. Thus, for the range in question:
1 beginbfrange
<0000><01E1>[
<FFFD><FFFD><FFFD><0020><0041><0042><0043><0044>
<0045><0046><0047><0048><0049><004A><004B><004C>
...
<2248><003C><003E><2264><2265><00AC><0394><03A9>
<00B5><03C0><00B0><221E><2202><222B><221A><2211>
<220F><25CA>]
endbfrange
it only iterates from 0x00 to 0xE1, i.e. only the first 226 entries of the 482 mappings.
There actually are some peculiar restrictions in CMaps, e.g. there may only be up to 100 separate bfrange entries in the same section, and in the alternative bfrange entry syntax
n beginbfrange
srcCode1 srcCode2 dstString
endbfrange
which is handled by the same method addRange, there is the restriction
When defining ranges of this type, the value of the last byte in the string shall be less than or equal to 255 āˆ’ (srcCode2 āˆ’ srcCode1).
Probably a misunderstanding of this restriction made the developer believe, srcCode2 and srcCode1 also would merely differ in the least significant byte.
But maybe there are even more restrictions which I merely did not find...
Meanwhile (as of iText 5.5.9, tested against a development SNAPSHOT) this issue seems to have been fixed.

What is the fastest way to compare two byte arrays?

I am trying to compare two long bytearrays in VB.NET and have run into a snag. Comparing two 50 megabyte files takes almost two minutes, so I'm clearly doing something wrong. I'm on an x64 machine with tons of memory so there are no issues there. Here is the code that I'm using at the moment and would like to change.
_Bytes and item.Bytes are the two different arrays to compare and are already the same length.
For Each B In item.Bytes
If B <> _Bytes(I) Then
Mismatch = True
Exit For
End If
I += 1
Next
I need to be able to compare as fast as possible files that are potentially hundreds of megabytes and even possibly a gigabyte or two. Any suggests or algorithms that would be able to do this faster?
Item.bytes is an object taken from the database/filesystem that is returned to compare, because its byte length matches the item that the user wants to add. By comparing the two arrays I can then determine if the user has added something new to the DB and if not then I can just map them to the other file and not waste hard disk drive space.
[Update]
I converted the arrays to local variables of Byte() and then did the same comparison, same code and it ran in like one second (I have to benchmark it still and compare it to others), but if you do the same thing with local variables and use a generic array it becomes massively slower. Iā€™m not sure why, but it raises a lot more questions for me about the use of arrays.
What is the _Bytes(I) call doing? It's not loading the file each time, is it? Even with buffering, that would be bad news!
There will be plenty of ways to micro-optimise this in terms of looking at longs at a time, potentially using unsafe code etc - but I'd just concentrate on getting reasonable performance first. Clearly there's something very odd going on.
I suggest you extract the comparison code into a separate function which takes two byte arrays. That way you know you won't be doing anything odd. I'd also use a simple For loop rather than For Each in this case - it'll be simpler. Oh, and check whether the lengths are correct first :)
EDIT: Here's the code (untested, but simple enough) that I'd use. It's in C# for the minute - I'll convert it in a sec:
public static bool Equals(byte[] first, byte[] second)
{
if (first == second)
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
for (int i=0; i < first.Length; i++)
{
if (first[i] != second[i])
{
return false;
}
}
return true;
}
EDIT: And here's the VB:
Public Shared Function ArraysEqual(ByVal first As Byte(), _
ByVal second As Byte()) As Boolean
If (first Is second) Then
Return True
End If
If (first Is Nothing OrElse second Is Nothing) Then
Return False
End If
If (first.Length <> second.Length) Then
Return False
End If
For i as Integer = 0 To first.Length - 1
If (first(i) <> second(i)) Then
Return False
End If
Next i
Return True
End Function
The fastest way to compare two byte arrays of equal size is to use interop. Run the following code on a console application:
using System;
using System.Runtime.InteropServices;
using System.Security;
namespace CompareByteArray
{
class Program
{
static void Main(string[] args)
{
const int SIZE = 100000;
const int TEST_COUNT = 100;
byte[] arrayA = new byte[SIZE];
byte[] arrayB = new byte[SIZE];
for (int i = 0; i < SIZE; i++)
{
arrayA[i] = 0x22;
arrayB[i] = 0x22;
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Safe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Safe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Unsafe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Unsafe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Pure(arrayA, arrayB, SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Pure: {0}", after - before);
}
return;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint="memcmp", ExactSpelling=true)]
[SuppressUnmanagedCodeSecurity]
static extern int memcmp_1(byte[] b1, byte[] b2, UIntPtr count);
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint = "memcmp", ExactSpelling = true)]
[SuppressUnmanagedCodeSecurity]
static extern unsafe int memcmp_2(byte* b1, byte* b2, UIntPtr count);
public static int MemCmp_Safe(byte[] a, byte[] b, UIntPtr count)
{
return memcmp_1(a, b, count);
}
public unsafe static int MemCmp_Unsafe(byte[] a, byte[] b, UIntPtr count)
{
fixed(byte* p_a = a)
{
fixed (byte* p_b = b)
{
return memcmp_2(p_a, p_b, count);
}
}
}
public static int MemCmp_Pure(byte[] a, byte[] b, int count)
{
int result = 0;
for (int i = 0; i < count && result == 0; i += 1)
{
result = a[0] - b[0];
}
return result;
}
}
}
If you don't need to know the byte, use 64-bit ints that gives you 8 at once. Actually, you can figure out the wrong byte, once you've isolated it to a set of 8.
Use BinaryReader:
saveTime = binReader.ReadInt32()
Or for arrays of ints:
Dim count As Integer = binReader.Read(testArray, 0, 3)
Better approach... If you are just trying to see if the two are different then save some time by not having to go through the entire byte array and generate a hash of each byte array as strings and compare the strings. MD5 should work fine and is pretty efficient.
I see two things that might help:
First, rather than always accessing the second array as item.Bytes, use a local variable to point directly at the array. That is, before starting the loop, do something like this:
array2 = item.Bytes
That will save the overhead of dereferencing from the object each time you want a byte. That could be expensive in Visual Basic, especially if there's a Getter method on that property.
Also, use a "definite loop" instead of "for each". You already know the length of the arrays, so just code the loop using that value. This will avoid the overhead of treating the array as a collection. The loop would look something like this:
For i = 1 to max Step 1
If (array1(i) <> array2(i))
Exit For
EndIf
Next
Not strictly related to the comparison algorithm:
Are you sure your bottleneck is not related to the memory available and the time used to load the byte arrays? Loading two 2 GB byte arrays just to compare them could bring most machines to their knees. If the program design allows, try using streams to read smaller chunks instead.