iTextSharp can't read numbers in this PDF - pdf

I'm reading PDF by iTextSharp-5.5.7.0, PdfTextExtractor.GetTextFromPage() works well in most of files until this: sample PDF
I can't read any number from it, for example: only return 'ANEU' from 'A0NE8U', they are fine in Adobe Reader to copy out. Code is here:
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString();
}
}

The font in question has a ToUnicode map which is used for text extraction. Unfortunately, though, iText(Sharp) reads it only partially, and digits are located after the mappings read.
In detail:
The cause for the issue is the implementation of AbstractCMap.addRange (I'm showing the iText Java code as iText also has this issue and I'm more into the Java version):
void addRange(PdfString from, PdfString to, PdfObject code) {
byte[] a1 = decodeStringToByte(from);
byte[] a2 = decodeStringToByte(to);
if (a1.length != a2.length || a1.length == 0)
throw new IllegalArgumentException("Invalid map.");
byte[] sout = null;
if (code instanceof PdfString)
sout = decodeStringToByte((PdfString)code);
int start = a1[a1.length - 1] & 0xff;
int end = a2[a2.length - 1] & 0xff;
for (int k = start; k <= end; ++k) {
a1[a1.length - 1] = (byte)k;
PdfString s = new PdfString(a1);
s.setHexWriting(true);
if (code instanceof PdfArray) {
addChar(s, ((PdfArray)code).getPdfObject(k - start));
}
else if (code instanceof PdfNumber) {
int nn = ((PdfNumber)code).intValue() + k - start;
addChar(s, new PdfNumber(nn));
}
else if (code instanceof PdfString) {
PdfString s1 = new PdfString(sout);
s1.setHexWriting(true);
++sout[sout.length - 1];
addChar(s, s1);
}
}
}
The loop only considers the range in the least significant byte of from and to. Thus, for the range in question:
1 beginbfrange
<0000><01E1>[
<FFFD><FFFD><FFFD><0020><0041><0042><0043><0044>
<0045><0046><0047><0048><0049><004A><004B><004C>
...
<2248><003C><003E><2264><2265><00AC><0394><03A9>
<00B5><03C0><00B0><221E><2202><222B><221A><2211>
<220F><25CA>]
endbfrange
it only iterates from 0x00 to 0xE1, i.e. only the first 226 entries of the 482 mappings.
There actually are some peculiar restrictions in CMaps, e.g. there may only be up to 100 separate bfrange entries in the same section, and in the alternative bfrange entry syntax
n beginbfrange
srcCode1 srcCode2 dstString
endbfrange
which is handled by the same method addRange, there is the restriction
When defining ranges of this type, the value of the last byte in the string shall be less than or equal to 255 − (srcCode2 − srcCode1).
Probably a misunderstanding of this restriction made the developer believe, srcCode2 and srcCode1 also would merely differ in the least significant byte.
But maybe there are even more restrictions which I merely did not find...
Meanwhile (as of iText 5.5.9, tested against a development SNAPSHOT) this issue seems to have been fixed.

Related

Flutter how to turn Lists encoded as Strings for a SQFL database back to Lists concisely?

I fear I'm trying to reinvent the wheel here. I'm putting Objects into my SQFL database:
https://pub.dev/packages/sqflite
some of the object fields are Lists of ints others are Lists of Strings. I'm encoding these as plain Strings to place in a TEXT field in my SQFL database.
At some point I'm going to have to turn them back, I couldn't find anything on Google, which is surprising because this must be a very common occurrence with SQFL
I've started coding the 'decoding', but it's rookie dart. Is there anything performant around I ought to use?
Code included to prove I'm not totally lazy, no need to look, edge cases make it fail.
List<int> listOfInts = new List<int>();
String testStringOfInts = "[1,2,4]";
List<String> intermediateStep2 = testStringOfInts.split(',');
int numListElements = intermediateStep2.length;
print("intermediateStep2: $intermediateStep2, numListElements: $numListElements");
for (int i = 0; i < numListElements; i++) {
if (i == 0) {
listOfInts.add(int.parse(intermediateStep2[i].substring(1)));
continue;
}
else if ((i) == (numListElements - 1)) {
print('final element: ${intermediateStep2[i]}');
listOfInts.add(int.parse(intermediateStep2[i].substring(0, intermediateStep2[i].length - 1)));
continue;
}
else listOfInts.add(int.parse(intermediateStep2[i]));
}
print('Output: $listOfInts');
/* DECODING LISTS OF STRINGS */
String testString = "['element1','element2','element23']";
List<String> intermediateStep = testString.split("'");
List<String> output = new List<String>();
for (int i = 0; i < intermediateStep.length; i++) {
if (i % 2 == 0) {
continue;
} else {
print('adding a value to output: ${intermediateStep[i]}');
//print('value is a: ${(intermediateStep[i]).runtimeType}');
output.add(intermediateStep[i]);
}
}
print('Output: $output');
}
For the integers your could make the parsing like:
void main() {
print(parseStringAsIntList("[1,2,4]")); // [1, 2, 4]
}
List<int> parseStringAsIntList(String stringOfInts) => stringOfInts
.substring(1, stringOfInts.length - 1)
.split(',')
.map(int.parse)
.toList();
I need more information about how the Strings are saved in some corner cases like if they contain , and/or ' since this will change how the parsing should be done. But if both characters are valid in the string (especially ,) I will recommend you to change the storage format into JSON instead which makes it a lot easier to encode/decode and without the risk of using characters which can give you issues).
But a rather naive solution can be made like this if we know each String does not contain ,:
void main() {
print(parseStringAsStringList("['element1','element2','element23']"));
// [element1, element2, element23]
}
List<String> parseStringAsStringList(String stringOfStrings) => stringOfStrings
.substring(1, stringOfStrings.length - 1)
.split(',')
.map((string) => string.substring(1, string.length - 1))
.toList();

Saving randomly generated passwords to a text file in order to display them later

I'm currently in a traineeship and I currently have to softwares I'm working on. The most important was requested yesterday and I'm stucked on the failure of its main feature: saving passwords.
The application is developped in C++\CLR using Visual Studio 2013 (Couldn't install MFC libraries somehow, installation kept failing and crashing even after multiple reboots.) and aims to generate a password from a seed provided by the user. The generated password will be save onto a .txt file. If the seed has already been used then the previously generated password will show up.
Unfortunately I can't save the password and seed to the file, though I can write the seed if I don't get to the end of the document. I went for the "if line is empty then write this to the document" but it doesn't work and I can't find out why. However I can read the passwords without any problem.
Here's the interresting part of the source:
int seed;
char genRandom() {
static const char letters[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(letters) - 1;
return letters[rand() % stringLength];
}
System::Void OK_Click(System::Object^ sender, System::EventArgs^ e) {
fstream passwords;
if (!(passwords.is_open())) {
passwords.open("passwords.txt", ios::in | ios::out);
}
string gen = msclr::interop::marshal_as<std::string>(GENERATOR->Text), line, genf = gen;
bool empty_line_found = false;
while (empty_line_found == false) {
getline(passwords, line);
if (gen == line) {
getline(passwords, line);
PASSWORD->Text = msclr::interop::marshal_as<System::String^>(line);
break;
}
if (line.empty()) {
for (unsigned int i = 0; i < gen.length(); i++) {
seed += gen[i];
}
srand(seed);
string pass;
for (int i = 0; i < 10; ++i) {
pass += genRandom();
}
passwords << pass << endl << gen << "";
PASSWORD->Text = msclr::interop::marshal_as<System::String^>(pass);
empty_line_found = true;
}
}
}
I've also tried replacing ios::in by ios::app and it doesn't work. And yes I have included fstream, iostream, etc.
Thanks in advance!
[EDIT]
Just solved this problem. Thanks Rook for putting me on the right way. It feels like a silly way to do it, but I've closed the file and re-openned it using ios::app to write at the end of it. I also solved a stupid mistake resulting in writing the password before the seed and not inserting a final line so the main loop can still work. Here's the code in case someone ends up with the same problem:
int seed;
char genRandom() {
static const char letters[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(letters) - 1;
return letters[rand() % stringLength];
}
System::Void OK_Click(System::Object^ sender, System::EventArgs^ e) {
fstream passwords;
if (!(passwords.is_open())) {
passwords.open("passwords.txt", ios::in | ios::out);
}
string gen = msclr::interop::marshal_as<std::string>(GENERATOR->Text), line, genf = gen;
bool empty_line_found = false;
while (empty_line_found == false) {
getline(passwords, line);
if (gen == line) {
getline(passwords, line);
PASSWORD->Text = msclr::interop::marshal_as<System::String^>(line);
break;
}
if (line.empty()) {
passwords.close();
passwords.open("passwords.txt", ios::app);
for (unsigned int i = 0; i < gen.length(); i++) {
seed += gen[i];
}
srand(seed);
string pass;
for (int i = 0; i < 10; ++i) {
pass += genRandom();
}
passwords << gen << endl << pass << endl << "";
PASSWORD->Text = msclr::interop::marshal_as<System::String^>(pass);
empty_line_found = true;
}
}
passwords.close();
}
So, here's an interesting thing:
passwords << pass << endl << gen << "";
You're not ending that with a newline. This means the very end of your file could be missing a newline too. This has an interesting effect when you do this on the final line:
getline(passwords, line);
getline will read until it sees a line ending, or an EOF. If there's no newline, it'll hit that EOF and then set the EOF bit on the stream. That means the next time you try to do this:
passwords << pass << endl << gen << "";
the stream will refuse to write anything, because it is in an eof state. There are various things you can do here, but the simplest would be to do passwords.clear() to remove any error flags like eof. I'd be very cautious about accidentally clearing genuine error flags though; read the docs for fstream carefully.
I also reiterate my comment about C++/CLR being a glue language, and not a great language for general purpose development, which would be best done using C++ or a .net language, such as C#. If you're absolutely wedded to C++/CLR for some reason, you may as well make use of the extensive .net library so you don't have to pointlessly martial managed types back and forth. See System::IO::FileStream for example.

How to set the line space between two chunks in itextsharp

I am creating a PDF using iTextSharp. This is a reporting tool. Everything is working fine, only the space between two chunks is slighly greater that what I want. I tried to find some help on StackOverflow and got to know SetLeading(fixed, multiplied); but it is not coming with chunk in case.
The reason I need it in chunk is that I have multiple chunks which I am adding into paragraph proceeding to which adding all into Document at a single shot.
public static void createPDF(Paragraph para)
{
string imagepath = "12.pdf";
Document doc = new Document();
try
{
Paragraph p = para;
Rectangle[] COLUMNS = {
new Rectangle(36, 36, 290, 806),
new Rectangle(305, 36, 559, 806)
};
//This is what i have tried
// p.SetLeading(0.4f,0.8f);
p.SpacingBefore = 0.0f;
p.SpacingAfter = 0.1f;
PdfReader inputPdf = new PdfReader(#"");
PdfWriter writer2 = PdfWriter.GetInstance(doc, new FileStream(imagepath, FileMode.Create));
doc.Open();
PdfContentByte canvas = writer2.DirectContent;
for (int ij = 1; ij <= 3; ij++)
{
doc.SetPageSize(inputPdf.GetPageSizeWithRotation(ij));
doc.NewPage();
PdfImportedPage page = writer2.GetImportedPage(inputPdf, ij);
int rotation = inputPdf.GetPageRotation(ij);
if (rotation == 90 || rotation == 270)
{
canvas.AddTemplate(page, 0, -1f, 1f, 0, 0, inputPdf.GetPageSizeWithRotation(ij).Height);
}
else
{
canvas.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
doc.NewPage();
ColumnText ct = new ColumnText(canvas);
int side_of_the_page = 0;
ct.SetSimpleColumn(COLUMNS[side_of_the_page]);
int paragraphs = 0;
int i = 0;
while (paragraphs < p.Count-1)
{
string TEXT = p[i].ToString();
ct.AddElement(p[i]);
while (ColumnText.HasMoreText(ct.Go()))
{
if (side_of_the_page == 0)
{
side_of_the_page = 1;
canvas.MoveTo(297.5f, 36);
canvas.LineTo(297.5f, 806);
canvas.Stroke();
}
else
{
side_of_the_page = 0;
doc.NewPage();
}
ct.SetSimpleColumn(COLUMNS[side_of_the_page]);
}
i++;
paragraphs++;
}
doc.Close();
}
catch {
}
}
Please read chapter 2 of my book. The Chunk object is called the atomic building block among iText's high-level objects. By design, you cannot define a leading on the level of a Chunk.
I quote from page 23:
A Chunk isn't aware of the space that is needed between two lines.
The leading is defined at the level of a Phrase (and, of course, its superclasses, such as Paragraph). If you want to change the spacing between Chunk objects, you need to wrap Chunks in Phrases or Paragraphs (as you already indicate) and define the leading for those phrases or paragraphs.
Note that the documentation also states:
In normal circumstances you'll use Chunk objects to compose other text objects, such as Phrases and Paragraphs. Typically, you won't add Chunk objects directly to a Document.
Which special circumstance do you have that requires making an exception to this rule?
Extra remarks
You are importing an existing PDF in a way that throws away all existing interactivity. This is suboptimal.
You first compose a paragraph p, you set the leading for p, then you decompose p throwing away the leading you've defined and then you complain that there's no leading.
This is what you are doing wrong:
while (paragraphs < p.Count-1)
{
ct.AddElement(p[i]);
...
}
The object p knows its leading; the separate components of this object (p[0], p[1],...), don't know anything about the leading.
Hence you should do something like this:
ColumnText ct = new ColumnText(canvas);
int side_of_the_page = 0;
ct.SetSimpleColumn(COLUMNS[side_of_the_page]);
ct.AddElement(p);
while (ColumnText.HasMoreText(ct.Go()))
{
if (side_of_the_page == 0)
{
side_of_the_page = 1;
canvas.MoveTo(297.5f, 36);
canvas.LineTo(297.5f, 806);
canvas.Stroke();
}
else
{
side_of_the_page = 0;
doc.NewPage();
}
ct.SetSimpleColumn(COLUMNS[side_of_the_page]);
}
As you have defined the leading at the level of the p object, you must add the p object as an element to the ColumnText.
Regarding the wrong way you're copying the original document: The AddLongTable example shows how to do it correctly. You get a PdfReader object for the existing document. You create a PdfStamper to create a new document. You get the number of pages in the existing document, and then you use insertPage() as many time as needed to add extra content.

Progress 10.1C 4GL Encode Function

Does anyone know which algorithm Progress 10.1C uses in the Encode Function?
I found this: http://knowledgebase.progress.com/articles/Article/21685
The Progress 4GL ENCODE function uses a CRC-16 algorithm to generate its encoded output.
Progress 4GL:
ENCODE("Test").
gives as output "LkwidblanjsipkJC"
But for example on http://www.nitrxgen.net/hashgen/ with Password "Test", I never get the Result as from Progress..
Any Ideas? :)
I've made the algorithm available on https://github.com/pvginkel/ProgressEncode.
I needed this function in Java. So I ported Pieter's C# code (https://github.com/pvginkel/ProgressEncode) to Java. At least all test cases passed. Enjoy! :)
public class ProgressEncode {
static int[] table = { 0x0000, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0,
0x0280, 0xC241, 0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1,
0xC481, 0x0440, 0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1,
0xCE81, 0x0E40, 0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0,
0x0880, 0xC841, 0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1,
0xDA81, 0x1A40, 0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0,
0x1C80, 0xDC41, 0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0,
0x1680, 0xD641, 0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1,
0xD081, 0x1040, 0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1,
0xF281, 0x3240, 0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0,
0x3480, 0xF441, 0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0,
0x3E80, 0xFE41, 0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1,
0xF881, 0x3840, 0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0,
0x2A80, 0xEA41, 0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1,
0xEC81, 0x2C40, 0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1,
0xE681, 0x2640, 0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0,
0x2080, 0xE041, 0xA001, 0x60C0, 0x6180, 0xA141, 0x6300, 0xA3C1,
0xA281, 0x6240, 0x6600, 0xA6C1, 0xA781, 0x6740, 0xA501, 0x65C0,
0x6480, 0xA441, 0x6C00, 0xACC1, 0xAD81, 0x6D40, 0xAF01, 0x6FC0,
0x6E80, 0xAE41, 0xAA01, 0x6AC0, 0x6B80, 0xAB41, 0x6900, 0xA9C1,
0xA881, 0x6840, 0x7800, 0xB8C1, 0xB981, 0x7940, 0xBB01, 0x7BC0,
0x7A80, 0xBA41, 0xBE01, 0x7EC0, 0x7F80, 0xBF41, 0x7D00, 0xBDC1,
0xBC81, 0x7C40, 0xB401, 0x74C0, 0x7580, 0xB541, 0x7700, 0xB7C1,
0xB681, 0x7640, 0x7200, 0xB2C1, 0xB381, 0x7340, 0xB101, 0x71C0,
0x7080, 0xB041, 0x5000, 0x90C1, 0x9181, 0x5140, 0x9301, 0x53C0,
0x5280, 0x9241, 0x9601, 0x56C0, 0x5780, 0x9741, 0x5500, 0x95C1,
0x9481, 0x5440, 0x9C01, 0x5CC0, 0x5D80, 0x9D41, 0x5F00, 0x9FC1,
0x9E81, 0x5E40, 0x5A00, 0x9AC1, 0x9B81, 0x5B40, 0x9901, 0x59C0,
0x5880, 0x9841, 0x8801, 0x48C0, 0x4980, 0x8941, 0x4B00, 0x8BC1,
0x8A81, 0x4A40, 0x4E00, 0x8EC1, 0x8F81, 0x4F40, 0x8D01, 0x4DC0,
0x4C80, 0x8C41, 0x4400, 0x84C1, 0x8581, 0x4540, 0x8701, 0x47C0,
0x4680, 0x8641, 0x8201, 0x42C0, 0x4380, 0x8341, 0x4100, 0x81C1,
0x8081, 0x4040 };
public static byte[] Encode(byte[] input) {
if (input == null)
return null;
byte[] scratch = new byte[16];
int hash = 17;
for (int i = 0; i < 5; i++) {
for (int j = 0; j < input.length; j++)
scratch[15 - (j % 16)] ^= input[j];
for (int j = 0; j < 16; j += 2) {
hash = Hash(scratch, hash);
scratch[j] = (byte) (hash & 0xFF);
scratch[j + 1] = (byte) ((hash >>> 8) & 0xFF);
}
}
byte[] target = new byte[16];
for (int i = 0; i < 16; i++) {
byte lower = (byte) (scratch[i] & 0x7F);
if ((lower >= 'A' && lower <= 'Z') || (lower >= 'a' && lower <= 'z'))
target[i] = lower;
else
target[i] = (byte) (((scratch[i] >>> 4 & 0xF) + 0x61) & 0xFF);
}
return target;
}
private static int Hash(byte[] scratch, int hash) {
for (int i = 15; i >= 0; i--)
hash = ((hash >>> 8) & 0xFF ^ table[hash & 0xFF] ^ table[scratch[i] & 0xFF]) & 0xFFFF;
return hash;
}
}
There are several implementations of CRC-16. Progress Software (deliberately) does not document which variant is used.
For what purpose are you looking for this?
Rather than trying to use "encode" I'd recommend studying OE's cryptography functionality. I'm not sure what 10.1C supports, the 11.0 docs I have says OE supports:
• DES — Data Encryption Standard
• DES3 — Triple DES
• AES — Advanced Encryption Standard
• RC4 — Also known as ARC4
The OE PDF docs are available here:
http://communities.progress.com/pcom/docs/DOC-16074
The way how the ENCODE function only works one way. Progress has never disclosed the algorithm behind it. Plus they have never built in a function to decode.
As with OE 10.0B Progress has implemented cryptography within the ABL. Have a look at the ENCRYPT and DECRYPT function.

What is the fastest way to compare two byte arrays?

I am trying to compare two long bytearrays in VB.NET and have run into a snag. Comparing two 50 megabyte files takes almost two minutes, so I'm clearly doing something wrong. I'm on an x64 machine with tons of memory so there are no issues there. Here is the code that I'm using at the moment and would like to change.
_Bytes and item.Bytes are the two different arrays to compare and are already the same length.
For Each B In item.Bytes
If B <> _Bytes(I) Then
Mismatch = True
Exit For
End If
I += 1
Next
I need to be able to compare as fast as possible files that are potentially hundreds of megabytes and even possibly a gigabyte or two. Any suggests or algorithms that would be able to do this faster?
Item.bytes is an object taken from the database/filesystem that is returned to compare, because its byte length matches the item that the user wants to add. By comparing the two arrays I can then determine if the user has added something new to the DB and if not then I can just map them to the other file and not waste hard disk drive space.
[Update]
I converted the arrays to local variables of Byte() and then did the same comparison, same code and it ran in like one second (I have to benchmark it still and compare it to others), but if you do the same thing with local variables and use a generic array it becomes massively slower. I’m not sure why, but it raises a lot more questions for me about the use of arrays.
What is the _Bytes(I) call doing? It's not loading the file each time, is it? Even with buffering, that would be bad news!
There will be plenty of ways to micro-optimise this in terms of looking at longs at a time, potentially using unsafe code etc - but I'd just concentrate on getting reasonable performance first. Clearly there's something very odd going on.
I suggest you extract the comparison code into a separate function which takes two byte arrays. That way you know you won't be doing anything odd. I'd also use a simple For loop rather than For Each in this case - it'll be simpler. Oh, and check whether the lengths are correct first :)
EDIT: Here's the code (untested, but simple enough) that I'd use. It's in C# for the minute - I'll convert it in a sec:
public static bool Equals(byte[] first, byte[] second)
{
if (first == second)
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
for (int i=0; i < first.Length; i++)
{
if (first[i] != second[i])
{
return false;
}
}
return true;
}
EDIT: And here's the VB:
Public Shared Function ArraysEqual(ByVal first As Byte(), _
ByVal second As Byte()) As Boolean
If (first Is second) Then
Return True
End If
If (first Is Nothing OrElse second Is Nothing) Then
Return False
End If
If (first.Length <> second.Length) Then
Return False
End If
For i as Integer = 0 To first.Length - 1
If (first(i) <> second(i)) Then
Return False
End If
Next i
Return True
End Function
The fastest way to compare two byte arrays of equal size is to use interop. Run the following code on a console application:
using System;
using System.Runtime.InteropServices;
using System.Security;
namespace CompareByteArray
{
class Program
{
static void Main(string[] args)
{
const int SIZE = 100000;
const int TEST_COUNT = 100;
byte[] arrayA = new byte[SIZE];
byte[] arrayB = new byte[SIZE];
for (int i = 0; i < SIZE; i++)
{
arrayA[i] = 0x22;
arrayB[i] = 0x22;
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Safe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Safe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Unsafe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Unsafe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Pure(arrayA, arrayB, SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Pure: {0}", after - before);
}
return;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint="memcmp", ExactSpelling=true)]
[SuppressUnmanagedCodeSecurity]
static extern int memcmp_1(byte[] b1, byte[] b2, UIntPtr count);
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint = "memcmp", ExactSpelling = true)]
[SuppressUnmanagedCodeSecurity]
static extern unsafe int memcmp_2(byte* b1, byte* b2, UIntPtr count);
public static int MemCmp_Safe(byte[] a, byte[] b, UIntPtr count)
{
return memcmp_1(a, b, count);
}
public unsafe static int MemCmp_Unsafe(byte[] a, byte[] b, UIntPtr count)
{
fixed(byte* p_a = a)
{
fixed (byte* p_b = b)
{
return memcmp_2(p_a, p_b, count);
}
}
}
public static int MemCmp_Pure(byte[] a, byte[] b, int count)
{
int result = 0;
for (int i = 0; i < count && result == 0; i += 1)
{
result = a[0] - b[0];
}
return result;
}
}
}
If you don't need to know the byte, use 64-bit ints that gives you 8 at once. Actually, you can figure out the wrong byte, once you've isolated it to a set of 8.
Use BinaryReader:
saveTime = binReader.ReadInt32()
Or for arrays of ints:
Dim count As Integer = binReader.Read(testArray, 0, 3)
Better approach... If you are just trying to see if the two are different then save some time by not having to go through the entire byte array and generate a hash of each byte array as strings and compare the strings. MD5 should work fine and is pretty efficient.
I see two things that might help:
First, rather than always accessing the second array as item.Bytes, use a local variable to point directly at the array. That is, before starting the loop, do something like this:
array2 = item.Bytes
That will save the overhead of dereferencing from the object each time you want a byte. That could be expensive in Visual Basic, especially if there's a Getter method on that property.
Also, use a "definite loop" instead of "for each". You already know the length of the arrays, so just code the loop using that value. This will avoid the overhead of treating the array as a collection. The loop would look something like this:
For i = 1 to max Step 1
If (array1(i) <> array2(i))
Exit For
EndIf
Next
Not strictly related to the comparison algorithm:
Are you sure your bottleneck is not related to the memory available and the time used to load the byte arrays? Loading two 2 GB byte arrays just to compare them could bring most machines to their knees. If the program design allows, try using streams to read smaller chunks instead.