How To Stop JVM Skipping Loop - jvm

I have my own test class that is supposed to do timing without JVM deleting anything. Some example test times of 100,000,000 reps comparing the native that Java calls from StrictMath.sin() to my own:
30 degrees
sineNative(): 18,342,858 ns (#1), 1,574,331 ns (#10)
sinCosTanNew6(): 13,751,140 ns (#1), 1,569,848 ns (#10)
60 degrees
sineNative(): 2,520,327,020 ns (#1), 2,520,108,337 ns (#10)
sinCosTanNew6(): 12,935,959 ns (#1), 1,565,365 ns (#10)
From 30 to 60 native time skyrockets * 137 while mine is ~constant. Also, some of the times are impossibly low even when repsDone returns == reps. I expect they should be > 1*reps.
CPU: G3258 # 4GHz
OS: Windows 7 HB SP1
Build Path: jre1.8.0_211
Reprex:
public final class MathTest {
private static int sysReps = 1_000_000;
private static double value = 0;
private static final double DRAD_ANGLE_30 = 0.52359877559829887307710723054658d;
private static final double DRAD_ANGLE_60 = 1.0471975511965977461542144610932d;
private static double sineNative(double angle ) {
int reps = sysReps * 100;
//int repsDone = 0;
value = 0;
long startTime, endTime, timeDif;
startTime = System.nanoTime();
for (int index = reps - 1; index >= 0; index--) {
value = Math.sin(angle);
//repsDone++;
}
endTime = System.nanoTime();
timeDif = endTime - startTime;
System.out.println("sineNative(): " + timeDif + "ns for " + reps + " sine " + value + " of angle " + angle);
//System.out.println("reps done: "+repsDone);
return value;
}
private static void testSines() {
sineNative(DRAD_ANGLE_30);
//sinCosTanNew6(IBIT_ANGLE_30);
}
/* Warm Up */
private static void repeatAll(int reps) {
for (int index = reps - 1; index >= 0; index--) {
testSines();
}
}
public static void main(String[] args) {
repeatAll(10);
}
}
I tried adding angle++ in the loop and that multiplies the times to a more reasonable level, but that messes with the math. I need a way to trick it into the running all of the code all x times. Single pass times are extremely volatile and calling nanotime() takes time, so I need the average of a large number.

The problem is that you never use/refer to the results returned by sineNative. The JIT compiler is clever enough to work out that you never use the return value, so it will just do nothing eventually. A very simple way to fix this is to add a dummy check for your return value. (e.g. if (Math.sin(angle) > 1) { System.out.println("Impossible!"); })
If you are writing benchmark like this it would be useful to use something like JMH (https://github.com/openjdk/jmh) which would automatically create a blackhole for your return variable, so that the JIT compiler will not optimise the value. (see example https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java)

Related

What's the term for saving values of calculations instead of recalculating multiple times?

When you have code like this (written in java, but applicable to any similar language):
public static void main(String[] args) {
int total = 0;
for (int i = 0; i < 50; i++)
total += i * doStuff(i % 2); // multiplies i times doStuff(remainder of i / 2)
}
public static int doStuff(int i) {
// Lots of complicated calculations
}
You can see that there's room for improvement. doStuff(i % 2) only returns two different values - one for doStuff(0) on even numbers and one for doStuff(1) on odd numbers. Therefore you're wasting a lot of computation time/power on recalculating those values each time by saying doStuff(i % 2). You can improve like this:
public static void main(String[] args) {
int total = 0;
boolean[] alreadyCalculated = new boolean[2];
int[] results = new int[2];
for (int i = 0; i < 50; i++) {
if (!alreadyCalculated[i % 2]) {
results[i % 2] = doStuff(i % 2);
alreadyCalculated[i % 2] = true;
}
total += i * results[i % 2];
}
}
Now it accesses a stored value instead of recalculating each time. It might seem silly to keep arrays like that, but for cases like looping from, say, i = 0, i < 500 and you're checking i % 32 each time, or something, an array is an elegant approach.
Is there a term for this kind of code optimization? I'd like to read up more on the different forms and the conventions of it but I'm lacking a concise description.
Is there a term for this kind of code optimization?
Yes, there is:
In computing, memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
https://en.wikipedia.org/wiki/Memoization
Common-subexpression-elimination (CSE) is related to this. This case is a combination of that and hoisting a loop-invariant calculation out of a loop.
I'd agree with CBroe that you could call this specific form of caching memoization, esp the way you're implementing it with the clunky alreadyCalculated array. You can optimize that away since you know which calls will be new values and which will be repeats. Normally you'd implement memoization with a static array inside the called function, for the benefit of all callers. Ideally there's a sentinel value you can use to mark entries which don't have a result computed yet, instead of maintaining a separate array for that. Or for a sparse set of input values, just use a hash (instead of e.g. an array with 2^32 entries).
You can also avoid the if in the main loop.
public class Optim
{
public static int doStuff(int i) { return (i+5) << 1; }
public static void main(String[] args)
{
int total = 0;
int results[] = new int[2];
// more interesting if we pretend the loop count isn't known to be > 1, so avoiding calling doStuff(1) for n=1 is useful.
// otherwise you'd just do int[] results = { doStuff(0), doStuff(1) };
int n = 50;
for (int i = 0 ; i < Math.min(n, 2) ; i++) {
results[i] = doStuff(i);
total += i * results[i];
}
for (int i = 2; i < n; i++) { // runs zero times if n < 2
total += i * results[i % 2];
}
System.out.print(total);
}
}
Of course, in this case we can optimize a lot further. sum(0..n) = n * (n+1) / 2, so we can use that to get a closed-form (non-looping) solution in terms of doStuff(0) (sum of the even terms) and doStuff(1) (sum of the odd terms). So we only need the two doStuff() results once each, avoiding any need to memoize.

IntelliSense: "#using" requires C++/CLI to be enabled

#using <mscorlib.dll>
#using <System.dll>
using namespace System;
using namespace System::Text;
using namespace System::IO;
using namespace System::Net;
using namespace System::Net::Sockets;
using namespace System::Collections;
Errors: IntelliSense: "#using" requires C++/CLI to be enabled....
how to fix this prob!?
Your project settings are wrong. Specifically Configuration Properties, General, Common Language Runtime support.
Fall in the pit of success by starting your project by picking one of the project templates in the CLR node.
Choose Project -> Properties from the menu bar. In the Project properties window, under Configuration Properties -> General, make sure that Common Language Runtime Support is set to Common Language Runtime Support (/clr)
In VS2019 it the steps would be :
1/ Right click on the project
2/ Project
3/ Properties
4/ Configuration Properties
5/ Advanced
6/ Common Language Runtime Support change it to Common Language Runtime Support(/clr)
Enable it in your project settings (right click on the projet -> settings) the first tab should provide the option.
The MSDN has a nice example for testing the difference in performance, Parse vs tryParse:
Stopwatch Example
#include <stdio.h>
#using <System.dll>
using namespace System;
using namespace System::Diagnostics;
void DisplayTimerProperties()
{
// Display the timer frequency and resolution.
if (Stopwatch::IsHighResolution)
{
Console::WriteLine("Operations timed using the system's high-resolution performance counter.");
}
else
{
Console::WriteLine("Operations timed using the DateTime class.");
}
Int64 frequency = Stopwatch::Frequency;
Console::WriteLine(" Timer frequency in ticks per second = {0}", frequency);
Int64 nanosecPerTick = (1000L * 1000L * 1000L) / frequency;
Console::WriteLine(" Timer is accurate within {0} nanoseconds", nanosecPerTick);
}
void TimeOperations()
{
Int64 nanosecPerTick = (1000L * 1000L * 1000L) / Stopwatch::Frequency;
const long numIterations = 10000;
// Define the operation title names.
array<String^>^operationNames = { "Operation: Int32.Parse(\"0\")","Operation: Int32.TryParse(\"0\")","Operation: Int32.Parse(\"a\")","Operation: Int32.TryParse(\"a\")" };
// Time four different implementations for parsing
// an integer from a string.
for (int operation = 0; operation <= 3; operation++)
{
// Define variables for operation statistics.
Int64 numTicks = 0;
Int64 numRollovers = 0;
Int64 maxTicks = 0;
Int64 minTicks = Int64::MaxValue;
int indexFastest = -1;
int indexSlowest = -1;
Int64 milliSec = 0;
Stopwatch ^ time10kOperations = Stopwatch::StartNew();
// Run the current operation 10001 times.
// The first execution time will be tossed
// out, since it can skew the average time.
for (int i = 0; i <= numIterations; i++)
{
Int64 ticksThisTime = 0;
int inputNum;
Stopwatch ^ timePerParse;
switch (operation)
{
case 0:
// Parse a valid integer using
// a try-catch statement.
// Start a new stopwatch timer.
timePerParse = Stopwatch::StartNew();
try
{
inputNum = Int32::Parse("0");
}
catch (FormatException^)
{
inputNum = 0;
}
// Stop the timer, and save the
// elapsed ticks for the operation.
timePerParse->Stop();
ticksThisTime = timePerParse->ElapsedTicks;
break;
case 1:
// Parse a valid integer using
// the TryParse statement.
// Start a new stopwatch timer.
timePerParse = Stopwatch::StartNew();
if (!Int32::TryParse("0", inputNum))
{
inputNum = 0;
}
// Stop the timer, and save the
// elapsed ticks for the operation.
timePerParse->Stop();
ticksThisTime = timePerParse->ElapsedTicks;
break;
case 2:
// Parse an invalid value using
// a try-catch statement.
// Start a new stopwatch timer.
timePerParse = Stopwatch::StartNew();
try
{
inputNum = Int32::Parse("a");
}
catch (FormatException^)
{
inputNum = 0;
}
// Stop the timer, and save the
// elapsed ticks for the operation.
timePerParse->Stop();
ticksThisTime = timePerParse->ElapsedTicks;
break;
case 3:
// Parse an invalid value using
// the TryParse statement.
// Start a new stopwatch timer.
timePerParse = Stopwatch::StartNew();
if (!Int32::TryParse("a", inputNum))
{
inputNum = 0;
}
// Stop the timer, and save the
// elapsed ticks for the operation.
timePerParse->Stop();
ticksThisTime = timePerParse->ElapsedTicks;
break;
default:
break;
}
// Skip over the time for the first operation,
// just in case it caused a one-time
// performance hit.
if (i == 0)
{
time10kOperations->Reset();
time10kOperations->Start();
}
else
{
// Update operation statistics
// for iterations 1-10001.
if (maxTicks < ticksThisTime)
{
indexSlowest = i;
maxTicks = ticksThisTime;
}
if (minTicks > ticksThisTime)
{
indexFastest = i;
minTicks = ticksThisTime;
}
numTicks += ticksThisTime;
if (numTicks < ticksThisTime)
{
// Keep track of rollovers.
numRollovers++;
}
}
}
// Display the statistics for 10000 iterations.
time10kOperations->Stop();
milliSec = time10kOperations->ElapsedMilliseconds;
Console::WriteLine();
Console::WriteLine("{0} Summary:", operationNames[operation]);
Console::WriteLine(" Slowest time: #{0}/{1} = {2} ticks", indexSlowest, numIterations, maxTicks);
Console::WriteLine(" Fastest time: #{0}/{1} = {2} ticks", indexFastest, numIterations, minTicks);
Console::WriteLine(" Average time: {0} ticks = {1} nanoseconds", numTicks / numIterations, (numTicks * nanosecPerTick) / numIterations);
Console::WriteLine(" Total time looping through {0} operations: {1} milliseconds", numIterations, milliSec);
}
}
int main()
{
DisplayTimerProperties();
Console::WriteLine();
Console::WriteLine("Press the Enter key to begin:");
Console::ReadLine();
Console::WriteLine();
TimeOperations();
getchar();
}
//Operations timed using the system's high-resolution performance counter.
//Timer frequency in ticks per second = 3319338
//Timer is accurate within 301 nanoseconds
//
//Press the Enter key to begin :
//
//
//
//Operation : Int32.Parse("0") Summary :
// Slowest time : #4483 / 10000 = 95 ticks
// Fastest time : #3 / 10000 = 0 ticks
// Average time : 0 ticks = 99 nanoseconds
// Total time looping through 10000 operations : 1 milliseconds
//
// Operation : Int32.TryParse("0") Summary :
// Slowest time : #7720 / 10000 = 187 ticks
// Fastest time : #1 / 10000 = 0 ticks
// Average time : 0 ticks = 109 nanoseconds
// Total time looping through 10000 operations : 1 milliseconds
//
// Operation : Int32.Parse("a") Summary :
// Slowest time : #3701 / 10000 = 2388 ticks
// Fastest time : #2698 / 10000 = 102 ticks
// Average time : 116 ticks = 35109 nanoseconds
// Total time looping through 10000 operations : 352 milliseconds
//
// Operation : Int32.TryParse("a") Summary :
// Slowest time : #8593 / 10000 = 23 ticks
// Fastest time : #1 / 10000 = 0 ticks
// Average time : 0 ticks = 88 nanoseconds
// Total time looping through 10000 operations : 1 milliseconds
If you are using Visual Studio, you might have to do some installations pre-hand. To install those, open the Visual Studio Installer from the Windows Start menu. Make sure that the Desktop development with C++ tile is checked, and in the Optional components section, also check C++/CLI Support.

CPU Usage (%) MBean on Sun JVM

The overview tab of a process on jconsole shows me the CPU Usage percentage. Is there a MBean that gives me this value? What is its ObjectName?
Update: In Java 7 you can do it like so:
public static double getProcessCpuLoad() throws MalformedObjectNameException, ReflectionException, InstanceNotFoundException {
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName name = ObjectName.getInstance("java.lang:type=OperatingSystem");
AttributeList list = mbs.getAttributes(name, new String[]{ "ProcessCpuLoad" });
if (list.isEmpty()) return Double.NaN;
Attribute att = (Attribute)list.get(0);
Double value = (Double)att.getValue();
if (value == -1.0) return Double.NaN;
return ((int)(value * 1000) / 10.0); // returns a percentage value with 1 decimal point precision
}
----- original answer below -----
In Java 7 you can use the hidden methods of com.sun.management.OperatingSystemMXBean:
getProcessCpuLoad() // returns the CPU usage of the JVM
getSystemCpuLoad() // returns the CPU usage of the whole system
Both values are returned as a double between 0.0 and 1.0 so simply multiply by 100 to get a percentage.
com.sun.management.OperatingSystemMXBean osBean = ManagementFactory.getPlatformMXBean(OperatingSystemMXBean.class);
System.out.println(osBean.getProcessCpuLoad() * 100);
System.out.println(osBean.getSystemCpuLoad() * 100);
Since these are hidden, undocumented, methods that exist in com.sun.management.OperatingSystemMXBean package and not in the java.lang.management.OperatingSystemMXBean there is a risk that they will not be available in some JVMs or in future updates, so you should decide if you're willing to take that risk or not.
see https://www.java.net/community-item/hidden-java-7-features-%E2%80%93-system-and-process-cpu-load-monitoring for more.
There does not seem to be a direct MBean within ManagementFactory. The closest is http://java.sun.com/javase/6/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage() which can be used to calculate the CPU used by the whole system.
However this URL has suggested a method based on the source code of jconsole
I modified a code from internet, like this, then I tested that and the result almost match the linux ps command's result.
/** below is the code */
public float getCpuUsed() {
/** get a MXBean */
com.sun.management.OperatingSystemMXBean osMXBean =
(com.sun.management.OperatingSystemMXBean)
ManagementFactory.getOperatingSystemMXBean();
/** set old timestamp values */
long previousJvmProcessCpuTime = osMXBean.getProcessCpuTime();
int sleepTime = 350;
/** sleep for a while to use to calculate */
try {
TimeUnit.MILLISECONDS.sleep(sleepTime);
} catch (InterruptedException e) {
logger.error("InterruptedException occurred while MemoryCollector sleeping...");
}
/** elapsed process time is in nanoseconds */
long elapsedProcessCpuTime = osMXBean.getProcessCpuTime() - previousJvmProcessCpuTime;
/** elapsed uptime is in milliseconds */
long elapsedJvmUptime = sleepTime ;
/** total jvm uptime on all the available processors */
//long totalElapsedJvmUptime = elapsedJvmUptime * osMXBean.getAvailableProcessors() ;
long totalElapsedJvmUptime = elapsedJvmUptime;
//System.out.println("echo cpu processors num " + osMXBean.getAvailableProcessors());
/** calculate cpu usage as a percentage value
to convert nanoseconds to milliseconds divide it by 1000000 and to get a percentage multiply it by 100 */
float cpuUsage = elapsedProcessCpuTime / (totalElapsedJvmUptime * 10000F);
return (float)(Math.round(cpuUsage*10)/10);
}
Iff you are using UNIX based OS then it's way much easier
final OperatingSystemMXBean mxBean = ManagementFactory.getOperatingSystemMXBean();
if (mxBean instanceof UnixOperatingSystemMXBean) {
return ((UnixOperatingSystemMXBean) mxBean).getSystemCpuLoad() * 100.0;
}

generating 9 digit ids without database sequence

I'd like to create 9-digit numeric ids that are unique across machines. I'm currently using a database sequence for this, but am wondering if it could be done without one. The sequences will be used for X12 EDI transactions, so they don't have to be unique forever. Maybe even only unique for 24 hours.
My only idea:
Each server has a 2 digit server identifier.
Each server maintains a file that essentially keeps track of a local sequence.
id = + <7 digit sequence which wraps>
My biggest problem with this is what to do if the hard-drive fails. I wouldn't know where it left off.
All of my other ideas essentially end up re-creating a centralized database sequence.
Any thoughts?
The Following
{XX}{dd}{HHmm}{N}
Where {XX} is the machine number {dd} is the day of the month {HHmm} current time (24hr) and {N} a sequential number.
A hd crash will take more than a minute so starting at 0 again is not a problem.
You can also replace {dd} with {ss} for seconds, depending on requirements. Uniqueness period vs. requests per minute.
If HD fails you can just set new and unused 2 digit server identifier and be sure that the number is unique (for 24 hours at least)
How about generating GUIDs (ensures uniqueness) and then using some sort of hash function to turn the GUID into a 9-digit number?
Just off the top of my head...
Use a variation on:
md5(uniqid(rand(), true));
Just a thought.
In my recent project I also come across this requirement, to generate N digit long sequence number without any database.
This is actually a good Interview question, because there are consideration on performance and software crash recovery. Further Reading if interested.
The following code has these features:
Prefix each sequence with a prefix.
Sequence cache like Oracle Sequence.
Most importantly, there is recovery logic to resume sequence from software crash.
Complete implementation attached:
import java.util.concurrent.atomic.AtomicLong;
import org.apache.commons.lang.StringUtils;
/**
* This is a customized Sequence Generator which simulates Oracle DB Sequence Generator. However the master sequence
* is stored locally in the file as there is no access to Oracle database. The output format is "prefix" + number.
* <p>
* <u><b>Sample output:</u></b><br>
* 1. FixLengthIDSequence(null,null,15,0,99,0) will generate 15, 16, ... 99, 00<br>
* 2. FixLengthIDSequence(null,"K",1,1,99,0) will generate K01, K02, ... K99, K01<br>
* 3. FixLengthIDSequence(null,"SG",100,2,9999,100) will generate SG0100, SG0101, ... SG8057, (in case server crashes, the new init value will start from last cache value+1) SG8101, ... SG9999, SG0002<br>
*/
public final class FixLengthIDSequence {
private static String FNAME;
private static String PREFIX;
private static AtomicLong SEQ_ID;
private static long MINVALUE;
private static long MAXVALUE;
private static long CACHEVALUE;
// some internal working values.
private int iMaxLength; // max numeric length excluding prefix, for left padding zeros.
private long lNextSnapshot; // to keep track of when to update sequence value to file.
private static boolean bInit = false; // to enable ShutdownHook routine after program has properly initialized
static {
// Inspiration from http://stackoverflow.com/questions/22416826/sequence-generator-in-java-for-unique-id#35697336.
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (bInit) { // Without this, saveToLocal may hit NullPointerException.
saveToLocal(SEQ_ID.longValue());
}
}));
}
/**
* This POJO style constructor should be initialized via Spring Singleton. Otherwise, rewrite this constructor into Singleton design pattern.
*
* #param sFilename This is the absolute file path to store the sequence number. To reset the sequence, this file needs to be removed manually.
* #param prefix The hard-coded identifier.
* #param initvalue
* #param minvalue
* #param maxvalue
* #param cache
* #throws Exception
*/
public FixLengthIDSequence(String sFilename, String prefix, long initvalue, long minvalue, long maxvalue, int cache) throws Exception {
bInit = false;
FNAME = (sFilename==null)?"C:\\Temp\\sequence.txt":sFilename;
PREFIX = (prefix==null)?"":prefix;
SEQ_ID = new AtomicLong(initvalue);
MINVALUE = minvalue;
MAXVALUE = maxvalue; iMaxLength = Long.toString(MAXVALUE).length();
CACHEVALUE = (cache <= 0)?1:cache; lNextSnapshot = roundUpNumberByMultipleValue(initvalue, cache); // Internal cache is always 1, equals no cache.
// If sequence file exists and valid, restore the saved sequence.
java.io.File f = new java.io.File(FNAME);
if (f.exists()) {
String[] saSavedSequence = loadToString().split(",");
if (saSavedSequence.length != 6) {
throw new Exception("Local Sequence file is not valid");
}
PREFIX = saSavedSequence[0];
//SEQ_ID = new AtomicLong(Long.parseLong(saSavedSequence[1])); // savedInitValue
MINVALUE = Long.parseLong(saSavedSequence[2]);
MAXVALUE = Long.parseLong(saSavedSequence[3]); iMaxLength = Long.toString(MAXVALUE).length();
CACHEVALUE = Long.parseLong(saSavedSequence[4]);
lNextSnapshot = Long.parseLong(saSavedSequence[5]);
// For sequence number recovery
// The rule to determine to continue using SEQ_ID or lNextSnapshot as subsequent sequence number:
// If savedInitValue = savedSnapshot, it was saved by ShutdownHook -> use SEQ_ID.
// Else if saveInitValue < savedSnapshot, it was saved by periodic Snapshot -> use lNextSnapshot+1.
if (saSavedSequence[1].equals(saSavedSequence[5])) {
long previousSEQ = Long.parseLong(saSavedSequence[1]);
SEQ_ID = new AtomicLong(previousSEQ);
lNextSnapshot = roundUpNumberByMultipleValue(previousSEQ,CACHEVALUE);
} else {
SEQ_ID = new AtomicLong(lNextSnapshot+1); // SEQ_ID starts fresh from lNextSnapshot+!.
lNextSnapshot = roundUpNumberByMultipleValue(SEQ_ID.longValue(),CACHEVALUE);
}
}
// Catch invalid values.
if (minvalue < 0) {
throw new Exception("MINVALUE cannot be less than 0");
}
if (maxvalue < 0) {
throw new Exception("MAXVALUE cannot be less than 0");
}
if (minvalue >= maxvalue) {
throw new Exception("MINVALUE cannot be greater than MAXVALUE");
}
if (cache >= maxvalue) {
throw new Exception("CACHE value cannot be greater than MAXVALUE");
}
// Save the next Snapshot.
saveToLocal(lNextSnapshot);
bInit = true;
}
/**
* Equivalent to Oracle Sequence nextval.
* #return String because Next Value is usually left padded with zeros, e.g. "00001".
*/
public String nextVal() {
if (SEQ_ID.longValue() > MAXVALUE) {
SEQ_ID.set(MINVALUE);
lNextSnapshot = roundUpNumberByMultipleValue(MINVALUE,CACHEVALUE);
}
if (SEQ_ID.longValue() > lNextSnapshot) {
lNextSnapshot = roundUpNumberByMultipleValue(lNextSnapshot,CACHEVALUE);
saveToLocal(lNextSnapshot);
}
return PREFIX.concat(StringUtils.leftPad(Long.toString(SEQ_ID.getAndIncrement()),iMaxLength,"0"));
}
/**
* Store sequence value into the local file. This routine is called either by Snapshot or ShutdownHook routines.<br>
* If called by Snapshot, currentCount == Snapshot.<br>
* If called by ShutdownHook, currentCount == current SEQ_ID.
* #param currentCount - This value is inserted by either Snapshot or ShutdownHook routines.
*/
private static void saveToLocal (long currentCount) {
try (java.io.Writer w = new java.io.BufferedWriter(new java.io.OutputStreamWriter(new java.io.FileOutputStream(FNAME), "utf-8"))) {
w.write(PREFIX + "," + SEQ_ID.longValue() + "," + MINVALUE + "," + MAXVALUE + "," + CACHEVALUE + "," + currentCount);
w.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Load the sequence file content into String.
* #return
*/
private String loadToString() {
try {
return new String(java.nio.file.Files.readAllBytes(java.nio.file.Paths.get(FNAME)));
} catch (Exception e) {
e.printStackTrace();
}
return "";
}
/**
* Utility method to round up num to next multiple value. This method is used to calculate the next cache value.
* <p>
* (Reference: http://stackoverflow.com/questions/18407634/rounding-up-to-the-nearest-hundred)
* <p>
* <u><b>Sample output:</b></u>
* <pre>
* System.out.println(roundUpNumberByMultipleValue(9,10)); = 10
* System.out.println(roundUpNumberByMultipleValue(10,10)); = 20
* System.out.println(roundUpNumberByMultipleValue(19,10)); = 20
* System.out.println(roundUpNumberByMultipleValue(100,10)); = 110
* System.out.println(roundUpNumberByMultipleValue(109,10)); = 110
* System.out.println(roundUpNumberByMultipleValue(110,10)); = 120
* System.out.println(roundUpNumberByMultipleValue(119,10)); = 120
* </pre>
*
* #param num Value must be greater and equals to positive integer 1.
* #param multiple Value must be greater and equals to positive integer 1.
* #return
*/
private long roundUpNumberByMultipleValue(long num, long multiple) {
if (num<=0) num=1;
if (multiple<=0) multiple=1;
if (num % multiple != 0) {
long division = (long) ((num / multiple) + 1);
return division * multiple;
} else {
return num + multiple;
}
}
/**
* Main method for testing purpose.
* #param args
*/
public static void main(String[] args) throws Exception {
//FixLengthIDSequence(Filename, prefix, initvalue, minvalue, maxvalue, cache)
FixLengthIDSequence seq = new FixLengthIDSequence(null,"H",50,1,999,10);
for (int i=0; i<12; i++) {
System.out.println(seq.nextVal());
Thread.sleep(1000);
//if (i==8) { System.exit(0); }
}
}
}
To test the code, let the sequence run normally. You can press Ctrl+C to simulate the server crash. The next sequence number will continue from NextSnapshot+1.
Cold you use the first 9 digits of some other source of unique data like:
a random number
System Time
Uptime
Having thaught about it for two seconds, none of those are unique on there own but you could use them as seed values for hash functions as was suggested in another answer.

What is the fastest way to compare two byte arrays?

I am trying to compare two long bytearrays in VB.NET and have run into a snag. Comparing two 50 megabyte files takes almost two minutes, so I'm clearly doing something wrong. I'm on an x64 machine with tons of memory so there are no issues there. Here is the code that I'm using at the moment and would like to change.
_Bytes and item.Bytes are the two different arrays to compare and are already the same length.
For Each B In item.Bytes
If B <> _Bytes(I) Then
Mismatch = True
Exit For
End If
I += 1
Next
I need to be able to compare as fast as possible files that are potentially hundreds of megabytes and even possibly a gigabyte or two. Any suggests or algorithms that would be able to do this faster?
Item.bytes is an object taken from the database/filesystem that is returned to compare, because its byte length matches the item that the user wants to add. By comparing the two arrays I can then determine if the user has added something new to the DB and if not then I can just map them to the other file and not waste hard disk drive space.
[Update]
I converted the arrays to local variables of Byte() and then did the same comparison, same code and it ran in like one second (I have to benchmark it still and compare it to others), but if you do the same thing with local variables and use a generic array it becomes massively slower. I’m not sure why, but it raises a lot more questions for me about the use of arrays.
What is the _Bytes(I) call doing? It's not loading the file each time, is it? Even with buffering, that would be bad news!
There will be plenty of ways to micro-optimise this in terms of looking at longs at a time, potentially using unsafe code etc - but I'd just concentrate on getting reasonable performance first. Clearly there's something very odd going on.
I suggest you extract the comparison code into a separate function which takes two byte arrays. That way you know you won't be doing anything odd. I'd also use a simple For loop rather than For Each in this case - it'll be simpler. Oh, and check whether the lengths are correct first :)
EDIT: Here's the code (untested, but simple enough) that I'd use. It's in C# for the minute - I'll convert it in a sec:
public static bool Equals(byte[] first, byte[] second)
{
if (first == second)
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
for (int i=0; i < first.Length; i++)
{
if (first[i] != second[i])
{
return false;
}
}
return true;
}
EDIT: And here's the VB:
Public Shared Function ArraysEqual(ByVal first As Byte(), _
ByVal second As Byte()) As Boolean
If (first Is second) Then
Return True
End If
If (first Is Nothing OrElse second Is Nothing) Then
Return False
End If
If (first.Length <> second.Length) Then
Return False
End If
For i as Integer = 0 To first.Length - 1
If (first(i) <> second(i)) Then
Return False
End If
Next i
Return True
End Function
The fastest way to compare two byte arrays of equal size is to use interop. Run the following code on a console application:
using System;
using System.Runtime.InteropServices;
using System.Security;
namespace CompareByteArray
{
class Program
{
static void Main(string[] args)
{
const int SIZE = 100000;
const int TEST_COUNT = 100;
byte[] arrayA = new byte[SIZE];
byte[] arrayB = new byte[SIZE];
for (int i = 0; i < SIZE; i++)
{
arrayA[i] = 0x22;
arrayB[i] = 0x22;
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Safe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Safe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Unsafe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Unsafe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Pure(arrayA, arrayB, SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Pure: {0}", after - before);
}
return;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint="memcmp", ExactSpelling=true)]
[SuppressUnmanagedCodeSecurity]
static extern int memcmp_1(byte[] b1, byte[] b2, UIntPtr count);
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint = "memcmp", ExactSpelling = true)]
[SuppressUnmanagedCodeSecurity]
static extern unsafe int memcmp_2(byte* b1, byte* b2, UIntPtr count);
public static int MemCmp_Safe(byte[] a, byte[] b, UIntPtr count)
{
return memcmp_1(a, b, count);
}
public unsafe static int MemCmp_Unsafe(byte[] a, byte[] b, UIntPtr count)
{
fixed(byte* p_a = a)
{
fixed (byte* p_b = b)
{
return memcmp_2(p_a, p_b, count);
}
}
}
public static int MemCmp_Pure(byte[] a, byte[] b, int count)
{
int result = 0;
for (int i = 0; i < count && result == 0; i += 1)
{
result = a[0] - b[0];
}
return result;
}
}
}
If you don't need to know the byte, use 64-bit ints that gives you 8 at once. Actually, you can figure out the wrong byte, once you've isolated it to a set of 8.
Use BinaryReader:
saveTime = binReader.ReadInt32()
Or for arrays of ints:
Dim count As Integer = binReader.Read(testArray, 0, 3)
Better approach... If you are just trying to see if the two are different then save some time by not having to go through the entire byte array and generate a hash of each byte array as strings and compare the strings. MD5 should work fine and is pretty efficient.
I see two things that might help:
First, rather than always accessing the second array as item.Bytes, use a local variable to point directly at the array. That is, before starting the loop, do something like this:
array2 = item.Bytes
That will save the overhead of dereferencing from the object each time you want a byte. That could be expensive in Visual Basic, especially if there's a Getter method on that property.
Also, use a "definite loop" instead of "for each". You already know the length of the arrays, so just code the loop using that value. This will avoid the overhead of treating the array as a collection. The loop would look something like this:
For i = 1 to max Step 1
If (array1(i) <> array2(i))
Exit For
EndIf
Next
Not strictly related to the comparison algorithm:
Are you sure your bottleneck is not related to the memory available and the time used to load the byte arrays? Loading two 2 GB byte arrays just to compare them could bring most machines to their knees. If the program design allows, try using streams to read smaller chunks instead.