Kotlin - Reading the least significant bit (Steganography) - kotlin

I am creating a program that can read the least significant bit of a image file using Kotlin. I have a function that reads the bytes in a file, but I am unsure how to actully print the bytes in the function consumeArray.
My goal is to print the least significant bits of a image.
override fun run() {
val buff = ByteArray(1230)
File("src\\main\\kotlin\\day01_least_significant_bit_steganography\\eksempel_bakgrunnsbilde.png").inputStream().buffered().use { input ->
while(true) {
val sz = input.read(buff)
if (sz <= 0) break
///at that point we have a sz bytes in the buff to process
consumeArray(buff, 0, sz)
}
}
} // run
private fun consumeArray(buff: ByteArray, i: Int, sz: Int) {
println("??")
} // consumeArray

In Kotlin 1.4+ you can get the least significant bit of any byte with .takeLowestOneBit() method.
It may happen that it's equal to zero, so you need to iterate byteArray until any non-zero least significant bit is met (I believe this is what was meant under "least significant bit of byteArray"):
var lowestBit: Byte = 0
for (index in sz - 1 downTo 0) {
val currentLowestBit = buff[index].takeLowestOneBit()
if (currentLowestBit != 0.toByte()) {
lowestBit = currentLowestBit
break
}
}
Note that it will print the least significant bit of your buffer, not the whole image (if it's bigger than the buffer)

Related

Kotlin - How to create a dynamic matrix?

I want to create a matrix of size nxn where n is the length of input message:String.
So far this is the only solution that came to my mind and that too has four for loops.
fun main(){
println("Enter the message:")
var message:String = readLine().toString()
var cipher = Array(message.length) { Array<Int>(message.length) {0} }
for(i in 0 .. (message.length - 1)){
for(j in 0 .. (message.length - 1)){
cipher[i][j] = readLine()!!.toInt()
}
}
//print the matrix
for(i in 0..(message.length -1)){
for(j in 0..(message.length -1)){
print(cipher[i][j])
}
println()
}
}
Is there any less complex code for the same? How can I improve this code?
Assuming your input data is row-major, this can be simplified by moving the array filling logic into the array creation itself:
var cipher = Array(message.length) {
IntArray(message.length) { readLine()!!.toInt() }
}
Array's constructor takes an initializer function that is invoked size times to populate the array. By reading user input here, you can populate the array while the matrix is being created and avoid having to write an extra loop.
Miscellaneous notes:
readLine().toString() is redundant and possibly harmful. readLine returns a String?, and you invoke Any?.toString on it, which either returns the result of Any.toString if its receiver is not null, or the literal string "null" (which is probably not desired.)
Consider using the until infix function when looping over arrays (0 until length), rather than 0..(length - 1) (or, even better, the Array.indices extension property.)
Consider using the corresponding primitive array type (i.e. IntArray, FloatArray, etc. rather than Array<*>)
See also:
readLine
Any?.toString
until
indices
IntArray vs Array<Int> in Kotlin
You can also avoid your output loop entirely by simplifying your code:
println(cipher.joinToString("\n") { row -> row.joinToString("") })
Here's a simpler piece of code:
fun main() {
println("Enter the message:")
var message = readLine()!!
var cipher = Array(message.length) {
IntArray(message.length) { readLine()!!.toInt() }
}
println(cipher.joinToString("\n") { it.joinToString("") })
}

Fastest way to read in and print back out a 2d array in Kotlin?

The following code works for reading in a 2d array and printing it back out in Kotlin, however I imagine that with larger datasets swapping from a string to an int list would be rather slow, so is there a quicker way to do it?
fun main() {
var rowAndColumn = readLine()!!.split(" ")
var rows = rowAndColumn[0].toInt()
var columns = rowAndColumn[1].toInt()
val board = Array(rows) { IntArray(columns).toMutableList() }
for (i in 0 until rows) {
var stringColumn = readLine()!!.split("").toMutableList()
stringColumn.removeAll(listOf(""))
var column = stringColumn.map {it.toInt()}.toMutableList()
board[i] = column
}
for(i in 0 until rows) {
println(board[i].toString())
}
}
I measure this to be about 4-5 times faster than your method. It iterates Chars rather than splitting each line into Strings for each character.
var rowAndColumn = readLine()!!.split(" ")
var rows = rowAndColumn[0].toInt()
var columns = rowAndColumn[1].toInt()
val board = arrayOfNulls<IntArray?>(rows)
for (row in 0 until rows) {
board[row] = readLine()!!
.asIterable()
.mapNotNull {
val i = it.toInt() - '0'.toInt()
if (i in 0..1) i else null
}
.toIntArray()
}
board.requireNoNulls()
If you you don't mind ending up with an Array<List<Int>> instead of an Array<IntArray>, you can change that out on line 4 and remove the .toIntArray() call for a slight (~5%) improvement.
Caveat...I was reading from a text file, not the console input, so file access may have affected the comparison. Intuition tells me it's possible this would be even faster in comparison if file reading time were removed.

Allocate uninitialized slice

Is there some way to allocate an uninitialized slice in Go? A frequent pattern is to create a slice of a given size as a buffer, and then only use part of it to receive data. For example:
b := make([]byte, 0x20000) // b is zero initialized
n, err := conn.Read(b)
// do stuff with b[:n]. all of b is zeroed for no reason
This initialization can add up when lots of buffers are being allocated, as the spec states it will default initialize the array on allocation.
You can get non zeroed byte buffers from bufs.Cache.Get (or see CCache for the concurrent safe version). From the docs:
NOTE: The buffer returned by Get is not guaranteed to be zeroed. That's okay for e.g. passing a buffer to io.Reader. If you need a zeroed buffer use Cget.
Technically you could by allocating the memory outside the go runtime and using unsafe.Pointer, but this is definitely the wrong thing to do.
A better solution is to reduce the number of allocations. Move buffers outside loops, or, if you need per goroutine buffers, allocate several of them in a pool and only allocate more when they're needed.
type BufferPool struct {
Capacity int
buffersize int
buffers []byte
lock sync.Mutex
}
func NewBufferPool(buffersize int, cap int) {
ret := new(BufferPool)
ret.Capacity = cap
ret.buffersize = buffersize
return ret
}
func (b *BufferPool) Alloc() []byte {
b.lock.Lock()
defer b.lock.Unlock()
if len(b.buffers) == 0 {
return make([]byte, b.buffersize)
} else {
ret := b.buffers[len(b.buffers) - 1]
b.buffers = b.buffers[0:len(b.buffers) - 1]
return ret
}
}
func (b *BufferPool) Free(buf []byte) {
if len(buf) != b.buffersize {
panic("illegal free")
}
b.lock.Lock()
defer b.lock.Unlock()
if len(b.buffers) < b.Capacity {
b.buffers = append(b.buffers, buf)
}
}

Functions to compress and uncompress array of integers

I was recently asked to complete a task for a c++ role, however as the application was decided not to be progressed any further I thought that I would post here for some feedback / advice / improvements / reminder of concepts I've forgotten.
The task was:
The following data is a time series of integer values
int timeseries[32] = {67497, 67376, 67173, 67235, 67057, 67031, 66951,
66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044,
67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620,
66579, 66596, 66713, 66852, 66715};
The series might be, for example, the closing price of a stock each day
over a 32 day period.
As stored above, the data will occupy 32 x sizeof(int) bytes = 128 bytes
assuming 4 byte ints.
Using delta encoding , write a function to compress, and a function to
uncompress data like the above.
Ok, so before this point I had never looked at compression so my solution is far from perfect. The manner in which I approached the problem is by compressing the array of integers into a array of bytes. When representing the integer as a byte I keep the calculate most
significant byte (msb) and keep everything up to this point, whilst throwing the rest away. This is then added to the byte array. For negative values I increment the msb by 1 so that we can
differentiate between positive and negative bytes when decoding by keeping the leading
1 bit values.
When decoding I parse this jagged byte array and simply reverse my
previous actions performed when compressing. As mentioned I have never looked at compression prior to this task so I did come up with my own method to compress the data. I was looking at C++/Cli recently, had not really used it previously so just decided to write it in this language, no particular reason. Below is the class, and a unit test at the very bottom. Any advice / improvements / enhancements will be much appreciated.
Thanks.
array<array<Byte>^>^ CDeltaEncoding::CompressArray(array<int>^ data)
{
int temp = 0;
int original;
int size = 0;
array<int>^ tempData = gcnew array<int>(data->Length);
data->CopyTo(tempData, 0);
array<array<Byte>^>^ byteArray = gcnew array<array<Byte>^>(tempData->Length);
for (int i = 0; i < tempData->Length; ++i)
{
original = tempData[i];
tempData[i] -= temp;
temp = original;
int msb = GetMostSignificantByte(tempData[i]);
byteArray[i] = gcnew array<Byte>(msb);
System::Buffer::BlockCopy(BitConverter::GetBytes(tempData[i]), 0, byteArray[i], 0, msb );
size += byteArray[i]->Length;
}
return byteArray;
}
array<int>^ CDeltaEncoding::DecompressArray(array<array<Byte>^>^ buffer)
{
System::Collections::Generic::List<int>^ decodedArray = gcnew System::Collections::Generic::List<int>();
int temp = 0;
for (int i = 0; i < buffer->Length; ++i)
{
int retrievedVal = GetValueAsInteger(buffer[i]);
decodedArray->Add(retrievedVal);
decodedArray[i] += temp;
temp = decodedArray[i];
}
return decodedArray->ToArray();
}
int CDeltaEncoding::GetMostSignificantByte(int value)
{
array<Byte>^ tempBuf = BitConverter::GetBytes(Math::Abs(value));
int msb = tempBuf->Length;
for (int i = tempBuf->Length -1; i >= 0; --i)
{
if (tempBuf[i] != 0)
{
msb = i + 1;
break;
}
}
if (!IsPositiveInteger(value))
{
//We need an extra byte to differentiate the negative integers
msb++;
}
return msb;
}
bool CDeltaEncoding::IsPositiveInteger(int value)
{
return value / Math::Abs(value) == 1;
}
int CDeltaEncoding::GetValueAsInteger(array<Byte>^ buffer)
{
array<Byte>^ tempBuf;
if(buffer->Length % 2 == 0)
{
//With even integers there is no need to allocate a new byte array
tempBuf = buffer;
}
else
{
tempBuf = gcnew array<Byte>(4);
System::Buffer::BlockCopy(buffer, 0, tempBuf, 0, buffer->Length );
unsigned int val = buffer[buffer->Length-1] &= 0xFF;
if ( val == 0xFF )
{
//We have negative integer compressed into 3 bytes
//Copy over the this last byte as well so we keep the negative pattern
System::Buffer::BlockCopy(buffer, buffer->Length-1, tempBuf, buffer->Length, 1 );
}
}
switch(tempBuf->Length)
{
case sizeof(short):
return BitConverter::ToInt16(tempBuf,0);
case sizeof(int):
default:
return BitConverter::ToInt32(tempBuf,0);
}
}
And then in a test class I had:
void CTestDeltaEncoding::TestCompression()
{
array<array<Byte>^>^ byteArray = CDeltaEncoding::CompressArray(m_testdata);
array<int>^ decompressedArray = CDeltaEncoding::DecompressArray(byteArray);
int totalBytes = 0;
for (int i = 0; i<byteArray->Length; i++)
{
totalBytes += byteArray[i]->Length;
}
Assert::IsTrue(m_testdata->Length * sizeof(m_testdata) > totalBytes, "Expected the total bytes to be less than the original array!!");
//Expected totalBytes = 53
}
This smells a lot like homework to me. The crucial phrase is: "Using delta encoding."
Delta encoding means you encode the delta (difference) between each number and the next:
67497, 67376, 67173, 67235, 67057, 67031, 66951, 66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044, 67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620, 66579, 66596, 66713, 66852, 66715
would turn into:
[Base: 67497]: -121, -203, +62
and so on. Assuming 8-bit bytes, the original numbers require 3 bytes apiece (and given the number of compilers with 3-byte integer types, you're normally going to end up with 4 bytes apiece). From the looks of things, the differences will fit quite easily in 2 bytes apiece, and if you can ignore one (or possibly two) of the least significant bits, you can fit them in one byte apiece.
Delta encoding is most often used for things like sound encoding where you can "fudge" the accuracy at times without major problems. For example, if you have a change from one sample to the next that's larger than you've left space to encode, you can encode a maximum change in the current difference, and add the difference to the next delta (and if you don't mind some back-tracking, you can distribute some to the previous delta as well). This will act as a low-pass filter, limiting the gradient between samples.
For example, in the series you gave, a simple delta encoding requires ten bits to represent all the differences. By dropping the LSB, however, nearly all the samples (all but one, in fact) can be encoded in 8 bits. That one has a difference (right shifted one bit) of -173, so if we represent it as -128, we have 45 left. We can distribute that error evenly between the preceding and following sample. In that case, the output won't be an exact match for the input, but if we're talking about something like sound, the difference probably won't be particularly obvious.
I did mention that it was an exercise that I had to complete and the solution that I received was deemed not good enough, so I wanted some constructive feedback seeing as actual companies never decide to tell you what you did wrong.
When the array is compressed I store the differences and not the original values except the first as this was my understanding. If you had looked at my code I have provided a full solution but my question was how bad was it?

What is the fastest way to compare two byte arrays?

I am trying to compare two long bytearrays in VB.NET and have run into a snag. Comparing two 50 megabyte files takes almost two minutes, so I'm clearly doing something wrong. I'm on an x64 machine with tons of memory so there are no issues there. Here is the code that I'm using at the moment and would like to change.
_Bytes and item.Bytes are the two different arrays to compare and are already the same length.
For Each B In item.Bytes
If B <> _Bytes(I) Then
Mismatch = True
Exit For
End If
I += 1
Next
I need to be able to compare as fast as possible files that are potentially hundreds of megabytes and even possibly a gigabyte or two. Any suggests or algorithms that would be able to do this faster?
Item.bytes is an object taken from the database/filesystem that is returned to compare, because its byte length matches the item that the user wants to add. By comparing the two arrays I can then determine if the user has added something new to the DB and if not then I can just map them to the other file and not waste hard disk drive space.
[Update]
I converted the arrays to local variables of Byte() and then did the same comparison, same code and it ran in like one second (I have to benchmark it still and compare it to others), but if you do the same thing with local variables and use a generic array it becomes massively slower. I’m not sure why, but it raises a lot more questions for me about the use of arrays.
What is the _Bytes(I) call doing? It's not loading the file each time, is it? Even with buffering, that would be bad news!
There will be plenty of ways to micro-optimise this in terms of looking at longs at a time, potentially using unsafe code etc - but I'd just concentrate on getting reasonable performance first. Clearly there's something very odd going on.
I suggest you extract the comparison code into a separate function which takes two byte arrays. That way you know you won't be doing anything odd. I'd also use a simple For loop rather than For Each in this case - it'll be simpler. Oh, and check whether the lengths are correct first :)
EDIT: Here's the code (untested, but simple enough) that I'd use. It's in C# for the minute - I'll convert it in a sec:
public static bool Equals(byte[] first, byte[] second)
{
if (first == second)
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
for (int i=0; i < first.Length; i++)
{
if (first[i] != second[i])
{
return false;
}
}
return true;
}
EDIT: And here's the VB:
Public Shared Function ArraysEqual(ByVal first As Byte(), _
ByVal second As Byte()) As Boolean
If (first Is second) Then
Return True
End If
If (first Is Nothing OrElse second Is Nothing) Then
Return False
End If
If (first.Length <> second.Length) Then
Return False
End If
For i as Integer = 0 To first.Length - 1
If (first(i) <> second(i)) Then
Return False
End If
Next i
Return True
End Function
The fastest way to compare two byte arrays of equal size is to use interop. Run the following code on a console application:
using System;
using System.Runtime.InteropServices;
using System.Security;
namespace CompareByteArray
{
class Program
{
static void Main(string[] args)
{
const int SIZE = 100000;
const int TEST_COUNT = 100;
byte[] arrayA = new byte[SIZE];
byte[] arrayB = new byte[SIZE];
for (int i = 0; i < SIZE; i++)
{
arrayA[i] = 0x22;
arrayB[i] = 0x22;
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Safe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Safe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Unsafe(arrayA, arrayB, (UIntPtr)SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Unsafe: {0}", after - before);
}
{
DateTime before = DateTime.Now;
for (int i = 0; i < TEST_COUNT; i++)
{
int result = MemCmp_Pure(arrayA, arrayB, SIZE);
if (result != 0) throw new Exception();
}
DateTime after = DateTime.Now;
Console.WriteLine("MemCmp_Pure: {0}", after - before);
}
return;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint="memcmp", ExactSpelling=true)]
[SuppressUnmanagedCodeSecurity]
static extern int memcmp_1(byte[] b1, byte[] b2, UIntPtr count);
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint = "memcmp", ExactSpelling = true)]
[SuppressUnmanagedCodeSecurity]
static extern unsafe int memcmp_2(byte* b1, byte* b2, UIntPtr count);
public static int MemCmp_Safe(byte[] a, byte[] b, UIntPtr count)
{
return memcmp_1(a, b, count);
}
public unsafe static int MemCmp_Unsafe(byte[] a, byte[] b, UIntPtr count)
{
fixed(byte* p_a = a)
{
fixed (byte* p_b = b)
{
return memcmp_2(p_a, p_b, count);
}
}
}
public static int MemCmp_Pure(byte[] a, byte[] b, int count)
{
int result = 0;
for (int i = 0; i < count && result == 0; i += 1)
{
result = a[0] - b[0];
}
return result;
}
}
}
If you don't need to know the byte, use 64-bit ints that gives you 8 at once. Actually, you can figure out the wrong byte, once you've isolated it to a set of 8.
Use BinaryReader:
saveTime = binReader.ReadInt32()
Or for arrays of ints:
Dim count As Integer = binReader.Read(testArray, 0, 3)
Better approach... If you are just trying to see if the two are different then save some time by not having to go through the entire byte array and generate a hash of each byte array as strings and compare the strings. MD5 should work fine and is pretty efficient.
I see two things that might help:
First, rather than always accessing the second array as item.Bytes, use a local variable to point directly at the array. That is, before starting the loop, do something like this:
array2 = item.Bytes
That will save the overhead of dereferencing from the object each time you want a byte. That could be expensive in Visual Basic, especially if there's a Getter method on that property.
Also, use a "definite loop" instead of "for each". You already know the length of the arrays, so just code the loop using that value. This will avoid the overhead of treating the array as a collection. The loop would look something like this:
For i = 1 to max Step 1
If (array1(i) <> array2(i))
Exit For
EndIf
Next
Not strictly related to the comparison algorithm:
Are you sure your bottleneck is not related to the memory available and the time used to load the byte arrays? Loading two 2 GB byte arrays just to compare them could bring most machines to their knees. If the program design allows, try using streams to read smaller chunks instead.