Cryptographic hashing in Ceylon - cryptography

What is the recommended way to import the standard cryptographic hashing (message digest) libraries (MD5, SHA1, SHA2, SHA256, SHA3, etc.) in Ceylon?

There doesn't seem to be a cryptography module in the SDK.
There is Ceylon Crypto on Github (and as 3 separate modules in Herd), but it says in the README:
Note that IANAC (I am not a cryptologist), so this will surely be flawed in some security relevant way.
Do not use in production and don't rely on it in any expensive way whatsoever!
If you just want to use this in the JVM, I would suggest to use just Java's crypto APIs in java.security (that should be enough for the hash functions) or javax.crypto (for other stuff like ciphers).
Here is an example, calculating the SHA-256 of Hello World:
module.ceylon:
native("jvm")
module example "1.0.0" {
import java.base "8";
}
run.ceylon:
import java.security {
MessageDigest
}
import java.lang {
JString=String,
ByteArray
}
"Formats a single byte as a (two character) hexadecimal string."
String formatByte(Byte byte)
=> Integer.format(byte.unsigned, 16).padLeading(2, '0');
"Formats a Java byte array as a hexadecimal string."
String formatByteArray(ByteArray result)
=> String.sum(result.byteArray.map(formatByte));
"Calculates SHA-256('Hello World') and print it."
shared void run() {
value message = "Hello World";
value bytes = JString(message).getBytes("UTF-8");
value dig = MessageDigest.getInstance("SHA-256");
value result = dig.digest(bytes);
value formatted = formatByteArray(result);
print("Result: ``result.array```");
print("Length: ``result.size``");
print("Result in hex: ``formatted``");
}
This program outputs this:
Result: { -91, -111, -90, -44, 11, -12, 32, 64, 74, 1, 23, 51, -49, -73, -79, -112, -42, 44, 101, -65, 11, -51, -93, 43, 87, -78, 119, -39, -83, -97, 20, 110 }`
Length: 32
Result in hex: A591A6D40BF420404A011733CFB7B190D62C65BF0BCDA32B57B277D9AD9F146E
I didn't find a Ceylon wrapper for this which would make it a bit nicer, though.

Related

How to put range of ints instead writing them one by one kotlin listOf

For example we have such array:
val listArr = listOf(1,2,3,4,5,6,7)
and finally we receive:
1,2,3,4,5,6,7
maybe it is possible to write something like that:
val listArr = listOf(1..7)
and receive similar result. Or it is impossible right now?
You can use the IntRange.toList() function:
val list = (1..7).toList()
Ranges are automatically converted to lists when concatenating:
val combined = (1..6) + 12 + (34..37)
// [1, 2, 3, 4, 5, 6, 12, 34, 35, 36, 37]
RobCo's answer is correct and answers the question asked.
About the followup question you asked in the comment to his answer:
how we can use such solution in another list for example 1,2,3,4,5,6,12,34,35,36,37
You could write a new function that accepts ranges:
fun customListOf(vararg ranges: IntRange) = ranges.flatMap { it.toList() }
Then use it like this:
fun main() {
val list = customListOf(1..6, 12..12, 34..37)
println(list)
}
Output:
[1, 2, 3, 4, 5, 6, 12, 34, 35, 36, 37]
However, you need to pass a range even for a single value like 12..12 above.
If you wanted to be hacky, you could write a function that accepts a vararg range: Any, and use reflection to check the type at runtime. That would allow you to mix ranges and ints in the same call:
fun hackyCustomListOf(vararg rangeOrInt: Any) = rangeOrInt.flatMap {
when (it) {
is IntRange -> it.toList()
is Int -> listOf(it)
else -> throw IllegalArgumentException("Expected an IntRange or an Int, got ${it::class}")
}
}
Usage:
fun main() {
val list1 = hackyCustomListOf(1, 5, 12..15, 25, 99..102)
println(list1)
val list2 = hackyCustomListOf(1..3, "boom", 5.0)
println(list2)
}
Output:
[1, 5, 12, 13, 14, 15, 25, 99, 100, 101, 102]
Exception in thread "main" java.lang.IllegalArgumentException: Expected an IntRange or an Int, got class kotlin.String
at TestKt.hackyCustomListOf(test.kt:7)
at TestKt.main(test.kt:14)
at TestKt.main(test.kt)
This removes compile-time checks on the argument, so I don't think it's a good idea. Fun exercise, though.

How to save/serialize QVariant that is QVector<int>

I'm lost as to how to fix this :
qRegisterMetaType<QVector<int>>("QVector<int>");
QMap<int, QVariant> wah;
wah[0] = QVariant::fromValue(QVector<int>{12, 231, 45, 125, 123, 12, 312, 4, 12});
qDebug() << wah;
QByteArray ar;
QDataStream s(&ar, QIODevice::WriteOnly);
s << wah;
Any ideas/help would be great. Most of the google results are about serializing custom classes and not ints :/
TIA!
Needed to add
qRegisterMetaTypeStreamOperators<QVector<int>>("QVector<int>");
Bit of a bummer that this is not explained in docs tho.

Beam Java SDK with TFRecord and Compression GZIP

We're using Beam Java SDK (and Google Cloud Dataflow to run batch jobs) a lot, and we noticed something weird (possibly a bug?) when we tried to use TFRecordIO with Compression.GZIP. We were able to come up with some sample code that can reproduce the errors we face.
To be clear, we are using Beam Java SDK 2.4.
Suppose we have PCollection<byte[]> which can be a PC of proto messages, for instance, in byte[] format.
We usually write this to GCS (Google Cloud Storage) using Base64 encoding (newline delimited Strings) or using TFRecordIO (without compression). We have had no issue reading the data from GCS in this manner for a very long time (2.5+ years for the former and ~1.5 years for the latter).
Recently, we tried TFRecordIO with Compression.GZIP option, and sometimes we get an exception as the data is seen as invalid (while being read). The data itself (the gzip files) is not corrupted, and we've tested various things, and reached the following conclusion.
When a byte[] that is being compressed under TFRecordIO is above certain threshold (I'd say when at or above 8192), then TFRecordIO.read().withCompression(Compression.GZIP) would not work.
Specifically, it will throw the following exception:
Exception in thread "main" java.lang.IllegalStateException: Invalid data
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:642)
at org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526)
at org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:468)
at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:261)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:161)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:125)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This can be reproduced easily, so you can refer to the code at the end. You will also see comments about the byte array length (as I tested with various sizes, I concluded that 8192 is the magic number).
So I'm wondering if this is a bug or known issue -- I couldn't find anything close to this on Apache Beam's Issue Tracker here but if there is another forum/site I need to check, please let me know!
If this is indeed a bug, what would be the right channel to report this?
The following code can reproduce the error we have.
A successful run (with parameters 1, 39, 100) would show the following message at the end:
------------ counter metrics from CountDoFn
[counter] plain_base64_proto_array_len: 8126
[counter] plain_base64_proto_in: 1
[counter] plain_base64_proto_val_cnt: 39
[counter] tfrecord_gz_proto_array_len: 8126
[counter] tfrecord_gz_proto_in: 1
[counter] tfrecord_gz_proto_val_cnt: 39
[counter] tfrecord_uncomp_proto_array_len: 8126
[counter] tfrecord_uncomp_proto_in: 1
[counter] tfrecord_uncomp_proto_val_cnt: 39
With parameters (1, 40, 100) which would push the byte array length over 8192, it will throw the said exception.
You can tweak the parameters (inside CreateRandomProtoData DoFn) to see why the length of byte[] being gzipped matters.
It may help you also to use the following protoc-gen Java class (for TestProto used in the main code above. Here it is: gist link
References:
Main Code:
package exp.moloco.dataflow2.compression; // NOTE: Change appropriately.
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.TreeMap;
import org.apache.beam.runners.direct.DirectRunner;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.Compression;
import org.apache.beam.sdk.io.TFRecordIO;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.MetricResult;
import org.apache.beam.sdk.metrics.Metrics;
import org.apache.beam.sdk.metrics.MetricsFilter;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.PCollection;
import org.apache.commons.codec.binary.Base64;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.protobuf.InvalidProtocolBufferException;
import com.moloco.dataflow.test.StackOverflow.TestProto;
import com.moloco.dataflow2.Main;
// #formatter:off
// This code uses TestProto (java class) that is generated by protoc.
// The message definition is as follows (in proto3, but it shouldn't matter):
// message TestProto {
// int64 count = 1;
// string name = 2;
// repeated string values = 3;
// }
// Note that this code does not depend on whether this proto is used,
// or any other byte[] is used (see CreateRandomData DoFn later which generates the data being used in the code).
// We tested both, but are presenting this as a concrete example of how (our) code in production can be affected.
// #formatter:on
public class CompressionTester {
private static final Logger LOG = LoggerFactory.getLogger(CompressionTester.class);
static final List<String> lines = Arrays.asList("some dummy string that will not used in this job.");
// Some GCS buckets where data will be written to.
// %s will be replaced by some timestamped String for easy debugging.
static final String PATH_TO_GCS_PLAIN_BASE64 = Main.SOME_BUCKET + "/comp-test/%s/output-plain-base64";
static final String PATH_TO_GCS_TFRECORD_UNCOMP = Main.SOME_BUCKET + "/comp-test/%s/output-tfrecord-uncompressed";
static final String PATH_TO_GCS_TFRECORD_GZ = Main.SOME_BUCKET + "/comp-test/%s/output-tfrecord-gzip";
// This DoFn reads byte[] which represents a proto message (TestProto).
// It simply counts the number of proto objects it processes
// as well as the number of Strings each proto object contains.
// When the pipeline terminates, the values of the Counters will be printed out.
static class CountDoFn extends DoFn<byte[], TestProto> {
private final Counter protoIn;
private final Counter protoValuesCnt;
private final Counter protoByteArrayLength;
public CountDoFn(String name) {
protoIn = Metrics.counter(this.getClass(), name + "_proto_in");
protoValuesCnt = Metrics.counter(this.getClass(), name + "_proto_val_cnt");
protoByteArrayLength = Metrics.counter(this.getClass(), name + "_proto_array_len");
}
#ProcessElement
public void processElement(ProcessContext c) throws InvalidProtocolBufferException {
protoIn.inc();
TestProto tp = TestProto.parseFrom(c.element());
protoValuesCnt.inc(tp.getValuesCount());
protoByteArrayLength.inc(c.element().length);
}
}
// This DoFn emits a number of TestProto objects as byte[].
// Input to this DoFn is ignored (not used).
// Each TestProto object contains three fields: count (int64), name (string), and values (repeated string).
// The three parameters in DoFn determines
// (1) the number of proto objects to be generated,
// (2) the number of (repeated) strings to be added to each proto object, and
// (3) the length of (each) string.
// TFRecord with Compression (when reading) fails when the parameters are 1, 40, 100, for instance.
// TFRecord with Compression (when reading) succeeds when the parameters are 1, 39, 100, for instance.
static class CreateRandomProtoData extends DoFn<String, byte[]> {
static final int NUM_PROTOS = 1; // Total number of TestProto objects to be emitted by this DoFn.
static final int NUM_STRINGS = 40; // Total number of strings in each TestProto object ('repeated string').
static final int STRING_LEN = 100; // Length of each string object.
// Returns a random string of length len.
// For debugging purposes, the string only contains upper-case English alphabets.
static String getRandomString(Random rd, int len) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < len; i++) {
sb.append('A' + (rd.nextInt(26)));
}
return sb.toString();
}
// Returns a randomly generated TestProto object.
// Each string is generated randomly using getRandomString().
static TestProto getRandomProto(Random rd) {
TestProto.Builder tpBuilder = TestProto.newBuilder();
tpBuilder.setCount(rd.nextInt());
tpBuilder.setName(getRandomString(rd, STRING_LEN));
for (int i = 0; i < NUM_STRINGS; i++) {
tpBuilder.addValues(getRandomString(rd, STRING_LEN));
}
return tpBuilder.build();
}
// Emits TestProto objects are byte[].
#ProcessElement
public void processElement(ProcessContext c) {
// For debugging purposes, we set the seed here.
Random rd = new Random();
rd.setSeed(132475);
for (int n = 0; n < NUM_PROTOS; n++) {
byte[] data = getRandomProto(rd).toByteArray();
c.output(data);
// With parameters (1, 39, 100), the array length is 8126. It works fine.
// With parameters (1, 40, 100), the array length is 8329. It breaks TFRecord with GZIP.
System.out.println("\n--------------------------\n");
System.out.println("byte array length = " + data.length);
System.out.println("\n--------------------------\n");
}
}
}
public static void execute() {
PipelineOptions options = PipelineOptionsFactory.create();
options.setJobName("compression-tester");
options.setRunner(DirectRunner.class);
// For debugging purposes, write files under 'gcsSubDir' so we can easily distinguish.
final String gcsSubDir =
String.format("%s-%d", DateTime.now(DateTimeZone.UTC), DateTime.now(DateTimeZone.UTC).getMillis());
// Write PCollection<TestProto> in 3 different ways to GCS.
{
Pipeline pipeline = Pipeline.create(options);
// Create dummy data which is a PCollection of byte arrays (each array representing a proto message).
PCollection<byte[]> data = pipeline.apply(Create.of(lines)).apply(ParDo.of(new CreateRandomProtoData()));
// 1. Write as plain-text with base64 encoding.
data.apply(ParDo.of(new DoFn<byte[], String>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(new String(Base64.encodeBase64(c.element())));
}
})).apply(TextIO.write().to(String.format(PATH_TO_GCS_PLAIN_BASE64, gcsSubDir)).withNumShards(1));
// 2. Write as TFRecord.
data.apply(TFRecordIO.write().to(String.format(PATH_TO_GCS_TFRECORD_UNCOMP, gcsSubDir)).withNumShards(1));
// 3. Write as TFRecord-gzip.
data.apply(TFRecordIO.write().withCompression(Compression.GZIP)
.to(String.format(PATH_TO_GCS_TFRECORD_GZ, gcsSubDir)).withNumShards(1));
pipeline.run().waitUntilFinish();
}
LOG.info("-------------------------------------------");
LOG.info(" READ TEST BEGINS ");
LOG.info("-------------------------------------------");
// Read PCollection<TestProto> in 3 different ways from GCS.
{
Pipeline pipeline = Pipeline.create(options);
// 1. Read as plain-text.
pipeline.apply(TextIO.read().from(String.format(PATH_TO_GCS_PLAIN_BASE64, gcsSubDir) + "*"))
.apply(ParDo.of(new DoFn<String, byte[]>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(Base64.decodeBase64(c.element()));
}
})).apply("plain-base64", ParDo.of(new CountDoFn("plain_base64")));
// 2. Read as TFRecord -> byte array.
pipeline.apply(TFRecordIO.read().from(String.format(PATH_TO_GCS_TFRECORD_UNCOMP, gcsSubDir) + "*"))
.apply("tfrecord-uncomp", ParDo.of(new CountDoFn("tfrecord_uncomp")));
// 3. Read as TFRecord-gz -> byte array.
// This seems to fail when 'data size' becomes large.
pipeline
.apply(TFRecordIO.read().withCompression(Compression.GZIP)
.from(String.format(PATH_TO_GCS_TFRECORD_GZ, gcsSubDir) + "*"))
.apply("tfrecord_gz", ParDo.of(new CountDoFn("tfrecord_gz")));
// 4. Run pipeline.
PipelineResult res = pipeline.run();
res.waitUntilFinish();
// Check CountDoFn's metrics.
// The numbers should match.
Map<String, Long> counterValues = new TreeMap<String, Long>();
for (MetricResult<Long> counter : res.metrics().queryMetrics(MetricsFilter.builder().build()).counters()) {
counterValues.put(counter.name().name(), counter.committed());
}
StringBuffer sb = new StringBuffer();
sb.append("\n------------ counter metrics from CountDoFn\n");
for (Entry<String, Long> entry : counterValues.entrySet()) {
sb.append(String.format("[counter] %40s: %5d\n", entry.getKey(), entry.getValue()));
}
LOG.info(sb.toString());
}
}
}
This looks clearly like a bug in TFRecordIO. Channel.read() can read fewer bytes than the capacity of the input buffer. 8192 seems to be the buffer size in GzipCompressorInputStream. I filed https://issues.apache.org/jira/browse/BEAM-5412.
It is a bug, please see: https://issues.apache.org/jira/browse/BEAM-7695, I have solved it.

Creating ByteArray in Kotlin

Is there a better/shorter way in creating byte array from constant hex than the version below?
byteArrayOf(0xA1.toByte(), 0x2E.toByte(), 0x38.toByte(), 0xD4.toByte(), 0x89.toByte(), 0xC3.toByte())
I tried to put 0xA1 without .toByte() but I receive syntax error complaint saying integer literal does not conform to the expected type Byte. Putting integer is fine but I prefer in hex form since my source is in hex string. Any hints would be greatly appreciated. Thanks!
as an option you can create simple function
fun byteArrayOfInts(vararg ints: Int) = ByteArray(ints.size) { pos -> ints[pos].toByte() }
and use it
val arr = byteArrayOfInts(0xA1, 0x2E, 0x38, 0xD4, 0x89, 0xC3)
If all your bytes were less than or equal to 0x7F, you could put them directly:
byteArrayOf(0x2E, 0x38)
If you need to use bytes greater than 0x7F, you can use unsigned literals to make a UByteArray and then convert it back into a ByteArray:
ubyteArrayOf(0xA1U, 0x2EU, 0x38U, 0xD4U, 0x89U, 0xC3U).toByteArray()
I think it's a lot better than appending .toByte() at every element, and there's no need to define a custom function as well.
However, Kotlin's unsigned types are an experimental feature, so you may have some trouble with warnings.
The issue is that bytes in Kotlin are signed, which means they can only represent values in the [-128, 127] range. You can test this by creating a ByteArray like this:
val limits = byteArrayOf(-0x81, -0x80, -0x79, 0x00, 0x79, 0x80)
Only the first and last values will produce an error, because they are out of the valid range by 1.
This is the same behaviour as in Java, and the solution will probably be to use a larger number type if your values don't fit in a Byte (or offset them by 128, etc).
Side note: if you print the contents of the array you've created with toInt calls, you'll see that your values larger than 127 have flipped over to negative numbers:
val bytes = byteArrayOf(0xA1.toByte(), 0x2E.toByte(), 0x38.toByte(), 0xD4.toByte(), 0x89.toByte(), 0xC3.toByte())
println(bytes.joinToString()) // -95, 46, 56, -44, -119, -61
I just do:
val bytes = listOf(0xa1, 0x2e, 0x38, 0xd4, 0x89, 0xc3)
.map { it.toByte() }
.toByteArray()

How to debug Kotlin sequences / collections

Take the following one-liner, which can be expressed as a series of operations on a collection or a sequence:
val nums = (10 downTo 1)
// .asSequence() if we want this to be a sequence
.filter { it % 2 == 0 }
.map { it * it }
.sorted()
// .asList() if declaring it a sequence
println(nums) // [4, 16, 36, 64, 100]
Let's say I want to see the elements at each step, they would be (from deduction):
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[10, 8, 6, 4, 2]
[100, 64, 36, 16, 4]
[4, 16, 36, 64, 100]
Unfortunately, there's no good way to either debug this with a debugger or log these values for later inspection. With good functional programming constructs, entire methods can be rewritten as single statements like this but there seems to be no good way to inspect intermediate states, even counts (10, 5, 5, 5 here).
What's the best way to debug these?
You can log the intermediate values (lists) with
fun <T> T.log(): T { println(this); this }
//USAGE:
val nums = (10 downTo 1)
.filter { it % 2 == 0 }.log()
.map { it * it }.log()
.sorted().log()
This will work as desired since in your example you work with collections, not sequences. For lazy Sequence you need:
// coming in 1.1
public fun <T> Sequence<T>.onEach(action: (T) -> Unit): Sequence<T> {
return map {
action(it)
it
}
}
fun <T> Sequence<T>.log() = onEach {print(it)}
//USAGE:
val nums = (10 downTo 1).asSequance()
.filter { it % 2 == 0 }
.map { it * it }.log()
.sorted()
.toList()
In latest Intellij Idea when adding a breakpoint you have an option to set it to not inspect whole expression but only a Lambda body.
Then in the debug itself you can see what is happening inside of your Lambda.
But this is not the only way. You can also use Run to cursor (Alt + F9).
I think the current correct answer is that you want the Kotlin Sequence Debugger plugin, which lets you use IntelliJ's lovely Java stream debugger with Kotlin sequences.
Note that (unless I'm doing something wrong) it doesn't appear to work with collections, so you will have to convert the collection to a sequence in order to debug it. Easy enough using Iterable.asSequence, and a small price to pay -- you can always revert that change once you are done debugging.
you may use the also inline function to log, print at any sequence stage as explained by Andrey Breslav at Google I/O '18
(1..10)
.filter { it % 2 == 0 }
.also { e -> println(e) /* do your debug or print here */ }
.map { it * 2 }
.toList()