Beam Java SDK with TFRecord and Compression GZIP

Beam Java SDK with TFRecord and Compression GZIP - gzip

We're using Beam Java SDK (and Google Cloud Dataflow to run batch jobs) a lot, and we noticed something weird (possibly a bug?) when we tried to use TFRecordIO with Compression.GZIP. We were able to come up with some sample code that can reproduce the errors we face.
To be clear, we are using Beam Java SDK 2.4.
Suppose we have PCollection<byte[]> which can be a PC of proto messages, for instance, in byte[] format.
We usually write this to GCS (Google Cloud Storage) using Base64 encoding (newline delimited Strings) or using TFRecordIO (without compression). We have had no issue reading the data from GCS in this manner for a very long time (2.5+ years for the former and ~1.5 years for the latter).
Recently, we tried TFRecordIO with Compression.GZIP option, and sometimes we get an exception as the data is seen as invalid (while being read). The data itself (the gzip files) is not corrupted, and we've tested various things, and reached the following conclusion.
When a byte[] that is being compressed under TFRecordIO is above certain threshold (I'd say when at or above 8192), then TFRecordIO.read().withCompression(Compression.GZIP) would not work.
Specifically, it will throw the following exception:
Exception in thread "main" java.lang.IllegalStateException: Invalid data
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:642)
at org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526)
at org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:468)
at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:261)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:161)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:125)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This can be reproduced easily, so you can refer to the code at the end. You will also see comments about the byte array length (as I tested with various sizes, I concluded that 8192 is the magic number).
So I'm wondering if this is a bug or known issue -- I couldn't find anything close to this on Apache Beam's Issue Tracker here but if there is another forum/site I need to check, please let me know!
If this is indeed a bug, what would be the right channel to report this?
The following code can reproduce the error we have.
A successful run (with parameters 1, 39, 100) would show the following message at the end:
------------ counter metrics from CountDoFn
[counter] plain_base64_proto_array_len: 8126
[counter] plain_base64_proto_in: 1
[counter] plain_base64_proto_val_cnt: 39
[counter] tfrecord_gz_proto_array_len: 8126
[counter] tfrecord_gz_proto_in: 1
[counter] tfrecord_gz_proto_val_cnt: 39
[counter] tfrecord_uncomp_proto_array_len: 8126
[counter] tfrecord_uncomp_proto_in: 1
[counter] tfrecord_uncomp_proto_val_cnt: 39
With parameters (1, 40, 100) which would push the byte array length over 8192, it will throw the said exception.
You can tweak the parameters (inside CreateRandomProtoData DoFn) to see why the length of byte[] being gzipped matters.
It may help you also to use the following protoc-gen Java class (for TestProto used in the main code above. Here it is: gist link
References:
Main Code:
package exp.moloco.dataflow2.compression; // NOTE: Change appropriately.
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.TreeMap;
import org.apache.beam.runners.direct.DirectRunner;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.Compression;
import org.apache.beam.sdk.io.TFRecordIO;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.MetricResult;
import org.apache.beam.sdk.metrics.Metrics;
import org.apache.beam.sdk.metrics.MetricsFilter;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.PCollection;
import org.apache.commons.codec.binary.Base64;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.protobuf.InvalidProtocolBufferException;
import com.moloco.dataflow.test.StackOverflow.TestProto;
import com.moloco.dataflow2.Main;
// #formatter:off
// This code uses TestProto (java class) that is generated by protoc.
// The message definition is as follows (in proto3, but it shouldn't matter):
// message TestProto {
// int64 count = 1;
// string name = 2;
// repeated string values = 3;
// }
// Note that this code does not depend on whether this proto is used,
// or any other byte[] is used (see CreateRandomData DoFn later which generates the data being used in the code).
// We tested both, but are presenting this as a concrete example of how (our) code in production can be affected.
// #formatter:on
public class CompressionTester {
private static final Logger LOG = LoggerFactory.getLogger(CompressionTester.class);
static final List<String> lines = Arrays.asList("some dummy string that will not used in this job.");
// Some GCS buckets where data will be written to.
// %s will be replaced by some timestamped String for easy debugging.
static final String PATH_TO_GCS_PLAIN_BASE64 = Main.SOME_BUCKET + "/comp-test/%s/output-plain-base64";
static final String PATH_TO_GCS_TFRECORD_UNCOMP = Main.SOME_BUCKET + "/comp-test/%s/output-tfrecord-uncompressed";
static final String PATH_TO_GCS_TFRECORD_GZ = Main.SOME_BUCKET + "/comp-test/%s/output-tfrecord-gzip";
// This DoFn reads byte[] which represents a proto message (TestProto).
// It simply counts the number of proto objects it processes
// as well as the number of Strings each proto object contains.
// When the pipeline terminates, the values of the Counters will be printed out.
static class CountDoFn extends DoFn<byte[], TestProto> {
private final Counter protoIn;
private final Counter protoValuesCnt;
private final Counter protoByteArrayLength;
public CountDoFn(String name) {
protoIn = Metrics.counter(this.getClass(), name + "_proto_in");
protoValuesCnt = Metrics.counter(this.getClass(), name + "_proto_val_cnt");
protoByteArrayLength = Metrics.counter(this.getClass(), name + "_proto_array_len");
}
#ProcessElement
public void processElement(ProcessContext c) throws InvalidProtocolBufferException {
protoIn.inc();
TestProto tp = TestProto.parseFrom(c.element());
protoValuesCnt.inc(tp.getValuesCount());
protoByteArrayLength.inc(c.element().length);
}
}
// This DoFn emits a number of TestProto objects as byte[].
// Input to this DoFn is ignored (not used).
// Each TestProto object contains three fields: count (int64), name (string), and values (repeated string).
// The three parameters in DoFn determines
// (1) the number of proto objects to be generated,
// (2) the number of (repeated) strings to be added to each proto object, and
// (3) the length of (each) string.
// TFRecord with Compression (when reading) fails when the parameters are 1, 40, 100, for instance.
// TFRecord with Compression (when reading) succeeds when the parameters are 1, 39, 100, for instance.
static class CreateRandomProtoData extends DoFn<String, byte[]> {
static final int NUM_PROTOS = 1; // Total number of TestProto objects to be emitted by this DoFn.
static final int NUM_STRINGS = 40; // Total number of strings in each TestProto object ('repeated string').
static final int STRING_LEN = 100; // Length of each string object.
// Returns a random string of length len.
// For debugging purposes, the string only contains upper-case English alphabets.
static String getRandomString(Random rd, int len) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < len; i++) {
sb.append('A' + (rd.nextInt(26)));
}
return sb.toString();
}
// Returns a randomly generated TestProto object.
// Each string is generated randomly using getRandomString().
static TestProto getRandomProto(Random rd) {
TestProto.Builder tpBuilder = TestProto.newBuilder();
tpBuilder.setCount(rd.nextInt());
tpBuilder.setName(getRandomString(rd, STRING_LEN));
for (int i = 0; i < NUM_STRINGS; i++) {
tpBuilder.addValues(getRandomString(rd, STRING_LEN));
}
return tpBuilder.build();
}
// Emits TestProto objects are byte[].
#ProcessElement
public void processElement(ProcessContext c) {
// For debugging purposes, we set the seed here.
Random rd = new Random();
rd.setSeed(132475);
for (int n = 0; n < NUM_PROTOS; n++) {
byte[] data = getRandomProto(rd).toByteArray();
c.output(data);
// With parameters (1, 39, 100), the array length is 8126. It works fine.
// With parameters (1, 40, 100), the array length is 8329. It breaks TFRecord with GZIP.
System.out.println("\n--------------------------\n");
System.out.println("byte array length = " + data.length);
System.out.println("\n--------------------------\n");
}
}
}
public static void execute() {
PipelineOptions options = PipelineOptionsFactory.create();
options.setJobName("compression-tester");
options.setRunner(DirectRunner.class);
// For debugging purposes, write files under 'gcsSubDir' so we can easily distinguish.
final String gcsSubDir =
String.format("%s-%d", DateTime.now(DateTimeZone.UTC), DateTime.now(DateTimeZone.UTC).getMillis());
// Write PCollection<TestProto> in 3 different ways to GCS.
{
Pipeline pipeline = Pipeline.create(options);
// Create dummy data which is a PCollection of byte arrays (each array representing a proto message).
PCollection<byte[]> data = pipeline.apply(Create.of(lines)).apply(ParDo.of(new CreateRandomProtoData()));
// 1. Write as plain-text with base64 encoding.
data.apply(ParDo.of(new DoFn<byte[], String>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(new String(Base64.encodeBase64(c.element())));
}
})).apply(TextIO.write().to(String.format(PATH_TO_GCS_PLAIN_BASE64, gcsSubDir)).withNumShards(1));
// 2. Write as TFRecord.
data.apply(TFRecordIO.write().to(String.format(PATH_TO_GCS_TFRECORD_UNCOMP, gcsSubDir)).withNumShards(1));
// 3. Write as TFRecord-gzip.
data.apply(TFRecordIO.write().withCompression(Compression.GZIP)
.to(String.format(PATH_TO_GCS_TFRECORD_GZ, gcsSubDir)).withNumShards(1));
pipeline.run().waitUntilFinish();
}
LOG.info("-------------------------------------------");
LOG.info(" READ TEST BEGINS ");
LOG.info("-------------------------------------------");
// Read PCollection<TestProto> in 3 different ways from GCS.
{
Pipeline pipeline = Pipeline.create(options);
// 1. Read as plain-text.
pipeline.apply(TextIO.read().from(String.format(PATH_TO_GCS_PLAIN_BASE64, gcsSubDir) + "*"))
.apply(ParDo.of(new DoFn<String, byte[]>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(Base64.decodeBase64(c.element()));
}
})).apply("plain-base64", ParDo.of(new CountDoFn("plain_base64")));
// 2. Read as TFRecord -> byte array.
pipeline.apply(TFRecordIO.read().from(String.format(PATH_TO_GCS_TFRECORD_UNCOMP, gcsSubDir) + "*"))
.apply("tfrecord-uncomp", ParDo.of(new CountDoFn("tfrecord_uncomp")));
// 3. Read as TFRecord-gz -> byte array.
// This seems to fail when 'data size' becomes large.
pipeline
.apply(TFRecordIO.read().withCompression(Compression.GZIP)
.from(String.format(PATH_TO_GCS_TFRECORD_GZ, gcsSubDir) + "*"))
.apply("tfrecord_gz", ParDo.of(new CountDoFn("tfrecord_gz")));
// 4. Run pipeline.
PipelineResult res = pipeline.run();
res.waitUntilFinish();
// Check CountDoFn's metrics.
// The numbers should match.
Map<String, Long> counterValues = new TreeMap<String, Long>();
for (MetricResult<Long> counter : res.metrics().queryMetrics(MetricsFilter.builder().build()).counters()) {
counterValues.put(counter.name().name(), counter.committed());
}
StringBuffer sb = new StringBuffer();
sb.append("\n------------ counter metrics from CountDoFn\n");
for (Entry<String, Long> entry : counterValues.entrySet()) {
sb.append(String.format("[counter] %40s: %5d\n", entry.getKey(), entry.getValue()));
}
LOG.info(sb.toString());
}
}
}

This looks clearly like a bug in TFRecordIO. Channel.read() can read fewer bytes than the capacity of the input buffer. 8192 seems to be the buffer size in GzipCompressorInputStream. I filed https://issues.apache.org/jira/browse/BEAM-5412.

It is a bug, please see: https://issues.apache.org/jira/browse/BEAM-7695, I have solved it.

Related

Resume a ml-agents training after changing hyperparameters and adding new observation vectors

I am working on training an agents thanks to ml-agents with Unity. When I changed the number of stacked vector, the observation vectors and the hyperparameters I can not resume the training from the last training because tensorflow tells me there is a problem for the lhs rhs shape that are not the same.
I would like to be able to change the agent scripts and config scripts and resume the training with this new parameters not to loose the past progress the agent made...Because for the moment I must restart a new training or not change the number of observations vectors etc.
How to do so ?
Thank you very much.
EDIT : Here an example of what I want to test and what errors I got with RollerBall ML-agents tutorial. See here https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md
GOAL : I want to see the impact of the observations vector choice on the agent's training.
I ran a learning with the basic script for the agent given in the tutorial. Here it is :
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
public class RollerAgent : Agent
{
Rigidbody rBody;
void Start()
{
rBody = GetComponent();
}
public Transform Target;
public override void OnEpisodeBegin()
{
if (this.transform.localPosition.y < 0)
{
// If the Agent fell, zero its momentum
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;
this.transform.localPosition = new Vector3(0, 0.5f, 0);
}
// Move the target to a new spot
Target.localPosition = new Vector3(Random.value * 8 - 4,
0.5f,
Random.value * 8 - 4);
}
public override void CollectObservations(VectorSensor sensor)
{
// Target and Agent positions
sensor.AddObservation(Target.localPosition);
sensor.AddObservation(this.transform.localPosition);
// Agent velocity
sensor.AddObservation(rBody.velocity.x);
sensor.AddObservation(rBody.velocity.z);
}
public float speed = 10;
public override void OnActionReceived(float[] vectorAction)
{
// Actions, size = 2
Vector3 controlSignal = Vector3.zero;
controlSignal.x = vectorAction[0];
controlSignal.z = vectorAction[1];
rBody.AddForce(controlSignal * speed);
// Rewards
float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);
// Reached target
if (distanceToTarget < 1.42f)
{
SetReward(1.0f);
EndEpisode();
}
// Fell off platform
if (this.transform.localPosition.y < 0)
{
EndEpisode();
}
}
public override void Heuristic(float[] actionsOut)
{
actionsOut[0] = Input.GetAxis("Horizontal");
actionsOut[1] = Input.GetAxis("Vertical");
}
}
I stopped the training before the agent hit the benchmark.
I suppressed the observation vectors concerning the velocity observation of the agent and adjusted the number of observation vector in unity from 8 to 6. Here is the new code :
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
public class RollerAgent : Agent
{
Rigidbody rBody;
void Start()
{
rBody = GetComponent();
}
public Transform Target;
public override void OnEpisodeBegin()
{
if (this.transform.localPosition.y < 0)
{
// If the Agent fell, zero its momentum
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;
this.transform.localPosition = new Vector3(0, 0.5f, 0);
}
// Move the target to a new spot
Target.localPosition = new Vector3(Random.value * 8 - 4,
0.5f,
Random.value * 8 - 4);
}
public override void CollectObservations(VectorSensor sensor)
{
// Target and Agent positions
sensor.AddObservation(Target.localPosition);
sensor.AddObservation(this.transform.localPosition);
// Agent velocity
//sensor.AddObservation(rBody.velocity.x);
//sensor.AddObservation(rBody.velocity.z);
}
public float speed = 10;
public override void OnActionReceived(float[] vectorAction)
{
// Actions, size = 2
Vector3 controlSignal = Vector3.zero;
controlSignal.x = vectorAction[0];
controlSignal.z = vectorAction[1];
rBody.AddForce(controlSignal * speed);
// Rewards
float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);
// Reached target
if (distanceToTarget < 1.42f)
{
SetReward(1.0f);
EndEpisode();
}
// Fell off platform
if (this.transform.localPosition.y < 0)
{
EndEpisode();
}
}
public override void Heuristic(float[] actionsOut)
{
actionsOut[0] = Input.GetAxis("Horizontal");
actionsOut[1] = Input.GetAxis("Vertical");
}
}
I ran again with the same ID and I RESUMED the training so as to keep the advancement made during the last training. But when I pressed the play button on the Unity editor I got this error :
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Restoring from checkpoint failed. This is most likely due to a
mismatch between the current graph and the graph from the checkpoint.
Please ensure that you have not altered the graph expected based on
the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [6,128]
rhs shape= [8,128]
[[node save_1/Assign_26 (defined at c:\users\jeann\anaconda3\envs\ml-agents-1.0.2\lib\site-packages\mlagents\trainers\policy\tf_policy.py:115)
]]
Errors may have originated from an input operation.
I know that it makes non-sense to use the advancement of the last training whereas I use a new brain configuration for the agent, but in the project I am currently working on, I need to keep the improvement made by the agent before even if we change the observation vectors. Is there a way to do so or it is impossible ?
Thank you :)

Sage: Iterate over increasing sequences

I have a problem that I am unwilling to believe hasn't been solved before in Sage.
Given a pair of integers (d,n) as input, I'd like to receive a list (or set, or whatever) of all nondecreasing sequences of length d all of whose entries are no greater than n.
Similarly, I'd like another function which returns all strictly increasing sequences of length d whose entries are no greater than n.
For example, for d = 2 n=3, I'd receive the output:
[[1,2], [1,3], [2,3]]
or
[[1,1], [1,2], [1,3], [2,2], [2,3], [3,3]]
depending on whether I'm using increasing or nondecreasing.
Does anyone know of such a function?
Edit Of course, if there is such a method for nonincreasing or decreasing sequences, I can modify that to fit my purposes. Just something to iterate over sequences

I needed this algorithm too and I finally managed to write one today. I will share the code here, but I only started to learn coding last week, so it is not pretty.
Idea Input=(r,d). Step 1) Create a class "ListAndPosition" that has a list L of arrays Integer[r+1]'s, and an integer q between 0 and r. Step 2) Create a method that receives a ListAndPosition (L,q) and screens sequentially the arrays in L checking if the integer at position q is less than the one at position q+1, if so, it adds a new array at the bottom of the list with that entry ++. When done, the Method calls itself again with the new list and q-1 as input.
The code for Step 1)
import java.util.ArrayList;
public class ListAndPosition {
public static Integer r=5;
public final ArrayList<Integer[]> L;
public int q;
public ListAndPosition(ArrayList<Integer[]> L, int q) {
this.L = L;
this.q = q;
}
public ArrayList<Integer[]> getList(){
return L;
}
public int getPosition() {
return q;
}
public void decreasePosition() {
q--;
}
public void showList() {
for(int i=0;i<L.size();i++){
for(int j=0; j<r+1 ; j++){
System.out.print(""+L.get(i)[j]);
}
System.out.println("");
}
}
}
The code for Step 2)
import java.util.ArrayList;
public class NonDecreasingSeqs {
public static Integer r=5;
public static Integer d=3;
public static void main(String[] args) {
//Creating the first array
Integer[] firstArray;
firstArray = new Integer[r+1];
for(int i=0;i<r;i++){
firstArray[i] = 0;
}
firstArray[r] = d;
//Creating the starting listAndDim
ArrayList<Integer[]> L = new ArrayList<Integer[]>();
L.add(firstArray);
ListAndPosition Lq = new ListAndPosition(L,r-1);
System.out.println(""+nonDecSeqs(Lq).size());
}
public static ArrayList<Integer[]> nonDecSeqs(ListAndPosition Lq){
int iterations = r-1-Lq.getPosition();
System.out.println("How many arrays in the list after "+iterations+" iterations? "+Lq.getList().size());
System.out.print("Should we stop the iteration?");
if(0<Lq.getPosition()){
System.out.println(" No, position = "+Lq.getPosition());
for(int i=0;i<Lq.getList().size();i++){
//Showing particular array
System.out.println("Array of L #"+i+":");
for(int j=0;j<r+1;j++){
System.out.print(""+Lq.getList().get(i)[j]);
}
System.out.print("\nCan it be modified at position "+Lq.getPosition()+"?");
if(Lq.getList().get(i)[Lq.getPosition()]<Lq.getList().get(i)[Lq.getPosition()+1]){
System.out.println(" Yes, "+Lq.getList().get(i)[Lq.getPosition()]+"<"+Lq.getList().get(i)[Lq.getPosition()+1]);
{
Integer[] tempArray = new Integer[r+1];
for(int j=0;j<r+1;j++){
if(j==Lq.getPosition()){
tempArray[j] = new Integer(Lq.getList().get(i)[j])+1;
}
else{
tempArray[j] = new Integer(Lq.getList().get(i)[j]);
}
}
Lq.getList().add(tempArray);
}
System.out.println("New list");Lq.showList();
}
else{
System.out.println(" No, "+Lq.getList().get(i)[Lq.getPosition()]+"="+Lq.getList().get(i)[Lq.getPosition()+1]);
}
}
System.out.print("Old position = "+Lq.getPosition());
Lq.decreasePosition();
System.out.println(", new position = "+Lq.getPosition());
nonDecSeqs(Lq);
}
else{
System.out.println(" Yes, position = "+Lq.getPosition());
}
return Lq.getList();
}
}
Remark: I needed my sequences to start at 0 and end at d.

This is probably not a very good answer to your question. But you could, in principle, use Partitions and the max_slope=-1 argument. Messing around with filtering lists of IntegerVectors sounds equally inefficient and depressing for other reasons.
If this has a canonical name, it might be in the list of sage-combinat functionality, and there is even a base class you could perhaps use for integer lists, which is basically what you are asking about. Maybe you could actually get what you want using IntegerListsLex? Hope this proves helpful.

This question can be solved by using the class "UnorderedTuples" described here:
http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/tuple.html
To return all all nondecreasing sequences with entries between 0 and n-1 of length d, you may type:
UnorderedTuples(range(n),d)
This returns the nondecreasing sequence as a list. I needed an immutable object (because the sequences would become keys of a dictionary). So I used the "tuple" method to turn the lists into tuples:
immutables = []
for s in UnorderedTuples(range(n),d):
immutables.append(tuple(s))
return immutables
And I also wrote a method which picks out only the increasing sequences:
def isIncreasing(list):
for i in range(len(list) - 1):
if list[i] >= list[i+1]:
return false
return true
The method that returns only strictly increasing sequences would look like
immutables = []
for s in UnorderedTuples(range(n),d):
if isIncreasing(s):
immutables.append(tuple(s))
return immutables

runge kutta 4th order to solve system of differential equation

dT/dt=(1.344-1.025T)/h (1)
dh/dt=0.025-(3.5*10^-4)*sqrt(h) (2)
h(0)=1
T(0)=1
I have to solve this system of equations in fortran. I solved the problem in matlab but I dont know fortran programming so guys if somebody can help me or somebody have the fortran code for this help me please please please
thanks a lot

Try it with Euler integration. Do something simple first. You have one advantage: you've solved this once, so you know what the answer looks like when you get it.
Since the moderators are insisting this is a low quality answer because of the short length, I'll provide a working one in Java that should spark some thoughts for you. I used the Apache Commons math library; it has several different ODE integration schemes, including Euler and Runge Kutta.
I ran this on a Windows 7 machine using JDK 8. You can switch between Euler and Runge-Kutta using the command line:
package math.ode;
import org.apache.commons.math3.exception.DimensionMismatchException;
import org.apache.commons.math3.exception.MaxCountExceededException;
import org.apache.commons.math3.ode.FirstOrderDifferentialEquations;
import org.apache.commons.math3.ode.FirstOrderIntegrator;
import org.apache.commons.math3.ode.nonstiff.ClassicalRungeKuttaIntegrator;
import org.apache.commons.math3.ode.nonstiff.EulerIntegrator;
/**
* IntegrationExample solves coupled ODEs using Euler and Runge Kutta
* Created by Michael
* Creation date 12/20/2015.
* #link https://stackoverflow.com/questions/20065521/dependencies-for-jama-in-maven
*/
public class IntegrationExample {
public static final double DEFAULT_STEP_SIZE = 0.001;
private static final double DEFAULT_MAX_TIME = 2.0;
public static void main(String[] args) {
// Problem set up
double step = (args.length > 0) ? Double.valueOf(args[0]) : DEFAULT_STEP_SIZE;
double maxTime = (args.length > 1) ? Double.valueOf(args[1]) : DEFAULT_MAX_TIME;
String integratorName = (args.length > 2) ? args[2] : "euler";
// Choose different integration schemes here.
FirstOrderIntegrator firstOrderIntegrator = getFirstOrderIntegrator(step, integratorName);
// Equations to solve here; see class below
FirstOrderDifferentialEquations odes = new CoupledOdes();
double [] y = ((CoupledOdes) odes).getInitialConditions();
double t = 0.0;
int i = 0;
while (t <= maxTime) {
System.out.println(String.format("%5d %10.6f %10.6f %10.6f", i, t, y[0], y[1]));
firstOrderIntegrator.integrate(odes, t, y, t+step, y);
t += step;
++i;
}
}
private static FirstOrderIntegrator getFirstOrderIntegrator(double step, String integratorName) {
FirstOrderIntegrator firstOrderIntegrator;
if ("runge-kutta".equalsIgnoreCase(integratorName)) {
firstOrderIntegrator = new ClassicalRungeKuttaIntegrator(step);
} else {
firstOrderIntegrator = new EulerIntegrator(step);
}
return firstOrderIntegrator;
}
}
class CoupledOdes implements FirstOrderDifferentialEquations {
public double [] getInitialConditions() {
return new double [] { 1.0, 1.0 };
}
#Override
public int getDimension() {
return 2;
}
#Override
public void computeDerivatives(double t, double[] y, double[] yDot) throws MaxCountExceededException, DimensionMismatchException {
yDot[0] = (1.344-1.025*y[0])/y[1];
yDot[1] = 0.025-3.5e-4*Math.sqrt(y[1]);
}
}
You didn't say how far out you needed to integrate in time, so I assumed 2.0 as the max time. You can change this on the command line, too.
Here's the plot of results versus time from Excel. As you can see, the responses are smooth and well behaved. Euler has no problem with systems of equations like this.

Dictionary in protocol buffers

Is there any way to serialize a dictionary using protocol buffers, or I'll have to use Thrift if I need that?

For future answer seekers, ProtoBuf now supports Maps natively:
message MapMessage
{
map<string, string> MyMap = 1;
}

Protobuf specification now supports dictionaries (maps) natively.
Original answer
People typically write down the dictionary as a list of key-value pairs, and then rebuild the dictionary on the other end.
message Pair {
string key = 1;
string value = 2;
}
message Dictionary {
repeated Pair pairs = 1;
}

You can check the ProtoText package.
Assume you want to serialize a dict person_dict to a pre-defined PersonBuf protobuf object defined in personbuf_pb2 module.
In this case, to use ProtoText,
import ProtoText
from personbuf_pb2 import PersonBuf
obj = PersonBuf()
obj.update(person_dict)

I firstly comment the #Flassari 's answer as it is really convenient.
However, in my case, I needed map<Type, repeated AnyModel> where :
enum Type {
Undefined = 0;
Square = 1;
Circle = 2;
}
message AnyModel {
string Name = 1;
}
Here I just want to return a dictionary that, for each type, contain a list of AnyModel
However, I didn't find a better workaround than the one proposed by #JesperE so I did the following: (as you can't use enum as key in map)
message MyRPCBodyCall {
map<string, AnyModels> Models = 1;
}
enum Type {
Undefined = 0;
Square = 1;
Circle = 2;
}
message AnyModel {
string Name = 1;
}
message AnyModelArray {
repeated AnyModel AnyModels = 1;
}
Here I convert from/to string my enum using my chosen code languages from both server/client side
So both approaches are actually valid answers IMO, depends on your requirements.

generating 9 digit ids without database sequence

I'd like to create 9-digit numeric ids that are unique across machines. I'm currently using a database sequence for this, but am wondering if it could be done without one. The sequences will be used for X12 EDI transactions, so they don't have to be unique forever. Maybe even only unique for 24 hours.
My only idea:
Each server has a 2 digit server identifier.
Each server maintains a file that essentially keeps track of a local sequence.
id = + <7 digit sequence which wraps>
My biggest problem with this is what to do if the hard-drive fails. I wouldn't know where it left off.
All of my other ideas essentially end up re-creating a centralized database sequence.
Any thoughts?

The Following
{XX}{dd}{HHmm}{N}
Where {XX} is the machine number {dd} is the day of the month {HHmm} current time (24hr) and {N} a sequential number.
A hd crash will take more than a minute so starting at 0 again is not a problem.
You can also replace {dd} with {ss} for seconds, depending on requirements. Uniqueness period vs. requests per minute.

If HD fails you can just set new and unused 2 digit server identifier and be sure that the number is unique (for 24 hours at least)

How about generating GUIDs (ensures uniqueness) and then using some sort of hash function to turn the GUID into a 9-digit number?
Just off the top of my head...

Use a variation on:
md5(uniqid(rand(), true));
Just a thought.

In my recent project I also come across this requirement, to generate N digit long sequence number without any database.
This is actually a good Interview question, because there are consideration on performance and software crash recovery. Further Reading if interested.
The following code has these features:
Prefix each sequence with a prefix.
Sequence cache like Oracle Sequence.
Most importantly, there is recovery logic to resume sequence from software crash.
Complete implementation attached:
import java.util.concurrent.atomic.AtomicLong;
import org.apache.commons.lang.StringUtils;
/**
* This is a customized Sequence Generator which simulates Oracle DB Sequence Generator. However the master sequence
* is stored locally in the file as there is no access to Oracle database. The output format is "prefix" + number.
* <p>
* <u><b>Sample output:</u></b><br>
* 1. FixLengthIDSequence(null,null,15,0,99,0) will generate 15, 16, ... 99, 00<br>
* 2. FixLengthIDSequence(null,"K",1,1,99,0) will generate K01, K02, ... K99, K01<br>
* 3. FixLengthIDSequence(null,"SG",100,2,9999,100) will generate SG0100, SG0101, ... SG8057, (in case server crashes, the new init value will start from last cache value+1) SG8101, ... SG9999, SG0002<br>
*/
public final class FixLengthIDSequence {
private static String FNAME;
private static String PREFIX;
private static AtomicLong SEQ_ID;
private static long MINVALUE;
private static long MAXVALUE;
private static long CACHEVALUE;
// some internal working values.
private int iMaxLength; // max numeric length excluding prefix, for left padding zeros.
private long lNextSnapshot; // to keep track of when to update sequence value to file.
private static boolean bInit = false; // to enable ShutdownHook routine after program has properly initialized
static {
// Inspiration from http://stackoverflow.com/questions/22416826/sequence-generator-in-java-for-unique-id#35697336.
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (bInit) { // Without this, saveToLocal may hit NullPointerException.
saveToLocal(SEQ_ID.longValue());
}
}));
}
/**
* This POJO style constructor should be initialized via Spring Singleton. Otherwise, rewrite this constructor into Singleton design pattern.
*
* #param sFilename This is the absolute file path to store the sequence number. To reset the sequence, this file needs to be removed manually.
* #param prefix The hard-coded identifier.
* #param initvalue
* #param minvalue
* #param maxvalue
* #param cache
* #throws Exception
*/
public FixLengthIDSequence(String sFilename, String prefix, long initvalue, long minvalue, long maxvalue, int cache) throws Exception {
bInit = false;
FNAME = (sFilename==null)?"C:\\Temp\\sequence.txt":sFilename;
PREFIX = (prefix==null)?"":prefix;
SEQ_ID = new AtomicLong(initvalue);
MINVALUE = minvalue;
MAXVALUE = maxvalue; iMaxLength = Long.toString(MAXVALUE).length();
CACHEVALUE = (cache <= 0)?1:cache; lNextSnapshot = roundUpNumberByMultipleValue(initvalue, cache); // Internal cache is always 1, equals no cache.
// If sequence file exists and valid, restore the saved sequence.
java.io.File f = new java.io.File(FNAME);
if (f.exists()) {
String[] saSavedSequence = loadToString().split(",");
if (saSavedSequence.length != 6) {
throw new Exception("Local Sequence file is not valid");
}
PREFIX = saSavedSequence[0];
//SEQ_ID = new AtomicLong(Long.parseLong(saSavedSequence[1])); // savedInitValue
MINVALUE = Long.parseLong(saSavedSequence[2]);
MAXVALUE = Long.parseLong(saSavedSequence[3]); iMaxLength = Long.toString(MAXVALUE).length();
CACHEVALUE = Long.parseLong(saSavedSequence[4]);
lNextSnapshot = Long.parseLong(saSavedSequence[5]);
// For sequence number recovery
// The rule to determine to continue using SEQ_ID or lNextSnapshot as subsequent sequence number:
// If savedInitValue = savedSnapshot, it was saved by ShutdownHook -> use SEQ_ID.
// Else if saveInitValue < savedSnapshot, it was saved by periodic Snapshot -> use lNextSnapshot+1.
if (saSavedSequence[1].equals(saSavedSequence[5])) {
long previousSEQ = Long.parseLong(saSavedSequence[1]);
SEQ_ID = new AtomicLong(previousSEQ);
lNextSnapshot = roundUpNumberByMultipleValue(previousSEQ,CACHEVALUE);
} else {
SEQ_ID = new AtomicLong(lNextSnapshot+1); // SEQ_ID starts fresh from lNextSnapshot+!.
lNextSnapshot = roundUpNumberByMultipleValue(SEQ_ID.longValue(),CACHEVALUE);
}
}
// Catch invalid values.
if (minvalue < 0) {
throw new Exception("MINVALUE cannot be less than 0");
}
if (maxvalue < 0) {
throw new Exception("MAXVALUE cannot be less than 0");
}
if (minvalue >= maxvalue) {
throw new Exception("MINVALUE cannot be greater than MAXVALUE");
}
if (cache >= maxvalue) {
throw new Exception("CACHE value cannot be greater than MAXVALUE");
}
// Save the next Snapshot.
saveToLocal(lNextSnapshot);
bInit = true;
}
/**
* Equivalent to Oracle Sequence nextval.
* #return String because Next Value is usually left padded with zeros, e.g. "00001".
*/
public String nextVal() {
if (SEQ_ID.longValue() > MAXVALUE) {
SEQ_ID.set(MINVALUE);
lNextSnapshot = roundUpNumberByMultipleValue(MINVALUE,CACHEVALUE);
}
if (SEQ_ID.longValue() > lNextSnapshot) {
lNextSnapshot = roundUpNumberByMultipleValue(lNextSnapshot,CACHEVALUE);
saveToLocal(lNextSnapshot);
}
return PREFIX.concat(StringUtils.leftPad(Long.toString(SEQ_ID.getAndIncrement()),iMaxLength,"0"));
}
/**
* Store sequence value into the local file. This routine is called either by Snapshot or ShutdownHook routines.<br>
* If called by Snapshot, currentCount == Snapshot.<br>
* If called by ShutdownHook, currentCount == current SEQ_ID.
* #param currentCount - This value is inserted by either Snapshot or ShutdownHook routines.
*/
private static void saveToLocal (long currentCount) {
try (java.io.Writer w = new java.io.BufferedWriter(new java.io.OutputStreamWriter(new java.io.FileOutputStream(FNAME), "utf-8"))) {
w.write(PREFIX + "," + SEQ_ID.longValue() + "," + MINVALUE + "," + MAXVALUE + "," + CACHEVALUE + "," + currentCount);
w.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Load the sequence file content into String.
* #return
*/
private String loadToString() {
try {
return new String(java.nio.file.Files.readAllBytes(java.nio.file.Paths.get(FNAME)));
} catch (Exception e) {
e.printStackTrace();
}
return "";
}
/**
* Utility method to round up num to next multiple value. This method is used to calculate the next cache value.
* <p>
* (Reference: http://stackoverflow.com/questions/18407634/rounding-up-to-the-nearest-hundred)
* <p>
* <u><b>Sample output:</b></u>
* <pre>
* System.out.println(roundUpNumberByMultipleValue(9,10)); = 10
* System.out.println(roundUpNumberByMultipleValue(10,10)); = 20
* System.out.println(roundUpNumberByMultipleValue(19,10)); = 20
* System.out.println(roundUpNumberByMultipleValue(100,10)); = 110
* System.out.println(roundUpNumberByMultipleValue(109,10)); = 110
* System.out.println(roundUpNumberByMultipleValue(110,10)); = 120
* System.out.println(roundUpNumberByMultipleValue(119,10)); = 120
* </pre>
*
* #param num Value must be greater and equals to positive integer 1.
* #param multiple Value must be greater and equals to positive integer 1.
* #return
*/
private long roundUpNumberByMultipleValue(long num, long multiple) {
if (num<=0) num=1;
if (multiple<=0) multiple=1;
if (num % multiple != 0) {
long division = (long) ((num / multiple) + 1);
return division * multiple;
} else {
return num + multiple;
}
}
/**
* Main method for testing purpose.
* #param args
*/
public static void main(String[] args) throws Exception {
//FixLengthIDSequence(Filename, prefix, initvalue, minvalue, maxvalue, cache)
FixLengthIDSequence seq = new FixLengthIDSequence(null,"H",50,1,999,10);
for (int i=0; i<12; i++) {
System.out.println(seq.nextVal());
Thread.sleep(1000);
//if (i==8) { System.exit(0); }
}
}
}
To test the code, let the sequence run normally. You can press Ctrl+C to simulate the server crash. The next sequence number will continue from NextSnapshot+1.

Cold you use the first 9 digits of some other source of unique data like:
a random number
System Time
Uptime
Having thaught about it for two seconds, none of those are unique on there own but you could use them as seed values for hash functions as was suggested in another answer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Beam Java SDK with TFRecord and Compression GZIP - gzip

This looks clearly like a bug in TFRecordIO. Channel.read() can read fewer bytes than the capacity of the input buffer. 8192 seems to be the buffer size in GzipCompressorInputStream. I filed https://issues.apache.org/jira/browse/BEAM-5412.

It is a bug, please see: https://issues.apache.org/jira/browse/BEAM-7695, I have solved it.

Related

Resume a ml-agents training after changing hyperparameters and adding new observation vectors

Sage: Iterate over increasing sequences

runge kutta 4th order to solve system of differential equation

Dictionary in protocol buffers

generating 9 digit ids without database sequence

Categories

Resources