How to fix dead kernel while executing tensorflow .fit - tensorflow

The kernel is dead while executing model.fit(train_generator,epochs=20), but the same code works on another pc.
This is a juputer log:
Warn 17:03:05: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1204175)
at c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1223227
at Map.forEach (<anonymous>)
at v._clearKernelState (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1223212)
at v.dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1216694)
at c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:533674
at t.swallowExceptions (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:913059)
at dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:533652)
at t.RawSession.dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:537330)
at processTicksAndRejections (node:internal/process/task_queues:96:5)]
Warn 17:03:05: Cell completed with errors [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1204175)
at c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1223227
at Map.forEach (<anonymous>)
at v._clearKernelState (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1223212)
at v.dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:1216694)
at c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:533674
at t.swallowExceptions (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:913059)
at dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:533652)
at t.RawSession.dispose (c:\Users\ivanf\.vscode\extensions\ms-toolsai.jupyter-2022.3.1000901801\out\extension.js:2:537330)
at processTicksAndRejections (node:internal/process/task_queues:96:5)]

There are many reasons for kernel to die. But most common reason is memory.
you may reduce your batch size which may help for your computer.
or try to use .py instead of jupyter notebook
refer: https://github.com/tensorflow/tensorflow/issues/9829

Related

Debugging "Transaction simulation failed" when sending program instruction (Solana Solidity)

When attempting to make a call to a program compiled with #solana/solidity, I'm getting the following error:
Transaction simulation failed: Error processing Instruction 0: Program failed to complete
Program jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N invoke [1]
Program log: pxKTQePwHC9MiR52J5AYaRtSLAtkVfcoGS3GaLD24YX
Program log: sender account missing from transaction
Program jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N consumed 200000 of 200000 compute units
Program failed to complete: BPF program Panicked in solana.c at 285:0
Program jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N failed: Program failed to complete
jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N is the program's public key and pxKTQePwHC9MiR52J5AYaRtSLAtkVfcoGS3GaLD24YX is the sender's public key.
I'm using a fork of the #solana/solidity library that exposes the Transaction object so that it can be signed and sent by Phantom Wallet on the front end. The code that results in the error is as follows:
// Generate the transaction
const transaction = contract.transactions.send(...args);
// Add recent blockhash and fee payer
const recentBlockhash = (await connection.getRecentBlockhash()).blockhash;
transaction.recentBlockhash = recentBlockhash;
transaction.feePayer = provider.publicKey;
// Sign and send the transaction (throws an error)
const res = await provider.signAndSendTransaction(transaction);
I would attempt to debug this further myself, but I'm not sure where to start. Looking up the error message hasn't yielded any results and the error message isn't very descriptive. I'm not sure if this error is occurring within the program execution itself or if it's an issue with the composition of the transaction object. If it is an issue within the program execution, is there a way for me to add logs to my solidity code? If it's an issue with the transaction object, what could be missing? How can I better debug issues like this?
Thank you for any help.
Edit: I'm getting a different error now, although I haven't changed any of the provided code. The error message is now the following:
Phantom - RPC Error: Transaction creation failed. {code: -32003, message: 'Transaction creation failed.'}
Unfortunately this error message is even less helpful than the last one. I'm not sure if Phantom Wallet was updated or if a project dependency was updated at some point, but given the vague nature of both of these error messages and the fact that none of my code has changed, I believe they're being caused by the same issue. Again, any help or debugging tips are appreciated.
I was able to solve this issue, and although I've run into another issue it's not related to the contents of this question so I'll post it separately.
Regarding my edit, I found that the difference between the error messages came down to how I was sending the transaction. At first, I tried sending it with Phantom's .signAndSendTransaction() method, which yielded the second error message (listed under my edit). Then I tried signing & sending the transaction manually, like so:
const signed = await provider.request({
method: 'signTransaction',
params: {
message: bs58.encode(transaction.serializeMessage()),
},
});
const signature = bs58.decode(signed.signature);
transaction.addSignature(provider.publicKey, signature);
await connection.sendRawTransaction(transaction.serialize())
Which yielded the more verbose error included in my original post. That error message did turn out to be helpful once I realized what to look for -- the sending account's public key was missing from the keys field on the TransactionInstruction on the Transaction. I added it in my fork of the #solana/solidity library and the error went away.
In short, the way I was able to debug this was by
Using provider.request({ method: 'signTransaction' }) and connection.sendRawTransaction(transaction) rather than Phantom's provider.signAndSendTransaction() method for a more verbose error message
Logging the transaction object and inspecting the instructions closely
I hope this helps someone else in the future.

Issues while running with BigQueryIO.Write.Method.STORAGE_WRITE_API

We are testing with STORAGE_WRITE_API to insert data into BigQuery. We've seen several errors/warnings in our Dataflow pipeline(written in Java). It might work well in the beginning, but eventually the system lag would be increasing, it would stop processing any data from PubSub and the unacked messages piled up.
One common warning is:
Operation ongoing in step insertTableRowsToBigQuery/StorageApiLoads/StorageApiWriteSharded/Write Records for at least 03h35m00s without outputting or completing in state process
at java.base#11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
at java.base#11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
at java.base#11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Callback.await(RetryManager.java:153)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Operation.await(RetryManager.java:136)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager.await(RetryManager.java:256)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager.run(RetryManager.java:248)
at app//org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn.process(StorageApiWritesShardedRecords.java:453)
at app//org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn$DoFnInvoker.invokeProcessElement(Unknown Source)
Other exceptions we've seen:
Got error io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Stream is closed
Got error io.grpc.StatusRuntimeException: ALREADY_EXIST
PodSandboxStatus of sandbox "..." for pod "df-...-pipeline-...-harness-qw4j_default(...)" error: rpc error: code = Unknown desc = Error: No such container
Code sample:
toBq.apply("insertTableRowsToBigQuery",
BigQueryIO
.writeTableRows()
.to(String.format("%s:%s.%s", PROJECT_ID, DATASET, table))
.withTriggeringFrequency(Duration.standardSeconds(options.getTriggeringFrequency()))
.withNumStorageWriteApiStreams(options.getNumStorageWriteApiStreams())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
There was a production issue related to connection being stuck after streaming 10MB which has been fixed. If you try again, it should work.

Flutter SIGINT error on run all tests on VSCODE

When trying to "Run All Tests" from flutter/dart on VSCODE, I'm getting a SIGINT error and the test finish with "loading"on file, only the first one goes ok. The problem is that the same thing does not happen if I run the tests one by one.
loading /Users/marciomontenegro/Documents/Projects/mel/test/domain/usecases/sign_in_test.dart:
ERROR: Failed to load "/Users/marciomontenegro/Documents/Projects/mel/test/domain/usecases/sign_in_test.dart":
Shell subprocess terminated by ^C (SIGINT, -2) before connecting to test harness.
Test: /Users/marciomontenegro/Documents/Projects/mel/test/domain/usecases/sign_in_test.dart
Shell: /Users/marciomontenegro/flutter/bin/cache/artifacts/engine/darwin-x64/flutter_tester
dart:async/stream_controller.dart 595:43 _StreamController.addError
dart:async/stream_controller.dart 862:13 _StreamSinkWrapper.addError
package:stream_channel/src/guarantee_channel.dart 144:14 _GuaranteeSink._addError
package:stream_channel/src/guarantee_channel.dart 135:5 _GuaranteeSink.addError
package:flutter_tools/src/test/flutter_platform.dart 566:27 FlutterPlatform._startTest
===== asynchronous gap ===========================
dart:async/zone.dart 1053:19 _CustomZone.registerUnaryCallback
dart:async-patch/async_patch.dart 71:23 _asyncThenWrapperHelper
package:flutter_tools/src/test/flutter_platform.dart FlutterPlatform._startTest
package:flutter_tools/src/test/flutter_platform.dart 368:36 FlutterPlatform.loadChannel
package:flutter_tools/src/test/flutter_platform.dart 321:46 FlutterPlatform.load
===== asynchronous gap ===========================
dart:async/zone.dart 1053:19 _CustomZone.registerUnaryCallback
dart:async-patch/async_patch.dart 71:23 _asyncThenWrapperHelper
package:test_core/src/runner/loader.dart Loader.loadFile.<fn>
package:test_core/src/runner/load_suite.dart 98:31 new LoadSuite.<fn>.<fn>
===== asynchronous gap ===========================
dart:async/zone.dart 1045:19 _CustomZone.registerCallback
dart:async/zone.dart 962:22 _CustomZone.bindCallbackGuarded
dart:async/timer.dart 52:45 new Timer
dart:async/timer.dart 89:9 Timer.run
dart:async/future.dart 172:11 new Future
package:test_api/src/backend/invoker.dart 399:21 Invoker._onRun.<fn>.<fn>.<fn>
Just found some info about this issue.
From https://github.com/Dart-Code/Dart-Code/issues/2082 :
Ok, I can only repro this using launch.json and not with the Run All Tests command. The reason for this is that Run All Tests runs without the debugger, but setting up launch.json as as described at #1673 (comment) will run in debug mode, but in a single debug session (and that fails due to multiple VM services).
I don't think there's going to be an easy way to handle this.
As a workaround, if you add "noDebug": true to the launch.json, it should fix it (albeit at the expense of debugging, and a warning from VS Code that it's not a valid option).

Express server large post (>3gb) endpoint error

My express server can't take POST request larger than 2GB.
I have tried expanding the buffer size of body parser, it has not worked. I have tried expanding the heap size, it has not worked.
This is the error message:
buffer.js:262
throw err;
^
RangeError [ERR_INVALID_OPT_VALUE]: The value "3228869136" is invalid for option "size"
at Function.allocUnsafe (buffer.js:284:3)
at Function.concat (buffer.js:463:25)
at getBuffer (/home/minjae/Dev/mednick_minjae/mednick_api_minjae/node_modules/express-fileupload/lib/memHandler.js:16:34)
at FileStream.file.on (/home/minjae/Dev/mednick_minjae/mednick_api_minjae/node_modules/express-fileupload/lib/processMultipart.js:58:19)
at FileStream.emit (events.js:198:15)
at FileStream.EventEmitter.emit (domain.js:481:20)
at endReadableNT (_stream_readable.js:1139:12)
at processTicksAndRejections (internal/process/task_queues.js:81:17)
It emits error even before it gets to the first code after my endpoint, so I guess it came from a middleware such as body-parser, but I can't fix it.
Any help would be appreciated, Thanks!

RabbitMQ Crash Explanation

Our RabbitMQ service crashed twice with the following report in the $RABBITMQ_NODENAME-sasl.log:
=CRASH REPORT==== 7-Jun-2016::14:37:25 ===
crasher:
initial call: gen:init_it/6
pid: <0.223.0>
registered_name: []
exception exit: {{badmatch,
{[{msg_location,
<<162,171,39,113,226,229,228,92,227,253,48,186,
45,48,29,98>>,
1,357,0,583},
******************
16000 similar msg_location lines snipped
******************
1795219}},
[{rabbit_msg_store,combine_files,3,[]},
{rabbit_msg_store_gc,attempt_action,3,[]},
{rabbit_msg_store_gc,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,
[{file,"proc_lib.erl"},{line,250}]}]}
in function gen_server2:terminate/3
ancestors: [msg_store_persistent,rabbit_sup,<0.159.0>]
messages: [{'$gen_cast',{combine,394,380}}]
links: [#Port<0.86370>,<0.218.0>,#Port<0.86369>]
dictionary: [{{"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/357.rdq",
fhc_file},
{file,1,false}},
{{"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/340.rdq",
fhc_file},
{file,1,true}},
{fhc_age_tree,{2,
{{1465,346244,764691},
#Ref<0.0.3145729.257998>,nil,
{{1465,346244,891543},
#Ref<0.0.3145729.258001>,nil,nil}}}},
{{#Ref<0.0.3145729.257998>,fhc_handle},
{handle,{file_descriptor,prim_file,{#Port<0.86369>,59}},
0,false,0,1048576,[],false,
"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/357.rdq",
[raw,binary,read_ahead,read],
[{write_buffer,1048576}],
false,true,
{1465,346244,764691}}},
{{#Ref<0.0.3145729.258001>,fhc_handle},
{handle,{file_descriptor,prim_file,{#Port<0.86370>,64}},
14212552,false,0,1048576,[],false,
"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/340.rdq",
[raw,binary,read_ahead,read,write],
[{write_buffer,1048576}],
true,true,
{1465,346244,891543}}}]
trap_exit: false
status: running
heap_size: 121536
stack_size: 27
reductions: 835024
neighbours:
We'd like to understand what this crash report means. Does it signify a bad message, RMQ can't find a message, or something completely different? We're using RabbitMQ 3.1.5 with Erlang 18, and while we know we're using an old version, we want to first know what's causing the crash before dedicating resources to an upgrade.
This message means that RabbitMQ message storage process has failed to combine files during garbage collection on message store. This can in theory cause message loss.
Note that 3.1.5 is not supported and has not been tested with OTP 10. This issue can be already fixed in newer versions though.