Celery Error Handling - error-handling

I've built a fairly simple application linking Flask, Celery, and RabbitMQ using docker-compose by linking together a few solutions I saw online. I'm having some issues trying to update task states to reflect if a failure occurred. To keep error visibility at it's highest, I've had my custom class only raise expected errors, else the errors are handled at the celery app level as follows (in celery_app.py):
#celery_app.task(name='celery_worker.summary')
def async_summary(data):
"""Background summary processing"""
try:
logger.info('Summarizing text')
return BdsSummary(data, nlp=en_nlp).create_summary()
except Exception as e:
current_task.update_state(state='FAILURE', meta={'error_message': str(traceback.format_exc())})
logger.exception('Text Summary worker raised: %r'%e)
I've been doing some negative testing against my application, and when I pass it data that I know will throw an error (non-text data, for example), when i run r = requests.get('http://my.app.addr:8888/task/my-task-id') I get {'status': 'SUCCESS', 'result': None}. I'm vexed as to why this is happening. Based on my admittedly limited understanding of Celery's behavior, it should update the status to show a traceback and ExceptionClass, why would it not do this?
I am relatively new to Celery, so my understanding of the Canvas that they reference in the documentation is extremely basic. I'm just trying to provide some basic task failure information to the response/task. For context, when I give it proper input, I get back {'status': 'SUCCESS', 'result': {'summary': 'My Summary text here', 'num_sentences': 3, ...}}.
Any insight here would be much appreciated

Related

Debugging "Transaction simulation failed" when sending program instruction (Solana Solidity)

When attempting to make a call to a program compiled with #solana/solidity, I'm getting the following error:
Transaction simulation failed: Error processing Instruction 0: Program failed to complete
Program jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N invoke [1]
Program log: pxKTQePwHC9MiR52J5AYaRtSLAtkVfcoGS3GaLD24YX
Program log: sender account missing from transaction
Program jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N consumed 200000 of 200000 compute units
Program failed to complete: BPF program Panicked in solana.c at 285:0
Program jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N failed: Program failed to complete
jdN1wZjg5P4xi718DG2HraGuxVx1mM7ebjXpxbJ5R3N is the program's public key and pxKTQePwHC9MiR52J5AYaRtSLAtkVfcoGS3GaLD24YX is the sender's public key.
I'm using a fork of the #solana/solidity library that exposes the Transaction object so that it can be signed and sent by Phantom Wallet on the front end. The code that results in the error is as follows:
// Generate the transaction
const transaction = contract.transactions.send(...args);
// Add recent blockhash and fee payer
const recentBlockhash = (await connection.getRecentBlockhash()).blockhash;
transaction.recentBlockhash = recentBlockhash;
transaction.feePayer = provider.publicKey;
// Sign and send the transaction (throws an error)
const res = await provider.signAndSendTransaction(transaction);
I would attempt to debug this further myself, but I'm not sure where to start. Looking up the error message hasn't yielded any results and the error message isn't very descriptive. I'm not sure if this error is occurring within the program execution itself or if it's an issue with the composition of the transaction object. If it is an issue within the program execution, is there a way for me to add logs to my solidity code? If it's an issue with the transaction object, what could be missing? How can I better debug issues like this?
Thank you for any help.
Edit: I'm getting a different error now, although I haven't changed any of the provided code. The error message is now the following:
Phantom - RPC Error: Transaction creation failed. {code: -32003, message: 'Transaction creation failed.'}
Unfortunately this error message is even less helpful than the last one. I'm not sure if Phantom Wallet was updated or if a project dependency was updated at some point, but given the vague nature of both of these error messages and the fact that none of my code has changed, I believe they're being caused by the same issue. Again, any help or debugging tips are appreciated.
I was able to solve this issue, and although I've run into another issue it's not related to the contents of this question so I'll post it separately.
Regarding my edit, I found that the difference between the error messages came down to how I was sending the transaction. At first, I tried sending it with Phantom's .signAndSendTransaction() method, which yielded the second error message (listed under my edit). Then I tried signing & sending the transaction manually, like so:
const signed = await provider.request({
method: 'signTransaction',
params: {
message: bs58.encode(transaction.serializeMessage()),
},
});
const signature = bs58.decode(signed.signature);
transaction.addSignature(provider.publicKey, signature);
await connection.sendRawTransaction(transaction.serialize())
Which yielded the more verbose error included in my original post. That error message did turn out to be helpful once I realized what to look for -- the sending account's public key was missing from the keys field on the TransactionInstruction on the Transaction. I added it in my fork of the #solana/solidity library and the error went away.
In short, the way I was able to debug this was by
Using provider.request({ method: 'signTransaction' }) and connection.sendRawTransaction(transaction) rather than Phantom's provider.signAndSendTransaction() method for a more verbose error message
Logging the transaction object and inspecting the instructions closely
I hope this helps someone else in the future.

Log messages classification/grouping and finding human readable pattern for each group

As new to data science and machine learning I would like to ask the following questions about the problem explained below:
Is machine learning good for such problem or is it overkill?
Could this problem be related with another classical problem that has already published papers so I can choose the right solution?
The problem:
I've been doing a research on pretty interesting problem that I believe many Analytics system solved by automated process.
We are collecting many JavaScript error messages that happen in all kind of browsers and custom build web applications. Our goal is to group the similar messages and label each group by the common pattern the grouped messages have.
Example:
+---------------------------------------------------------------+
|Label: "Forbidden: User session {{placeholder1}} has expired." |
+---------------------------------------------------------------+
|Message: "Forbidden: User session aad3-1v299-4400 has expired."|
|Message: "Forbidden: User session jj41-1d333-bbaa has expired."|
|Message: "Forbidden: User session aab3-bn12n-1111 has expired."|
+---------------------------------------------------------------+
So far we have semi-automated process that solves the problem but from time to time we get new user generated JavaScript error messages that slip through our filters.
I've been thinking about naive 2 step approach that uses existing libraries/tools/algorithms.
For a batch of error lines run an algorithm (e.g. Levenshtein) that finds similar strings. Group the similar errors.
Within a group of similar strings run a diff and find the dynamic parts. Check the diff:
For reference here we have error messages that were collected in the period of one minute:
Message: 3312445,Error: Unknown page "retina_list"
Message: 9931234,Error: Unknown page "widget_summary"
Message: ReferenceError: 'alg,TypeError: g' is undefined
Message: 522574,Error: Unknown page "page_options"
Message: ReferenceError: '297756| Zly / Error in handler for event:,[object Object],ApiListenerError: TypeError: a' is undefined
Message: [Euv warn]: style="width: {{item.evaluation}}em": interpolation in 'style' attribute will cause the attribute to be discarded in Internet Explorer. Use krt-bind:style instead. (found in component: <default-componentfalse2320383>)
Message: [Euv warn]: src="//www.example.com/image/{{item._id}}-1.jpg?w=220&h=165&mode=crop": interpolation in 'src' attribute will cause a 404 request. Use krt-bind:src instead. (found in component: <default-componentfalse8372912>)
Message: [Euv warn]: src="//www.example.com/image/{{item._id}}?car=recommend_sp312": interpolation in 'src' attribute will cause a 404 request. Use krt-bind:src instead. (found in component: <default-componentfalse3330736>)
Message: [Euv warn]: src="//www.example.com/image/{{item._id}}-1.jpg?w=220&h=165&mode=crop": interpolation in 'src' attribute will cause a 404 request. Use krt-bind:src instead. (found in component: <default-componentfalse4893336>)
Message: ReferenceError: 'alg,TypeError: g' is undefined
Message: 73276| Zly / Error in handler for event:,[object Object],ApiListenerError: TypeError: Cannot read property 'campaignName' of undefined
Message: ReferenceError: 'alg,TypeError: g' is undefined
Message: ReferenceError: 'bend,TypeError: f' is undefined
I've been playing lately with Tensorflow JS where I am complete beginner but I may try to train something that could help me classify strings and label them.
I also think that the more serious problem is to generate the group label than grouping the strings because sometimes a pair of similar strings have very different length and the placeholders are long sentences with special characters like \,".^%#&*!?<>|][{}.
As you have pointed out, it sounds like we can separate this problem into two distinct steps.
Group together similar messages, and
Label each group.
Step 1:
While I am not too familiar with Tensorflow JS, I do not believe it is overkill to use Machine Learning (ML) to tackle this problem, especially for step 1.
In fact, this type of problem is a great candidate for a specific form of ML known as Unsupervised Learning, and more specifically, Clustering. In Unsupervised Learning, we look to find “previously unknown patterns in our data without pre-existing labels”.
See: https://en.wikipedia.org/wiki/Unsupervised_learning
In this context, that means that we do not know if “Error Message 1” and “Error Message 2” will belong to the same group before we apply our Clustering algorithm. Using your example, we can reasonably suspect that the messages:
“Forbidden: User session aad3-1v299-4400 has expired"
“Forbidden: User session jj41-1d333-bbaa has expired"
will belong to the same group, but the Clustering algorithm does not know this when it starts.
We can contrast this with a form of Supervised Learning known as Classification, where we know beforehand that we expect a group to have the form
“Forbidden: User session {{placeholder1}} has expired".
Then the pre-existing labels in the data are that messages such as
“Forbidden: User session aad3-1v299-4400 has expired"
“Forbidden: User session jj41-1d333-bbaa has expired"
belong to the expected group just above. We essentially give the ML model a bunch of examples of what this group looks like, and then incoming messages that appear to be similar will be classified to this group.
It sounds like from your description that for Step 1, you want to perform a string match (such as Levenshtein) to compare all of the example messages, and then apply a Clustering algorithm to those results. Then after you have groups (clusters) of messages, Step 2 involves finding an appropriate label for each group.
Step 2:
Agreed that finding an appropriate label for each group is likely the harder problem here. One approach that could be useful is to count how many times a word or phrase appears within a group or cluster, and if it does not meet some pre-defined threshold, to use a placeholder as you have in your example label. For example, the words “Forbidden”, “User”, “session”, and “expired” will be common to the group, whereas the alpha numeric ID’s listed are unique to the individual messages. If the threshold is that a word or phrase must show up in at least two messages, only the ID’s will be replaced by the placeholder.
In this approach, you are essentially looking to find words or phrases that are uncommon to the group, and do not provide useful information in forming an appropriate label. In a way, this is the opposite of a concept used in many search engines that aims to find how common or important a word or phrase is to a document (see https://en.wikipedia.org/wiki/Tf%E2%80%93idf).

WebSphere wsadmin testConnection error message

I'm trying to write a script to test all DataSources of a WebSphere Cell/Node/Cluster. While this is possible from the Admin Console a script is better for certain audiences.
So I found the following article from IBM https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/txml_testconnection.html which looks promising as it describles exactly what I need.
After having a basic script like:
ds_ids = AdminConfig.list("DataSource").splitlines()
for ds_id in ds_ids:
AdminControl.testConnection(ds_id)
I experienced some undocumented behavior. Contrary to the article above the testConnection function does not always return a String, but may also throw a exception.
So I simply use a try-catch block:
try:
AdminControl.testConnection(ds_id)
except: # it actually is a com.ibm.ws.scripting.ScriptingException
exc_type, exc_value, exc_traceback = sys.exc_info()
now when I print the exc_value this is what one gets:
com.ibm.ws.scripting.ScriptingException: com.ibm.websphere.management.exception.AdminException: javax.management.MBeanException: Exception thrown in RequiredModelMBean while trying to invoke operation testConnection
Now this error message is always the same no matter what's wrong. I tested authentication errors, missing WebSphere Variables and missing driver classes.
While the Admin Console prints reasonable messages, the script keeps printing the same meaningless message.
The very weird thing is, as long as I don't catch the exception and the script just exits by error, a descriptive error message is shown.
Accessing the Java-Exceptions cause exc_value.getCause() gives None.
I've also had a look at the DataSource MBeans, but as they only exist if the servers are started, I quickly gave up on them.
I hope someone knows how to access the error messages I see when not catching the Exception.
thanks in advance
After all the research and testing AdminControl seems to be nothing more than a convinience facade to some of the commonly used MBeans.
So I tried issuing the Test Connection Service (like in the java example here https://www.ibm.com/support/knowledgecenter/en/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/cdat_testcon.html
) directly:
ds_id = AdminConfig.list("DataSource").splitlines()[0]
# other queries may be 'process=server1' or 'process=dmgr'
ds_cfg_helpers = __wat.AdminControl.queryNames("WebSphere:process=nodeagent,type=DataSourceCfgHelper,*").splitlines()
try:
# invoke MBean method directly
warning_cnt = __wat.AdminControl.invoke(ds_cfg_helpers[0], "testConnection", ds_id)
if warning_cnt == "0":
print = "success"
else:
print "%s warning(s)" % warning_cnt
except ScriptingException as exc:
# get to the root of all evil ignoring exception wrappers
exc_cause = exc
while exc_cause.getCause():
exc_cause = exc_cause.getCause()
print exc_cause
This works the way I hoped for. The downside is that the code gets much more complicated if one needs to test DataSources that are defined on all kinds of scopes (Cell/Node/Cluster/Server/Application).
I don't need this so I left it out, but I still hope the example is useful to others too.

Determine actual errors from a load job

Using the Java SDK I am creating a load job for just a single record with a fairly complicated schema. When monitoring the status of the load job, it takes a surprisingly long time (but perhaps this is due to working out the schema), but then says:
11:21:06.975 [main] INFO xxx.GoogleBigQuery - Job status (21694ms) create_scans_1384744805079_172221126: DONE
11:24:50.618 [main] ERROR xxx.GoogleBigQuery - Job create_scans_1384744805079_172221126 caused error (invalid) with message
Too many errors encountered. Limit is: 0.
11:24:50.810 [main] ERROR xxx.GoogleBigQuery - {
"message" : "Too many errors encountered. Limit is: 0.",
"reason" : "invalid"
?}
BTW - how do I tell the job that it can have more than zero errors using Java?
This load job does not appear in the list of recent jobs in the console, and as far as I can see, none of the Java objects contains any more details about the actual errors encountered. So how can I pro-grammatically find out what is going wrong? All I can find is:
if (err != null) {
log.error("Job {} caused error ({}) with message\n{}", jobID, err.getReason(), err.getMessage());
try {
log.error(err.toPrettyString());
}
...
In general I am having a difficult time finding good documentation for some of these things and am working it out by trial and error and short snippets of code found on here and older groups. If there is a better source of information than the getting started guides, then I would appreciate any pointers to that information. The Javadoc does not really help and I cannot find any complete examples of loading, querying, testing for errors, cataloging errors and so on.
This job is submitted via a NEWLINE_DELIMITIED_JSON record, supplied to the job via:
InputStream dummy = getClass().getResourceAsStream("/googlebigquery/xxx.record");
final InputStreamContent jsonIn = new InputStreamContent("application/octet-stream", dummy);
createTableJob = bigQuery.jobs().insert(projectId, loadJob, jsonIn).execute();
My authentication and so on seems to work correctly as separate Java code to list the projects, and the datasets in the project all works correctly. So I just need help in working what the actual error is - does it not like the schema (I have records nested within records for instance), or does it think that there is an error in the data I am submitting.
Thanks in advance for any help. The job number cited above is an actual failed load job if that helps any Google staffers who might read this.
It sounds like you have a couple of questions, so I'll try to address them all.
First, the way to get the status of the job that failed is to call jobs().get(jobId), which returns a job object that has an errorResult object that has the error that caused the job to fail (e.g. "too many errors"). The errorStream list is a lost of all of the errors on the job, which should tell you which lines hit errors.
Note if you have the job id, it may be easier to use bq to lookup the job -- you can run bq show <job_id> to get the job error information. If you add the --format=prettyjson it will print out all of the information in the job.
A hint you also might want to consider is to supply your own job id when you create the job -- then even if there is an error starting the job (i.e. the insert() call fails, perhaps due to a network error) you can look up the job to see what actually happened.
To tell BigQuery that some errors are allowed during import, you can use the maxBadResults setting in the load job. See https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#getMaxBadRecords().

Return key in process

The code was run as:
u = subprocess.Popen(['process','abc','def','','ghi','jkl'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
It doesn't work below due to an error occurred:
ValueError: I/O operation on closed file
I suggest you to try pexpect, it is far more well-suited for this tasks (actually, it is a tool built for these kind of tasks).
You can also browse througn examples and see what its usage looks like.