Error: ‘Read only object or object without ownership can't be applied to mutable function append!’ - mutable

I'm using DolphinDB subscribeTable but the error occurs:
I’d like to know what is wrong with my code. I wanted to subscribe to stream tables in DolphinDB and the code is:
csEngine1=createCrossSectionalEngine(name=sTb_Cs + "_eng",
dummyTable=objByName(sTb_join12),
keyColumn=`symbol,
triggeringPattern="perRow",
useSystemTime=false, timeColumn=`TimeStamp)
subscribeTable(tableName=sTb_join12, actionName="do"+sTb_Cs, offset=-1, handler=append!{csEngine1}, msgAsTable=true, hash=5, reconnect=true)
The UDF handler of another subscription to sTb_join12 also queries csEngine1.
def append_plan (csEngine1, candidates2, strategy, msg){
subscribeTable(tableName=sTb_join12, actionName="do" +strategy, offset=-1, handler=append_plan {csEngine1, candidates2, strategy}, msgAsTable=true, hash=6, reconnect=true)
Please tell why the error occurs and how I can fix it.

The problem is that the parameter in your function (append_plan in this case) was not mutable. With an immutable parameter, the object (csEngine1) will be set to readOnly when calling the function. If another thread writes this object (as shown in your first subscription), it will report the error.
Solution #1
Pass the same hash value in the two subscribeTable functions that subscribe to sTb_join12. With the same hash value, the two subscription tasks will be processed in the same thread in turn, which avoids concurrency and error reporting.
Solution #2
Simplify the two subscribeTable functions into one statement. ‘append!’ the cross-section engine with udf, and then continue with the above process to ensure the append operation and select query to the table CSEngines1 are sequentially processed. You can refer to the following example:
tradesCrossEngine008=createCrossSectionalEngine(name="tradesCrossEngine008", dummyTable=objByName(sTc_join12_testSTR), keyColumn=`Symbol, triggeringPattern=`perRow, useSystemTime=false, timeColumn=`TimeStamp)
def append_plan(msg){
getStreamEngine("tradesCrossEngine008").append!(msg)
select Symbol, rank(left_v1+left_v2) as rmk from getStreamEngine("tradesCrossEngine008")
}
subscribeTable(tableName=sTc_join12_testSTR, actionName="tradesCrossEngine008", offset=0, handler=append_plan, msgAsTable=true, hash=9, reconnect=true)

Related

Airflow/SQLAlchemy Error - Loading context has changed within a load/refresh handler

I am attempting to use clairvoyant's db-cleanup dag to clear metadata in our xcom table, but when I run it, I receive the following warning, printed thousands of times before I manually stop the job in order to not take down our mysql instance:
SAWarning: Loading context for <BaseXCom at 0x7f26f789b370> has changed within a load/refresh handler, suggesting a row refresh operation took place. If this event handler is expected to be emitting row refresh operations within an existing load or refresh operation, set restore_load_context=True when establishing the listener to ensure the context remains unchanged when the event handler completes.
The other cleanup tasks work fine, but it is the xcom table in particular I am having trouble with. We have hundreds/thousands of active dags and so the xcom table is constantly being written to nearly every second or two. I think that is what is causing this error, the fact that the data is continually changing while it is being queried.
I have been unable to find the cause of this or any examples of how this can be resolved. I tried adding a "restore_load_context":True line as per SQLAlchemy docs but it did not work.
Here are the snippets I attempted to add to the database object and the cleanup task:
{
"airflow_db_model": XCom,
"age_check_column": XCom.execution_date,
"keep_last": False,
"keep_last_filters": None,
"keep_last_group_by": None,
"restore_load_context":True
},
....
def cleanup_function(**context):
logging.info("Retrieving max_execution_date from XCom")
max_date = context["ti"].xcom_pull(
task_ids=print_configuration.task_id, key="max_date"
)
max_date = dateutil.parser.parse(max_date) # stored as iso8601 str in xcom
airflow_db_model = context["params"].get("airflow_db_model")
state = context["params"].get("state")
age_check_column = context["params"].get("age_check_column")
keep_last = context["params"].get("keep_last")
keep_last_filters = context["params"].get("keep_last_filters")
keep_last_group_by = context["params"].get("keep_last_group_by")
restore_load_context = context["params"].get("restore_load_context")
In order to not paste too much code here, I am using the same code in the db-cleanup dag. Has anyone encountered this and found a way to resolve?
I am very inexperienced with sqlalchemy and am entirely unsure where else to place this code or how to go about it.

Persist detailed information about failed Item processing

I´ve got a Job that runs a TaskletStep, then a chunk-based step and then another TaskletStep.
In each of these steps, errors (in the form of Exceptions) can occur.
The chunk-based step looks like this:
stepBuilderFactory
.get("step2")
.chunk<SomeItem, SomeItem>(1)
.reader(flatFileItemReader)
.processor(itemProcessor)
.writer {}
.faultTolerant()
.skipPolicy { _ , _ -> true } // skip all Exceptions and continue
.taskExecutor(taskExecutor)
.throttleLimit(taskExecutor.corePoolSize)
.build()
The whole job definition:
jobBuilderFactory.get("job1")
.validator(validator())
.preventRestart()
.start(taskletStep1)
.next(step2)
.next(taskletStep2)
.build()
I expected that Spring Batch somehow picks up the Exceptions that occur along the way, so I can then create a Report including them after the Job has finished processing. Looking at the different contexts, there´s also fields that should contain failureExceptions. However, it seems there´s no such information (especially for the chunked step).
What would be a good approach if I need information about:
what Exceptions did occur in which Job execution
which Item was the one that triggered it
The JobExecution provides a method to get all failure exceptions that happened during the job. You can use that in a JobExecutionListener#afterJob(JobExecution jobExecution) to generate your report.
In regards to which items caused the issue, this will depend on where the exception happens (during the read, process or write operation). For this requirement, you can use one of the ItemReadListener, ItemProcessListener or ItemWriteListener to keep record of the those items (For example, by adding them to the job execution context to be able to get access to them in the JobExecutionListener#afterJob method for your report).

How should I avoid sending duplicate emails using mailgun, taskqueue and ndb?

I am using the taskqueue API to send multiple emails is small groups with mailgun. My code looks more or less like this:
class CpMsg(ndb.Model):
group = ndb.KeyProperty()
sent = ndb.BooleanProperty()
#Other properties
def send_mail(messages):
"""Sends a request to mailgun's API"""
# Some code
pass
class MailTask(TaskHandler):
def post(self):
p_key = utils.key_from_string(self.request.get('p'))
msgs = CpMsg.query(
CpMsg.group==p_key,
CpMsg.sent==False).fetch(BATCH_SIZE)
if msgs:
send_mail(msgs)
for msg in msgs:
msg.sent = True
ndb.put_multi(msgs)
#Call the task again in COOLDOWN seconds
The code above has been working fine, but according to the docs, the taskqueue API guarantees that a task is delivered at least once, so tasks should be idempotent. Now, most of the time this would be the case with the above code, since it only gets messages that have the 'sent' property equal to False. The problem is that non ancestor ndb queries are only eventually consistent, which means that if the task is executed twice in quick succession the query may return stale results and include the messages that were just sent.
I thought of including an ancestor for the messages, but since the sent emails will be in the thousands I'm worried that may mean having large entity groups, which have a limited write throughput.
Should I use an ancestor to make the queries? Or maybe there is a way to configure mailgun to avoid sending the same email twice? Should I just accept the risk that in some rare cases a few emails may be sent more than once?
One possible approach to avoid the eventual consistency hurdle is to make the query a keys_only one, then iterate through the message keys to get the actual messages by key lookup (strong consistency), check if msg.sent is True and skip sending those messages in such case. Something along these lines:
msg_keys = CpMsg.query(
CpMsg.group==p_key,
CpMsg.sent==False).fetch(BATCH_SIZE, keys_only=True)
if not msg_keys:
return
msgs = ndb.get_multi(msg_keys)
msgs_to_send = []
for msg in msgs:
if not msg.sent:
msgs_to_send.append(msg)
if msgs_to_send:
send_mail(msgs_to_send)
for msg in msgs_to_send:
msg.sent = True
ndb.put_multi(msgs_to_send)
You'd also have to make your post call transactional (with the #ndb.transactional() decorator).
This should address the duplicates caused by the query eventual consistency. However there still is room for duplicates caused by transaction retries due to datastore contention (or any other reason) - as the send_mail() call isn't idempotent. Sending one message at a time (maybe using the task queue) could reduce the chance of that happening. See also GAE/P: Transaction safety with API calls

Apache Flink - exception handling in "keyBy"

It may happen that data that enters Flink job triggers exception either due to bug in code or lack of validation.
My goal is to provide consistent way of exception handling that our team could use within Flink jobs that won't cause any downtime in production.
Restart strategies do not seem to be applicable here as:
simple restart won't fix issue and we fall into restart loop
we cannot simply skip event
they can be good for OOME or some transient issues
we cannot add custom one
try/catch block in "keyBy" function does not fully help as:
there's no way to skip event in "keyBy" after exception is handled
Sample code:
env.addSource(kafkaConsumer)
.keyBy(keySelector) // must return one result for one entry
.flatMap(mapFunction) // we can skip some entries here in case of errors
.addSink(new PrintSinkFunction<>());
env.execute("Flink Application");
I'd like to have ability to skip processing of event that caused issue in "keyBy" and similar methods that are supposed to return exactly one result.
Beside the suggestion of #phanhuy152 (which seems totally legit to me) why not filter before keyBy?
env.addSource(kafkaConsumer)
.filter(invalidKeys)
.keyBy(keySelector) // must return one result for one entry
.flatMap(mapFunction) // we can skip some entries here in case of errors
.addSink(new PrintSinkFunction<>());
env.execute("Flink Application");
Can you reserve a special value like "NULL" for the keyBy to return in such case? Then your flatMap function can skip when encounter such value?

UI5 Odata batch update - Connect return messages to single operation

I perform a batch update on an OData v2 model, that contains several operations.
The update is performed in a single changeset, so that a single failed operation fails the whole update.
If one operation fails (due to business logic) and a message returns. Is there a way to know which operation triggered the message? The response I get contains the message text and nothing else that seems useful.
The error function is triggered for every failed operation, and contains the same message every time.
Maybe there is a specific way the message should be issued on the SAP backend?
The ABAP method /iwbep/if_message_container->ADD_MESSAGE has a parameter IV_KEY_TAB, but it does not seem to affect anything.
Edit:
Clarification following conversation.
My service does not return a list of messages, it performs updates. If one of the update operations fails with a message, I want to connect the message to the specific update that failed, preferably without modifying the message text.
An example of the error response I'm getting:
{
"error":{
"code":"SY/530",
"message":{
"lang":"en",
"value":"<My message text>"
},
"innererror":{
"application":{
"component_id":"",
"service_namespace":"/SAP/",
"service_id":"<My service>",
"service_version":"0001"
},
"transactionid":"",
"timestamp":"20181231084555.1576790",
"Error_Resolution":{
// Sap standard message here
},
"errordetails":[
{
"code":"<My message class>",
"message":"<My message text>",
"propertyref":"",
"severity":"error",
"target":""
},
{
"code":"/IWBEP/CX_MGW_BUSI_EXCEPTION",
"message":"An exception was raised.",
"propertyref":"",
"severity":"error",
"target":""
}
]
}
}
}
If you want to keep the same exact message for all operations the simplest way to be able to determine the message origin would be to add a specific 'tag' to it in the backend.
For example, you can fill the PARAMETER field of the message structure with a specific value for each operation. This way you can easily determine the origin in gateway or frontend.
If I understand your question correctly, you could try the following.
override the following DPC methods:
changeset_begin: set cv_defer_mode to abap_true
changeset_end: just redefine it, with nothing inside
changeset_process:
here you get a list of your requests in a table, which has the operation number (thats what you seek), and the key value structure (iwbep.blablabla) for the call.
loop over the table, and call the method for each of the entries.
put the result of each of the operations in the CT_CHANGESET_RESPONSE.
in case of one operation failing, you can raise the busi_exception in there and there you can access the actual operation number.
for further information about batch processing you can check out this link:
https://blogs.sap.com/2018/05/06/batch-request-in-sap-gateway/
is that what you meant?