Why would I get a lot of None.get exceptions while draining a streaming pipeline?

Why would I get a lot of None.get exceptions while draining a streaming pipeline? - spotify-scio

I am running into issues where I have a streaming scio pipeline running on Dataflow that is deduplicating messages and performing some counting by key. When I try to drain the pipeline I get a large amount of None.get exceptions supposedly thrown in my deduplicate step (I am basing this assumption off the label I am observing in the stackdriver log).
We are currently running on scio version 0.7.0-beta1 and beam version 2.8.0. I have tried protecting as much as I can in my code from any potential Nones but this appears like it is occurring further down inside of the deduplicate step.
The error I am getting is the following:
"java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at com.spotify.scio.util.Functions$$anon$2.mergeAccumulators(Functions.scala:227)
at com.spotify.scio.util.Functions$$anon$2.mergeAccumulators(Functions.scala:220)
at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.getAccum(WindmillStateInternals.java:958)
at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.read(WindmillStateInternals.java:920)
at org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
at org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
at org.apache.beam.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:768)
at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:95)
at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
at org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:135)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:45)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:50)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:202)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:160)
at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1226)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:141)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:965)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
As you can see, this never really enters my code and I am unsure how I should go about finding this issue. Perhaps it has something to do with the "LateDataDroppingDoFnRunner"? Our allowed lateness is relatively large (3 days with windows being an hour long).
val input = PubsubIO.readStrings()
.fromSubscription(subscription)
.withTimestampAttribute("ts")
.withName("Window messages")
.withFixedWindows(
duration = windowSize,
options = WindowOptions(
trigger = AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(earlyFiring))
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(lateFiring)),
accumulationMode = ACCUMULATING_FIRED_PANES,
allowedLateness = allowedLateness
)
)
.withName(s"Deduplicate messages")
.distinctBy[String](f = getId)
...
// I am being overly cautious here because I have been having
// so much trouble debugging this
def getId(message: Map[String, Any]): String = {
message match {
case null => {
logger.warn("message is null when getting id")
""
}
case message => {
message.get("id") match {
case None => {
logger.warn("id is null in message")
""
}
case id => id.get.toString
}
}
}
}
I am confused how I could possibly be getting a None.get here and why that would only occur when I am draining.
Can I have some advice on how I should go about debugging this error or where I should be looking?

Related

Gatling feeder/parameter issue - Exception in thread "main" java.lang.UnsupportedOperationException

I just involved the new project for API test for our service by using Gatling. At this point, I want to search query, below is the code:
def chnSendToRender(testData: FeederBuilderBase[String]): ChainBuilder = {
feed(testData)
exec(api.AdvanceSearch.searchAsset(s"{\"all\":[{\"all:aggregate:text\":{\"contains\":\"#{edlAssetId}_Rendered\"}}]}", "#{authToken}")
.check(status.is(200).saveAs("searchStatus"))
.check(jsonPath("$..asset:id").findAll.optional.saveAs("renderedAssetList"))
)
.doIf(session => session("searchStatus").as[Int] == 200) {
exec { session =>
printConsoleLog("Rendered Asset ID List: " + session("renderedAssetList").as[String], "INFO")
session
}
}
}
I declared the feeder already in the simulation scala file:
class GVRERenderEditor_new extends Simulation {
private val edlToRender = csv("data/render/edl_asset_ids.csv").queue
private val chnPostRender = components.notifications.notice.JobsPolling_new.chnSendToRender(edlToRender)
private val scnSendEDLForRender = scenario("Search Post Render")
.exitBlockOnFail(exec(preSimAuth))
.exec(chnPostRender)
setUp(
scnSendEDLForRender.inject(atOnceUsers(1)).protocols(httpProtocol)
)
.maxDuration(sessionDuration.seconds)
.assertions(global.successfulRequests.percent.is(100))
}
But Gatling test failed to run, showing this error: Exception in thread "main" java.lang.UnsupportedOperationException: There were no requests sent during the simulation, reports won't be generated
If I hardcode the #{edlAssetId} (put the real edlAssetId in that query), I will get result. I think I passed the parameter wrongly in this case. I've tried to print the output in console log but no luck. What's wrong with this code? I would appreciate your help. Thanks!

feed(testData)
exec(api.AdvanceSearch.searchAsset(s"{\"all\":[{\"all:aggregate:text\":{\"contains\":\"#{edlAssetId}_Rendered\"}}]}", "#{authToken}")
.check(status.is(200).saveAs("searchStatus"))
.check(jsonPath("$..asset:id").findAll.optional.saveAs("renderedAssetList"))
)
You're missing a . (dot) before the exec to attach it to the feed.
As a result, your method is returning the last instruction, ie the exec only.

PLC4X:Exception during scraping of Job

I'm actually developing a project that read data from 19 PLCs Siemens S1500 and 1 modicon. I have used the scraper tool following this tutorial:
PLC4x scraper tutorial
but when the scraper is working for a little amount of time I get the following exception:
I have changed the scheduled time between 1 to 100 and I always get the same exception when the scraper reach the same number of received messages.
I have tested if using PlcDriverManager instead of PooledPlcDriverManager could be a solution but the same problem persists.
In my pom.xml I use the following dependency:
<dependency>
<groupId>org.apache.plc4x</groupId>
<artifactId>plc4j-scraper</artifactId>
<version>0.7.0</version>
</dependency>
I have tried to change the version to an older one like 0.6.0 or 0.5.0 but the problem still persists.
If I use the modicon (Modbus TCP) I also get this exception after a little amount of time.
Anyone knows why is happening this error? Thanks in advance.
Edit: With the scraper version 0.8.0-SNAPSHOT I continue having this problem.
Edit2: This is my code, I think the problem can be that in my scraper I am opening a lot of connections and when it reaches 65526 messages it fails. But since all the processing is happenning inside the lambda function and I'm using a PooledPlcDriverManager, I think the scraper is using only one connection so I dont know where is the mistake.
try {
// Create a new PooledPlcDriverManager
PlcDriverManager S7_plcDriverManager = new PooledPlcDriverManager();
// Trigger Collector
TriggerCollector S7_triggerCollector = new TriggerCollectorImpl(S7_plcDriverManager);
// Messages counter
AtomicInteger messagesCounter = new AtomicInteger();
// Configure the scraper, by binding a Scraper Configuration, a ResultHandler and a TriggerCollector together
TriggeredScraperImpl S7_scraper = new TriggeredScraperImpl(S7_scraperConfig, (jobName, sourceName, results) -> {
LinkedList<Object> S7_results = new LinkedList<>();
messagesCounter.getAndIncrement();
S7_results.add(jobName);
S7_results.add(sourceName);
S7_results.add(results);
logger.info("Array: " + String.valueOf(S7_results));
logger.info("MESSAGE number: " + messagesCounter);
// Producer topics routing
String topic = "s7" + S7_results.get(1).toString().substring(S7_results.get(1).toString().indexOf("S7_SourcePLC") + 9 , S7_results.get(1).toString().length());
String key = parseKey_S7("s7");
String value = parseValue_S7(S7_results.getLast().toString(),S7_results.get(1).toString());
logger.info("------- PARSED VALUE -------------------------------- " + value);
// Create my own Kafka Producer
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
// Send Data to Kafka - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger.info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger.error("Error while producing", e);
}
}
});
}, S7_triggerCollector);
S7_scraper.start();
S7_triggerCollector.start();
} catch (ScraperException e) {
logger.error("Error starting the scraper (S7_scrapper)", e);
}

So in the end indeed it was the PLC that was simply hanging up the connection randomly. However the NiFi integration should have handled this situation more gracefully. I implemented a fix for this particular error ... could you please give version 0.8.0-SNAPSHOT a try (or use 0.8.0 if we happen to have released it already)

How to get the result of the KieSession build (i.e. the rules compiler errors)?

I am testing DROOLS 7.0 with a simple test Rule set using the following code:
KieContainer kc = KieServices.Factory.get().getKieClasspathContainer();
KieSession ksession = kc.newKieSession("DroolsTestKS");
...
The KieSession instance is returned even if there are errors in the rule .drl file, and no exception is thrown. I would like to check the result of the rules compilation.
The Drools reference (see 4.2.2.4) says that the build result can be obtained with:
KieServices kieServices = KieServices.Factory.get();
KieBuilder kieBuilder = kieServices.newKieBuilder( kfs ).buildAll();
assertEquals( 0, kieBuilder.getResults().getMessages( Message.Level.ERROR ).size() );
where kfs is a KieFileSystem instance, but the examples on how to build such a KieFileSystem in the previous pages of the manual are much more complex and a little bit confused IMHO.
Is there a way to have the Session buid result (i.e. to access a KieBuilder ) when creating a the KieSession with the simple two lines of code I show at the beginning of this post ?

I am answering to my question, because I've just found a solution:
KieContainer kc = KieServices.Factory.get().getKieClasspathContainer();
Results rs = kc.verify("KBase");
if (rs.hasMessages(Level.ERROR)) {
System.out.println("ERRORI DROOLS: " + rs.getMessages());
... // handle this
}
I'm wondering if with this validation the actual rules compilation is executed twice or not ... but anyway this method seems to work.

php-smpp Library not working and fails after two to three SMS

it is very first time i'm messing with sockets , and read many quotes that this is not for newbies.
so problem is i'm using php smpp library for sending SMS which works fine but after delivering two to three SMS delivery fails with following warning
Warning: stream_socket_sendto() [function.stream-socket-sendto]: An existing connection was forcibly closed by the remote host', and to make it work again i need to restart` apache.
Following is write function which throwing exception
public function write($buf) {
$null = null;
$write = array($this->handle_);
// keep writing until all the data has been written
while (strlen($buf) > 0) {
// wait for stream to become available for writing
$writable = #stream_select($null, $write, $null, $this->sendTimeoutSec_, $this->sendTimeoutUsec_);
if ($writable > 0) {
// write buffer to stream
$written = stream_socket_sendto($this->handle_, $buf);
if ($written === -1 || $written === false) {
throw new TTransportException('TSocket: Could not write '.$written.' bytes '.
$this->host_.':'.$this->port_);
}
// determine how much of the buffer is left to write
$buf = substr($buf, $written);
} else if ($writable === 0) {
throw new TTransportException('TSocket: timed out writing '.strlen($buf).' bytes from '.
$this->host_.':'.$this->port_);
} else {
throw new TTransportException('TSocket: Could not write '.strlen($buf).' bytes '.
$this->host_.':'.$this->port_);
}
}
}
Please anyone can put some light

It was the bug which i won't able to identify/ rectify. then i used an other library from http://www.codeforge.com/read/171085/smpp.php__html and it really saved me.

How to handle authentication and authorization with thrift?

I'm developing a system which uses thrift. I'd like clients identity to be checked and operations to be ACLed. Does Thrift provide any support for those?

Not directly. The only way to do this is to have an authentication method which creates a (temporary) key on the server, and then change all your methods so that the first argument is this key and they all additionally raise an not-authenticated error. For instance:
exception NotAuthorisedException {
1: string errorMessage,
}
exception AuthTimeoutException {
1: string errorMessage,
}
service MyAuthService {
string authenticate( 1:string user, 2:string pass )
throws ( 1:NotAuthorisedException e ),
string mymethod( 1:string authstring, 2:string otherargs, ... )
throws ( 1:AuthTimeoutException e, ... ),
}
We use this method and save our keys to a secured memcached instance with a 30min timeout for keys to keep everything "snappy". Clients who receive an AuthTimeoutException are expected to reauthorise and retry and we have some firewall rules to stop brute-force attacks.

Tasks like autorisation and permissions are not considered as a part of Thrift, mostly because these things are (usually) more related to the application logic than to a general RPC/serialization concept. The only Thing that Thrift supports out of the box right now is the TSASLTransport. I can't say much about that one myself, simply because I never felt the need to use it.
The other option could be to make use of THeaderTransport which unfortunately at the time of writing is only implemented with C++. Hence, if you plan to use it with some other language you may have to invest some additional work. Needless to say that we accept contributions ...

A bit late (I guess very late) but I had modified the Thrift Source code for this a couple of years ago.
Just submitted a ticket with the Patch to https://issues.apache.org/jira/browse/THRIFT-4221 for just this.
Have a look at that. Basically the proposal is to add a "BeforeAction" hook that does exactly that.
Example Golang generated diff
+ // Called before any other action is called
+ BeforeAction(serviceName string, actionName string, args map[string]interface{}) (err error)
+ // Called if an action returned an error
+ ProcessError(err error) error
}
type MyServiceClient struct {
## -391,7 +395,12 ## func (p *myServiceProcessorMyMethod) Process(seqId int32, iprot, oprot thrift.TP
result := MyServiceMyMethodResult{}
var retval string
var err2 error
- if retval, err2 = p.handler.MyMethod(args.AuthString, args.OtherArgs_); err2 != nil {
+ err2 = p.handler.BeforeAction("MyService", "MyMethod", map[string]interface{}{"AuthString": args.AuthString, "OtherArgs_": args.OtherArgs_})
+ if err2 == nil {
+ retval, err2 = p.handler.MyMethod(args.AuthString, args.OtherArgs_)
+ }
+ if err2 != nil {
+ err2 = p.handler.ProcessError(err2)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why would I get a lot of None.get exceptions while draining a streaming pipeline? - spotify-scio

Related

Gatling feeder/parameter issue - Exception in thread "main" java.lang.UnsupportedOperationException

PLC4X:Exception during scraping of Job

How to get the result of the KieSession build (i.e. the rules compiler errors)?

php-smpp Library not working and fails after two to three SMS

How to handle authentication and authorization with thrift?

Categories

Resources