Collect statistics on current traffic with Bro - bro

I want to collect statistics on traffic every 10 seconds and the only tool that I found is connection_state_remove event,
event connection_state_remove(c: connection)
{
SumStats::observe( "traffic", [$str="all"] [$num=c$orig$num_bytes_ip] );
}
how to deal with those connections that did not removed by the end of this period. How to get statistics from them?

The events you're processing are independent of the time interval at which the SumStats framework reports statistics. First, you need to define what exactly are the statistics you care about — for example, you may want to count the number of connections for which Bro completes processing in a given time interval. Second, you need to define the time interval (in your case, 10 seconds) and how to process the statistical observations in the SumStats framework. This latter part is missing in your snippet: you're only making an observation but not telling the framework what to do with it.
The examples in the SumStats documentation are very close to what you're looking for.

Related

Check number of slots used by a query in BigQuery

Is there a way to check how many slots were used by a query over the period of its execution in BigQuery? I checked the execution plan but I could just see the Slot Time in ms but could not see any parameter or any graph to show the number of slots used over the period of execution. I even tried looking at Stackdriver Monitoring but I could not find anything like this. Please let me know if it can be calculated in some way or if I can see it somewhere I might've missed seeing.
A BigQuery job will report the total number of slot-milliseconds from the extended query stats in the job metadata, which is analogous to computational cost. Each stage of the query plan also indicates input stats for the stage, which can be used to indicate the number of units of work each stage dispatched.
More details about the representation can be found in the REST reference for jobs. See query.statistics.totalSlotMs and statistics.query.queryPlan[].parallelInputs for more information.
BigQuery now provides a key in the Jobs API JSON called "timeline". This structure provides "statistics.query.timeline[].completedUnits" which you can obtain either during job execution or after. If you choose to pull this information after a job has executed, "completedUnits" will be the cumulative sum of all the units of work (slots) utilised during the query execution.
The question might have two parts though: (1) Total number of slots utilised (units of work completed) or (2) Maximum parallel number of units used at a point in time by the query.
For (1), the answer is as above, given by "completedUnits".
For (2), you might need to consider the maximum value of queryPlan.parallelInputs across all query stages, which would indicate the maximum "number of parallelizable units of work for the stage" (https://cloud.google.com/bigquery/query-plan-explanation)
If, after this, you additionally want to know if the 2000 parallel slots that you are allocated across your entire on-demand query project is sufficient, you'd need to find the point in time across all queries taking place in your project where the slots being utilised is at a maximum. This is not a trivial task, but Stackdriver monitoring provides the clearest view for you on this.

Waiting time of SUMO

I am using sumo for traffic signal control, and want to optimize the phase to reduce some objectives. During the process, I use the traci module as an output of states in traffic junction. The confusing part is traci.lane.getWaitingTime.
I don't know how the waiting time is calculated and also after I use two detectors as an output to observe, I think it is too large.
Can someone explain how the waiting time is calculated in SUMO?
The waiting time essentially counts the number of seconds a vehicle has a speed of less than 0.1 m/s. In the case of traci.lane this means it is the number of (nearly) standing vehicles multiplied with the time step length (since traci.lane returns the values for the last step).

How to reduce time allotted for a batch of HITs?

today I created a small batch of 20 categorization HITs with the name Grammatical or Ungrammatical using the web UI. Can you tell me the easiest way to manage this batch so that I can reduce its time allotted to 15 minutes from 1 hour and remove also remove the categorization of masters. This is a very simple task that's set to auto-approve within 1 hour, and I am fine with that. I just need to make it more lucrative for people to attempt this at the penny rate.
You need to register a new HITType with the relevant properties (reduced time and no masters qualification) and then perform a ChangeHITTypeOfHIT operation on all of the HITs in the batch.
API documentation here: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_ChangeHITTypeOfHITOperation.html

How much time is going to take a process - by WCF Service

I’m here with another question this time.
I have an application which builds to move data from one database to another. It also deals with validation & comparison between the databases. When we start moving the data from source to destination it takes a while as it always deals with thousands of records. We use WCF service and SQL server # server side and WPF # client side to handle this.
Now I have a requirement to notify user with the time it is going to take based on the source database no: records (eventually that is what im going to create in the destination database) right before user starts this movement process.
Now my real question, which is the best way we can do this and get an estimated time out of it?
Thanks and appreciated your helps.
If your estimates are going to be updated during the upload process, you can take the time already spent, delete on number of processed records, and multiply by number of remaining records. This will give you an updating average remaining time:
TimeSpan spent = DateTime.Now - startTime;
TimeSpan remaining = (spent / numberOfProcessedRecords) * numberOfRemainingRecords;

SQL connection lifetime

I am working on an API to query a database server (Oracle in my case) to retrieve massive amount of data. (This is actually a layer on top of JDBC.)
The API I created tries to limit as much as possible the loading of every queried information into memory. I mean that I prefer to iterate over the result set and process the returned row one by one instead of loading every rows in memory and process them later.
But I am wondering if this is the best practice since it has some issues:
The result set is kept during the whole processing, if the processing is as long as retrieving the data, it means that my result set will be open twice as long
Doing another query inside my processing loop means opening another result set while I am already using one, it may not be a good idea to start opening too much result sets simultaneously.
On the other side, it has some advantages:
I never have more than one row of data in memory for a result set, since my queries tend to return around 100k rows, it may be worth it.
Since my framework is heavily based on functionnal programming concepts, I never rely on multiple rows being in memory at the same time.
Starting the processing on the first rows returned while the database engine is still returning other rows is a great performance boost.
In response to Gandalf, I add some more information:
I will always have to process the entire result set
I am not doing any aggregation of rows
I am integrating with a master data management application and retrieving data in order to either validate them or export them using many different formats (to the ERP, to the web platform, etc.)
There is no universal answer. I personally implemented both solutions dozens of times.
This depends of what matters more for you: memory or network traffic.
If you have a fast network connection (LAN) and a poor client machine, then fetch data row by row from the server.
If you work over the Internet, then batch fetching will help you.
You can set prefetch count or your database layer properties and find a golden mean.
Rule of thumb is: fetch everything that you can keep without noticing it
if you need more detailed analysis, there are six factors involved:
Row generation responce time / rate(how soon Oracle generates first row / last row)
Row delivery response time / rate (how soon can you get first row / last row)
Row processing response time / rate (how soon can you show first row / last row)
One of them will be the bottleneck.
As a rule, rate and responce time are antagonists.
With prefetching, you can control the row delivery response time and row delivery rate: higher prefetch count will increase rate but decrease response time, lower prefetch count will do the opposite.
Choose which one is more important to you.
You can also do the following: create separate threads for fetching and processing.
Select just ehough rows to keep user amused in low prefetch mode (with high response time), then switch into high prefetch mode.
It will fetch the rows in the background and you can process them in the background too, while the user browses over the first rows.