Following up on this GWT-RPC question (and answer #1) re. field size checking, I would like to know the right way to check pre-deserialization for max data size sent to server, something like if request data size > X then abort the request. Valuing simplicity and based on answer on aforementioned question/answer, I am inclined to believe checking for max overall request size would suffice, finer grained checks (i.e., field level checks) could be deferred to post-deserialization, but I am open to any best-practice suggestion.
Tech stack of interest: GWT-RPC client-server communication with Apache-Tomcat front-end web-server.
I suppose a first step would be to globally limit the size of any request (LimitRequestBody in httpd.conf or/and others?).
Are there finer-grained checks like something that can be set per RPC request? If so where, how? How much security value do finer grain checks bring over one global setting?
To frame the question more specifically with an example, let's suppose we have the two following RPC request signatures on the same servlet:
public void rpc1(A a, B b) throws MyException;
public void rpc2(C c, D d) throws MyException;
Suppose I approximately know the following max sizes:
a: 10 kB
b: 40 kB
c: 1 M B
d: 1 kB
Then I expect the following max sizes:
rpc1: 50 kB
rpc2: 1 MB
In the context of this example, my questions are:
Where/how to configure the max size of any request -- i.e., 1 MB in my above example? I believe it is LimitRequestBody in httpd.conf but not 100% sure whether it is the only parameter for this purpose.
If possible, where/how to configure max size per servlet -- i.e., max size of any rpc in my servlet is 1 MB?
If possible, where/how to configure/check max size per rpc request -- i.e., max rpc1 size is 50 kB and max rpc2 size is 1 MB?
If possible, where/how to configure/check max size per rpc request argument -- i.e., a is 10 kB, b is 40 kB, c is 1 MB, and d is 1 kB. I suspect it makes practical sense to do post-deserialization, doesn't it?
For practical purposes based of cost/benefit, what level of pre-deserialization checking is generally recommended -- 1. global, 2. servlet, 3. rpc, 4. object-argument? Stated differently, what is roughly the cost-complexity on one hand and the added value on the other hand of each of the above pre-deserialization level checks?
Thanks much in advance.
Based on what I have learned since I asked the question, my own answer and strategy until someone can show me better is:
First line of defense and check is Apache's LimitRequestBody set in httpd.conf. It is the overall max for all rpc calls across all servlets.
Second line of defense is servlet pre-deserialization by overriding GWT AbstractRemoteServiceServlet.readContent. For instance, one could do it as shown further below I suppose. This was the heart of what I was fishing for in this question.
Then one can further check each rpc call argument post-deserialization. One could conveniently use the JSR 303 validation both on the server and client side -- see references StackOverflow and gwt r.e. client side.
Example on how to override AbstractRemoteServiceServlet.readContent:
#Override
protected String readContent(HttpServletRequest request) throws ServletException, IOException
{
final int contentLength = request.getContentLength();
// _maxRequestSize should be large enough to be applicable to all rpc calls within this servlet.
if (contentLength > _maxRequestSize)
throw new IOException("Request too large");
final String requestPayload = super.readContent(request);
return requestPayload;
}
See this question in case the max request size if > 2GB.
From a security perspective, this strategy seems quite reasonable to me to control the size of data users send to server.
Related
I am using below settings:
allowOverwrite: false
nodeParallelOperations: 1
autoFlushFrequency: 10
perNodeBufferSize: 5000000
My records size is around 2000 bytes. And see the "grid-data-loader-flusher"
thread stats as below:
Thread Count Average Longest Duration
grid-data-loader-flusher-#100 38 4,737,793.579 30,427,862 180,036,156
What would be the best configurations for Data streamer?
Thanks
Its good to have parallel streaming mode for data streamer. You can achieve this by collecting you key-value records in java Map and call the streamer.addData() method in parallel mode over that map. Here is the snippet.
maptoStream.entrySet().parallelStream().forEach(streamer::addData);
Also, if you are setting allowOverWrite to false then you cant use your custom stream receiver to process your collection of records. In this case it will skip the record(s) if it is already there in cache.
Regarding buffersize, you need to wait till buffer gets full each time to get it flushed automatically to cache. flush frequency comes to your rescue in this case and it will do periodic flushing. so whatever condition first satisfies(either buffer gets full or flush frequency reach) it will do flush. I preferred calling manual flush after above method call.
I observed that streamer works well with much more big collection on which you will call streamer.addData() method in parallel.
I to want get and historize queue metrics for the "Enqueued, Dequeued an Size" (Terminology formerly met on ActiveMQ).
The moving charts provided in the management plugin are not enough for the monitoring that I need to do.
So with RabbitMQ, I'm getting data from https://rabbitmq-server:15672/api/queues/myvhost
This returns json.. for a queue, I can obtain real life production data like :
"messages":0, // for "Size"
"message_stats":{
"deliver_get":171528, // for "Dequeued"
"ack":162348,
"redeliver":9513,
"deliver_no_ack":0,
"deliver":171528,
"get":0,
"publish":51293 // for "Enqueued"
(...)
I'm in particular surprised by the publish counter:
Its value can even decrease between 2 measures done with a couple of minutes of delay ! (see sample chart around 17:00)
As you can see on my data, the deliver_get is significantly larger than the publish.
https://my-rabbitmq:15672/doc/stats.html doesn't give a lot of details that could explain what I actually notice.
Also, under the message_stats object that I obtain, I'm missing the some counters like confirm and return which could be related to the enqueuing.
Are there relationships between these metrics ? (like deliver_get + messages = redeliver + publish.. but that one doesn't work with my figures)
Is there another more detailed documentation about these metrics ?
I wrote a sequential REST API crawler in http4s & fs2 here:
https://gist.github.com/NicolasRouquette/656ed7a2d6984ce0995fd78a3aec2566
This is to query a REST API service to get a starting set of IDs, fetch elements for a batch of IDs and continue based on the cross-reference IDs found in these elements until there are no new IDs to fetch and return a map of all elements fetched.
This works; however, the performance is inadequate -- too slow!
Since I don't have access to the server, I tried experimenting with varying batch sizes, from 10, 50, 100, 200, 500 and even batching all IDs in a single query. Query time increases significantly with batch size.
At large sizes (500 and all), I even got HTTP 500 responses from the server.
I would like to experiment with batching parallel queries in a load-balancing fashion using a pool of threads; however, it is unclear to me how to do this based on the fs2 docs.
Can someone provide suggestions how to achieve this?
Regarding using http4s & fs2: Well, I found this library fairly easy to use for simple client-side programming. Given the emphasis on supporting tasks, streams, etc..., I presume that batching parallel queries should be doable somehow.
fs2.concurrent.join will allow you to run multiple streams concurrently. The specific section in the guide is available at https://github.com/functional-streams-for-scala/fs2/blob/v0.9.7/docs/guide.md#concurrency
For your use case you could take your queue of ids, chunk them, create a http task and then wrap it in a stream. You would then run this stream of streams concurrently with join and combine the results.
def createHttpRequest(ids: Seq[ID]): Task[(ElementMap, Set[ID])] = ???
def fetch(queue: Set[ID]): Task[(ElementMap, Set[ID])] = {
val resultStreams = Stream.emits(queue.toSeq)
.vectorChunkN(batchSize)
.map(createHttpRequest)
.map(Stream.eval)
val resultStream = fs2.concurrent.join(maxOpen)(resultStreams)
resultStream.runFold((Map.empty[ID, Element], Set.empty[ID])) {
case ((a, b), (_a, _b)) => (a ++ _a, b ++ _b)
}
}
I am stuck at a point where I am configuring the DCM module and the current parameter I am trying to configure DcmTimStrP2AdjustServer,
The requirement is P2CAN_SERVER_MAX = 25ms; P2STARCAN_SERVER_MAX = 5000ms;
Is DcmDspSessionP2ServerMax the same as P2CAN_SERVER_MAX? and if it is the same
What is the need for DcmTimStrP2AdjustServer and how do I find the best value for DcmTimStrP2AdjustServer.(The values all should be a multiple of DcmTaskTime which I find to be logical).
DcmTaskTime = 5ms;
I am following Autosar 4.0.3, using ETAS tool for configuring the parameters.
To fulfill your requirement, you need to configure respectively
DcmDspSessionP2ServerMax & DcmDspSessionP2StarServerMax for each session control in the DcmDspSessionRows at Dcm/DcmConfigSet/DcmDsp/DcmDspSession/.
i.e.
DcmDspSessionP2ServerMax 25
DcmDspSessionP2StarServerMax 5000
There is no DcmTimStrP2AdjustServer, but I guess you're referring to DcmTimStrP2ServerAdjust instead. DcmTimStrP2ServerAdjust & DcmTimStrP2StarServerAdjust should be configured to a multiple of your DcmTaskTime (5ms in your case, so i.e. 5ms, 10ms, 15, ms, ... is applicable) and are used to safeguard that the response is available on the bus before triggering the P2 or P2* timeouts. In your case you may want to set these values to the same values as in the DcmDspSessionRows if there is no other specification given, because the chosen timeout values there are already multiples of your DcmTaskTime:
DcmTimStrP2ServerAdjust 25
DcmTimStrP2StarServerAdjust 5000
The adjust value is an internal value, in order to adjust the delay between the Dcm Transmit Request and the message being actually on the Bus.
The definition of P2ServerMax and P2*ServerMax and their corresponding Adjust values is the same:
This parameter is used to guarantee that the diagnostic response is available on the bus before reaching P2 by adjusting the current DcmDspSessionP2ServerMax. This parameter mainly represents the software architecture dependent communication delay between the time the transmission is initiated by DCM and the time when the message is actually transmitted to the bus
I have a job that periodically does some work involving ServerXmlHttpRquest to perform an HTTP POST. The job runs every 60 seconds.
And normally it runs without issue. But there's about a 1 in 50,000 chance (every two or three months) that it will hang:
IXMLHttpRequest http = new ServerXmlHttpRequest();
http.open("POST", deleteUrl, false, "", "");
http.send(stuffToDelete); <---hang
When it hangs, not even the Task Scheduler (with the option enabled to kill the job if it takes longer than 3 minutes to run) can end the task. I have to connect to the remote customer's network, get on the server, and use Task Manager to kill the process.
And then its good for another month or three.
Eventually i started using Task Manager to create a process dump,
so i could analyze where the hang is. After five crash dumps (over the last 11 months or so) i get a consistent picture:
ntdll.dll!_NtWaitForMultipleObjects#20()
KERNELBASE.dll!_WaitForMultipleObjectsEx#20()
user32.dll!MsgWaitForMultipleObjectsEx()
user32.dll!_MsgWaitForMultipleObjects#20()
urlmon.dll!CTransaction::CompleteOperation(int fNested) Line 2496
urlmon.dll!CTransaction::StartEx(IUri * pIUri, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4453 C++
urlmon.dll!CTransaction::Start(const wchar_t * pwzURL, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4515 C++
msxml3.dll!URLMONRequest::send()
msxml3.dll!XMLHttp::send()
Contoso.exe!FrobImporter.TFrobImporter.DeleteFrobs Line 971
Contoso.exe!FrobImporter.TFrobImporter.ImportCore Line 1583
Contoso.exe!FrobImporter.TFrobImporter.RunImport Line 1070
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.HandleFrobImport Line 433
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.CoreExecute Line 71
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.Execute Line 84
Contoso.exe!Contoso.Contoso Line 167
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!__RtlUserThreadStart()
ntdll.dll!__RtlUserThreadStart#8()
So i do a ServerXmlHttpRequest.send, and it never returns. It will sit there for days (causing the system to miss financial transactions, until come Sunday night i get a call that it's broken).
It is of no help unless someone knows how to debug code, but the registers in the stalled thread at the time of the dump are:
EAX 00000030
EBX 00000000
ECX 00000000
EDX 00000000
ESI 002CAC08
EDI 00000001
EIP 732A08A7
ESP 0018F684
EBP 0018F6C8
EFL 00000000
Windows Server 2012 R2
Microsoft IIS/8.5
Default timeouts of ServerXmlHttpRequest
You can use serverXmlHttpRequest.setTimeouts(...) to configure the four classes of timeouts:
resolveTimeout: The value is applied to mapping host names (such as "www.microsoft.com") to IP addresses; the default value is infinite, meaning no timeout.
connectTimeout: A long integer. The value is applied to establishing a communication socket with the target server, with a default timeout value of 60 seconds.
sendTimeout: The value applies to sending an individual packet of request data (if any) on the communication socket to the target server. A large request sent to a server will normally be broken up into multiple packets; the send timeout applies to sending each packet individually. The default value is 30 seconds.
receiveTimeout: The value applies to receiving a packet of response data from the target server. Large responses will be broken up into multiple packets; the receive timeout applies to fetching each packet of data off the socket. The default value is 30 seconds.
The KB305053 (a server that decides to keep the connection open will cause serverXmlHttpRequest to wait for the connection to close) seems like it plausibly could be the issue. But the 30 second default timeout would have taken care of that.
Possible workaround - Add myself to a Job
The Windows Task Scheduler is unable to terminate the task; even though the option is enabled to do do.
I will look into using the Windows Job API to add my self process to a job, and use SetInformationJobObject to set a time limit on my process:
CreateJobObject
AssignProcessToJobObject
SetInformationJobObject
to limit my process to three minutes of execution time:
PerProcessUserTimeLimit
If LimitFlags specifies
JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process
user-mode execution time limit, in 100-nanosecond ticks. Otherwise,
this member is ignored.
The system periodically checks to determine
whether each process associated with the job has accumulated more
user-mode time than the set limit. If it has, the process is
terminated.
If the job is nested, the effective limit is the most
restrictive limit in the job chain.
Although since Task Scheduler uses Job objects to also limit a task's time, i'm not hopeful that the Job Object can limit a job either.
Edit: Job objects cannot limit a process by process time - only user time. And with a process idle waiting for an object, it will not accumulate any user time - certainly not three minutes worth.
Bonus Reading
How can a ServerXMLHTTP GET request hang? (GET, not POST)
KB305053: ServerXMLHTTP Stops Responding When You Send a POST Request (which says the timeout should expire; where mine does not)
MS Forums: oHttp.Send - Hangs (HEAD, not POST)
MS Forums: ASP to test SOAP WebService using MSXML2.ServerXMLHTTP Send hangs
CC to MS Support Forums
Consider switching to a newer, supported API.
msxml6.dll using MSXML2.ServerXMLHTTP.6.0
winhttpcom.dll using WinHttp.WinHttpRequest.5.1.
The msxml3.dll library is no longer supported and is only kept around for compatibility reasons. Plus, there were a number of security and stability improvements included with msxml4.dll (and newer) that you are missing out on.