Splunk query to collect non unique values as comma separated along with group by other columns - splunk

Splunk query <my search_criteria> | stats count by Proxy, API, VERB, ClientApp preparing the below table.
Proxy
API
VERB
ClientApp
count
CUSTOMER_OFFICE_CLIENTS
clients/{clientId}
GET
co_web
5
CUSTOMER_OFFICE_CLIENTS
clients/{clientId}
GET
co_mobile
6
CUSTOMER_OFFICE_CLIENTS
clients/{clientId}
GET
co_tab
4
CUSTOMER_OFFICE_CLIENTS
clients
POST
co_web
57
CUSTOMER_OFFICE_CLIENTS
clients
POST
co_mobile
34
CUSTOMER_OFFICE_CLIENTS
clients
POST
co_tab
50
Is there a way to group by Proxy, API, VERB and collect ClientApp values as comma separated list as follows with splunk query?
Proxy
API
VERB
ClientApp
count
CUSTOMER_OFFICE_CLIENTS
clients/{clientId}
GET
co_web, co_mobile, co_tab
15
CUSTOMER_OFFICE_CLIENTS
clients
POST
co_web, co_mobile, co_tab
141

You could use values() to return all of the unique ClientApp values in each row.
| stats values(ClientApp) count by Proxy, API, VERB
and to get the ClientApp values in a comma-separated list, use the mvjoin function.
| stats values(ClientApp) as ClientApp count by Proxy, API, VERB
| eval ClientApp = mvjoin(ClientApp, ",")

Related

Which http status code is the appropriate for a request with N items and some of them failed validation?

I'm working on API REST service that handle more or less 50 req/s in parallel. Every request come 100 items, some of them are valid and another failed validations
For example, a request to insert a list of 100 products, 10 of them do not meet the specifications of the input schema. I'm wondering what HTTP code would be appropriate to respond considering that the service process the valid items (90 items) and return the ids and error details for failed items (10 items).
201 (Created) , 202 (Accepted), 400 (Bad request), 422 (Unprocessable Entity) or none of them?
Thanks beforehand

How to filter the response via API

Wanted to know if this is possible, I have 2 APIs I am testing.
API 1. Gives a list of total jobs posted by the user.
Response =
"jobId": 15596, "jobTitle": "PHP developer"
API 2. Gives the following response.
"total CVs": 19, "0-7days": 12,"status": "New Resume"
meaning in bucket New resume we have a total of 19 CVs and in 19 Cvs 12 Cvs have an aging of 12 days. This response basically related to the jobs posted.
When i Hit the API i am getting the correct numbers but on front end the API 1 will be used as dropdown to select the jobs and then New Resume, ageing and total Cvs will be shown according to that jobs.
I wanted is it possible to test the two API's togther sort of using filter like on front end or the only way to test is to check if the response i am getting is correct.

Cloudwatch Logs Insights working with multiple #messages

I have the following query with the following output:
Query:
filter #message like /A:|B:/
Output:
[INFO] 2020-07-28T09:20:48.406Z requestid A: [{'Delivery': OK, 'Entry': 12323 }]
[INFO] 2020-07-28T09:20:48.407Z requestid B: {'MyValue':0}
I would like to print ONLY the A message when in the B message 'MyValue' = 0. For the above example, I would have to have the following output
Output:
[INFO] 2020-07-28T09:20:48.406Z requestid A: [{'Delivery': OK, 'Entry': 12323 }]
For the next example
[INFO] 2020-07-28T09:20:48.406Z requestid A: [{'Delivery': OK, 'Entry': 12323 }]
[INFO] 2020-07-28T09:20:48.407Z requestid B: {'MyValue':12}
The output should be empty
I can't do something like this because I miss the A message:
filter #message like /A:|B:/
filter MyValue = 0
Any ideas?
If anyone still interested, there IS ways to get the first and last from grouping by a field. So if you can fit your data into pairs of messages, it might help.
For example, given API Gateway access log (each row is a #message):
2021-09-14T14:09:00.452+03:00 (01c53288-5d25-*******) Extended Request Id: ***************
2021-09-14T14:09:00.452+03:00 (01c53288-5d25-*******) Verifying Usage Plan for request: 01c53288-5d25-*******. API Key: API Stage: **************/dev
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) API Key authorized because method 'ANY /path/{proxy+}' does not require API Key. Request will not contribute to throttle or quota limits
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) Usage Plan check succeeded for API Key and API Stage **************/dev
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) Starting execution for request: 01c53288-5d25-*******
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) HTTP Method: GET, Resource Path: /path/json.json
2021-09-14T14:09:00.468+03:00 (01c53288-5d25-*******) Method completed with status: 304
We can get method, uri and return code from the last 2 rows.
To do this, I parse the relevant data into params, and then get them by doing aggregation by request id (that i also parse)
The magic is: using stats likesortsFirst() and sortsLast() and grouping by #reqid. (AWS Docs
Note: IMO, don't use earliest() and latest() as they depend on built-in #timestamp and worked weird for me where 2 sequential messages had the same timestamp
So, for example, using this query:
filter #message like "Method"
| parse #message /\((?<#reqid>.*?)\) (.*?) (Method: (?<#method>.*?), )?(.*?:)* (?<#data>[^\ ]*)/
| sort #timestamp desc
| stats sortsFirst(#method) as #reqMethod, sortsFirst(#data) as #reqPath, sortsLast(#data) as #reqCode by #reqid
| limit 20
We would get the following desired output:
#reqid #reqMethod #reqPath #reqCode
f42e2b44-b858-45cb-***************** GET /path-******.json 304
fecddb03-3804-4ff5-***************** OPTIONS /path-******.json 200
e8e47185-6280-4e1e-***************** GET /path-******.json 304
e4fa9a0c-6d75-4e26-***************** GET /path-******.json 304

How to delete/remove unfetched URLs from NUTCH Database (CrawlDB)

I want to crawl new URL list using nutch but there are some Un-fetched URL available :
bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
retry 0: 3403
retry 1: 25
retry 2: 2
status 4 (status_redir_temp): 5
status 5 (status_redir_perm): 26
retry 3: 1
status 2 (status_fetched): 704
jobs: {db_stats-job_local_0001={jobName=db_stats, jobID=job_local_0001, counters={Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=227, REDUCE_INPUT_RECORDS=13, SPILLED_RECORDS=26, VIRTUAL_MEMORY_BYTES=0, MAP_INPUT_RECORDS=3431, SPLIT_RAW_BYTES=1059, MAP_OUTPUT_BYTES=181843, REDUCE_SHUFFLE_BYTES=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_INPUT_GROUPS=13, COMBINE_OUTPUT_RECORDS=13, REDUCE_OUTPUT_RECORDS=13, MAP_OUTPUT_RECORDS=13724, COMBINE_INPUT_RECORDS=13724, CPU_MILLISECONDS=0, COMMITTED_HEAP_BYTES=718675968}, File Input Format Counters ={BYTES_READ=0}, File Output Format Counters ={BYTES_WRITTEN=397}, FileSystemCounters={FILE_BYTES_WRITTEN=1034761, FILE_BYTES_READ=912539}}}}
max score: 1.0
status 1 (status_unfetched): 2679
min score: 0.0
status 3 (status_gone): 17
TOTAL urls: 3431
avg score: 0.0043631596
WebTable statistics: done
So, How can i remove it from Nutch Database ?? Thanks
You could use CrawlDbMerger but you would only be able to filter by the URL and not the status, the Generator Job has already support for using jexl expressions but as far as I remember we don't have that feature built in now into the crawl DB.
One way would be to list all the URLs with the status_unfetched (readdb) and write some regex to block them (using the normal URL filter) then you just use the CrawlDbMerger to filter the crawldb with this filter enabled and your URLs should disappear.

JMeter: Resetting Count Value

I'm performing basic API testing of CRUD calls to the database. In JMeter, I have 1 thread with 3 thread groups within my test plan where I've set up Loops and Counters within each thread. The reason for Counters is when saving results to a file, I want to append the file's prefix with the counter value.
The issue is the counters never get reset. So for example:
Where Count = 1 for all groups, I would expect:
Thread Group 1, filename_1.json
Thread Group 2, filename_2.json
Thread Group 3, filename_3.xml
Where Group 1 Count = 3, Group 2 Count = 2, Group 3 Count = 1, I would expect:
Thread Group 1, filename_1.json, filename_2.json, filename_3.json
Thread Group 2, filename_4.json and filename_5.json
Thread Group 3, filename_6.xml
Instead, where Count = 1 for all groups I'm getting results like:
Thread Group 1, filename_11.json
Thread Group 2, filename_14.json
Thread Group 3, filename_18.xml
After much searching and trying multiple suggestions, I'm still not getting what I expect. Below is a sample of how the test plan is configured.
Any suggestions are much appreciated.
Thread Group 1
HTTP Header Manager (application/json)
Loop Controller
Counter (Start=1, Increment=1, Maximum=100, Num Format=null, Ref Name=LoopCounter1)
HTTP Request (CREATE)
RegEx (RefName=newRequest, Reg Ex = "id":(.+?)\,"displayName", Template=$1$, Match No.=1, Default=NONE)
BeanShell Assertion (Name=newRequest, Param=${__setProperty(newRequest,${newRequest},)})
Save Response to file (File prefix=requestResult_${LoopCounter}, Var Name=newRequestFile)
Loop Controller
HTTP Request (READ)
HTTP Request (UPDATE)
HTTP Request (DELETE)
Thread Group 2
HTTP Header Manager (application/json)
Loop Controller
Counter (Start=1, Increment=1, Maximum=100, Num Format=null, Ref Name=LoopCounter2)
HTTP Request (CREATE)
RegEx (RefName=newContractId, Reg Ex = "id":(.+?)\,"terminationType", Template=$1$, Match No.=1, Default=NONE)
BeanShell Assertion (Name=newContractId, Param=${__setProperty(newContractId,${newContractId},)})
Save Response to file (File prefix=contractRecords_${LoopCounter2}, Var Name=newContractFile)
Loop Controller
HTTP Request (READ)
HTTP Request (UPDATE)
HTTP Request (DELETE)
Thread Group 3
HTTP Header Manager (application/xml)
Loop Controller
Counter (Start=1, Increment=1, Maximum=100, Num Format=null, Ref Name=LoopCounter3)
HTTP Request (CREATE)
RegEx (RefName=newPricingId, Reg Ex = "id":(.+?)\,"terminationType", Template=$1$, Match No.=1, Default=NONE)
BeanShell Assertion (Name=newPricingId, Param=${__setProperty(newPricingId,${newPricingId},)})
Save Response to file (File prefix=pricingRecords_${LoopCounter3}, Var Name=newPricingFile)
Loop Controller
HTTP Request (READ)
HTTP Request (UPDATE)
HTTP Request (DELETE)
UPDATE
I'm closer to the desired results. With the "Reset counter on each Thread Group" enabled, I would expect Thread Group 2's Count to reset to 0. However, it continues from the previous thread. I need to reset the Counter within each Thread Group. Here's why:
Thread Group 2
HTTP Header Manager (application/json)
Loop Controller
Counter (Start=1, Increment=1, Maximum=100, Num Format=null, Ref Name=LoopCounter2)
HTTP POST Request (CREATE)
${__FileToString(${payloadArchive}/${__eval(contract_${LoopCounter})}.json,,)}
As you can see, I am passing in a different file into the HTTP Request's body with each loop of Thread Group 2. Each .json file contains unique elements based on the unique constraints of the database. The files are named "contract_01.json", "contract_02.json", "contract_03.json", etc. This is why I want Thread Group 2 to restart it's counter.
SOLVED (?)
The following Counter configurations seem to provide the desired results.
Thread Group 1
HTTP Header Manager (application/json)
Loop Controller
Counter
Start=null,
Increment=null,
Maximum=null,
Num Format=null,
Ref Name=LoopCounter1
HTTP POST Request (CREATE)
Thread Group 2
HTTP Header Manager (application/json)
Loop Controller
Counter
Start=1,
Increment=null,
Maximum=null,
Num Format=null,
Ref Name=LoopCounter2
HTTP POST Request (CREATE)
${__FileToString(${payloadArchive}/${__eval(contract_${LoopCounter})}.json,,)}
As you can see, with each loop of Thread Group 2 I am passing in a different file into the HTTP Request's body. (Each file contains unique elements based on the unique constraints of the database.) The files are named "contract_1.json", "contract_2.json", etc. Hence, why I wanted Thread Group 2 to restart its counter.
It is now working and correctly grabbing the correct file contents with each looping. However, I'm not sure why the Start has to be null for Counter 1 and 1 for Counter 2.
If anyone sees a flaw with this, I'd appreciate knowing why and how to correct it. I've only been using JMeter for 1 week with no Java (or any programming) background.