Summation of sub document field based on a group unexpected behaviour - mongodb-query

*I have a document structured in which we embedded the salary Document into the Employee_Detail Document.As per the MongoDb documantation ,we can use $Unwind to deconstruct the Document and use aggregated pipeline...But its not working. i am using the below script...*
{
"_id" : ObjectId("5763d4a54da83b98f269878a"),
"First_Name" : "fgfg",
"Department" : "QA",
"Salary" : {
"HRA" : "1200",
"Basic" : "2000",
}
})
And i want to get sum of basic salary based on department Like
then Expected output is
Department Total_Basic**
QA 2000
I have used the following code to get the output. I have used the $unwind to deconstruct the document.and use aggregated pipeline to group the department(Sum of basic Salary).
db.Employee_Detail.aggregate([
{$unwind:"$Salary"}, {$group: {"_id": "$Department", total_Basic: {$sum: "$Salary.Basic" }
}}
])
But i get the below Result.
Department Total_Basic
QA 0
I think $unwind is not Working. Please suggest

Your main problem is the type of the field Basic is a string. Second, you do not need to use unwind unless the field Salary contains an array.
So perform an update to convert the types of Basic and HRA to floats, (see this stackoverflow question)
And then an aggregate operation like this will give you the desired result:
db.Employee_Detail.aggregate([
{$group: {"_id": "$Department", total_Basic: {$sum: "$Salary.Basic" }}
])

Related

Get a json attribute with variable name in Postgres

not an SQL guru here.
Trying to write a query that gets a few columns of a table, and only the value "icon" of the json column below (named weather). I got to a pointwhere i can list all the attributes listed right after sessions, which are timestamps, but no luck in iterating them and joining to the rest of the table.
I also have the feeling that it wasn't very clever to store that value as an attribute name, especially as it's already stored in the "dt" value.
Can anybody confirm if this is best practice or not?
And could somebody help me get the "icon" value?
{
"lat":43.6423,
"lon":-72.2518,
"timezone":"America/New_York",
"timezone_offset":-14400,
"sessions":{
"1651078174":{
"dt":1651078174,
"sunrise":1651052825,
"sunset":1651103155,
"temp":48.45,
"feels_like":43.63,
"pressure":1009,
"humidity":68,
"dew_point":38.39,
"uvi":5,
"clouds":100,
"visibility":10000,
"wind_speed":11.5,
"wind_deg":310,
"weather":[
{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04d"
}
]
}
}
}
If you have only 1 icon and multiple sessions you can run
if you have multiple icon you need to apply anothe CTE layer to extract them with json_Each
for more json function see https://www.postgresql.org/docs/current/functions-json.html
wITH CTE AS (
select value from json_each('{
"lat":43.6423,
"lon":-72.2518,
"timezone":"America/New_York",
"timezone_offset":-14400,
"sessions":{
"1651078174":{
"dt":1651078174,
"sunrise":1651052825,
"sunset":1651103155,
"temp":48.45,
"feels_like":43.63,
"pressure":1009,
"humidity":68,
"dew_point":38.39,
"uvi":5,
"clouds":100,
"visibility":10000,
"wind_speed":11.5,
"wind_deg":310,
"weather":[
{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04d"
}
]
}
}
}
') WHERE key = 'sessions')
SELECT json_data.key session,
json_data.value-> 'weather' -> 0 ->> 'icon' FROM CTE,json_each(CTE.value) json_data
session | ?column?
:--------- | :-------
1651078174 | 04d
db<>fiddle here

How to use a function in select along with all the records in Sequalize?

Here is a Sequalize query below which retrieves a transformed value based on the table column value.
courses.findAll({
attributes: [ [sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days']]
});
The above sequlaize query will return result equal to as followed SQL query.
select to_char(bs.session_date, 'Day') as days from courses bs;
Expected output:
I want the transformed value which is in attributes along with all records like below. I know we can mention all the column names in attributes array but it is a tedious job. Any shortcut similar to asterisk in SQL query.
select to_char(bs.session_date, 'Day') as days,* from courses bs;
I tried the below sequalize query but no luck.
courses.findAll({
attributes: [ [sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days'],'*']
});
The attributes option can be passed an object as well as an array of fields for finer tuning in situations like this. It's briefly addressed in the documentation.
courses.findAll({
attributes: {
include: [
[ sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days' ]
]
}
});
By using include we're adding fields to the courses.* selection. Likewise we can also include an exclude parameter in the attributes object which will remove fields from the courses.* selection.
There is one shortcut to achieve the asterisk kind of selection in Sequalize. Which can be done as follows...
// To get all the column names in an array
let attributes = Object.keys(yourModel.rawAttributes);
courses.findAll({
attributes: [...attributes ,
[sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days']]
});
This is a work around there may be a different option.

Using Athena to get terminatingrule from rulegrouplist in AWS WAF logs

I followed these instructions to get my AWS WAF data into an Athena table.
I would like to query the data to find the latest requests with an action of BLOCK. This query works:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
My issue is cleanly identifying the "terminatingrule" - the reason the request was blocked. As an example, a result has
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
And
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
The piece of data I would like separated into a column is rulegrouplist[terminatingrule].ruleid which has a value of NoUserAgent_HEADER
AWS provide useful information on querying nested Athena arrays, but I have been unable to get the result I want.
I have framed this as an AWS question but since Athena uses SQL queries, it's likely that anyone with good SQL skills could work this out.
It's not entirely clear to me exactly what you want, but I'm going to assume you are after the array element where terminatingrule is not "null" (I will also assume that if there are multiple you want the first).
The documentation you link to say that the type of the rulegrouplist column is array<string>. The reason why it is string and not a complex type is because there seems to be multiple different schemas for this column, one example being that the terminatingrule property is either the string "null", or a struct/object – something that can't be described using Athena's type system.
This is not a problem, however. When dealing with JSON there's a whole set of JSON functions that can be used. Here's one way to use json_extract combined with filter and element_at to remove array elements where the terminatingrule property is the string "null" and then pick the first of the remaining elements:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
You say you want the "latest", which to me is ambiguous and could mean both first non-null and last non-null element. The query above will return the first non-null element, and if you want the last you can change the second argument to element_at to -1 (Athena's array indexing starts from 1, and -1 is counting from the end).
To return the individual ruleid element of the json:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
I had the same issue but the solution posted by Theo didn't work for me, even though the table was created according to the instructions linked to in the original post.
Here is what worked for me, which is basically the same as Theo's solution, but without the json conversion:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist,
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;

SQL to retrieve userid's from a json from a column in a table

I have a table with columns dep_id and dep_value.
dep_value has the data which is JsonData and it looks like this :
{
"users": [{
"uid": "0"
}, {
"uid": "1"
}, {
"uid": "2"
}]
}
I need a sql which can extract all the values .. 0,1,2
I tried using regex in SQL but I am not sure how to pattern match in SQL.
SELECT
REGEXP_count(dep_value,'uid') as user_count
FROM (
select dep_value from users where dep_id = '123'
)
;
I used this SQL to get the count of uid, similarly I need to get what uid's they are.
Prior to 12c, the module apex_json is useful to parse JSONs without relying on Regular expressions. Please refer to this answer to find a good example of the APEX_JSON module in action .
If your requirement is limited to only parse all the available uids from the json string regardless of what depth they reside within the JSON, a Regex solution like this may be used.
SELECT
REGEXP_SUBSTR(dep_value,'"uid" *?: *?"(\d+)"',1,level,null,1) as user_count
from users where dep_id = 123
connect by level <= REGEXP_COUNT(dep_value,'"uid" *?: *?"(\d+)"')
and prior dep_id = dep_id -- You may skip these 2 lines while
and prior sys_guid() is not null; --running for single, unique dep_id
Demo

Query data inside an attribute array in a json column in Postgres 9.6

I have a table say types, which had a JSON column, say location that looks like this:
{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
} ...
]
}
Each row in the table has the data, and all have the "type": "state" in it. I want to just extract the value of "type": "state" from every row in the table, and put it in a new column. I checked out several questions on SO, like:
Query for element of array in JSON column
Index for finding an element in a JSON array
Query for array elements inside JSON type
but could not get it working. I do not need to query on this. I need the value of this column. I apologize in advance if I missed something.
create table t(data json);
insert into t values('{"attribute":[{"type": "state","value": "CA"},{"type": "distance","value": "200.00"}]}'::json);
select elem->>'value' as state
from t, json_array_elements(t.data->'attribute') elem
where elem->>'type' = 'state';
| state |
| :---- |
| CA |
dbfiddle here
I mainly use Redshift where there is a built-in function to do this. So on the off-chance you're there, check it out.
redshift docs
It looks like Postgres has a similar function set:
https://www.postgresql.org/docs/current/static/functions-json.html
I think you'll need to chain three functions together to make this work.
SELECT
your_field::json->'attribute'->0->'value'
FROM
your_table
What I'm trying is a json extract by key name, followed by a json array extract by index (always the 1st, if your example is consistent with the full data), followed finally by another extract by key name.
Edit: got it working for your example
SELECT
'{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
}
]
}'::json->'attribute'->0->'value'
Returns "CA"
2nd edit: nested querying
#McNets is the right, better answer. But in this dive, I discovered you can nest queries in Postgres! How frickin' cool!
I stored the json as a text field in a dummy table and successfully ran this:
SELECT
(SELECT value FROM json_to_recordset(
my_column::json->'attribute') as x(type text, value text)
WHERE
type = 'state'
)
FROM dummy_table