Get sum as an additional column instead of only column - sql

I'd like to sum up all the order amounts, grouped per user from my database, for a specific set of users.
I'm using .sum() and .groupBy() to do this, like so:
knex('orders')
.select(['id_user', 'order_amount'])
.whereIn('id_user', ['user-foo', 'user-bar'])
.sum('order_amount')
.groupBy('id_user')
This returns the sums:
[ { sum: 500 } ]
[ { sum: 600 } ]
But now there's no way to know which sum corresponds to which user.
This would be my ideal result:
[ { id_user: 'user-foo', sum: 500 } ]
[ { id_user: 'user-bar', sum: 600 } ]
How can I also get the id_user column for each sum?

You'll need to use knex.raw() for that:
knex.select('id_user', knex.raw('SUM(order_amount)')).from('orders').groupBy('id_user');

Related

Can I create an array of values in one column, based on array overlaps in another column?

I have two datasets that I'm trying to consolidate to represent all of the unique touch points for a given user. I've gotten as far as using ARRAY_AGG to aggregate everything down to a single session identifier, but now I want to consolidate the identifiers themselves and am stuck.
The source data looks like this:
Session_GUID
User_GUID
Interaction_GUID
Session_1
User_1
Interact_A
Session_1
User_1
Interact_B
Session_1
User_2
Interact_C
Session_2
User_2
Interact_D
Session_3
User_3
Interact_C
Session_4
User_4
Interact_E
And I've aggregated it down with a simple
SELECT
Session,
ARRAY_AGG(DISTINCT User_GUID),
ARRAY_AGG(DISTINCT Interaction_GUID)
FROM
source_table
GROUP BY
Session
Which gets me here:
Session
User_GUID_Array
Interaction_GUID_Array
Session_1
[ User_1, User_2 ]
[ Interact_A, Interact_B, Interact_C ]
Session_2
[ User_2 ]
[ Interact_D ]
Session_3
[ User_3 ]
[ Interact_C ]
Session_4
[ User_4 ]
[ Interact_E ]
Now I'd like to aggregate everything based on matches in either of the two arrays.
So from the above, Session_1 and Session_2 get grouped together based on User_GUID matches, and Session_3 gets added too based on Interaction_GUID matches.
This seems like it should be do-able based on some sort of "do another ARRAY_AGG if these intersect/overlap conditions are met" logic. But I'm at the limits of my SQL knowledge and haven't been able to figure it out.
The end result I'm looking for is this:
Session_Array
User_GUID_Array
Interaction_GUID_Array
[ Sessionion_1, Session_2, Session_3 ]
[ User_1, User_2, User_3 ]
[ Interact_A, Interact_B, Interact_C, Interact_D ]
[ Session_4 ]
[ User_4 ]
[ Interact_E ]
Grouping by more than one column usually requires a recursive CTE, but in this case the grouping is by array intersection. One way to accomplish this is with a user defined table function that maintains two two-dimensional arrays, one for each column. As a row goes through, the function checks to see if it's seen the values before (in either of the two two-dimensional arrays). If it has seen at least one value in one of the groups, it returns the group number. A CTE can then use those group numbers for the array_union aggregation.
This approach will only work for small partitions. In this example the partition is "1", which means the entire table. If the table is large, the UDTF will run out of memory. This approach requires a partition such as a date, ID of some sort, etc. that keeps the partitions small (a few thousand rows perhaps). If the partitions are significantly larger than that, this approach won't work.
create or replace function GROUP_ANY(ARR1 array, ARR2 array)
returns table(GROUP_NUMBER float)
language javascript
as
$$
{
initialize: function (argumentInfo, context) {
this.isInitialized = false;
this.groupNumber = 0;
this.arr1 = [1];
this.arr2 = [1];
},
processRow: function (row, rowWriter, context) {
var arraysIntersect;
var g;
if(!this.isInitialized) {
this.isInitialized = true;
this.arr1[0] = row.ARR1;
this.arr2[0] = row.ARR2;
} else {
arraysIntersect = false;
for (g=0; g<=this.groupNumber; g++) {
if(arraysOverlap(this.arr1[g], row.ARR1) || arraysOverlap(this.arr2[g], row.ARR2)) {
this.arr1[g] = this.arr1[g].concat(row.ARR1);
this.arr2[g] = this.arr2[g].concat(row.ARR2);
arraysIntersect = true;
}
if (arraysIntersect) {
break;
}
}
if(!arraysIntersect){
this.arr1.push(row.ARR1);
this.arr2.push(row.ARR2);
this.groupNumber++;
}
}
if (arraysIntersect) {
rowWriter.writeRow({GROUP_NUMBER:g});
} else {
rowWriter.writeRow({GROUP_NUMBER:this.groupNumber});
}
function arraysOverlap(arr1, arr2) {
return arr1.some(r=> arr2.includes(r));
}
},
finalize: function (rowWriter, context) {/*...*/},
}
$$;
create or replace table T1(Session_GUID string, User_GUID string, Interaction_GUID string);
insert into T1 (Session_GUID, User_GUID, Interaction_GUID) values
('Session_1', 'User_1', 'Interact_A'),
('Session_1', 'User_1', 'Interact_B'),
('Session_1', 'User_2', 'Interact_C'),
('Session_2', 'User_2', 'Interact_D'),
('Session_3', 'User_3', 'Interact_C'),
('Session_4', 'User_4', 'Interact_E'),
('Session_5', 'User_5', 'Interact_F'),
('Session_6', 'User_4', 'Interact_G'),
('Session_7', 'User_6', 'Interact_E'),
('Session_8', 'User_8', 'Interact_H')
;
with SESSIONS as
(
select Session_GUID
,array_unique_agg(User_GUID) USER_GUID
,array_unique_agg(Interaction_GUID) INTERACTION_GUID
from T1
group by Session_GUID
), GROUPS as
(
select * from SESSIONS, table(group_any(USER_GUID, INTERACTION_GUID)
over (partition by 1 order by SESSION_GUID ))
)
select array_agg(SESSION_GUID) SESSION_GUIDS
,array_union_agg(USER_GUID) USER_GUIDS
,array_union_agg(INTERACTION_GUID) INTERACTION_GUIDS
from GROUPS
group by GROUP_NUMBER
;
Output:
SESSION_GUIDS
USER_GUIDS
INTERACTION_GUIDS
[ "Session_5" ]
[ "User_5" ]
[ "Interact_F" ]
[ "Session_1", "Session_2", "Session_3" ]
[ "User_1", "User_2", "User_3" ]
[ "Interact_A", "Interact_B", "Interact_C", "Interact_D" ]
[ "Session_8" ]
[ "User_8" ]
[ "Interact_H" ]
[ "Session_4", "Session_6", "Session_7" ]
[ "User_4", "User_6" ]
[ "Interact_E", "Interact_G" ]

Querying Line Items of Order with JSON Functions in BigQuery

I am banging my head head here for the past 2 hours with all the available JSON_... functions in BigQuery. I've read quite a few questions here but no matter why I try, I never succeed in extracting the "amounts" from my JSON below.
This is my JSON stored in a BQ column:
{
"lines": [
{
"id": "70223039-83d6-463d-a482-7ce4d50bf0fc",
"charges": [
{
"type": "price",
"amount": 50.0
},
{
"type": "discount",
"amount": -40.00
}
]
},
{
"id": "70223039-83d6-463d-a482-7ce4d50bf0fc",
"charges": [
{
"type": "price",
"amount": 20.00
},
{
"type": "discount",
"amount": 0.00
}
]
}
]
}
Imagine the above being an order containing multiple items.
I am trying to get a sum of all amounts => 50-40+20+0. The result needs to be 30 = the total order price.
Is it possible to pull all the amount values and then have them summed up just via SQL without any custom JS functions? I guess the summing is the easy part - getting the amounts into an array is the challenge here.
Use below
select (
select sum(cast(json_value(charge, '$.amount') as float64))
from unnest(json_extract_array(order_as_json, '$.lines')) line,
unnest(json_extract_array(line, '$.charges')) charge
) total
from your_table
if applied to sample data in y our question - output is

Fetching second maximum date json object(postgres)

I'm trying to fetch second max date json data from an json column..
Here is jsonb column
--------
value
--------
{
"id": "90909",
"records": [
{
"name":"john",
"date": "2016-06-16"
},
{
"name":"kiran",
"date": "2017-06-16"
},
{
"name":"koiy",
"date": "2018-06-16"
}
]
}
How to select the second maximum date json object in the jsonb column..
expected output:-
{
"name":"kiran",
"date": "2017-06-16"
}
and if we have only one object inside the records means that will be the second max date
and any suggestions would also helpful..
one way of doing it could be, get all the dates in an array, then sort the array in DESC and then get the second date. all these steps can be done in a single query

MongoDB Aggregate $min not calculating

I am using the aggregate:
db.quantum_auto_keys.aggregate([
{$match: {table_name: 'PIZZA_ORDERS'}},
{
$group: {
_id: { onDate: { $dateToString: {format: "%Y-%m-%d", date: '$created_on', timezone: 'America/Los_Angeles'}}, table_name: '$table_name' },
min: { $min: '$last_number' }
}},
{$sort: {_id: 1}}
]);
It ignores the onDate grouping and returns the min for the collection where table_name = PIZZA_ORDERS.
When I use $max it calculates the maximum pizza orders by day. $count also returns the number of orders per day correctly.
How should I go about getting the minimum and maximum values via Aggregate or is there a different way to get that information from my collection?
I updated to MongoDB 4.4 and the table_name was not the correct group. Changing it to include another field got min calculation to be correct.

Rally Lookback: help fetching all history based on future state

Probably a lookback newbie question, but how do I return all of the history for stories based on an attribute that gets set later in their history?
Specifically, I want to load all of the history for all stories/defects in my project that have an accepted date in the last two weeks.
The following query (below) doesn't work because it (of course) only returns those history records where accepted date matches the query. What I actually want is all of the history records for any defect/story that is eventually accepted after that date...
filters :
[
{
property: "_TypeHierarchy",
value: { $nin: [ -51009, -51012, -51031, -51078 ] }
},
{
property: "_ProjectHierarchy",
value: this.getContext().getProject().ObjectID
},
{
property: "AcceptedDate",
value: { $gt: Ext.Date.format(twoWeeksBack, 'Y-m-d') }
}
]
Thanks to Nick's help, I divided this into two queries. The first grabs the final history record for stories/defects with an accepted date. I accumulate the object ids from that list, then kick off the second query, which finds the entire history for each object returned from the first query.
Note that I'm caching some variables in the "window" scope - that's my lame workaround to the fact that I can't ever quite figure out the context of "this" when I need it...
window.projectId = this.getContext().getProject().ObjectID;
I also end up flushing window.objectIds (where I store the results from the first query) when I exec the query, so I don't accumulate results across reloads. I'm sure there's a better way to do this, but I struggle with scope in javascript.
filter for first query
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "AcceptedDate",
value : {
$gt : Ext.Date.format(monthBack, 'Y-m-d')
}
}, {
property : "_ValidTo",
value : {
$gt : '3000-01-01'
}
} ]
Filter for second query:
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "ObjectID",
value : {
$in : window.objectIds
}
}, {
property : "c_Kanban",
value : {
$exists : true
}
} ]
Here's an alternative query that will return only the snapshots that represent transition into the Accepted state.
find:{
_TypeHierarchy: { $in : [ -51038, -51006 ] },
_ProjectHierarchy: 999999,
ScheduleState: { $gte: "Accepted" },
"_PreviousValues.ScheduleState": {$lt: "Accepted", $exists: true},
AcceptedDate: { $gte: "2014-02-01TZ" }
}
A second query is still required if you need the full history of the stories/defects. This should at least give you a cleaner initial list. Also note that Project: 999999 limits to the given project, while _ProjectHierarchy finds stories/defects in the child projects, as well.
In case you are interested, the query is similar to scenario #5 in the Lookback API documentation at https://rally1.rallydev.com/analytics/doc/.
If I understand the question, you want to get stories that are currently accepted, but you want that the returned results include snapshots from the time when they were not accepted. Before you write code, you may test an equivalent query in the browser and see if the results look as expected.
Here is an example - you will have to change OIDs.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"_ProjectHierarchy":12352608219,"_TypeHierarchy":"HierarchicalRequirement","ScheduleState":"Accepted",_ValidFrom:{$gte: "2013-11-01",$lt: "2014-01-01"}}},sort:[{"ObjectID": 1},{_ValidFrom: 1}]&fields=["Name","ScheduleState","PlanEstimate"]&hydrate=["ScheduleState"]
You are correct that a query like this: find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}
will return one snapshot per story that satisfies it.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}&fields=true&start=0&pagesize=1000
but a query like this: find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}
will return the whole history of the 3 artifacts:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}&fields=true&start=0&pagesize=1000
To illustrate, here are some numbers: the last query returns 17 results for me. I check each story's revision history, and the number of revisions per story are 5, 5, 7 respectively, sum of which is equal to the total result count returned by the query.
On the other hand the number of stories that meet find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}} is 13. And the query based on the accepted date returns 13 results, one snapshot per story.