How to aggregate based on other rows that share a key?

How to aggregate based on other rows that share a key? - sql

I have a table in the following format:
I feel like this should be simple but I'm struggling to come up with a performant query that can perform aggregations based on other rows with a shared key. For example, I want to sum the rows for a user with the key MediaLength but only if the rows with the key Score that share the event_id are greater than or equal to 3.
The result from a simple sum:
SELECT SUM(value::float) FROM data WHERE key = 'MediaLength' AND user_id = '9765f312-0d0b-4db0-b4c5-217eec81d7c3'
Result: 40
The result I am trying to achieve here is 15. In the table above you can see the rows are children of an event. I only want to sum the value column where key = 'MediaLength' and its sister row with key = 'Score' has value >= 3.
This is the query I have tried so far but it seems a bit messy and also doesn't work due to a more than one row returned by subquery error:
select
sum(value::float)
filter (where (
select d.value::float
from data d
where d.event_id = event_id
and d.key = 'Score'
) >= 3)
from data
where user_id = '9765f312-0d0b-4db0-b4c5-217eec81d7c3'
This is a simple example but in the future I would need to filter on potentially multiple other keys as well, so any advice on how to extend that is also hugely appreciated.

I only want to sum the value column where key = 'MediaLength' and its sister row with key = 'Score' has value >= 3.
SELECT sum(value::float) -- why the cast?
FROM data d
WHERE user_id = '9765f312-0d0b-4db0-b4c5-217eec81d7c3'
AND key = 'MediaLength'
AND EXISTS (
SELECT FROM data ds
WHERE ds.event_id = d.event_id
AND ds.user_id = d.user_id -- !
AND ds.key = 'Score'
AND ds.value >= 3
);
Here, rows with key = 'MediaLength' qualify if any sister passes the filter. (There may be more sisters failing the test.)
If there can only ever be a single qualifying sister row (enforced by a unique constraint / index?), a self-join is a bit simpler:
SELECT sum(value::float)
FROM data d
JOIN data ds USING (event_id, user_id)
WHERE d.user_id = '9765f312-0d0b-4db0-b4c5-217eec81d7c3'
AND d.key = 'MediaLength'
AND ds.key = 'Score'
AND ds.value >= 3;
The self-join would produce multiple result rows for multiple qualifying sister rows.
At its core, this can be cast as relational-division problem. Especially, since ...
in the future I would need to filter on potentially multiple other keys as well
See:
How to filter SQL results in a has-many-through relation

Related

Select / Merge user specific rows with additional fallback rows in PostgreSQL

Setup: Postgresql table with a customer_id and a request_id column (+ additional not relevant data).
The rows with customer_id set to NULL work as a fallback/default.
Example what the table looks like:
Goal: I want to select all rows from the table for a given customer (e.g. where customer_id = 2).
For any existent request_id: If there are no entries for the given customer, return the fallback rows (where customer is null).
So the result should look like this:
Any idea how to write the select statement for postgresql? I'm kind of stuck and couldn't really find anything helpful so far. Thanks!

This is a strange requirement.
select t.*
from t
where t.customer_id = 2 or
(t.customer_id is null and
not exists (select 1 from t t2 where t2.request_id = t.request_id and t2.customer_id = 2)
);
For performance, I would recommend an index on (request_id, customer_id).

Select all rows of a SQL table which do not share a name

I have a table, call it widgets which has columns name and created_at, among others. I want to run a query that returns the count of all the rows of widgets which share the same name and have been created within a millisecond of each other.
This is the query that I have come up with, but it returns a number greater than the total number of rows in the table, can someone point out where I am going wrong?
SELECT COUNT (DISTINCT "t1"."id")
FROM
"tasks" "t1" ,"tasks" "t2"
WHERE
"t1"."name" = "t2"."name"
AND
date_trunc('milliseconds',"t1"."created_at") = date_trunc('milliseconds',"t2"."created_at")

You should add the condition:
and "t1"."id" <> "t2"."id"
where "id" is a primary key. In the lack of a primary key you can use ctid:
and "t1".ctid <> "t2".ctid

How to compare records and update one column

It is a little bit tricky for me so I need your help :)
I want to update the column Relevant to 0 WHERE Contract_Status_Code is 10 OR the Date_Contract_start YEAR is the same AND the Ranking_Value is lower than the other one ON all records that have the same VIN.
So I want to compare all records which have the same VIN.
Few examples to illustrate it:
I have there two records with the VIN = 123456. One of them (ID = 6847) has a higher Ranking_Value (7) than the other one (3). The YEAR is the same as well so I want to update the column relevant to 0 where the ID is 8105.
Two records with the VIN = 654321. Both of them have the same Ranking_Value but the record with the id = 11012 has the value 10 for the column Contract_Status_Code so I want to update the relevant column to 0 where the ID = 11012.
The last two records... They have the VIN = 171819. The first one (ID = 11578) has the higher Ranking_Value. But they have a different year where the contract has started. So I don't want to update both.
It is also possible that there are three or four records with the same VIN.
I hope you understand my problem. I'm from Germany so sorry for my English :)

By considering your ID column as unique or Identity column, I can suggest you the below query for your solution:
With cte
As
(Select a.Id, a.VIN From Table a
Join (Select max(Ranking_Value) ranks,VIN From Table Group By VIN, Year(Date_Contract_start)) b
on a.VIN=b.VIN And a.Ranking_Value = b.ranks)
update table
set Relevant = 0
where (Contract_Status_Code = 10) Or
ID Not In (Select id from cte)

update table1
set Relevant = 0
where Contract_Status_Code = 10
or (VIN,Year,Ranking_value) not in(
select VIN,Year,max(Ranking_Value)
from table1
group by VIN,Year
)

in sql how to return single row of data from more than one row in the same table

I have a single table of activities, some labelled 'Assessment' (type_id of 50) and some 'Counselling' (type_id of 9) with dates of the activities. I need to compare these dates to find how long people wait for counselling after assessment. The table contains rows for many people, and that is the primary key of 'id'. My problem is how to produce a result row with both the assessment details and the counselling details for the same person, so that I can compare the dates. I've tried joining the table to itself, and tried nested subqueries, I just can't fathom it. I'm using Access 2010 btw.
Please forgive my stupidity, but here's an example of joining the table to itself that doesn't work, producing nothing (not surprising):
Table looks like:
ID TYPE_ID ACTIVITY_DATE_TIME
----------------------------------
1 9 20130411
1 v 50 v 20130511
2 9 20130511
3 9 20130511
In the above the last two rows have only had assessment so I want to ignore them, and just work on the situation where there's both assessment and counselling 'type-id'
SELECT
civicrm_activity.id, civicrm_activity.type_id,
civicrm_activity.activity_date_time,
civicrm_activity_1.type_id,
civicrm_activity_1.activity_date_time
FROM
civicrm_activity INNER JOIN civicrm_activity AS civicrm_activity_1
ON civicrm_activity.id = civicrm_activity_1.id
WHERE
civicrm_activity.type_id=9
AND civicrm_activity_1.type_id=50;
I'm actually wondering whether this is in fact not possible to do with SQL? I hope it is possible? Thank you for your patience!

Sounds to me like you only want to get the ID numbers where you have a TYPE_ID entry of both 9 and 50.
SELECT DISTINCT id FROM civicrm_activity WHERE type_id = '9' AND id IN (SELECT id FROM civicrm_activity WHERE type_id = '50');
This will give you a list of id's that has entries with both type_id 9 and 50. With that list you can now go and get the specifics.
Use this SQL for the time of type_id 9
SELECT activity_date_time FROM civicrm_activity WHERE id = 'id_from_last_sql' AND type_id = '9'
Use this SQL for the time of type_id 50
SELECT activity_date_time FROM civicrm_activity WHERE id = 'id_from_last_sql' AND type_id = '50'

Your query looks OK to me, too. The one problem might be that you use only one table alias. I don't know, but perhaps Access treats the table name "specially" such that, in effect, the WHERE clause says
WHERE
civicrm_activity.type_id=9
AND civicrm_activity.type_id=50;
That would certainly explain zero rows returned!
To fix that, use an alias for each table. I suggest shorter ones,
SELECT A.id, A.type_id, A.activity_date_time,
B.type_id, B.activity_date_time
FROM civicrm_activity as A
JOIN civicrm_activity as B
ON A.id = B.id
WHERE A.type_id=9
AND B.type_id=50;

SQL count query, using where clause from 2 different tables

I am new to SQL. I need to run a one-time query at a few different sites to get a count. The query needs to give me a count of all records based on a where clause. But I'm having trouble figuring out the syntax.
Here's what I tried:
SELECT COUNT(KEYS.IDXKEYID) FROM KEYS, KEYFLAGS
WHERE IDXLEVELID = 1
AND KEYFLAGS.BKEYSEVERMADE = -1
Which gave me a crazy number.
Basically, IDXKEYID is a primary key, and exists in both the KEYS and KEYFLAGS table. I want a count of all IDXKEYID records in the database that meet the above WHERE clause critera. I just want 1 simple result in 1 column/row.
COUNT
-----
12346
Thanks in advance!

SELECT COUNT(DISTINCT KEYS.IDXKEYID) -- count each key only once
FROM KEYS, KEYFLAGS
WHERE KEYS.IDXLEVELID = 1
AND KEYFLAGS.BKEYSEVERMADE = -1
AND KEYS.IDXKEYID = KEYFLAGS.IDXKEYID -- you're missing this link
Or you can write it using EXISTS
SELECT COUNT(1) -- count each key only once
FROM KEYS
WHERE KEYS.IDXLEVELID = 1
AND EXISTS (
SELECT *
FROM KEYFLAGS
WHERE KEYS.IDXKEYID = KEYFLAGS.IDXKEYID -- correlate
AND KEYFLAGS.BKEYSEVERMADE = -1)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas