Multiple sum/counts across multiple tables in PostgreSQL

Multiple sum/counts across multiple tables in PostgreSQL - sql

I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.
I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:
Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.
name | birth
-----+-----------
A21 | 1968-07-01
AAR | 2002-03-30
ABB | 1998-09-10
ABD | 2005-03-15
ABE | 1986-01-01
Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.
name | sample_type | avail
-----+-------------+------
A21 | BLOOD | Y
A21 | BLOOD | Y
A21 | TISSUE | N
ABB | BLOOD | Y
ABB | TISSUE | Y
Table: "dna" is similar to babtissue.
name | sample_type | avail
-----+-------------+------
ABB | GDNA | N
ABB | WGA | Y
ACC | WGA | N
ALE | GDNA | Y
ALE | GDNA | Y
Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...
name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21 | 2 | 0 | 0 | 0 | ?
AAR | 0 | 0 | 0 | 0 | ?
ABB | 1 | 1 | 0 | 1 | ?
ACC | 0 | 0 | 0 | 0 | ?
ALE | 0 | 0 | 2 | 0 | ?
(Apologies for the weird formatting above, I'm not very familiar with writing this way)
The latest version of the query that I've tried:
select b.name,
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas
from biograph b
left join babtissue t on b.name=t.name
left join dna d on b.name=d.name
where b.name is not NULL
group by b.name
order by b.name
I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.
Any ideas?

The numbers are too high because you're joining to babtissue and then also to dna, which is going to cause duplicates.
You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...
SELECT
SQ.name,
SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
SQ.gdnas,
SQ.wgas
FROM
(
SELECT
B.name,
SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
FROM
biograph B
LEFT JOIN dna D ON D.name = B.name
GROUP BY
B.name
) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name
Can the name really be NULL?

I don't know about the "avail" column, but this should give you the other columns you're looking for:
SELECT b.name,
COALESCE (t.bloodsamps, 0) AS bloodsamps,
COALESCE (t.tissuesamps, 0) AS tissuesamps
COALESCE (d.gdnas, 0) AS gdnas
COALESCE (d.wgas, 0) AS wgas
FROM biograph b
LEFT JOIN (
SELECT name,
SUM(CASE WHEN sample_type = 'BLOOD' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
FROM babtissue
WHERE avail = 'Y'
GROUP BY name
) t
ON (t.name = b.name)
LEFT JOIN (
SELECT name,
SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN sample_type = 'WGA' THEN 1 ELSE 0 END) AS wgas
FROM dna
WHERE avail = 'Y'
GROUP BY name
) d
ON (d.name = b.name)
;

Related

How to do multiple actions in case when then in sql?

I want to do something like this:
select sum(case ttt.ind = 1 then 1 else 0 end) from ttt
I want to add a column to this query, called myresult which indicates if the value of ttt.istry is equal to 1.
Maybe like:
select
sum(case ttt.ind = 1 then 1, ttt.istry as myresult else 0 end)
from ttt
of course I got an error...
How would I do that?
My data is:
ttt.ind | ttt.istry
--------+----------
1 | 0
0 | 1
1 | 1
and so on...
Expected result:
ttt.ind | ttt.istry | myresult | sum
--------+-----------+----------+------
1 | 0 | 0 | 2
0 | 1 | null | 2
1 | 1 | 1 | 2

You don't say which database so I'll assume it's a modern one. You can use a window function and a CASE clause to do this.
For example:
select
ind,
istry,
case when ind = 1 then istry end as myresult,
sum(ind) over() as sum
from ttt
See live example at SQL Fiddle.

Your logic is a bit hard to follow, but your result set suggests:
select ind, istry,
(case when istry = 1 then 1
when sum(istry) over (partition by ind) = 1 then 0
end),
sum(ttt.ind) over () as sum_ind
from ttt;

value = 0 while other line items > 0

I'd like to know how to write a query where unique IDs with multiple line items will have 1 line item with a particular code, let's call it A4445, where charge amount is = 0, while all the other lines with other codes other than A4445 linked to that ID would be greater than 0 like the following. If the data looked like this:
ID | line item | code | Charge |
---------+-----------+--------+---------
3334400 | 1 | A4445 | 32.50 |
3334400 | 2 | B0021 | 0.00 |
3334400 | 3 | B0666 | 9.00 |
but I want IDs that have the A4445 code = 0.00 and the other lines with a charge amount greater > 0
ID | line item | code | Charge |
---------+-----------+--------+---------
3334422 | 1 | A4445 | 0.00 |
3334422 | 2 | B0021 | 12.30 |
3334422 | 3 | B0666 | 9.00 |
I'm currently using the union all function but I don't think it's working. This is my query:
Select
ID,
Line item,
Code,
Charge
from
claim
where
code = 'A4445'
and charge = 0.00
union all
Select
ID,
Line item,
Code,
Charge
from
claim
where
code <> 'A4445'
and charge > 0.00
I'm not sure how to articulate this but hopefully the above illustration will give you an idea of what I'm looking for

I believe you want:
select c.id
from claim c
group by c.id
having max(case when c.code = 'A4445' then c.charge end) = 0 and
min(case when c.code <> 'A4445' then c.charge end) > 0;
This assumes that charge is never negative (you have no such examples in your data). It is easy enough to support negative charges, but the logic is a little bit more complicated.
If you want the original rows, you can use this with join, exists, or in to get those values. However, I would probably use not exists:
select c.*
from claims c
where not exists (select 1
from claims c2
where c2.id = c.id and c2.code = 'A4445' and c2.charge <> 0
) and
not exists (select 1
from claims c2
where c2.id = c.id and c2.code <> 'A4445' and c2.charge = 0
);

Search for records with same value in one column but varying values in a another

Apologies for my very ambiguous title, but i've been working on this for the better part of a day and can't get anywhere so i'm probably clouded.. Let me present sample data and explain what I'm trying to do:
+------+------+
| ID | UW |
+------+------+
| 1 | I |
| 1 | I |
| 3 | I |
| 3 | I |
| 3 | C |
| 3 | C |
| 4 | C |
| 4 | C |
I'm trying to find the count of IDs where there are both "I" and "C" in the UW column, so in the example above the count would be: 1 (for ID #3). Since ID 1 has only "I" and ID 4 has only "C" values in "UW" field. Thanks in advance for helping me with this, much appreciated.

Here is one way:
SELECT COUNT(DISTINCT A.ID) N
FROM dbo.YourTable A
WHERE EXISTS(SELECT 1 FROM dbo.YourTable
WHERE ID = A.ID
AND UW IN ('I','C'));
And another:
SELECT COUNT(*)
FROM ( SELECT ID
FROM dbo.YourTable
WHERE UW IN ('I','C')
GROUP BY ID
HAVING COUNT(DISTINCT UW) = 2) A;

You can use group by and having to get the ids that meet the conditions:
select id
from table t
group by id
having sum(case when uw = 'I' then 1 else 0 end) > 0 and
sum(case when uw = 'C' then 1 else 0 end) > 0;
You can then count these with a subquery:
select count(*)
from (select id
from table t
group by id
having sum(case when uw = 'I' then 1 else 0 end) > 0 and
sum(case when uw = 'C' then 1 else 0 end) > 0
) t
I like to formulate these problems this way, because the having clause is very general on the types of conditions that it can support.

Rewriting SQL query to get record id and customer number

Sample data
+-------------------+-------------+-----------------+---------------------+
| RECORD_ID | CUST_NO | IsAccntClosed | Code |
+-------------------+-------------+-----------------+---------------------+
|159045 | 2439123 | N | 13 |
+-------------------+-------------+-----------------+---------------------+
|159048 | 6376150 | Y | 13 |
+-------------------+-------------+-----------------+---------------------+
|159048 | 9513035 | N | 13 |
+-------------------+-------------+-----------------+---------------------+
|159049 | 2398524 | N | 12 |
+-------------------+-------------+-----------------+---------------------+
|159049 | 6349269 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159049 | 6350690 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159049 | 6372163 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159049 | 6393810 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159049 | 6402062 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159050 | 2677512 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159050 | 6349382 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159050 | 6378137 | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
|159051 | 2336197 | N | 12 |
+-------------------+-------------+-----------------+---------------------+
|159051 | 6349293 | N | 12 |
+-------------------+-------------+-----------------+---------------------+
|159051 | 6350682 | N | 12 |
+-------------------+-------------+-----------------+---------------------+
|159051 | 6367895 | N | 12 |
+-------------------+-------------+-----------------+---------------------+
|159060 | yyyyyy | Y | 12 |
+-------------------+-------------+-----------------+---------------------+
IsAccntClosed column indicates if the account is Open (Y) or account is closed (Y).
I need to select Record_ID and cust_no for only those rows for which which Record_Id satisfies one of the below condition :
1. Only one cust account is open , there might be one or multiple closed customers
2. No open customer and only one closed customer
Expected output :
159045 2439123
159048 9513035
159049 2398524
159060 yyyyyy
A query like this would take each row as a single group and the count will come as 1
select RECORD_ID, CUST_NO, IsAccntClosed, count(IsAccntClosed), Code
from table1
group by RECORD_ID, CUST_NO, IsAccntClosed, Code
Any suggestions on how this query could be written to get the expected output?

Being IsAccntClosed a CHAR column, you should be able to get what you want with a query like this:
SELECT a.record_id,
(CASE
WHEN a.CountOpen=1 THEN a.CustNoOpen
ELSE a.CustNoClosed
END) AS cust_no
FROM (
SELECT b.record_id,
MAX(CASE WHEN b.IsAccntClosed='N' THEN b.cust_no ELSE NULL END) AS CustNoOpen ,
SUM(CASE WHEN b.IsAccntClosed='N' THEN 1 ELSE 0 END) AS CountOpen ,
MAX(CASE WHEN b.IsAccntClosed='Y' THEN b.cust_no ELSE NULL END) AS CustNoClosed,
SUM(CASE WHEN b.IsAccntClosed='Y' THEN 1 ELSE 0 END) AS CountClosed
FROM table1 b
GROUP BY b.record_id
) a
WHERE a.CountOpen=1 OR (a.CountOpen=0 AND a.CountClosed=1)
The inner query is grouping the table. It counts the open and closed accounts and takes one (random) cust_no of any of the closed accounts and one (random) of any of the open accounts, per group.
The outer query filters the data and cleans everything up, placing the open or closed cust_no in the output result column.
Notice that the WHERE condition of the outer query has the collateral effect that, since you are looking for records that have just a single open or a single closed account, those random cust_no which have been selected by the inner query, are now significant.
EDIT: I fixed the query and tested it on SQLFiddle.

In the below I added another record:
insert into tbl values (159060, 'zzzzzz', 'N', 12);
to illustrate what would happen if a record_id has just one open cust_no and just one closed cust_no. Note how in the result the cust_no returned is the zzzzz one, because that account is open, which you mentioned wanting to take precendence over closed, in the event of a tie 1:1 (zzzzzz should take over yyyyyy in this case, because yyyyyy is closed whereas zzzzzz is open)
Fiddle: http://sqlfiddle.com/#!4/5cb60/1/0
with one_open as
(select record_id
from tbl
where IsAccntClosed = 'N'
group by record_id
having count(distinct cust_no) = 1),
one_closed as
(select record_id
from tbl
where IsAccntClosed = 'Y'
group by record_id
having count(distinct cust_no) = 1),
bothy as
(select record_id from one_open intersect select record_id from one_closed)
select *
from tbl
where (record_id in (select record_id from one_open) and
IsAccntClosed = 'N')
or (record_id not in (select record_id from one_open) and
record_id in (select record_id from one_closed) and
IsAccntClosed = 'Y' and
record_id not in (select record_id from bothy))

select
record_id,
cust_no
from (
select
record_id,
cust_no,
count(case when isAccntClosed='Y' then 1 else null end)
over (partition by record_id) closed_accounts,
count(case when isAccntClosed='N' then 1 else null end)
over (partition by record_id) open_accounts
from
table1
)
where (open_accounts = 1)
or (open_accounts = 0 and closed_accounts = 1)

You can do this with conditional aggregation:
select RECORD_ID,
(case when sum(case when IsAccntClosed = 'N' then 1 else 0 end) = 1
then max(case when IsAccntClosed = 'N' then max(CUST_NO) end)
else max(CUST_NO)
end) as cust_no
from table1
group by RECORD_ID
having sum(case when IsAccntClosed = 'N' then 1 else 0 end) = 1 or
(sum(case when IsAccntClosed = 'N' then 1 else 0 end) = 0 and
sum(case when IsAccntClosed = 'Y' then 1 else 0 end) = 1
)
It is probably easier to understand the logic using subqueries:
select recordId,
(case when numOpen > 0 then OpenCustNo else closedCustNo end) as CustNo
from (select t1.RecordId, sum(case when IsAccntClosed = 'N' then 1 else 0 end) as numOpen,
sum(case when IsAccntOpen = 'N' then 1 else 0 end) as numClosed,
max(case when IsAccntClosed = 'N' then cust_no end) as OpenCustNo,
max(case when IsAccntClosed = 'Y' then cust_no end) as ClosedCustNo
from table1 t1
group by record_id
) r
where numOpen = 1 or numOpen = 0 and numClosed = 1;

Return count(*) even if 0

I have the following query:
select bb.Name, COUNT(*) as Num from BOutcome bo
JOIN BOffers bb ON bo.ID = bb.BOutcomeID
WHERE bo.EventID = 123 AND bo.OfferTypeID = 321 AND bb.NumA > bb.NumB
GROUP BY bb.Name
The table looks like:
Name | Num A | Num B
A | 10 | 3
B | 2 | 3
C | 10 | 3
A | 9 | 3
B | 2 | 3
C | 9 | 3
The expected output should be:
Name | Count
A | 2
B | 0
C | 2
Because when name is A and C then Num A is bigger to times than Num B and when Name is B, in both records Num A is lower than Num B.
My current output is:
Name | Count
A | 2
C | 2
Because B's output is 0, i am not getting it back in my query.
What is wrong with my query? how should I get it back?

Here is my guess. I think this is a much simpler approach than all of the left/right join hoops people have been spinning their wheels on. Since the output of the query relies only on columns in the left table, there is no need for an explicit join at all:
SELECT
bb.Name,
[Count] = SUM(CASE WHEN bb.NumA > bb.NumB THEN 1 ELSE 0 END)
-- just FYI, the above could also be written as:
-- [Count] = COUNT(CASE WHEN bb.NumA > bb.NumB THEN 1 END)
FROM dbo.BOffers AS bb
WHERE EXISTS
(
SELECT 1 FROM dbo.BOutcome
WHERE ID = bb.BOutcomeID
AND EventID = 123
AND OfferTypeID = 321
)
GROUP BY bb.Name;
Of course, we're not really sure that both Name and NumA/NumB are in the left table, since the OP talks about two tables but only shows one table in the sample data. My guess is based on the query he says is "working" but missing rows because of the explicit join.

Another wild guess. Feel free to downvote:
SELECT ba.Name, COUNT(bb.BOutcomeID) as Num
FROM
( SELECT DISTINCT ba.Name
FROM
BOutcome AS b
JOIN
BOffers AS ba
ON ba.BOutcomeID = b.ID
WHERE b.EventID = 123
AND b.OfferTypeID = 321
) AS ba
LEFT JOIN
BOffers AS bb
ON AND bb.Name = ba.Name
AND bb.NumA > bb.NumB
GROUP BY ba.Name ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple sum/counts across multiple tables in PostgreSQL - sql

Related

How to do multiple actions in case when then in sql?

value = 0 while other line items > 0

Search for records with same value in one column but varying values in a another

Rewriting SQL query to get record id and customer number

Return count(*) even if 0

Categories

Resources