Getting counts of row data - sql

There's a table variable (I'll write it as a regular table here)
CREATE TABLE TEST (memberid int, producttype varchar(7))
This table has hundreds of thousands of rows, but for this example I've added a lot less
Insert into test values(1,'book')
Insert into test values(1,'clothes')
Insert into test values(2,'book')
Insert into test values(3,'book')
Insert into test values(4,'clothes')
Insert into test values(5,'book')
Insert into test values(5,'clothes')
Insert into test values(6,'book')
Insert into test values(7,'book')
I need to get:
the memberids that have 'book' only
the memberids that have 'clothes' only
the memberids that have both 'book' & 'clothes'
e.g.
Member Book Clothes Both
1 0 0 1
2 1 0 0
3 1 0 0
4 0 1 0
5 0 0 1
6 1 0 0
7 1 0 0
I had managed to get it to work with sub-queries, but because of the size of the table it could take over 2 minutes to run.
I would appreciate if anyone knows a better way to achieve this?

One method uses conditional aggregation:
select
memberid,
case when max(producttype) = 'book' then 1 else 0 end book,
case when min(producttype) = 'clothes' then 1 else 0 end clothes,
case when min(producttype) <> max(producttype) then 1 else 0 end both
from test
group by memberid
This works because there are only two possible producttypes. If you actually have more, then you need some expressions that are more complicated (and possibly more efficient), such as:
case when count(*) = sum(case when producttype = 'book' then 1 end)
then 1
else 0
end book

Use a CTE to get if each member has book and/or clothes:
with cte as (
select memberid,
count(distinct case when producttype = 'book' then 1 end) book_flag,
count(distinct case when producttype = 'clothes' then 1 end) clothes_flag
from test
group by memberid
)
select memberid,
case when book_flag > clothes_flag then 1 else 0 end book,
case when clothes_flag > book_flag then 1 else 0 end clothes,
book_flag * clothes_flag both
from cte
See the demo.
Results:
> memberid | book | clothes | both
> -------: | ---: | ------: | ---:
> 1 | 0 | 0 | 1
> 2 | 1 | 0 | 0
> 3 | 1 | 0 | 0
> 4 | 0 | 1 | 0
> 5 | 0 | 0 | 1
> 6 | 1 | 0 | 0
> 7 | 1 | 0 | 0

A table variable with hundreds of thousands of rows is going to be problematic for you.
If you check your query plan, you'll likely see that the optimizer expects that table variable to only contain one row.
Changing the structure to a local temp table, and perhaps adding an index to producttype, should significantly improve the performance of the query even before you optimize your code.
CREATE TABLE #TEST (memberid int, producttype varchar(7));
CREATE NONCLUSTERED INDEX tempTest ON #TEST(producttype);

Related

How to do multiple actions in case when then in sql?

I want to do something like this:
select sum(case ttt.ind = 1 then 1 else 0 end) from ttt
I want to add a column to this query, called myresult which indicates if the value of ttt.istry is equal to 1.
Maybe like:
select
sum(case ttt.ind = 1 then 1, ttt.istry as myresult else 0 end)
from ttt
of course I got an error...
How would I do that?
My data is:
ttt.ind | ttt.istry
--------+----------
1 | 0
0 | 1
1 | 1
and so on...
Expected result:
ttt.ind | ttt.istry | myresult | sum
--------+-----------+----------+------
1 | 0 | 0 | 2
0 | 1 | null | 2
1 | 1 | 1 | 2
You don't say which database so I'll assume it's a modern one. You can use a window function and a CASE clause to do this.
For example:
select
ind,
istry,
case when ind = 1 then istry end as myresult,
sum(ind) over() as sum
from ttt
See live example at SQL Fiddle.
Your logic is a bit hard to follow, but your result set suggests:
select ind, istry,
(case when istry = 1 then 1
when sum(istry) over (partition by ind) = 1 then 0
end),
sum(ttt.ind) over () as sum_ind
from ttt;

SQL using more two columns with case

I can't find a good explanation for my problem.
I have a table:
user | 70Y | hospital
-------+-------+----------
1 | 18 | 1
2 | 70 | 1
3 | 90 | 0
I need to find is a how many people have more than 70Y, and if it has how many of those people are in the hospital.
I'm using this to find is his age more than 70:
SUM(CASE WHEN 70y > 70 THEN 1 ELSE 0 END) AS 'old_person'
but how do I find is he is in the hospital?
What I'm expecting from a table is:
| old_person | old_person_in_hospital|
+------------+-----------------------+
| 18 | 1 |
And if I would want to and more columns let's say check for 40Y old what is the easiest way to do so?
What I expect from table :
| old_person | 40y_person |
+-------------+---------------------+
| 18 | 16 |
in hospital | 1 | 2 |
You need a case for each column:
select
SUM(Case when [70y] > 70 then 1 else 0 end) old_person,
SUM(Case when [70y] > 70 and hospital = 1 then 1 else 0 end) old_person_in_hospital
from tablename
use another case for number in hospital count
select SUM(Case when 70y > 70 then 1 else 0 end) as old_person,
sum (Case when 70y > 70 and hospital=1 then 1 else 0 end ) hospital
from tbale
How about moving the condition to the where clause?
select count(*) as old_person,
sum(hospital) as old_person_in_hospital
from tablename
where [70y] > 70;
If you want to add more age groups, then you could use conditional aggregation. However, I might suggest that you use aggregation instead and put the results in different rows. For instance:
select (age / 10) as decade,
count(*) as num_people,
sum(hospital) as num_in_hospital
from tablename
group by (age / 10);

Counting occurrences of a value in multiple columns - postgres

I have a table called fixtures (I have simplified for this example) and would like to populate the last two columns (*_plus_mc_per) with the percentage of times occurred for each number with a query run against the mc_* columns. It would look like this as an example
#mc = Match Corner # mc_per = Match Corner Percentage
| mc_0 | mc_1 | mc_3 | mc_4 | match_count | one_plus_mc_per | two_plus_mc_per |
| 1 | 4 | 3 | null | 3 | 100 | 66 |
At the point where I run my query it looks like
#mc = Match Corner # mc_per = Match Corner Percentage
| mc_0 | mc_1 | mc_3 | mc_4 | match_count | one_plus_mc_per | two_plus_mc_per |
| 1 | 4 | 3 | null | 3 | null | null |
So starting with the query for one_plus_mc_per I can do this
SELECT COUNT(*) FROM fixtures WHERE coalesce(mc_0,0) >= 1 AND id = 182;
# Using coalesce for dealing with null, will return a 0 if value null
This returns
| count |
| 1 |
If I run this query on each column individually the results returned would be
| count | count | count | count |
| 1 | 1 | 1 | 0 |
Thus enabling me to add all the column values up and divide by my match count. This makes sense (and I thank dmfay for getting me to think about his suggestion in a previous question)
My problem is I can't run this query 4 times for example as that is very ineffective. My SQL fu is not strong and was looking for a way to do this in one call to the database, enabling me to then take that percentage value and update the percentage column
Thanks
Try this:
SELECT
SUM(CASE WHEN coalesce(mc_0,0) >= 1 THEN 1 ELSE 0 END) count_0,
SUM(CASE WHEN coalesce(mc_1,0) >= 1 THEN 1 ELSE 0 END) count_1,
SUM(CASE WHEN coalesce(mc_3,0) >= 1 THEN 1 ELSE 0 END) count_3,
SUM(CASE WHEN coalesce(mc_4,0) >= 1 THEN 1 ELSE 0 END) count_4,
FROM
fixtures
WHERE id = 182;
It will return count of all the columns in single query
I am not sure though, whats the use of id = id in your query as it will always be true.
If you want count of columns *_mc for every row with > 0 condition, try this:
SELECT
(CASE WHEN coalesce(mc_0,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_1,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_3,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_4,0) >= 1 THEN 1 ELSE 0 END) as count
FROM
fixtures
WHERE id = 182;
UPDATE:
Calculating one_plus_mc_per
SELECT
CAST((CASE WHEN coalesce(mc_0,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_1,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_3,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_4,0) >= 1 THEN 1 ELSE 0 END)AS DECIMAL)/match_count as one_plus_mc_per
FROM
fixtures
WHERE id = 182;
Postgres has very nice capabilities for answering this type of question:
SELECT COUNT(*) FILTER (WHERE mc_0 >= 1) as count_0,
COUNT(*) FILTER (WHERE mc_1 >= 1) as count_1,
COUNT(*) FILTER (WHERE mc_3 >= 1) as count_3,
COUNT(*) FILTER (WHERE mc_4 >= 1) as count_4,
AVG ( (mc_0 >= 1)::int + (mc_1 >= 1)::int + (mc_3 >= 1)::int + (mc_4 >= 1)::int
) as one_plus_mc_per
FROM fixtures
WHERE id = 182;
The FILTER is ANSI-standard syntax. The conversion of booleans to numbers is a very convenient construct.

Count who paid group by 1, 2 or 3+

I have a payment table like the example below and I need a query that gives me how many IDs paid (AMOUNT > 0) 1 time, 2 times, 3 or more times. Example:
+----+--------+
| ID | AMOUNT |
+----+--------+
| 1 | 50 |
| 1 | 0 |
| 2 | 10 |
| 2 | 20 |
| 2 | 15 |
| 2 | 10 |
| 3 | 80 |
+----+--------+
I expect the result:
+-----------+------------+-------------+
| 1 payment | 2 payments | 3+ payments |
+-----------+------------+-------------+
| 2 | 0 | 1 |
+-----------+------------+-------------+
ID 1: Paid 1 time (50). The other payment is 0, so I did not count. So, 1 person paid 1 time.
ID 2: Paid 3 times (10,20,15). So, 1 person paid 3 or more time.
ID 3: Paid 1 time (80). So, 2 persons paid 1 time.
I'm doing manually on excel right now but I'm pretty sure there is a more practical solution. Any ideas?
A little sub-query will do the trick
Declare #YOurTable table (ID int, AMOUNT int)
Insert into #YourTable values
( 1 , 50 ),
( 1 , 0) ,
( 2 , 10) ,
( 2 , 20) ,
( 2 , 15) ,
( 2 , 10) ,
( 3 , 80)
Select [1_Payment] = sum(case when Cnt=1 then 1 else 0 end)
,[2_Payment] = sum(case when Cnt=2 then 1 else 0 end)
,[3_Payment] = sum(case when Cnt>2 then 1 else 0 end)
From (
Select id
,Cnt=count(*)
From #YourTable
Where Amount<>0
Group By ID
) A
Returns
1_Payment 2_Payment 3_Payment
2 0 1
To get the output you want try using a table to form the data and then SELECT from that:
with c as (
select count(*) count from mytable where amount > 0 group by id)
select
sum(case count when 1 then 1 else 0 end) "1 Payment"
, sum(case count when 2 then 1 else 0 end) "2 Payments"
, sum(case when count > 2 then 1 else 0 end) "3 Payments"
from c
Here is an example you can play with to see how the query is working.

Search for records with same value in one column but varying values in a another

Apologies for my very ambiguous title, but i've been working on this for the better part of a day and can't get anywhere so i'm probably clouded.. Let me present sample data and explain what I'm trying to do:
+------+------+
| ID | UW |
+------+------+
| 1 | I |
| 1 | I |
| 3 | I |
| 3 | I |
| 3 | C |
| 3 | C |
| 4 | C |
| 4 | C |
I'm trying to find the count of IDs where there are both "I" and "C" in the UW column, so in the example above the count would be: 1 (for ID #3). Since ID 1 has only "I" and ID 4 has only "C" values in "UW" field. Thanks in advance for helping me with this, much appreciated.
Here is one way:
SELECT COUNT(DISTINCT A.ID) N
FROM dbo.YourTable A
WHERE EXISTS(SELECT 1 FROM dbo.YourTable
WHERE ID = A.ID
AND UW IN ('I','C'));
And another:
SELECT COUNT(*)
FROM ( SELECT ID
FROM dbo.YourTable
WHERE UW IN ('I','C')
GROUP BY ID
HAVING COUNT(DISTINCT UW) = 2) A;
You can use group by and having to get the ids that meet the conditions:
select id
from table t
group by id
having sum(case when uw = 'I' then 1 else 0 end) > 0 and
sum(case when uw = 'C' then 1 else 0 end) > 0;
You can then count these with a subquery:
select count(*)
from (select id
from table t
group by id
having sum(case when uw = 'I' then 1 else 0 end) > 0 and
sum(case when uw = 'C' then 1 else 0 end) > 0
) t
I like to formulate these problems this way, because the having clause is very general on the types of conditions that it can support.