Count most occurring word in row SQL Server

Count most occurring word in row SQL Server - sql

I'm trying to get the number of times a certain word occur in a query row.
For example :
Name | Chemistry | Physics | Biology | Maths
-------+-----------+-----------+-----------+--------
John | Excellent | Good | Good | Poor
Kelvin | Excellent | Excellent | Excellent | Poor
I want to get something for each row like
Name | Excellent | Good | Poor
-------+-----------+------+-------
John | 1 | 2 | 1
Kelvin | 3 | 0 | 1

Just add them up using case expressions:
select name,
(case when chemistry = 'Excellent' then 1 else 0 end +
case when physics = 'Excellent' then 1 else 0 end +
case when biology = 'Excellent' then 1 else 0 end +
case when math = 'Excellent' then 1 else 0 end
) as num_excellents,
. . .
from t;
A fancier method would use apply and aggregation:
select t.name, v.*
from t cross apply
(select sum(case when marks = 'Excellent' then 1 else 0 end) as excellent,
sum(case when marks = 'Good' then 1 else 0 end) as good,
sum(case when marks = 'Poor' then 1 else 0 end) as Poor
from (values (chemistry), (physics), (biology), (math)
) v(marks);

Related

How to do multiple actions in case when then in sql?

I want to do something like this:
select sum(case ttt.ind = 1 then 1 else 0 end) from ttt
I want to add a column to this query, called myresult which indicates if the value of ttt.istry is equal to 1.
Maybe like:
select
sum(case ttt.ind = 1 then 1, ttt.istry as myresult else 0 end)
from ttt
of course I got an error...
How would I do that?
My data is:
ttt.ind | ttt.istry
--------+----------
1 | 0
0 | 1
1 | 1
and so on...
Expected result:
ttt.ind | ttt.istry | myresult | sum
--------+-----------+----------+------
1 | 0 | 0 | 2
0 | 1 | null | 2
1 | 1 | 1 | 2

You don't say which database so I'll assume it's a modern one. You can use a window function and a CASE clause to do this.
For example:
select
ind,
istry,
case when ind = 1 then istry end as myresult,
sum(ind) over() as sum
from ttt
See live example at SQL Fiddle.

Your logic is a bit hard to follow, but your result set suggests:
select ind, istry,
(case when istry = 1 then 1
when sum(istry) over (partition by ind) = 1 then 0
end),
sum(ttt.ind) over () as sum_ind
from ttt;

T-SQL Crosstab count query

If have the following dataset:
... and I want to do a crosstab of sorts, counting the data against specific criteria e.g.:
Colour criteria: String contains "Blue", "Red", "Yellow" or "Green" (not case sensitive)
Type criteria: String contains "Car", "Lorry", or "Bus (not case sensitive)
... and I would like the result to look like the following:
Is there an SQL query that I can run on the original data to produce the result I'm looking for?

You can use CROSS APPLY with conditional aggregation; CROSS APPLY simplifies the generation of the list of colours:
select c.colour,
sum(case when v.VehicleData like '%Car%' then 1 else 0 end) Car,
sum(case when v.VehicleData like '%Lorry%' then 1 else 0 end) Lorry,
sum(case when v.VehicleData like '%Bus%' then 1 else 0 end) Bus
from vehicles v
cross apply (values ('Blue'), ('Red'), ('Yellow'), ('Green')
) AS c(colour)
where v.VehicleData like '%' + c.colour + '%'
group by c.colour
Output:
colour Car Lorry Bus
Blue 3 1 0
Red 1 2 0
Yellow 0 1 1
Green 0 0 2
Demo on dbfiddle

With conditional aggregation:
select c.colour,
count(case when t.VehicleData like '%Car%' then 1 end) Car,
count(case when t.VehicleData like '%Lorry%' then 1 end) Lorry,
count(case when t.VehicleData like '%Bus%' then 1 end) Bus
from (
select 'Blue' colour union all
select 'Red' union all
select 'Yellow' union all
select 'Green'
) c left join tbl1 t
on t.VehicleData like '%' + c.colour + '%'
group by c.colour
See the demo.
Results:
> colour | Car | Lorry | Bus
> :----- | --: | ----: | --:
> Blue | 3 | 1 | 0
> Red | 1 | 2 | 0
> Yellow | 0 | 1 | 1
> Green | 0 | 0 | 2

SQL using more two columns with case

I can't find a good explanation for my problem.
I have a table:
user | 70Y | hospital
-------+-------+----------
1 | 18 | 1
2 | 70 | 1
3 | 90 | 0
I need to find is a how many people have more than 70Y, and if it has how many of those people are in the hospital.
I'm using this to find is his age more than 70:
SUM(CASE WHEN 70y > 70 THEN 1 ELSE 0 END) AS 'old_person'
but how do I find is he is in the hospital?
What I'm expecting from a table is:
| old_person | old_person_in_hospital|
+------------+-----------------------+
| 18 | 1 |
And if I would want to and more columns let's say check for 40Y old what is the easiest way to do so?
What I expect from table :
| old_person | 40y_person |
+-------------+---------------------+
| 18 | 16 |
in hospital | 1 | 2 |

You need a case for each column:
select
SUM(Case when [70y] > 70 then 1 else 0 end) old_person,
SUM(Case when [70y] > 70 and hospital = 1 then 1 else 0 end) old_person_in_hospital
from tablename

use another case for number in hospital count
select SUM(Case when 70y > 70 then 1 else 0 end) as old_person,
sum (Case when 70y > 70 and hospital=1 then 1 else 0 end ) hospital
from tbale

How about moving the condition to the where clause?
select count(*) as old_person,
sum(hospital) as old_person_in_hospital
from tablename
where [70y] > 70;
If you want to add more age groups, then you could use conditional aggregation. However, I might suggest that you use aggregation instead and put the results in different rows. For instance:
select (age / 10) as decade,
count(*) as num_people,
sum(hospital) as num_in_hospital
from tablename
group by (age / 10);

Counting occurrences of a value in multiple columns - postgres

I have a table called fixtures (I have simplified for this example) and would like to populate the last two columns (*_plus_mc_per) with the percentage of times occurred for each number with a query run against the mc_* columns. It would look like this as an example
#mc = Match Corner # mc_per = Match Corner Percentage
| mc_0 | mc_1 | mc_3 | mc_4 | match_count | one_plus_mc_per | two_plus_mc_per |
| 1 | 4 | 3 | null | 3 | 100 | 66 |
At the point where I run my query it looks like
#mc = Match Corner # mc_per = Match Corner Percentage
| mc_0 | mc_1 | mc_3 | mc_4 | match_count | one_plus_mc_per | two_plus_mc_per |
| 1 | 4 | 3 | null | 3 | null | null |
So starting with the query for one_plus_mc_per I can do this
SELECT COUNT(*) FROM fixtures WHERE coalesce(mc_0,0) >= 1 AND id = 182;
# Using coalesce for dealing with null, will return a 0 if value null
This returns
| count |
| 1 |
If I run this query on each column individually the results returned would be
| count | count | count | count |
| 1 | 1 | 1 | 0 |
Thus enabling me to add all the column values up and divide by my match count. This makes sense (and I thank dmfay for getting me to think about his suggestion in a previous question)
My problem is I can't run this query 4 times for example as that is very ineffective. My SQL fu is not strong and was looking for a way to do this in one call to the database, enabling me to then take that percentage value and update the percentage column
Thanks

Try this:
SELECT
SUM(CASE WHEN coalesce(mc_0,0) >= 1 THEN 1 ELSE 0 END) count_0,
SUM(CASE WHEN coalesce(mc_1,0) >= 1 THEN 1 ELSE 0 END) count_1,
SUM(CASE WHEN coalesce(mc_3,0) >= 1 THEN 1 ELSE 0 END) count_3,
SUM(CASE WHEN coalesce(mc_4,0) >= 1 THEN 1 ELSE 0 END) count_4,
FROM
fixtures
WHERE id = 182;
It will return count of all the columns in single query
I am not sure though, whats the use of id = id in your query as it will always be true.
If you want count of columns *_mc for every row with > 0 condition, try this:
SELECT
(CASE WHEN coalesce(mc_0,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_1,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_3,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_4,0) >= 1 THEN 1 ELSE 0 END) as count
FROM
fixtures
WHERE id = 182;
UPDATE:
Calculating one_plus_mc_per
SELECT
CAST((CASE WHEN coalesce(mc_0,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_1,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_3,0) >= 1 THEN 1 ELSE 0 END +
CASE WHEN coalesce(mc_4,0) >= 1 THEN 1 ELSE 0 END)AS DECIMAL)/match_count as one_plus_mc_per
FROM
fixtures
WHERE id = 182;

Postgres has very nice capabilities for answering this type of question:
SELECT COUNT(*) FILTER (WHERE mc_0 >= 1) as count_0,
COUNT(*) FILTER (WHERE mc_1 >= 1) as count_1,
COUNT(*) FILTER (WHERE mc_3 >= 1) as count_3,
COUNT(*) FILTER (WHERE mc_4 >= 1) as count_4,
AVG ( (mc_0 >= 1)::int + (mc_1 >= 1)::int + (mc_3 >= 1)::int + (mc_4 >= 1)::int
) as one_plus_mc_per
FROM fixtures
WHERE id = 182;
The FILTER is ANSI-standard syntax. The conversion of booleans to numbers is a very convenient construct.

Multiple sum/counts across multiple tables in PostgreSQL

I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.
I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:
Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.
name | birth
-----+-----------
A21 | 1968-07-01
AAR | 2002-03-30
ABB | 1998-09-10
ABD | 2005-03-15
ABE | 1986-01-01
Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.
name | sample_type | avail
-----+-------------+------
A21 | BLOOD | Y
A21 | BLOOD | Y
A21 | TISSUE | N
ABB | BLOOD | Y
ABB | TISSUE | Y
Table: "dna" is similar to babtissue.
name | sample_type | avail
-----+-------------+------
ABB | GDNA | N
ABB | WGA | Y
ACC | WGA | N
ALE | GDNA | Y
ALE | GDNA | Y
Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...
name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21 | 2 | 0 | 0 | 0 | ?
AAR | 0 | 0 | 0 | 0 | ?
ABB | 1 | 1 | 0 | 1 | ?
ACC | 0 | 0 | 0 | 0 | ?
ALE | 0 | 0 | 2 | 0 | ?
(Apologies for the weird formatting above, I'm not very familiar with writing this way)
The latest version of the query that I've tried:
select b.name,
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas
from biograph b
left join babtissue t on b.name=t.name
left join dna d on b.name=d.name
where b.name is not NULL
group by b.name
order by b.name
I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.
Any ideas?

The numbers are too high because you're joining to babtissue and then also to dna, which is going to cause duplicates.
You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...
SELECT
SQ.name,
SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
SQ.gdnas,
SQ.wgas
FROM
(
SELECT
B.name,
SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
FROM
biograph B
LEFT JOIN dna D ON D.name = B.name
GROUP BY
B.name
) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name
Can the name really be NULL?

I don't know about the "avail" column, but this should give you the other columns you're looking for:
SELECT b.name,
COALESCE (t.bloodsamps, 0) AS bloodsamps,
COALESCE (t.tissuesamps, 0) AS tissuesamps
COALESCE (d.gdnas, 0) AS gdnas
COALESCE (d.wgas, 0) AS wgas
FROM biograph b
LEFT JOIN (
SELECT name,
SUM(CASE WHEN sample_type = 'BLOOD' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
FROM babtissue
WHERE avail = 'Y'
GROUP BY name
) t
ON (t.name = b.name)
LEFT JOIN (
SELECT name,
SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN sample_type = 'WGA' THEN 1 ELSE 0 END) AS wgas
FROM dna
WHERE avail = 'Y'
GROUP BY name
) d
ON (d.name = b.name)
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count most occurring word in row SQL Server - sql

Related

How to do multiple actions in case when then in sql?

T-SQL Crosstab count query

SQL using more two columns with case

Counting occurrences of a value in multiple columns - postgres

Multiple sum/counts across multiple tables in PostgreSQL

Categories

Resources