Multiple Word Count in SQL - sql

I have a list of words I need to find in a specific column , "description of what happenned "
this holds anything up to 500 or more characters. I have the script below that does work
However how do I replace the Name column 1.2.3 with the actual name of the word I am looking for with the total next to it.
Just cant get it to display prob something simple.
select GROUPING_ID ( Amoxicillin ,Atorvastatin ) as Name ,count(*) as Total
from ( select case when [description_of_what_happened] like '%Amoxicillin%'
then 1 else 0 end as Amoxicillin ,
case when [description_of_what_happened] like '%Atorvastatin%'
then 1 else 0 end as Atorvastatin
FROM "NAME OF TABLE"
group by grouping sets (() ,(Amoxicillin),(Atorvastatin))
having coalesce (Amoxicillin,1) != 0 and coalesce (Atorvastatin,1) != 0
order by grouping_id (Amoxicillin,Atorvastatin)
row 3 being the total I need row 1 and row 2 to show the name of the product
result as below
Name Total
1 7
2 9
3 4112

You can use strings instead of flags:
select coalesce(Amoxicillin, Atorvastatin, 'Total') as Name,
count(*) as Total
from (select (case when [description_of_what_happened] like '%Amoxicillin%'
then 'Amoxicillin'
end) as Amoxicillin ,
(case when [description_of_what_happened] like '%Atorvastatin%'
then 'Atorvastatin'
end
) as Atorvastatin
from "NAME OF TABLE"
where Amoxicillin is not null or Atorvastatin is not null
group by grouping sets ((), (Amoxicillin), (Atorvastatin))
order by name;
Note that I also moved the logic in the having to the where.

Related

Is There a Way to Automate the Conversion of SQL Rows to Column Using Case?

I was playing with usa_names dataset on Bigquery and in order to be able to visualize the top 10 names between 1910 and 2020, I had to GROUP BY year and create a new column for each of the 10 names using CASE.
The thing is, I will like to visualize the top 100 and I want to know if there is a way to automate the CASE, in the sense that I don't have to write a "WHEN and THEN Clause for each name in order to create a column for them.
I had to use the following SQL query code to first get the top 10 names;
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
10
And then use the following code to convert each name row to columns;
SELECT
year,
SUM(CASE WHEN name = 'James' THEN number ELSE 0 END) AS James,
SUM(CASE WHEN name = 'John' THEN number ELSE 0 END) AS John,
SUM(CASE WHEN name = 'Robert' THEN number ELSE 0 END) AS Robert,
SUM(CASE WHEN name = 'Michael' THEN number ELSE 0 END) AS Michael,
SUM(CASE WHEN name = 'William' THEN number ELSE 0 END) AS William,
SUM(CASE WHEN name = 'Mary' THEN number ELSE 0 END) AS Mary,
SUM(CASE WHEN name = 'Richard' THEN number ELSE 0 END) AS Richard,
SUM(CASE WHEN name = 'Joseph' THEN number ELSE 0 END) AS Joseph,
SUM(CASE WHEN name = 'Charles' THEN number ELSE 0 END) AS Charles,
SUM(CASE WHEN name = 'Thomas' THEN number ELSE 0 END) AS Thomas
FROM
bigquery-public-data.usa_names.usa_1910_current
GROUP BY
year
ORDER BY
year
I want to achieve the same result without having to first pull out the name and manually enter them into the CASE statements.
Also, this won't be needed if there is a way to visualize the data directly without having to convert the names from row to columns.
Thanks.
You need to combine 2 capabilities:
row to column: PIVOT clause
scripting to automate the query finding the top 10 names
declare top_names default ((
select concat("'", string_agg(name, "','"), "'")
from (
// your query in question
SELECT
name
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
SUM(number) DESC
LIMIT
10
)));
select top_names;
The output is:
'James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
The PIVOT query you will need is:
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN ('James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
))
which output exactly as your second query.
To stick the 2 together, you will need something like:
execute immediate concat(
"""
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN (
""",
top_names,
"))");
You shouldn't need to create a column for each name. Your first query is sufficient (would obviously just need to change the limit to 100). Based on the questions tags I'm assuming your using Tableau, so it would be as simple as choosing your desired visualisation (say a bar chart) and placing names on one axis and total on the other axis.
Based on your follow up comment it would look like this
SELECT
name,
year,
SUM(number) AS total
From bigquery-public-data.usa_names.usa_1910_current
WHERE name IN
(
SELECT name
FROM
(
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
100
))
GROUP BY name, year
You could also look into using calculate fields within Tableau ok the raw data to achieve the desired visualisation.

Trying to combine multiples of a key ID into single row, but with different values in columns

TSQL - SQL Sever
I'm building a report to very specific requirements. I'm trying to combine multiples of a key ID into single rows, but there's different values in some of the columns, so GROUP BY won't work.
SELECT count(tt.Person_ID) as CandCount, tt.Person_ID,
CASE e.EthnicSuperCategoryID WHEN CandCount > 1 THEN 10 ELSE e.EthnicSuperCategoryID END as EthnicSuperCategoryID,
CASE e.Ethnicity_Id WHEN 1 THEN 1 ELSE 0 END as Black ,
CASE e.Ethnicity_Id WHEN 2 THEN 1 ELSE 0 END as White ,
CASE e.Ethnicity_Id WHEN 3 THEN 1 ELSE 0 END as Asian,
etc
FROM T_1 TT
JOINS
WHERE
GROUP
Msg 102, Level 15, State 1, Line 4
Incorrect syntax near '>'.
Here's the results (without the first CASE). Note person 3 stated multiple ethnicities.
SELECT count(tt.Person_ID) as CandCount, tt.Person_ID,
CASE e.Ethnicity_Id WHEN 1 THEN 1 ELSE 0 END as Black ,
CASE e.Ethnicity_Id WHEN 2 THEN 1 ELSE 0 END as White ,
CASE e.Ethnicity_Id WHEN 3 THEN 1 ELSE 0 END as Asian,
etc
FROM T_1 TT
JOINS
WHERE
GROUP
That’s expected, but the goal would be to assign multiple ethnicities to Ethnicity_Id of 10 (multiple). I also want them grouped on a single line.
So the end result would look like this:
So my issue is two fold. If the candidate has more than 2 ethnicities, assign the records to Ethnicity_Id of 10. I also need duplicated person IDs grouped into a single row, while displaying all of the results of the columns.
This should bring your desired result:
SELECT Person_ID
, ISNULL(ID_Dummy,Ethnicity_ID) Ethnicity_ID
, MAX(Black) Black
, MAX(White) White
, MAX(Asian) Asian
FROM #T T
OUTER APPLY(SELECT MAX(10) FROM #T T2
WHERE T2.Person_ID = T.Person_ID
AND T2.Ethnicity_ID <> T.Ethnicity_ID
)EthnicityOverride(ID_Dummy)
GROUP BY Person_ID, ISNULL(ID_Dummy,Ethnicity_ID)
You want conditional aggregation. Your query is incomplete, but the idea is:
select
person_id,
sum(case ethnicity_id = 1 then 1 else 0 end) as black,
sum(case ethnicity_id = 2 then 1 else 0 end) as white,
sum(case ethnicity_id = 3 then 1 else 0 end) as asian
from ...
where ...
group by person_id
You might want max() instead of sum(). Also I did not get the logic for column the second column in the desired results - maybe that's just count(*).
This would be my approach
SELECT
person_id,
CASE WHEN flag = 1 THEN Ethnicity_Id ELSE 10 END AS Ethnicity_Id,
[1] as black,
[2] as white,
[3] as asian
FROM
(
SELECT
person_id,
Ethnicity_Id as columns,
1 as n,
MAX(Ethnicity_Id) over(PARTITION BY person_id) as Ethnicity_Id,
COUNT(Ethnicity_Id) over(PARTITION BY person_id) as flag
FROM
#example
) AS SourceTable
PIVOT
(
MAX(n) FOR columns IN ([1], [2], [3])
) AS PivotTable;
Pivot the Ethnicity_Id column into multiples columns, Using constant
1 to make it complain with your expected result.
Using Max(Ethnicity_Id) with Partition By to get the original
Ethnicity_Id
Using Count(Ethnicity_Id) to flag if a need to raplace Ethnicity_Id
with 10 bc there is more that 1 row for that person_id
If you need to add more Ethnicitys add the ids in ... IN ([1], [2], [3]) ... and in the select

Proportion request sql

There is a table of accidents and output the share of accidents number 2 to all accidents I wrote this code, but I can not make it work:
select ((select count("ID") from "DTP" where "REASON"=2)/count("REASON"))
from "DTP"
group by "ID"
Something like this (not tested):
select id, count(case reason when 2 then 1 end)/count(*) as proportion
from your_table
-- where ... (if you need to filter, for example by date)
group by id
;
count(*) counts all the rows in a group (that is, all the rows for each separate id). The case expression returns 1 when the reason is 2 and it returns null otherwise; count counts only non-null values, so it will count the rows where the reason is 2.
You can use avg():
select id,
avg(case when reason = 2 then 1.0 else 0 end)
from "DTP"
group by "ID"
This produces the ratio for each id -- based on your sample query. If you only want one row for all the data, then:
select avg(case when reason = 2 then 1.0 else 0 end)
from "DTP";

GROUP BY with COUNT condition

I have a result set such as:
Code No
1 *
1 -
1 4
1
1
Now i basically want a query that has 2 columns, a count for the total amount and a count for those that dont have numbers.
Code No_Number Total
1 4 5
Im assuming this needs a group by and a count but how can i do the 2 different counts in a query like this?
This is what i had so far, but i am a bit stuck with the rest of it
SELECT CODE,NO
Sum(Case when No IN ('*', '-', '') then 1 else 0 end) as Count
I think you basically just need GROUP BY:
SELECT CODE,
SUM(Case when No IN ('*', '-', '') then 1 else 0 end) as Count,
COUNT(*) as total
FROM t
GROUP BY CODE;
Well, this took a moment :-), however here it is...I have used a CASE statement to create and populate the No_Number column; the database gives the row in the original table a value of 1 if the original table value is a number or gives it a NULL and discards it from the COUNT if not. Then when it makes the count it is only recognising values which were originally numbers and ignoring everything else..
If the result set is in a table or temp table:
SELECT Code,
COUNT(CASE WHEN [No] NOT LIKE '[0-9]' THEN 1 ELSE NULL END) AS No_Number,
COUNT(Code) AS Total
FROM <tablename>
GROUP BY Code
If the result set is the product of a previous query you can use a CTE (Common Table Expression) to arrive at the required result or you could include parts of this code in the earlier query.

select result set row to columns transformation

I've a table remarks with columns id, story_id, like like can be +1, -1
I want my select query to return the following columns story_id, total, n_like, n_dislike where total = n_like + n_dislike without sub queries.
I am currently doing a group by on like and selecting like as like_t, count(like) as total which is giving me an output like
-- like_t --+ --- total --
-1 | 2
1 | 6
and returning two rows in result set. But what I want is to get 1 row where n_like is 6 and n_dislike is 2 and total is 8
First, LIKE is a reserved word in PostgreSQL, so you have to double-quote it. Maybe a better name should be picked for this column.
CREATE TABLE testbed (id int4, story_id int4, "like" int2);
INSERT INTO testbed VALUES
(1,1,'+1'),(1,1,'+1'),(1,1,'+1'),
(1,1,'+1'),(1,1,'+1'),(1,1,'+1'),
(1,1,'-1'),(1,1,'-1');
SELECT
story_id,
sum(CASE WHEN "like" > 0 THEN abs("like") ELSE 0 END) AS n_like,
sum(CASE WHEN "like" < 0 THEN abs("like") ELSE 0 END) AS n_dislike,
count(story_id) AS total
-- for cases +2 / -3 in the "like" field, use following construct instead
-- sum(abs("like")) AS total
FROM testbed
GROUP BY story_id;
I used abs("like") for cases when you'll have +2 or -3 in your "like" column.