Identifying data with different occurrences - sql

How to find the animals with different test dates?

If you want to find animals which have 2 or more distinct dates then you can use COUNT along with DISTINCT:
SELECT animals_key, anim_ident
FROM jt3
GROUP BY animals_key, anim_ident
HAVING COUNT(DISTINCT test_date) > 1
If you want to also include test date information, then you can join the above query to the original table.

Related

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!
You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.
GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need
Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

Combine multiple date fields into one on query

I have a requirement to create a report that counts a total from 2 date fields into one. A simplified example of the table I'm querying is:
ID, FirstName, LastName, InitialApplicationDate, UpdatedApplicationDate
I need to query the two date fields in a way that creates similar output to the following:
Date | TotalApplications
I would need the date output to include both InitialApplicationDate and
UpdatedApplicationDate fields and the TotalApplications output to be a count of the total for both types of date fields. Originally I thought maybe a Union would work however that returns 2 separate records for each date. Any ideas how I might accomplish this?
The simplest way, I think, is to unpivot using apply and then aggregate:
select v.thedate, count(*)
from t cross apply
(values (InitialApplicationDate), (UpdatedApplicationDate)) v(thedate)
group by v.thedate;
You might want to add where thedate is not null if either column could be NULL.
Note that the above will count the same application twice, once for each date. That appears to be your intention.

SELECT list expression references column user_id which is neither grouped nor aggregated at [8:5]

I have 2 data sets. One of all patients who got ill (endo-2) and one of a special group of patients that also exists in endo-2 called "xp-56"
I've been trying to run this query and I'm not sure why it isn't working. I want to do counts of 3 columns in endo-2 of those patients that belong in the xp-56 table.
this is the code I've been using with the following error
SELECT list expression references column user_id which is neither grouped nor aggregated at [8:5]
how do I fix this so I never make the same mistake again!
SELECT
Virus_Exposure,
Medical_Delivery,
Number_of_Site
FROM
(
SELECT
medical_id,
COUNT(DISTINCT Virus_id) AS Virus_Exposure,
COUNT(EndoCrin_id) AS Medical_Delivery,
COUNT (site_id_clinic) AS Number_of_Site
FROM
`endo-2`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP("2017-12-15")
AND TIMESTAMP("2018-01-10")) AS a
RIGHT JOIN
(
SELECT
medical_id
FROM
`xp-56`
ORDER BY
medical_id DESC) AS b
ON
a.medical_id=b.medical_id
GROUP BY
medical_id
Why doesnt the medical_id in table a work?
Why not just do this?
SELECT e.medical_id,
COUNT(DISTINCT e.Virus_id) AS Virus_Exposure,
COUNT(e.EndoCrin_id) AS Medical_Delivery,
COUNT(e.site_id_clinic) AS Number_of_Site
FROM `endo-2` e JOIN
`xp-56` x
ON x.medical_id = e.medical_id
WHERE e._PARTITIONTIME BETWEEN TIMESTAMP("2017-12-15") AND TIMESTAMP("2018-01-10")
GROUP BY e.medical_id;

PSQL not counting all entries

I am writing a count query which counts attendances at events over a few years. However there is a column named status in which it has extra information about attendance. Apology is listed when the person did not attend and thus should be excluded. status has various entries including group's letters, attendance information, guest status, as well as NULL entries.
The problem is that not all of the attendances are counted in full. They are counted in full when the status NOT LIKE 'Apology' is removed.
How can I get it to count all of the entries? Rather than a limited selection of 364 out of 1583.
I am using psql v9.3.11 in pgAdmin III v1.18.1.
The following is the relevant code:
SELECT b.attendances,
COUNT(b.attendances)
FROM (
SELECT COUNT(a.event_id) as attendances
FROM (SELECT DISTINCT event_id AS event_id,
member_pers_id AS member_pers_id
FROM event_attendance
WHERE status NOT LIKE 'Apology') a
GROUP BY member_pers_id
ORDER BY attendances) b
GROUP BY attendances
ORDER BY attendances ASC
I suspect the NOT LIKE is filtering out additional values, such as NULL. If so, then test for this explicitly.
Also, it makes no sense to do so many aggregations. One is sufficient:
SELECT COUNT(a.event_id) as attendances
FROM (SELECT DISTINCT event_id, member_pers_id
FROM event_attendance
WHERE status NOT LIKE 'Apology' OR status IS NULL
) a
And I'm not sure if the SELECT DISTINCT is necessary

SQL Query to find if different values exist for a column

I have a temporary table with three columns
pay_id,
id_client_grp,
id_user
Basically i want to ensure that this table should have all the rows having same client group and same id_user if not i want to know which pay_id is the culprit and throw error to user.
Can somebody help me with a query.
Thanks,
Rishi
When you say 'culprit,' I assume you mean the pay_id(s) that are not like the others, assuming there is a majority.
The problem is all of the pay_id's could potentially become culprits once your SELECT COUNT(DISTINCT id_client_grp, id_user) returns > 1 record, if there is a relatively even distribution. It is difficult to program for this scenario, since you will need to determine what exactly a majority is.
Your best bet will be to return all distinct combinations of those 3 fields, then decide where to go from there based on your business logic.
So could this question be asked like this:
If I wanted to add a unique index on my table across the three columns: client group, id user, pay id, identify those that break the unique condition where we have non unique pay id for a client group and id user??
select a.id_client_grp, a.id_user, a.pay_id , a.count from (
/* this should return 1 row per client group and user, */
/* if the pay id is the same for all */
select id_client_grp, id_user, pay_id, count(1) as count
from table t
group by id_client_grp, id_user ) a
group by a.id_client_grp, a.id_user
/* if we have more than one row per client group and user, then we have a dupe, so report them all */
having count (1) > 1
If you want all the rows to have the same values for some set of columns (your question is not entirely clear to me as t9o what you want to be the same)
Do you know going in WHICH pay_id, id_client_grp all the rows should be? Or do you not care, as long as they are all the same?
If you know the values you are looking for, simply test for rows that are not set to those desired values
Select distinct id_user
From tempTable
Where pay_id <> #PayIdValue
Or id_client_grp <> #ClientGroupIDValue
If you don't care, and just want them all to be the same, and they're not, then you need to specify which of the more than one set of values IS the "culprit" as you said...
If you want some other question answered. please explain more clearly...
Based on yr comment, then, to determine if there is more than one id_client_grp, pay_id
Select Count(Distinct id_client_grp, pay_id)
From tempTable
If this = 1 then every record has the same values for these 2 fields.... Any other value indicates that three is more than one set of distinct values in the table.
SELECT DISTINCT p.pay_id,
t.[count]
FROM rishi_table p
INNER JOIN ( SELECT id_client_grp, id_user, COUNT(*) As 'count'
FROM rishi_table
GROUP BY id_client_grp, id_user
HAVING COUNT(*) > 1 ) t
ON p.id_client_grp = t.id_client_grp AND p.id_user = t.id_user
basically create a set with the dupes, and bounce that against the main table to get your offending list.
SELECT DISTINCT id_client_grp, id_user
should let you do something like
IF ##ROWCOUNT > 1 THEN
...
Or possibly SELECT COUNT(DISTINCT id_client_grp, id_user) ...
but that's more vendor-dependent as to its availability and proper syntax.