Distinct count mismatch - sql

This is my initial table structure.
MEMBER_ID ITEM_ID ACCOUNT
1 3 A
1 4 A
2 1 B
3 4 B
4 4 B
5 4 A
6 2 A
When I want the distinct number of members I do
Select COUNT(DISTINCT MEMBER_ID) FROM TABLE A
I get 6, the expected answer
When I do
SELECT COUNT(DISTINCT MEMBER_ID),ACCOUNT FROM TABLE A GROUP BY 2
I get something like A=4 and B=3, what do you think is the disconnect here.
Thanks

I find the results highly unlikely. You would, however, get 4 and 3 if the data were slightly different:
MEMBER_ID ITEM_ID ACCOUNT
1 3 A
1 4 B
2 1 B
3 4 B
4 4 A
5 4 A
6 2 A
With the group by, MEMBER_ID = 1 would be counted twice -- once for A and once for B. My guess is that something like this is happening for your real problem. COUNT(DISTINCT) is not additive. So, when you break it in apart using a group by, the sum of the values is not (necessarily) the sum for all the data. This differs from MIN(), MAX(), COUNT(*), and SUM(). However, AVG() is also not additive (although it is easily recalculated).

Related

How to group, count consecutive dates and use it as filter in Netezza

I'm trying to group consecutive dates, count the consecutive dates, and use that count as filter.
I have a table that currently looks like:
pat_id admin_dates admin_grp daily_admin
-------------------------------------------------
1 08/20/2018 1 2 doses
1 08/21/2018 1 3 doses
1 08/22/2018 1 1 doses
1 10/05/2018 2 3 doses
1 12/10/2018 3 4 doses
2 01/05/2019 1 1 doses
2 02/10/2019 2 2 doses
2 02/11/2019 2 2 doses
where admin_grp is grouping consecutive dates per pat_id.
I want to exclude all rows that have less than 3 consecutive dates for same pat_id. In this example, only pat_id = 1 and admin_grp = 1 condition has 3 consecutive dates, which I would like to see in result. My desired output would be:
pat_id admin_dates admin_grp daily_admin
-------------------------------------------------
1 08/20/2018 1 2 doses
1 08/21/2018 1 3 doses
1 08/22/2018 1 1 doses
I honestly have no idea how to perform this.. my attempt failed to count how many admin_grp has same value within same pat_id, let alone using that count as filter. If anyone could help out / suggest ideas how to tackle this, it will be greatly appreciated.
Assuming that any admin_grp would only have consecutive days, you would just need to count those records by (patid,admin_grp) that have 3 or greater records.
Eg:
select x.*
from (select t.*
,count(*) over(partition by patid,admin_grp) as cnt
from table t
)x
where x.cnt>=3
Short answer: join the table with itself on ‘pat_id’ and filter appropriately:
Select a.* from TABLE a
join (Select * from TABLE where daily_admin=‘3 doses’) b
using (pat_id)
Where a.daily_admin in (‘1 doses’, ‘2 doses’, ‘3 doses’)
Btw: too bad the ‘daily_admin’ column is not an integer... better data model would have made the Where statement slightly simpler :)

Querying duplicates table into related sets

We have a process that creates a table of duplicate records based on some arbitrary rules (details not relevant).
Every record gets checked against all other records and if a suspected duplicate is found both it and the duplicate are stored in a dupes table to be manually reviewed.
This results in a table something like this:
dupId, originalId, duplicateId
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
7 5 6
8 5 7
9 6 7
10 8 9
You can see here record #1 has 3 other records it is similar to (#2,#3 and #4) and they are each similar to each other.
Record #5 has 2 duplicates (#6 and #7) and record #8 has only 1 (#9).
I want to query the duplicates into sets, so my results would look something like this:
setId recordId
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
But I am too old/slow/tired/rubbish and a bit out of my depth here.
Currently, when checking for duplicates if the record pairing is already in the table we don't insert it twice (i.e. you don't see both sides of the duplicate pairing) but can easily do so if it makes the querying simpler.
Any advice much appreciated!
Duplicates seems to be transitive, so you have all pairs. That is, the "original" id has the information you need.
But it is not included in the duplicates and you want that. So:
select dense_rank() over (order by originalid) as setid, duplicateid
from ((select originalid, duplicateid
from t
where not exists (select 1 from t t2 where t.originalid = t2.duplicateid)
) union all
(select distinct originalid, originalid
from t
where not exists (select 1 from t t2 where t.originalid = t2.duplicateid)
)
) i
order by setid;

SQL Calculations With Multi-Group Affiliations

I'm attempting to have a function or view that is able to calculate and roll up various counts while being able to search on a many to many affiliation.
Here is an example data set:
Invoice Table:
InvoiceID LocationID StatusID
1 5 1
2 5 1
3 5 1
4 5 2
5 7 2
5 7 1
5 7 2
Group Table:
GroupID GroupName
1 Group 1
2 Group 2
GroupToLocation Table:
GroupToLocationID GroupID LocationID
1 1 5
2 2 5
3 2 7
I have gotten to the point where I could sum up the various statuses per location and get this:
LocationID Status1 Status2
5 3 1
7 1 2
Location 5 has 3 Invoices with a status of 1, and 1 invoice with a status of 2 while Location 7 has 1 status 1 and 2 status 2
There are two groups, and Location 5 is in both, while Location 7 is only in the second. I need to be able to set it up where I can append a where statement like this:
select * from vw_GroupCounts
where GroupName = 'Group 2'
or
select Invoice, SUM(*) from vw_GroupCounts
where GroupName = 'Group 2'
And that result in only getting Location 7. Whenever I do this, as I have to use left joins or something along those lines, the counts are duplicating for each group the the Location is affiliated with. I know I could do something along the lines of a subquery and pass in the GroupName into that, but the system I am working with uses a dynamic query builder that appends WHERE statements based on user input.
I don't mind using view, or functions, or any number of functions inside of functions, but I hope there is a way to do what I'm looking for.
Since locations 5 and 7 are in Group 2, if you search for group 2 in the where clause after joining all the tables, then you would get all records in this case, this isn't duplication, just the way the data is. A different join wouldn't change this, only changing the data. Let me know if I am misunderstanding something though.
Here is how you would join them to do that search.
Here it is with your first example of the location and status count.

SQL - Order by amount of occurrences

It's my first question here so I hope I can explain it well enough,
I want to order my data by amount of occurrences in the table.
My table is like this:
id Daynr
1 2
1 4
2 4
2 5
2 6
3 1
4 2
4 5
And I want it to sort it like this:
id Daynr
3 1
1 2
1 4
4 2
4 5
2 4
2 5
2 6
Player #3 has one day in the table, and Player #1 has 2.
My table is named "dayid"
Both id and Daynr are foreign keys, together making it a primary key
I hope this explains my problem enough, Please ask for more information it's my first time here.
Thanks in advance
You can do this by counting the number of times that things occur for each id. Most databases support window functions, so you can do this as:
select id, daynr
from (select t.*, count(*) over (partition by id) as cnt
from table t
) t
order by cnt, id;
You can also express this as a join:
select t.id, t.daynr
from table as t inner join
(select id, count(*) as cnt
from table
group by id
) as tg
on t.id = tg.id
order by tg.cnt, id;
Note that both of these include the id in the order by. That way, if two ids have the same count, all rows for the id will appear together.

Get rows with single values using SQlite

By using SQlite, I'd like to get all rows that show in a specific column only one single distinct value. Like from following table:
A B
1 2
2 1
3 2
4 3
5 1
6 1
7 2
8 4
9 2
Here I'd like to get only row Nr. 4 an 8 as there values (3 and 4) occur only once in the entire column.
You could use a query like this:
SELECT *
FROM mytable
WHERE B IN (SELECT B FROM mytable GROUP BY B HAVING COUNT(DISTINCT A)=1)
Please see fiddle here.
Subquery will return all B values that are present only once (you could also use HAVING COUNT(*)=1 in this case), the outer query will return all rows where B is returned by the subquery.