Using Groupby with additional filter - sql

I have the data in Initial format:
STEP 1: To find out the users having more than 1 record and show those records. This was achieved using the below.
SELECT ID,
USER,
STATUS
FROM TABLE
WHERE USER in
(SELECT USER
FROM TABLE
GROUP BY USER
HAVING COUNT(*) > 1)
*STEP 2: From the above set of records find out records for which all the values are either 1 or 2. SO data should be something like:
Can I get some suggestions to how to achieve that. Note status is NVARCHAR hence aggregate functions can't be used.

The simplest thing is to check that the status is the same in your subquery. Assuming that status only takes on the values 1 and 2:
SELECT t.ID, t.USER, t.STATUS
FROM TABLE
WHERE t.USER IN (SELECT t2.USER
FROM TABLE t2
GROUP BY t2.USER
HAVING COUNT(*) > 1 AND
MIN(t2.status) = MAX(t2.status)
);
If there are other status values and you particularly care about 1 and 2, you would use:
SELECT t.ID, t.USER, t.STATUS
FROM TABLE
WHERE t.USER IN (SELECT t2.USER
FROM TABLE t2
GROUP BY t2.USER
HAVING COUNT(*) > 1 AND
MIN(t2.status) = MAX(t2.status) AND
MIN(t2.status) IN (1, 2)
);

Please check if this helps
SELECT ID,
[USER],
[STATUS]
FROM TABLE
WHERE [USER] in
(SELECT [USER]
FROM TABLE
GROUP BY [USER]
HAVING COUNT([USER]) > 1 AND ((MIN(STATUS) != MAX(STATUS) AND COUNT(STATUS) > 2) OR (MIN(STATUS) = MAX(STATUS))))

Related

Does Presto support NOT IN constructs?

I have a query of the form:
SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry')
In my_table there are multiple rows for each person, and I want to exclude those people who have ever had status "hungry". This is a construct I regard as standard and have used in other SQL dialects, but this brings me back an empty result set in Athena.
On the other hand, the plain old IN construction works as expected.
Can anyone explain how I can write this query in Presto? I found another article on SO that seems to imply it works correctly, so I am a bit nonplussed.
Do not use NOT IN. If any returned values are NULL then it returns no rows. Note: This is how SQL works, not a peculiarity of any particular database.
Instead, use NOT EXISTS:
SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT
FROM my_table t2
WHERE t2.status = 'hungry' AND
t2.person_id = t.person_id
);
Actually, I might suggest aggregation for this instead -- you are already doing aggregation essentially with the SELECT DISTINCT:
select person_id
from my_table t
group by person_id
having sum(case when status = 'hungry' then 1 else 0 end) = 0;
Using conditional aggregation:
SELECT person_id
FROM my_table m
GROUP BY person_id
HAVING COUNT(CASE WHEN status='hungry' THEN 1 END)=0
I would do aggregation :
SELECT person_id
FROM my_table
GROUP BY person_id
HAVING SUM(CASE WHEN status = 'hungry' THEN 1 ELSE 0 END) = 0;
If you want full row then use NOT EXISTS , NOT IN would return no row if sub-query have null :
SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT 1
FROM my_table t1
WHERE t1.status = 'hungry' AND
t1.person_id = t.person_id
);
I feel compelled to point out that you can solve this by just excluding the NULLs explicitly from the subquery, and sticking with the NOT IN construct:
SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry' AND person_id IS NOT NULL)

Classic ASP / MSSQL - Remove returned results based on certain conditions

I have a little sql query, like so
SELECT * FROM table
This returns a bunch of results, i output the following fields:
ID
UserID
Amount
Date
What i want to do is get the most recent entry from each UserID ( based on ID ), then if the amount is 0 do not return ANY results from that UserID.
select t1.*
from your_table t1
join
(
select userid, max(date) as mdate
from your_table
group by userid
having sum(case when amount = 0 then 1 else 0 end) = 0
) t2 on t1.userid = t2.userid and t1.date = t2.mdate
In the subquery you group by the user and select only those having no amount of zero. In that select you use max(date) as mdate to get the latest date for each user.
That subquery can be joined to the original table to get the complete record and not just the userid.
try this
WITH cte AS
(
SELECT
MAX(ID) OVER (PARTITION BY UserID) MaxIDForUserID,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY ID DESC) rn,
UserID,
Amount,
Date
FROM TableName
)
SELECT * FROM cte WHERE rn = 1 AND Amount != 0

Get latest sql rows based on latest date and per user

I have the following table:
RowId, UserId, Date
1, 1, 1/1/01
2, 1, 2/1/01
3, 2, 5/1/01
4, 1, 3/1/01
5, 2, 9/1/01
I want to get the latest records based on date and per UserId but as a part of the following query (due to a reason I cannot change this query as this is auto generated by a tool but I can write pass any thing starting with AND...):
SELECT RowId, UserId, Date
FROM MyTable
WHERE 1 = 1
AND (
// everything which needs to be done goes here . . .
)
I have tried similar query, but get an error:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
EDIT: Database is Sql Server 2008
You could use a NOT EXISTS condition:
SELECT RowId, UserId, Date
FROM MyTable
WHERE 1 = 1
AND NOT EXISTS (
SELECT *
FROM MyTable AS t
WHERE t.UserId = MyTable.UserId
AND t.Date > MyTable.Date
)
;
Note that if a user has more than one row with the same latest Date value, the query will return all such entries. If necessary, you can modify the subquery's condition slightly to make sure only one row is returned:
WHERE t.UserId = MyTable.UserId
AND (t.Date > MyTable.Date
OR t.Date = MyTable.Date AND t.RowId > MyTable.RowId
)
With the above condition, if two or more rows with the same Date exist for the same user, the one with the greater RowId value will be returned.
Assuming you have the ability to modify anything within the AND clause you can do a query like this if you are using TSQL
SELECT RowId, UserId, [Date]
FROM #myTable
WHERE 1 = 1
AND (
RowId IN (
SELECT D.RowId
FROM (
SELECT DISTINCT MAX(RowId) AS RowId, UserId, MAX([Date]) AS [Date]
FROM #myTable
GROUP BY UserId
) AS D
)
)
Try:
SELECT RowId, UserId, Date
FROM MyTable
WHERE 1 = 1
AND EXISTS
(SELECT 1
FROM (SELECT UserId, MAX(Date) MaxDate
FROM MyTable
GROUP BY UserId) m
WHERE m.UserId = MyTable.UserId and m.MaxDate = MyTable.Date)
SQLFiddle here.
Assuming that RowID is an identity column:
SELECT t1.RowId, t1.UserId, t1.Date
FROM MyTable t1
WHERE 1 = 1
AND t1.RowID IN (
SELECT TOP 1 t2.RowID
FROM MyTable t2
WHERE t1.UserId = t2.UserId
AND t2.Date = (SELECT MAX(t3.Date) FROM MyTable t3
WHERE t2.UserID = t3.UserId)
)
Demo

SQL how to select a group of records based on some statistics of this group?

Example, I have a record set with three columns:
id,week,count
1,1,10;
1,2,20;
1,3,30;
2,1,3;
2,2,2;
2,3,15;
What I want is just the data of IDs whose average count is > 10. Then, in this example data, the data of id=1 will be selected.
Thanks.
SELECT id FROM YourTable GROUP BY id HAVING AVG(count) > 10
SELECT *
FROM YourTable
WHERE id IN (SELECT id FROM YourTable GROUP BY id HAVING AVG(count) > 10)
Or if you are using an access database (where IN happens to have horrendous performance for whatever reason) you can use:
SELECT t2.*
FROM (SELECT id FROM YourTable GROUP BY id HAVING AVG(count) > 10) AS t1
INNER JOIN YourTable AS t2 ON t1.id = t2.id
In most databases, you can also do this with window functions:
select t.*
from (select t.*, avg(count) over (partition by id) as avgcount
from t
) t
where avgcount > 10

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?
select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID