sql aggregate data - sql

this is not a specific dbms question, but a generic sql problem.
i have this dataset
userid | objecteid| count
--------------------------
1 | 1 | 12
1 | 2 | 15
1 | 3 | 6
2 | 4 | 30
2 | 1 | 1
2 | 5 | 9
with one query i need to find: for each user, the object with the maximum count
looking for a result like this:
userid | objecteid| count
--------------------------
1 | 2 | 15
2 | 4 | 30
because the object 2 has the max count for user 1 and the object 4 has the max count for user 2

This can easily be solved using window functions.
The following is standard ANSI SQL:
select userid, objecteid, "count"
from (
select userid, objecteid, "count",
max("count") over (partition by userid) as max_cnt
from the_table
) t
where "count" = max_cnt;
If there are two objects with the same count, both will be returned.
Alternatively this can also be done using row_number() instead:
select userid, objecteid, "count"
from (
select userid, objecteid, "count",
row_number() over (partition by userid order by "count" desc) as rn
from the_table
) t
where rn = 1;
Unlike the first query, this will only pick one row if a user has more than one object with the same count. If you want those duplicates returned, use dense_rank() instead of row_number()
SQLFiddle: http://sqlfiddle.com/#!15/f02a9/1

try this
Select * from tableName
where count in (
Select Max(count)
from tableName
group by userid
)

Related

Order By Id and Limit Offset By Id from a table

I have an issue similar to the following query:
select name, number, id
from tableName
order by id
limit 10 offset 5
But in this case I only take the 10 elements from the group with offset 5
Is there a way to set limit and offset by id?
For example if I have a set:
|------------------------------------|---|---------------------------------------|
| Ana | 1 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
| Joe | 2 | 64ed0011-ef54-4708-a64a-f85228149651 |
and if I have skip 1 I should get
|------------------------------------|---|---------------------------------------|
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
I think that you want to filter by row_number():
select name, number, id
from (
select t.*, row_number() over(partition by name order by id) rn
from mytable t
) t
where
rn >= :number_of_records_per_group_to_skip
and rn < :number_of_records_per_group_to_skip + :number_of_records_per_group_to_keep
The query ranks records by id withing groups of records having the same name, and then filters using two parameters:
:number_of_records_per_group_to_skip: how many records per group should be skipped
:number_of_records_per_group_to_skip: how many records per group should be kept (after skipping :number_of_records_per_group_to_skip records)
This might not be the answer you are looking for but it gives you the results your example shows:
select name, number, id
from (
select * from tableName
order by id
limit 3 offset 0
) d
where id > 1;
Best regards,
Bjarni

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo
The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)
Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

SQL Server: Select only one row of rows that has the same ID on some coulmn

I have a table that has 3 columns:
- ID
- FROM
- TO
And i have data like that
-----------------------
ID | FROM | TO
1 | 2 | 1
2 | 5 | 1
3 | 7 | 1
4 | 2 | 1
5 | 2 | 1
6 | 9 | 1
7 | 3 | 1
8 | 4 | 1
9 | 5 | 1
I would like to create a query that selects all rows where TO = 1 and i don't want to display rows that was previously retrieved, for example i have multiple rows where FROM = 2 and TO = 1, i just need to retrieve that row only once.
My table doesn't really look like this but i am giving a small example because my aim is to collect all FROM numbers but without any redundancy.
use distinct keyword
select distinct m.from,m.to from mytable as m;
Use DISTINCT
SELECT DISTINCT from,to FROM yourTable WHERE to = 1
You just have to group by the columns you want to display:
select [from] from mytable group by [from]
If you want to see how many froms you have all you have to do is:
select [from], count(*) from mytable group by [from]
You could use distinct but it would slower than group by but require more memory.
Please read here if you want an explanation on the difference between group by and distinct:
Huge performance difference when using group by vs distinct
Not sure what exactly you meant select distinct [FROM] from TableName where [TO] = 1
OR
may be you need single row for every distinct [FROM] value for given [TO] ?
;with cte as (
select ID, [FROM], [TO],
rn = row_number() over (partition by [FROM] order by ID)
from TableName
where [TO] = 1
)
select ID, [FROM], [TO]
from cte
where rn=1

sql query distinct with Row_Number

I am fighting with the distinct keyword in sql.
I just want to display all row numbers of unique (distinct) values in a column & so I tried:
SELECT DISTINCT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNum
FROM table
WHERE fid = 64
however the below code giving me the distinct values:
SELECT distinct id FROM table WHERE fid = 64
but when tried it with Row_Number.
then it is not working.
This can be done very simple, you were pretty close already
SELECT distinct id, DENSE_RANK() OVER (ORDER BY id) AS RowNum
FROM table
WHERE fid = 64
Use this:
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS RowNum FROM
(SELECT DISTINCT id FROM table WHERE fid = 64) Base
and put the "output" of a query as the "input" of another.
Using CTE:
; WITH Base AS (
SELECT DISTINCT id FROM table WHERE fid = 64
)
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS RowNum FROM Base
The two queries should be equivalent.
Technically you could
SELECT DISTINCT id, ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS RowNum
FROM table
WHERE fid = 64
but if you increase the number of DISTINCT fields, you have to put all these fields in the PARTITION BY, so for example
SELECT DISTINCT id, description,
ROW_NUMBER() OVER (PARTITION BY id, description ORDER BY id) AS RowNum
FROM table
WHERE fid = 64
I even hope you comprehend that you are going against standard naming conventions here, id should probably be a primary key, so unique by definition, so a DISTINCT would be useless on it, unless you coupled the query with some JOINs/UNION ALL...
This article covers an interesting relationship between ROW_NUMBER() and DENSE_RANK() (the RANK() function is not treated specifically). When you need a generated ROW_NUMBER() on a SELECT DISTINCT statement, the ROW_NUMBER() will produce distinct values before they are removed by the DISTINCT keyword. E.g. this query
SELECT DISTINCT
v,
ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... might produce this result (DISTINCT has no effect):
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| a | 2 |
| a | 3 |
| b | 4 |
| c | 5 |
| c | 6 |
| d | 7 |
| e | 8 |
+---+------------+
Whereas this query:
SELECT DISTINCT
v,
DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... produces what you probably want in this case:
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| b | 2 |
| c | 3 |
| d | 4 |
| e | 5 |
+---+------------+
Note that the ORDER BY clause of the DENSE_RANK() function will need all other columns from the SELECT DISTINCT clause to work properly.
All three functions in comparison
Using PostgreSQL / Sybase / SQL standard syntax (WINDOW clause):
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... you'll get:
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
Using DISTINCT causes issues as you add fields and it can also mask problems in your select. Use GROUP BY as an alternative like this:
SELECT id
,ROW_NUMBER() OVER (ORDER BY id) AS RowNum
FROM table
where fid = 64
group by id
Then you can add other interesting information from your select like this:
,count(*) as thecount
or
,max(description) as description
How about something like
;WITH DistinctVals AS (
SELECT distinct id
FROM table
where fid = 64
)
SELECT id,
ROW_NUMBER() OVER (ORDER BY id) AS RowNum
FROM DistinctVals
SQL Fiddle DEMO
You could also try
SELECT distinct id, DENSE_RANK() OVER (ORDER BY id) AS RowNum
FROM #mytable
where fid = 64
SQL Fiddle DEMO
Try this:
;WITH CTE AS (
SELECT DISTINCT id FROM table WHERE fid = 64
)
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNum
FROM cte
WHERE fid = 64
Try this
SELECT distinct id
FROM (SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNum
FROM table
WHERE fid = 64) t
Or use RANK() instead of row number and select records DISTINCT rank
SELECT id
FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS RowNum
FROM table
WHERE fid = 64) t
WHERE t.RowNum=1
This also returns the distinct ids
Question is too old and my answer might not add much but here are my two cents for making query a little useful:
;WITH DistinctRecords AS (
SELECT DISTINCT [col1,col2,col3,..]
FROM tableName
where [my condition]
),
serialize AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY [colNameAsNeeded] ORDER BY [colNameNeeded]) AS Sr,*
FROM DistinctRecords
)
SELECT * FROM serialize
Usefulness of using two cte's lies in the fact that now you can use serialized record much easily in your query and do count(*) etc very easily.
DistinctRecords will select all distinct records and serialize apply serial numbers to distinct records. after wards you can use final serialized result for your purposes without clutter.
Partition By might not be needed in most cases

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.