SQL query for displaying clusters with their size - sql

I have a Table with different Cluster ID's
ID
1
1
2
2
2
3
3
3
3
4
4
I want to display the size of the Cluster with No of Clusters in that Cluster.
For example for the above table Expected Output:
Cluster Size | No of Clusters (with that size)
2 | 2
3 | 1
4 | 1
I wrote a query which will give me the specified Cluster size.
Select COUNT(*) from
(SELECT ID, COUNT(ID) as cnt
FROM [Table] group by ID having COUNT(*) =3) as TC;
In the above example I will get "1" as my result for the above table.
However, I want a query which will give me all the Clusters and their respective size.

select [Cluster Size], Count(*) as [No of Clusters]
from (
select count(*) as [Cluster Size]
from Table1
group by ID
) a
group by [Cluster Size]
SQL Fiddle Example
Output:
| CLUSTER SIZE | NO OF CLUSTERS |
---------------------------------
| 2 | 2 |
| 3 | 1 |
| 4 | 1 |

SELECT c AS ClusterSize, COUNT(*) AS NumOfClusters
FROM
(
SELECT COUNT(*) AS c, ID
FROM #table
GROUP BY ID
)A
GROUP BY c

If I understood your requirements this query should do the job:
SELECT cl.size, COUNT(cl.size)
FROM
(SELECT id, COUNT(id) [Size] FROM Table1 GROUP BY id) cl
GROUP BY cl.size
Here is a link to SQL Fiddle: http://sqlfiddle.com/#!3/56de3/7

Related

Find the count of IDs that have the same value

I'd like to get a count of all of the Ids that have have the same value (Drops) as other Ids. For instance, the illustration below shows you that ID 1 and 3 have A drops so the query would count them. Similarly, ID 7 & 18 have B drops so that's another two IDs that the query would count totalling in 4 Ids that share the same values so that's what my query would return.
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 2 | C |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
I've tried the several approaches but the following query was my last attempt.
With cte1 (Id1, D1) as
(
select Id, Drops
from Posts
),
cte2 (Id2, D2) as
(
select Id, Drops
from Posts
)
Select count(distinct c1.Id1) newcnt, c1.D1
from cte1 c1
left outer join cte2 c2 on c1.D1 = c2.D2
group by c1.D1
The result if written out in full would be a single value output but the records that the query should be choosing should look as follows:
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
Any advice would be great. Thanks
You can use a CTE to generate a list of Drops values that have more than one corresponding ID value, and then JOIN that to Posts to find all rows which have a Drops value that has more than one Post:
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT P.*
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output:
ID Drops
1 A
3 A
7 B
18 B
If desired you can then count those posts in total (or grouped by Drops value):
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT COUNT(*) AS newcnt
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output
newcnt
4
Demo on SQLFiddle
You may use dense_rank() to resolve your problem. if drops has the same ID then dense_rank() will provide the same rank.
Here is the demo.
with cte as
(
select
drops,
count(distinct rnk) as newCnt
from
( select
*,
dense_rank() over (partition by drops order by id) as rnk
from myTable
) t
group by
drops
having count(distinct rnk) > 1
)
select
sum(newCnt) as newCnt
from cte
Output:
|newcnt |
|------ |
| 4 |
First group the count of the ids for your drops and then sum the values greater than 1.
select sum(countdrops) as total from
(select drops , count(id) as countdrops from yourtable group by drops) as temp
where countdrops > 1;

How to include row totals in pivot statement in Oracle?

I have a table of data, see
Using a pivot statement, I am able to break down the count by title
select * from (
select * from ta
)
pivot (
COUNT(title)
for title in ( 'worker', 'manager') )
So the result looks like this:
STATUS 'worker' 'manager'
started 3 1
finished 4 5
ready 3 4
What I need to add a third column for the row totals
STATUS 'worker' 'manager' Total
started 3 1 4
finished 4 5 9
ready 3 4 7
Any idea how I can accomplish this within the same statement?
demo is at http://sqlfiddle.com/#!4/740fd/1
I would just use conditional aggregation rather than pivot. This gives you the extra flexibility that you need:
select
status,
sum(case when title = 'worker' then 1 else 0 end) worker,
sum(case when title = 'manager' then 1 else 0 end) manager,
count(*) total
from ta
group by status
Demo on DB Fiddle:
STATUS | WORKER | MANAGER | TOTAL
:------- | -----: | ------: | ----:
started | 3 | 1 | 4
finished | 4 | 5 | 9
ready | 3 | 4 | 7
Use the SUM() analytic function to get the total and then use PIVOT
select
status,
sum(case
when title = 'worker'
then 1
else 0
end) worker,
sum(case
when title = 'manager'
then 1
else 0
end) manager,
count(*) total
from ta
group by status
Give an alias for the whole query(such as q) in order to qualify the all columns with asterisk(q.*), and then sum up all the columns to yield total column next to it :
select q.*, worker + manager as total
from ta
pivot
(
count(title)
for title in ( 'worker' as worker, 'manager' as manager )
) q
Demo
I think the other examples are much simpler, but here is a different approach using cube and grouping before pivoting:
select *
from (
select decode(grouping(title),1,'total',0,title) title,
status,
count(*) cnt
from ta
group by status, cube(title) )
pivot(
sum(cnt) for title in ('worker','manager','total')
)
Output:
| STATUS | 'worker' | 'manager' | 'total' |
|----------|----------|-----------|---------|
| finished | 4 | 5 | 9 |
| ready | 3 | 4 | 7 |
| started | 3 | 1 | 4 |
http://sqlfiddle.com/#!4/740fd/13/0
Adding the cube into the group by clause will give you a subtotal for that column. It will show as null in that column by default. You can use the grouping function in the select clause to differentiate between the total row and the normal rows (the total row will be 1, normal rows are 0). Using a decode will force those total rows to be 'total' which becomes one of the values that you can pivot on.

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo
The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)
Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

SQL Server: Select only one row of rows that has the same ID on some coulmn

I have a table that has 3 columns:
- ID
- FROM
- TO
And i have data like that
-----------------------
ID | FROM | TO
1 | 2 | 1
2 | 5 | 1
3 | 7 | 1
4 | 2 | 1
5 | 2 | 1
6 | 9 | 1
7 | 3 | 1
8 | 4 | 1
9 | 5 | 1
I would like to create a query that selects all rows where TO = 1 and i don't want to display rows that was previously retrieved, for example i have multiple rows where FROM = 2 and TO = 1, i just need to retrieve that row only once.
My table doesn't really look like this but i am giving a small example because my aim is to collect all FROM numbers but without any redundancy.
use distinct keyword
select distinct m.from,m.to from mytable as m;
Use DISTINCT
SELECT DISTINCT from,to FROM yourTable WHERE to = 1
You just have to group by the columns you want to display:
select [from] from mytable group by [from]
If you want to see how many froms you have all you have to do is:
select [from], count(*) from mytable group by [from]
You could use distinct but it would slower than group by but require more memory.
Please read here if you want an explanation on the difference between group by and distinct:
Huge performance difference when using group by vs distinct
Not sure what exactly you meant select distinct [FROM] from TableName where [TO] = 1
OR
may be you need single row for every distinct [FROM] value for given [TO] ?
;with cte as (
select ID, [FROM], [TO],
rn = row_number() over (partition by [FROM] order by ID)
from TableName
where [TO] = 1
)
select ID, [FROM], [TO]
from cte
where rn=1

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.