Counting in sql and subas - sql

I have the following code
select ID, count(*) from
( select ID, service type from database
group by 1,2) suba
group by 1
having count (*) > 1
And I get a table where i see the IDs and a count of changes. Similar to this
ID | Count(*)
5675 | 2
5695 | 3
5855 | 2
5625 | 4
5725 | 3
Can someone explain to me how to count all the count(*) into groups such that i get a table similar to...
count (*) | number
2 | 2
3 | 2
4 | 1
and so forth. Can someone also explain to be me what suba means?
MY NEWEST CODE:
select suba.id, count(*) from
( select id, service_type from table_name
group by 1,2) as suba
group by 1
having count (*) > 1

Haven't tried it, but I think this should work
select NoOfChanges, count (*) from
(
select suba.id, count(*) as NoOfChanges from
( select id, service_type from table_name
group by 1,2) as suba
group by 1
having count (*) > 1
)
subtableb
group by NoOfChanges
You can think of that as
select NoOfChanges, count (*) from subtableb
group by NoOfChanges
but subtableb isn't a real table, but the results from your previous query

suba is the alias of the subquery. Every table or subquery needs a unique name or an alias so you can refer to it in other parts of the query (and disambiguate). Note there is a missing implicit AS between the closing parenthesis and "suba".

Related

Postgresql query to filter latest data based on 2 columns

Table Structure First
users table
id
1
2
3
sites table
id
1
2
site_memberships table
site_id
user_id
created_on
1
1
1
1
1
2
1
1
3
2
1
1
2
1
2
1
2
2
1
2
3
Assuming higher the created_on number, latest the record
Expected Output
site_id
user_id
created_on
1
1
3
2
1
2
1
2
3
Expected output: I need latest record for each user for each site membership.
Tried the following query, but this does not seem to work.
select * from users inner join
(
SELECT ROW_NUMBER () OVER (
PARTITION BY sm.user_id,
sm.created_on
), sm.*
from site_memberships sm
inner join sites s on sm.site_id=s.id
) site_memberships
ON site_memberships.user_id = users.user_id where row_number=1```
I think you have overcomplicated the problem you want to solve.
You seem to want aggregation:
select site_id, user_id, max(created_on)
from site_memberships sm
group by site_id, user_id;
If you had additional columns that you wanted, you could use distinct on instead:
select distinct on (site_id, user_id) sm.*
from site_memberships sm
order by site_id, user_id, created_on desc;

How to select IDs that have at least two specific instaces in a given column

I'm working with a medical claim table in pyspark and I want to return only userid's that have at least 2 claim_ids. My table looks something like this:
claim_id | userid | diagnosis_type | claim_type
__________________________________________________
1 1 C100 M
2 1 C100a M
3 2 D50 F
5 3 G200 M
6 3 C100 M
7 4 C100a M
8 4 D50 F
9 4 A25 F
From this example, I would want to return userid's 1, 3, and 4 only. Currently I'm building a temp table to count all of the distinct instances of the claim_ids
create table temp.claim_count as
select distinct userid, count(distinct claim_id) as claims
from medical_claims
group by userid
and then pulling from this table when the number of claim_id >1
select distinct userid
from medical_claims
where userid (
select distinct userid
from temp.claim_count
where claims>1)
Is there a better / more efficient way of doing this?
If you want only the ids, then use group by:
select userid, count(*) as claims
from medical_claims
group by userid
having count(*) > 1;
If you want the original rows, then use window functions:
select mc.*
from (select mc.*, count(*) over (partition by userid) as num_claims
from medical_claims mc
) mc
where num_claims > 1;

How to get MAX Hike in Min month?

below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2
Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.
use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name
You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)

Can the result set of inner query can be displayed with final result set

I have a table that contains some data say
====================
Record | Record_Count
1 | 12
3 | 87
5 | 43
6 | 54
1 | 43
3 | 32
5 | 65
6 | 43
I have a query that returns Record Count sum grouped by Record
select record,sum(record_count)
FROM table_name
WHERE <conditions>
GROUP BY tcpa_code
ORDER BY sum(record_count)
The result is something like this
====================
Record | Record_Count
1 | 55
3 | 119
5 | 108
6 | 97
Now I also want a grand total of record_count (Sum of all record Count).
The thing is I want the above result set along with the grand total also.
I had tried this
select sum(subquery.record_count)
from (
select record,sum(record_count)
FROM table_name
WHERE <conditions>
GROUP BY tcpa_code
ORDER BY sum(record_count) ) as subquery
But by using this I am losing the individual record_count sum.
So my question is can I achieve result set that contains record_count sum for each record and grand total of record_count in a single query?
You may use union to achieve what you need:
(select cast(record as varchar(16)) record,sum(record_count) from schema.table
group by 1)
union
(select 'Grand_Total' as record,sum(record_count) from schema.table
group by 1);
Check here - SQL Fiddle
If your DB supports group by ... with rollup, you may also use:
select ifnull(record,'Grand Total')record,sum(record_count) total_count
from schema.table
group by record asc with rollup
Check here - SQL Fiddle
To save some typing, a common table expression (cte) can be used:
with cte as
(
select record, sum(record_count) rsum
FROM table_name
WHERE <conditions>
GROUP BY record
)
select record, rsum from cte
union all
select 'Grand_Total', sum(rsum) from cte
You should utilize windows functions of PostgrSQL.
As for this query,
SELECT record, record_count, sum(record_count) OVER()
FROM (
SELECT record, sum(record_count) record_count
FROM table_name
WHERE <conditions>
GROUP BY tcpa_code
ORDER BY sum(record_count)
) as subquery

How to find records that are associated to the same group more than once?

I need to find if a record id is on the same group id more than once.
group id | record id | comments
--------------------------------
3 1
3 1
4 2
In this case, the record with id 1 is being associated with group id 3 twice.
Is there a query that can provide all records with this behaviour?
Thanks
This query should do the job:
select t.group_id, t.record_id
from [your_table] t
group by t.group_id, t.record_id
having count(*) > 1