Is there a way to display the first two results of each unique id? - sql

I work in healthcare. In a Postgres database, we have a table member IDs and dates. I'm trying to pull the latest two dates for each member ID.
Simplified sample data:
A 1
B 1
B 2
C 1
C 5
C 7
D 1
D 2
D 3
D 4
Desired result:
A 1
B 1
B 2
C 1
C 5
D 1
D 2

I get a strong feeling this is for a homework assignment and would recommend that you look into partitioning and specifically rank() function by yourself first before looking at my solution.
Moreover, you have not specified how you received the initial result you provided, so I'll have to assume you just did select letter_column, number_column from my_table; to achieve the result.
So, what you actually want here is partition the initial query result into groups by the letter_column and select the first two rows in each. rank() function lets you assign each row a number, counting within groups:
select letter_column,
number_column,
rank() over (partition by letter_column order by number_column) as rank
from my_table;
Since it's a function, you can't use it in a predicate in the same query, so you'll have to build another query around this one, this time filtering the results where rank is over 2:
with ranked_results as (select letter_column,
number_column,
rank() over (partition by letter_column order by number_column asc) as rank
from my_table mt)
select letter_column,
number_column
from ranked_results
where rank < 3;
Here's an SQLFiddle to play around: http://sqlfiddle.com/#!15/e90744/1/0
Hope this helps!

Related

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

Get occurrence count of specific categories in a table

Looking to get the transition count of categories from a table. For Name type B, category transitions from Good to Bad so count is 2. For Name type A, it transitions from Good - Moderate - Good - Moderate - Bad, hence gets a count of 5.
Any help would be appreciated.
This is my input data:
Name
order no
category
A
1
Good
A
2
Good
A
3
MODERATE
A
4
Good
A
5
MODERATE
A
6
Bad
A
7
Bad
B
1
Good
B
2
Good
B
3
Good
B
4
BAD
And this is my desired output:
Name
category_transition_count
A
5
B
2
select name
,count(cnt) as category_transition_count
from
(select name
,case when category <> lag(category) over(partition by Name order by order_no) or lag(category) over(partition by Name order by order_no) is null then 1 end as cnt
from t) t
group by name
name
category_transition_count
A
5
B
2
Fiddle
You could use the lag window function to get the category of the previous row, and then compare it with the current row to see if it changed, and count those occurrences. Note that by definition the lag of the first value is null, which can't be different from the current value. so you'll need to handle that explicitly:
SELECT name, COUNT(changed) + 1
FROM (SELECT name,
CASE WHEN category <> LAG(category) OVER (PARTITION BY name ORDER BY order_no ASC)
THEN 1
END AS changed
FROM mytable) t
GROUP BY name
SQLFiddle (PostgreSQL) demo

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

SQL obtaining items ranked by their count(*)

I have been attempting the following query for a while- not sure how to approach this issue I'm having.
I need to obtain bands that cover the second most styles of music - including all equal bands if there is a tie for second. For example for the table band_style,
Band_id | Style
---------------------
1 Rock
2 Pop
1 Punk
3 Classical
1 Metal
2 Rock
4 Pop
4 Rap
The returned result should be
Band_id | Num_styles
2 2
4 2
My initial attempt at a solution:
SELECT band_id, COUNT(*) AS num_styles FROM band_style
GROUP BY band_id HAVING COUNT(*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id));
So this gives me the count of all the bands with less styles than the maximum. Now, I'd like to take ALL rows which have the maximum value of this query. I do not want to use rownum or limit because from what I've experienced this doesn't work too well in the case of ties. I am also wondering if there is a way to wrap this in another MAX function, but I don't really see how.
Any help with this issue would be appreciated- also think this would be useful to know to see if it can be applied to 3rd, 4th highest, etc.
(Using Oracle/SQLPlus)
Assuming this is a large data file and we do not necessarily know what the "second highest count" is.
UPDATE: this almost works- gets all bands with less than max number of styles. But calling MAX doesn't seem to be working, as the table returned still has all values of NUM except the max..
WITH data AS (
SELECT band_id, COUNT(*) AS NUM FROM band_style GROUP BY band_id HAVING COUNT (*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id)))
SELECT data.band_id, data.NUM FROM data
INNER JOIN ( SELECT band_id m, MAX(NUM) n
FROM data GROUP BY band_id
) t
ON t.band_id = data.band_id
AND t.NUM = data.NUM;
If you have to stick with mysql, this sql will be much more difficult. But if you could switch to mariadb or oracle this should work.
with data as (
select
band_id, count(*) styles,
dense_rank() over (order by count(*) desc) place
from
table1 group by band_id)
select * from data where place=2
http://sqlfiddle.com/#!4/dc3f6/12
Your friend here is the window function dense_rank.
The output is:
BAND_ID STYLES PLACE
2 2 2
4 2 2
And here to avoid some missunderstandings, due to place 2 is here styles 2.
http://sqlfiddle.com/#!4/2be32/3
Now the styles count is different from the place id.
BAND_ID STYLES PLACE
4 3 2
This illustrates that dense_rank does not know the second highest count value beforehand.

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.