Query Table by select Distinct items given a datestamp - sql

I'm trying to write a query that selects distinct uid's but I want to choose those distinct uid's given an order on a modified_at column.
Example:
Table_A
uid data_value modified_at
=== ========== ===========
1 a 1/1/2016
1 b 1/2/2016
1 c 1/3/2016
2 d 1/1/2016
2 e 1/2/2016
3 f 3/1/2016
3 g 3/3/2016
3 h 3/4/2016
4 i 2/1/2016
5 j 1/5/2016
5 k 1/6/2016
So I want to select distinct uid's that have been modified most recently.
I'm not sure if there's a quick query that would allow be to do this rather than pull the information separately into a script and modify.
Write now, all I can do is
select distinct uid, data_value, modified_at
from Table_A (...and other stuff if I want to join and do things)

You can use DISTINCT ON:
SELECT DISTINCT ON (uid) uid, data_value, modified_at
FROM Table_A
ORDER BY uid, modified_at DESC

use a windows function row_number() and use cte syntaxis for better reading
WITH cte as (
SELECT *,
row_number() over (PARTITION BY uid ORDER BY modified_at DESC) as rn
FROM TableA
)
SELECT *
FROM cte
WHERE rn = 1

Related

Get the only the last row in a sequence. SQL Server

I have a table like this:
ID Seq Prod
-----------------
1 001 1
2 002 1
3 001 2
4 002 2
5 003 2
I want to make a query that only gets the last "Seq" of each product, so the expected output will be something like this:
ID Seq Prod
-----------------
2 002 1
5 003 2
Any help?
A simple way is a correlated subquery:
select t.*
from t
where t.seq = (select max(t2.seq) from t t2 where t2.prod = t.prod);
For performance, you want an index on (prod, seq).
The above often has the best performance. But another way to write the query is to use window fucntions:
select t.*
from (select t.*, row_number() over (partition by prod order by seq desc) as seqnum
from t
) t
where seqnum = 1;
Yet another option is using WITH TIES
Select top 1 * with ties
From YourTable
Order By row_number() over (partition by prod order by seq desc)
Full Disclosure:
Gordon's answer is a nudge more performant (+1), but WITH TIES does not generate an extra column.
You could use a sub-query that finds the maximum ID by Prod. In the following example, replace 'myTable' with your table name:
SELECT t.*
FROM myTable t
INNER JOIN (
SELECT MAX(ID) AS ID,
Prod
FROM myTable
GROUP BY Prod
) a ON a.ID = t.ID
Output:
ID Seq Prod
2 002 1
5 003 2
Here is a quick, working fiddle.
You can write a correlated subquery as:
select T.ID,T.Seq,T.Prod
from #T1 T
where T.ID = (select max(T_Inner.ID)
from #T1 T_Inner
where T_Inner.Prod = T.Prod
group by T_Inner.Prod
)

Select count of total records and also distinct records

I have a table such as this:
PalmId | UserId | CreatedDate
1 | 1 | 2018-03-08 14:18:27.077
1 | 2 | 2018-03-08 14:18:27.077
1 | 3 | 2018-03-08 14:18:27.077
1 | 1 | 2018-03-08 14:18:27.077
I wish to know how many dates were created for Palm 1 and I also wish to know how many users have created those dates for Palm 1. So the outcome for first is 4 and outcome for second is 3
I am wondering if I can do that in a single query as oppose to having to do a subquery and a join on itself as in example below.
SELECT MT.[PalmId], COUNT(*) AS TotalDates, T1.[TotalUsers]
FROM [MyTable] MT
LEFT OUTER JOIN (
SELECT MT2.[PalmId], COUNT(*) AS TotalUsers
FROM [MyTable] MT2
GROUP BY MT2.[UserId]
) T1 ON T1.[PalmId] = MT.[PalmId]
GROUP BY MT.[PalmId], T1.[TotalUsers]
According to first table you could do something like this:
select count(distinct uerid) as N_Users,
count(created_date) as created_date, -- if you use count(*) you consider also rows with 'NULL'
palmid
from your_table
group by palmid
If you want "4" and "3", then I think you want:
SELECT MT.PalmId, COUNT(*) AS NumRows, COUNT(DISTINCT mt.UserId) as NumUsers
FROM MyTable MT
GROUP BY MT.PalmId

find all rows after the recent update using oracle

I tried below query to bring all rows after last Action="UNLOCKED", but ORDER BY is not allowed in subquery it seems.
SELECT *
FROM TABLE
WHERE id >= (SELECT MAX(id)
FROM TABLE
WHERE ACTION='UNLOCKED' AND action_id=123
ORDER BY CREATE_DATE DESC);
Sample data
Id action_id Action ... CREATE_DATE
1 123 ADD 03/18/2018
2 123 Unlocked 03/19/2018
3 123 Updated1 03/19/2018
4 123 Updated2 03/19/2018
5 123 Unlocked 03/20/2018
6 123 Updated3 03/20/2018
7 123 Updated4 03/20/2018
Output should be rows with id 5,6,7. What should i use to get this output
you could use an inner join on subselect for max create_date
select * from TABLE
INNER JOIN (
select max(CREATE_DATE) max_date
from TABLE
where Action = 'Unlocked' ) T on t.max_date = TABLE.CREATE_DATE
You need not order the inner query because it will return only one value. You can do it as follows
SELECT * FROM TABLE WHERE id >= (select max(id) from TABLE where ACTION='UNLOCKED' and action_id=123);

Ranking of a tuple in another table

So I have 2 tables, team A and team B, with their score. I want the rank of the score of every member of team A within team B using SQL or vertica, as shown below
Team A Table
user score
-------------
asa 100
bre 200
cqw 50
duy 50
Team B Table
user score
------------
gfh 20
ewr 80
kil 70
cvb 90
Output:
Team A Table
user score rank in team B
------------------------------
asa 100 1
bre 200 1
cqw 50 4
duy 50 4
Try this - and this only works in Vertica.
INTERPOLATE PREVIOUS VALUE is an outer-join predicate specific to Vertica that joins two tables on non-equal columns, using the 'last known' value in the outer-joined table to make a match succeed.
WITH
-- input, don't use in query itself
table_a (the_user,score) AS (
SELECT 'asa',100
UNION ALL SELECT 'bre',200
UNION ALL SELECT 'cqw',50
UNION ALL SELECT 'duy',50
)
,
table_b(the_user,score) AS (
SELECT 'gfh',20
UNION ALL SELECT 'ewr',80
UNION ALL SELECT 'kil',70
UNION ALL SELECT 'cvb',90
)
-- end of input - start WITH clause here
,
ranked_b AS (
SELECT
RANK() OVER(ORDER BY score DESC) AS the_rank
, *
FROM table_b
)
SELECT
a.the_user AS a_user
, a.score AS a_score
, b.the_rank AS rank_in_team_b
FROM table_a a
LEFT JOIN ranked_b b
ON a.score INTERPOLATE PREVIOUS VALUE b.score
ORDER BY 1
;
a_user|a_score|rank_in_team_b
asa | 100| 1
bre | 200| 1
cqw | 50| 4
duy | 50| 4
Simple correlated query should do:
select
a.*,
(select count(*) + 1 from table_b b where b.score > a.score) rank_in_b
from table_a a;
All you need to do is count the number of people with more score than current user in the table b and add 1 to it to get the rank.

How to select all records of n groups?

I want to select the records of the top n groups. My data looks like this:
Table 'runner':
id gid status rtime
---------------------------
100 5550 1 2016-08-19
200 5550 2 2016-08-22
300 5550 1 2016-08-30
100 6050 3 2016-09-01
200 6050 1 2016-09-02
100 6250 1 2016-09-11
200 6250 1 2016-09-15
300 6250 3 2016-09-19
Table 'static'
id description env
-------------------------------
100 something 1 somewhere 1
200 something 2 somewhere 2
300 something 3 somewhere 3
The unit id (id) is unique within the group but not unique in its column, because an instance of the group is generated regularly. The group id (gid) is assigned to every unit but will not generate on more than one instance.
Now, combining the tables and selecting everything or filter by a specific value is easy, but how do I select all records of, for example, the first two groups without directly refering to the group ids?
Expected result would be:
id gid description status rtime
--------------------------------------
300 6250 something 2 3 2016-09-19
200 6250 something 1 1 2016-09-15
100 6250 something 3 1 2016-09-11
200 6050 something 2 1 2016-09-02
100 6050 something 1 3 2016-09-01
Extra Question: When I filter for a timeframe like this:
[...]
WHERE runner.rtime BETWEEN '2016-08-25' AND '2016-09-16'
Is there a simple way of ensuring, that groups are not cut off but either appear with all their records or not at all?
You can use a ROW_NUMBER() to do this. First, create a query to rank groups:
SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid
Then use this as a derived table to get your other info, and use a where clause to filter to the number of groups you want to see. For instance, the below would return the top 5 groups RN <= 5:
SELECT id, R.gid, description, status, rtime
FROM (SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid) G
INNER JOIN Runner R on R.gid = G.gid
INNER JOIN Statis S on S.id = R.id
WHERE RN <= 5 --Change this to see more or less groups
For your second question about dates, you can do this with a subquery like so:
SELECT *
FROM Runner
WHERE gid IN (SELECT gid
FROM Runner
WHERE rtime BETWEEN '2016-08-25' AND '2016-09-16')
Hmmm. I suspect this might do what you want:
select top (1) with ties r.*
from runner r
order by min(rtime) over (partition by gid), gid;
At least, this will get the complete first group.
In any case, the idea is to include gid as a key in the order by and to use top with ties.
you can do the following
with report as(
select n.id,n.gid,m.description,n.status,n.rtime, dense_rank() over(order by gid desc) as RowNum
from #table1 n
inner join #table2 m on n.id = m.id )
select id,gid,description,status,rtime
from report
where RowNum<=2 -- <-- here n=2
order by gid desc,rtime desc
here a working demo
DENSE_RANK looks like a ideal solution here
Select * From
(
select DENSE_RANK() over (order by gid desc) as D_RN, r.*
from runner r
) A
Where D_RN = 1
No need to use ranking functions (ROW_NUMBER, DENSE_RANK etc).
SELECT r.id, gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
WHERE gid IN (
SELECT TOP 2 gid FROM runner GROUP BY gid ORDER BY gid DESC
)
ORDER BY rtime DESC;
The same using CTE:
WITH grouped
AS
(
SELECT TOP 2 gid
FROM runner GROUP BY gid ORDER BY gid DESC
)
SELECT r.id, grouped.gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
INNER JOIN grouped ON r.gid = grouped.gid
ORDER BY rtime DESC;