How to select all records of n groups? - sql

I want to select the records of the top n groups. My data looks like this:
Table 'runner':
id gid status rtime
---------------------------
100 5550 1 2016-08-19
200 5550 2 2016-08-22
300 5550 1 2016-08-30
100 6050 3 2016-09-01
200 6050 1 2016-09-02
100 6250 1 2016-09-11
200 6250 1 2016-09-15
300 6250 3 2016-09-19
Table 'static'
id description env
-------------------------------
100 something 1 somewhere 1
200 something 2 somewhere 2
300 something 3 somewhere 3
The unit id (id) is unique within the group but not unique in its column, because an instance of the group is generated regularly. The group id (gid) is assigned to every unit but will not generate on more than one instance.
Now, combining the tables and selecting everything or filter by a specific value is easy, but how do I select all records of, for example, the first two groups without directly refering to the group ids?
Expected result would be:
id gid description status rtime
--------------------------------------
300 6250 something 2 3 2016-09-19
200 6250 something 1 1 2016-09-15
100 6250 something 3 1 2016-09-11
200 6050 something 2 1 2016-09-02
100 6050 something 1 3 2016-09-01
Extra Question: When I filter for a timeframe like this:
[...]
WHERE runner.rtime BETWEEN '2016-08-25' AND '2016-09-16'
Is there a simple way of ensuring, that groups are not cut off but either appear with all their records or not at all?

You can use a ROW_NUMBER() to do this. First, create a query to rank groups:
SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid
Then use this as a derived table to get your other info, and use a where clause to filter to the number of groups you want to see. For instance, the below would return the top 5 groups RN <= 5:
SELECT id, R.gid, description, status, rtime
FROM (SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid) G
INNER JOIN Runner R on R.gid = G.gid
INNER JOIN Statis S on S.id = R.id
WHERE RN <= 5 --Change this to see more or less groups
For your second question about dates, you can do this with a subquery like so:
SELECT *
FROM Runner
WHERE gid IN (SELECT gid
FROM Runner
WHERE rtime BETWEEN '2016-08-25' AND '2016-09-16')

Hmmm. I suspect this might do what you want:
select top (1) with ties r.*
from runner r
order by min(rtime) over (partition by gid), gid;
At least, this will get the complete first group.
In any case, the idea is to include gid as a key in the order by and to use top with ties.

you can do the following
with report as(
select n.id,n.gid,m.description,n.status,n.rtime, dense_rank() over(order by gid desc) as RowNum
from #table1 n
inner join #table2 m on n.id = m.id )
select id,gid,description,status,rtime
from report
where RowNum<=2 -- <-- here n=2
order by gid desc,rtime desc
here a working demo

DENSE_RANK looks like a ideal solution here
Select * From
(
select DENSE_RANK() over (order by gid desc) as D_RN, r.*
from runner r
) A
Where D_RN = 1

No need to use ranking functions (ROW_NUMBER, DENSE_RANK etc).
SELECT r.id, gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
WHERE gid IN (
SELECT TOP 2 gid FROM runner GROUP BY gid ORDER BY gid DESC
)
ORDER BY rtime DESC;
The same using CTE:
WITH grouped
AS
(
SELECT TOP 2 gid
FROM runner GROUP BY gid ORDER BY gid DESC
)
SELECT r.id, grouped.gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
INNER JOIN grouped ON r.gid = grouped.gid
ORDER BY rtime DESC;

Related

Get the only the last row in a sequence. SQL Server

I have a table like this:
ID Seq Prod
-----------------
1 001 1
2 002 1
3 001 2
4 002 2
5 003 2
I want to make a query that only gets the last "Seq" of each product, so the expected output will be something like this:
ID Seq Prod
-----------------
2 002 1
5 003 2
Any help?
A simple way is a correlated subquery:
select t.*
from t
where t.seq = (select max(t2.seq) from t t2 where t2.prod = t.prod);
For performance, you want an index on (prod, seq).
The above often has the best performance. But another way to write the query is to use window fucntions:
select t.*
from (select t.*, row_number() over (partition by prod order by seq desc) as seqnum
from t
) t
where seqnum = 1;
Yet another option is using WITH TIES
Select top 1 * with ties
From YourTable
Order By row_number() over (partition by prod order by seq desc)
Full Disclosure:
Gordon's answer is a nudge more performant (+1), but WITH TIES does not generate an extra column.
You could use a sub-query that finds the maximum ID by Prod. In the following example, replace 'myTable' with your table name:
SELECT t.*
FROM myTable t
INNER JOIN (
SELECT MAX(ID) AS ID,
Prod
FROM myTable
GROUP BY Prod
) a ON a.ID = t.ID
Output:
ID Seq Prod
2 002 1
5 003 2
Here is a quick, working fiddle.
You can write a correlated subquery as:
select T.ID,T.Seq,T.Prod
from #T1 T
where T.ID = (select max(T_Inner.ID)
from #T1 T_Inner
where T_Inner.Prod = T.Prod
group by T_Inner.Prod
)

Return most results for a match based on a preferential order of keywords

I have built a program to index keywords in text files and put them to the database.
My tables are simple:
FILE_ID|Name
------------
1 | a.txt
2 | b.txt
3 | c.txt
KEYWORD_ID|FILE_ID|Hits
-----------------------
1 | 1 | 55
2 | 1 | 10
3 | 1 | 88
1 | 2 | 44
2 | 2 | 15
1 | 3 | 199
2 | 3 | 1
3 | 3 | 4
There is no primary key in this table. I didn't find it necessary.
Now I'd like to search which file has most hits to certain keywords.
If I have only one keyword it is easy:
select top 10 *
from words
where keyword_id=1
order by hits desc
Lets say I want to search for files with keyword 1 and 3 (both must be present and first keyword has highest importance). After many hours I came with this:
select top 10 k.*
from
(
select file_id,
max(hits) as maxhits
from words
where keyword_id=3
group by file_id
) as x
inner join keyword as k
on (k.file_id = x.file_id
and k.keyword=1)
order by k.hits desc
How to make this right? Especially if I want to search with N keywords. Would it be better use temp table and work with that?
If searching with keyword 1 and 3 I want FILE_ID 3 and 1 returned, in this order (because file_id 3 has higher hit count for keyword 1)
Not sure, but (based on your comment) may be this is what you need ?
(I used table declaration from #scsimon answer)
declare #words table (KEYWORD_ID int, [FILE_ID] int, HITS int)
insert into #words
values
(1,1,55),
(2,1,10),
(3,1,88),
(1,2,44),
(2,2,15),
(1,3,199),
(2,3,1),
(3,3,4)
select [FILE_ID] from (
select *, row_number() over(partition by KEYWORD_ID order by HITS desc) rn from #words
where KEYWORD_ID in(1,3)
)t
where rn = 1
order by hits desc
Assuming that all relevant keywords to be found are stored in table KTable which has two columns ID and KEYWORD_ID
Then query should be
SELECT
FileID,
SUM(Hits) NetHits,
SUM(Hits/K.ID) WeightedHits
FROM
Words w JOIN Ktable K
on w.KEYWORD_ID= K.KEYWORD_ID
GROUP BY FileID
HAVING count(1) = (SELECT COUNT(1) FROM Ktable )
ORDER BY 2 DESC,3 DESC
Same query using Windowing function will be
SELECT
DISTINCT
FileID,
NetHitsPerFile
FROM
(
SELECT
FileID,
SUM(Hits) OVER (PARTITION BY FileID ORDER BY K.ID ASC) NetHitsPerFile,
SUM(FileID) OVER(PARTITION BY K.ID) Files,
SUM(Hits/K.ID) OVER (PARTITION BY FileID ORDER BY K.ID ASC) weightedHits
FROM
Words w JOIN Ktable K
on w.KEYWORD_ID= K.KEYWORD_ID
)T
WHERE Files= (SELECT COUNT(1) FROM Ktable)
ORDER BY NetHitsPerFile, weightedHits
Here's one way... if you only want to see the rows with the KEYWORD_ID you specify, just add that WHERE CLAUSE at the bottom as well. The INNER JOIN limits the FILE_ID to those which contain both KEYWORD_ID you specify by checking that the distinct count is = to the number of keywords. Thus, in the below example we limit the result set on 2 KEYWORD_ID and check to make sure each FILE_ID has 2 distinct KEYWORD_ID associated, with the HAVING clause
declare #words table (KEYWORD_ID int, [FILE_ID] int, HITS int)
insert into #words
values
(1,1,55),
(2,1,10),
(3,1,88),
(1,2,44),
(2,2,15),
(1,3,199),
(2,3,1),
(3,3,4)
select top 10 w.*
from #words w
inner join
(select [FILE_ID]
from #words
where KEYWORD_ID in (1,3)
group by [FILE_ID]
having count(distinct KEYWORD_ID) = 2
) x on x.[FILE_ID] = w.[FILE_ID]
order by HITS desc
You can use top (n) with ties for your query as below:
declare #n int = 10 --10 in your scenario
select top (#n) with ties *
from (
select w.*, f.name from #words w inner join #files f
on w.[FILE_ID] = f.[file_id]
) a
order by (row_number() over (partition by a.[file_id] order by hits desc)-1)/#n +1

I want to give serial numbers for particular IDs in existing colomns

I have Products table and 5000 records are there, and I need to update serial numbers for 2000 records.
old table
Id Name Price Recommended
45 Lotus 450 500
55 Cherry 560 500
56 Berry 789 566
new table
Id Name Price Recommended
45 Lotus 450 1
55 Cherry 560 2
56 Berry 789 3
You can't, unfortunately, use a window function directly in the set clause. You could, however, use it in a subquery, and then join that query on your table when updating:
UPDATE p
SET p.recommended = rn
FROM products p
JOIN (SELECT id, ROW_NUMBER() OVER (ORDER BY recommended) AS rn
FROM products) r ON p.id = r.id
SQLFiddle
select row_number() over ( order by (select null)), *
from OldTable
you can try with cte
;WITH cte AS
(
SELECT
id,
ROW_NUMBER() OVER (ORDER BY recommended) AS rn,
name,
price
FROM Products
)
UPDATE cte
SET recommended = rn
-- if you have any condition put here for example ( where rn <= 2000)

Query Table by select Distinct items given a datestamp

I'm trying to write a query that selects distinct uid's but I want to choose those distinct uid's given an order on a modified_at column.
Example:
Table_A
uid data_value modified_at
=== ========== ===========
1 a 1/1/2016
1 b 1/2/2016
1 c 1/3/2016
2 d 1/1/2016
2 e 1/2/2016
3 f 3/1/2016
3 g 3/3/2016
3 h 3/4/2016
4 i 2/1/2016
5 j 1/5/2016
5 k 1/6/2016
So I want to select distinct uid's that have been modified most recently.
I'm not sure if there's a quick query that would allow be to do this rather than pull the information separately into a script and modify.
Write now, all I can do is
select distinct uid, data_value, modified_at
from Table_A (...and other stuff if I want to join and do things)
You can use DISTINCT ON:
SELECT DISTINCT ON (uid) uid, data_value, modified_at
FROM Table_A
ORDER BY uid, modified_at DESC
use a windows function row_number() and use cte syntaxis for better reading
WITH cte as (
SELECT *,
row_number() over (PARTITION BY uid ORDER BY modified_at DESC) as rn
FROM TableA
)
SELECT *
FROM cte
WHERE rn = 1

Join a dynamic number of rows in postgres

Let's say I have the following tables:
Batch Items
---+----- ---+----------+--------
id | size id | batch_id | quality
---+----- ---+----------+--------
1 | 10 1 | 1 | 9
2 | 2 2 | 1 | 10
3 | 2 | 1
4 | 2 | 2
5 | 2 | 1
6 | 2 | 9
I have batches of items. They are sent by batches of size batch.size. An item is broken if it's quality is <= 3.
I want to know the number of broken items in the last batches sent:
batch_id | broken_item_count
---------+---------------------
1 | 0
2 | 2 (and not 3)
My idea is the following:
SELECT batch.id as batch_id, COUNT(broken_items.*) as broken_item_count
FROM batch
INNER JOIN (
SELECT id
FROM items
WHERE items.quality <= 3
ORDER BY items.id asc
LIMIT batch.size -- invalid reference to FROM-clause entry for table "batch"
) broken_items ON broken_items.batch_id = batch.id
(I would ORDER BY items.shipped_at. But for simplicity, I order by items.id)
But this query shows me the error I put as the comment.
How can I limit the number of joined items based on the batch.size that is different for each row ?
Is there any other way to achieve what I want ?
SELECT b.id AS batch_id
, count(i.quality < 4 OR NULL) AS broken_item_count
FROM batch b
LEFT JOIN (
SELECT batch_id, quality
, row_number() OVER (PARTITION BY batch_id ORDER BY id DESC) AS rn
FROM items
) i ON i.batch_id = b.id
AND i.rn <= b.size
GROUP BY 1
ORDER BY 1;
SQL Fiddle with added examples.
This is much like #Clodoaldos's answer, but with a couple of differences. Most importantly:
You want to count the broken items in the last batches sent, so we have to ORDER BY id DESC
If there can be batches without items at all you need to use LEFT JOIN instead of a plain JOIN or those batches are excluded.
Consequently, the check i.rn <= b.size needs to move from the WHERE clause to the JOIN clause.
SQL Fiddle
select
b.id as batch_id,
count(quality <= 3 or null) as broken_item_count
from
batch b
inner join (
select
id, quality, batch_id,
row_number() over (partition by batch_id order by id) as rn
from items
) i on i.batch_id = b.id
where rn <= b.size
group by b.id
order by b.id
From what I understand the count of defective items cannot be greater than the batch size.
EDIT: After reading your comments, I think using the RANK() function, and then join by rank and size should work for you. The following query attempts that.
SELECT b.id,
SUM(CASE WHEN i1.quality <= 3 THEN 1 ELSE 0END) as broken_item_count
FROM BATCH as b
LEFT JOIN (SELECT i.id, i.batch_id, i.quality,
RANK() OVER(PARTITION BY i.batch_id ORDER BY i.id) as RANK
FROM ITEMS as i) as i1 ON b.id = i1.batch_id AND i1.RANK <= b.size
GROUP BY b.id
EDIT2: Updated the query with a LEFT JOIN to cover the case where there are no samples in some batch.