Get the latest records per Group By SQL

Get the latest records per Group By SQL - sql

I have the following table:
CREATE TABLE orders (
id INT PRIMARY KEY IDENTITY,
oDate DATE NOT NULL,
oName VARCHAR(32) NOT NULL,
oItem INT,
oQty INT
-- ...
);
INSERT INTO orders
VALUES
(1, '2016-01-01', 'A', 1, 2),
(2, '2016-01-01', 'A', 2, 1),
(3, '2016-01-01', 'B', 1, 3),
(4, '2016-01-02', 'B', 1, 2),
(5, '2016-01-02', 'C', 1, 2),
(6, '2016-01-03', 'B', 2, 1),
(7, '2016-01-03', 'B', 1, 4),
(8, '2016-01-04', 'A', 1, 3)
;
I want to get the most recent rows (of which there might be multiple) for each name. For the sample data, the results should be:
id
oDate
oName
oItem
oQty
...
5
2016-01-02
C
1
2
6
2016-01-03
B
2
1
7
2016-01-03
B
1
4
8
2016-01-04
A
1
3
The query might be something like:
SELECT oDate, oName, oItem, oQty, ...
FROM orders
WHERE oDate = ???
GROUP BY oName
ORDER BY oDate, id
Besides missing the expression (represented by ???) to calculate the desired values for oDate, this statement is invalid as it selects columns that are neither grouped nor aggregates.
Does anyone know how to do get this result?

The rank window clause allows you to, well, rank rows according to some partitioning, and then you could just select the top ones:
SELECT oDate, oName, oItem, oQty, oRemarks
FROM (SELECT oDate, oName, oItem, oQty, oRemarks,
RANK() OVER (PARTITION BY oName ORDER BY oDate DESC) AS rk
FROM my_table) t
WHERE rk = 1

This is a generic query without using analytical function.
SQLFiddle Demo
SELECT a.*
FROM table1 a
INNER JOIN
(SELECT max(odate) modate,
oname,
oItem
FROM table1
GROUP BY oName,
oItem
)
b ON a.oname=b.oname
AND a.oitem=b.oitem
AND a.odate=b.modate

Add a primary key suppose id field to the table and make it auto increment,. Then order by id you will get it. It is the traditional way. By using your table you can only order by oDate. But is is having same date multiple times, so it also won't solve your problem.

I think you need a query like this:
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY oName ORDER BY oDate DESC) seq
FROM yourTable) t
WHERE (seq <= 2)
ORDER BY oDate;

You have to use ROW_NUMBER in following:
select oDate, oName, oItem, oQty, oRemarks
from (
select *, row_number() over(partition by oName, oItem order by oDate desc) rn
from #t
)x
where rn = 1
order by oDate
OUTPUT
oDate oName oItem oQty oRemarks
2016-01-01 A 001 2
2016-01-01 A 002 1 test
2016-01-02 C 001 2
2016-01-03 B 001 4
2016-01-03 B 002 1

Related

GROUP by Largest String for all the substrings

I have a table like this where some rows have the same grp but different names. I want to group them by name such that all the substrings after removing nonalphanumeric characters are aggregated together and grouped by the largest string. The null value is considered the substring of all the strings.
grp
name
value
1
ab&c
10
1
abc d e
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
Desired result
grp
name
value
1
abcde
111
1
xy
34
2
fgh
87
My query-
Select grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g') name, sum(value) value
from table
group by grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g');
Result
grp
name
value
1
abc
10
1
abcde
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
What changes should I make in my query?

To solve this problem, I did the following (all of the code below is available on the fiddle here).
CREATE TABLE test
(
grp SMALLINT NOT NULL,
name TEXT NULL,
value SMALLINT NOT NULL
);
and populate it using your data + extra for testing:
INSERT INTO test VALUES
(1, 'ab&c', 10),
(1, 'abc d e', 56),
(1, 'ab', 21),
(1, 'a', 23),
(1, NULL, 1000000),
(1, 'r*&%$s', 100), -- added for testing.
(1, 'rs__t', 101),
(1, 'rs__tu', 101),
(1, 'xy', 1111),
(1, NULL, 1000000),
(2, 'fgh', 87),
(2, 'fgh', 13), -- For Charlieface
(2, NULL, 1000000),
(2, 'x', 50),
(2, 'x', 150),
(2, 'x----y', 100);
Then, you can use this query:
WITH t1 AS
(
SELECT
grp, n_str,
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str),
CASE
WHEN
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str) IS NULL
OR
POSITION
(
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str)
IN
n_str
) = 0
THEN 1
ELSE 0
END AS change,
value
FROM
test t1
CROSS JOIN LATERAL
(
VALUES
(
REGEXP_REPLACE(name,'[^a-zA-Z0-9]+', '', 'g')
)
) AS v(n_str)
WHERE n_str IS NOT NULL
), t2 AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY grp, s_change ORDER BY grp, n_str DESC) AS rn,
grp, n_str,
SUM(value) OVER (PARTITION BY grp, s_change) AS s_val,
MAX(LENGTH(n_str)) OVER (PARTITION BY grp) AS max_nom
FROM
(
SELECT
grp, n_str, change,
SUM(change) OVER (ORDER BY grp, n_str) AS s_change,
value
FROM
t1
ORDER BY grp, n_str DESC
) AS sub1
), t3 AS
(
SELECT
grp, SUM(value) AS null_sum
FROM
test
WHERE name IS NULL
GROUP BY grp
)
SELECT x.grp, x.n_str, x.s_val + y.null_sum
FROM t2 x
JOIN t3 y
ON x.max_nom = LENGTH(x.n_str) AND x.grp = y.grp
UNION
SELECT grp, n_str, s_val
FROM
t2 WHERE max_nom != LENGTH(n_str) AND rn = 1
ORDER BY grp, n_str;
Result:
grp n_str ?column?
1 abcde 2000110
1 rstu 302
1 xy 1111
2 fgh 1000100
2 xy 300
A few points to note:
Please always provide a fiddle when you ask questions such as this one with tables and data - it provides a single source of truth for the question and eliminates duplication of effort on the part of those trying to help you!
You haven't been very clear about what, exactly, should happen with NULLs - do the values count towards the SUM()? You can vary the CASE statement as required.
What happens when there's a tie in the number of characters in the string? I've included an example in the fiddle, where you get the draws - but you may wish to sort alphabetically (or some other method)?
There appears to be an error in your provided sums for the values (even taking account of counting or not values for NULL for the name field).
Finally, you don't want to GROUP BY the largest string - you want to GROUP BY the grp fields + the SUM() of the values in the the given grp records and then pick out the longest alphanumeric string in that grouping. It would be interesting to know why you want to do this?

SQL Occurrence of Sequence Number

I want to find if any Name has straight 4 or more occurrences of SeqNo in consecutive sequence only.
If there is a break in seqNo but 4 or more rows are consecutive then also i need that Name.
Example:
SeqNo Name
10 | A
15 | A
16 | A
17 | A
18 | A
9 | B
10 | B
13 | B
14 | B
6 | C
7 | C
9 | C
10 | C
OUTPUT:
A
BELOW IS SCRIPT FOR ANYONE HELPING.
create table testseq (Id int, Name char)
INSERT into testseq values
(10, 'A'),
(15, 'A'),
(16, 'A'),
(17, 'A'),
(18, 'A'),
(9, 'B'),
(10, 'B'),
(13, 'B'),
(14, 'B'),
(6, 'C'),
(7, 'C'),
(9, 'C'),
(10, 'C')
SELECT * FROM testseq

You can use some gaps-and-islands techniques for this.
If you want names that have at least 4 consecutive records where seqno is increasing by 1, then you can use the difference between seqno androw_number()` to define the groups, and then aggregate:
select distinct name
from (
select t.*, row_number() over(partition by name order by seqno) rn
from testseq t
) t
group by name, rn - seqno
having count(*) >= 4
Note that for your sample data, this returns no rows. A has 3 consecutive records where seqno is incrementing by 1, B and C have two.

I don't really view this as a "gaps-and-islands" problem. You are just looking for a minimum number of adjacent rows. This is easily handled using lag() or lead():
select t.*
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
This checks the third sequence number on the same name. The third one after means that four names are the same in a row.
If you just want the name and to handle duplicates:
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
If the sequence numbers can have gaps (but are otherwise adjacent):
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3,
lead(seqno, 3) over (order by seqno) as seqno_3
from t
) t
where seqno_name_3 = seqno_3;

A solution in plain SQL, no LAG() or LEAD() or ROW_NUMBER():
SELECT t1.Name
FROM testseq t1
WHERE (
SELECT count(t2.Id)
FROM testseq t2
WHERE t2.Name=t1.Name
and t2.Id between t1.Id and t1.Id+3
GROUP BY t2.Name)>=4
GROUP BY t1.Name;

Rows Columns Traverse

I have data in the below format
id idnew
1 2
3 4
2
4 7
6 8
7
Result Should be something like this
ID should be followed by idnew
1
2
3
4
2
4
7
6
8
7
Thanks in advance

This should maintain the order:
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNumber
FROM myTable
UNION ALL
SELECT idnew, ROW_NUMBER() OVER (ORDER BY idnew) +
(SELECT COUNT(*) FROM dbo.myTable) AS RowNumber
FROM myTable
WHERE idnew IS NOT NULL
) a
ORDER BY RowNumber
I am assuming the id column is NOT NULL-able.
NOTE: If you want to keep the NULL values from the idnew column AND maintain the order, then remove the WHERE clause and ORDER BY id in the second select:
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNumber
FROM myTable
UNION ALL
SELECT idnew, ROW_NUMBER() OVER (ORDER BY id) +
(SELECT COUNT(*) FROM dbo.myTable) AS RowNumber
FROM myTable
) a
ORDER BY RowNumber

This is fully tested, try it here: https://rextester.com/DVZXO21058
Setting up the table as you described:
CREATE TABLE myTable (id INT, idnew INT);
INSERT INTO myTable (id, idnew)
VALUES (1, 2),
(3, 4),
(2, NULL),
(4, 7),
(6, 8),
(7, NULL);
SELECT * FROM myTable;
Here is the query to do the trick:
SELECT mixed_id FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS row_num,
id,
idnew
FROM myTable
) AS x
UNPIVOT
(
mixed_id for item in (id, idnew)
) AS y
WHERE mixed_id IS NOT NULL
ORDER BY row_num, mixed_id;
In order not to further complicate the query, this is taking advantage of 'id' would rank ahead of 'idnew' as a string. I believe string ranking is not the key issue here.

Using Cross Apply
;WITH CTE (id,idnew)
AS
(
SELECT 1,2 UNION ALL
SELECT 3,4 UNION ALL
SELECT 2,NULL UNION ALL
SELECT 4,7 UNION ALL
SELECT 6,8 UNION ALL
SELECT 7,NULL
)
SELECT New
FROM CTE
CROSS APPLY ( VALUES (id),(idnew))AS Dt (New)
WHERE dt.New IS NOT NULL
Result
New
---
1
2
3
4
2
4
7
6
8
7

Fluently adding values in t-sql

I have a table like this:
Items Date Price
1 2016-01-01 10
1 2016-01-02 15
1 2016-01-03 null
1 2016-01-04 null
1 2016-01-05 8
1 2016-01-06 null
1 2016-01-07 null
1 2016-01-08 null
2 2016-01-01 14
2 2016-01-02 7
2 2016-01-03 null
2 2016-01-04 null
2 2016-01-05 16
2 2016-01-06 null
2 2016-01-07 null
2 2016-01-08 5
Now I want to update the null values. The difference between the price before and after null values must be evenly added.
Example:
1 2016-01-02 15 to
1 2016-01-05 8
15 to 8 = -7
-7 / 3 = -2,333333
1 2016-01-02 15
1 2016-01-03 12,6666
1 2016-01-04 10,3333
1 2016-01-05 8
Shouldn't be made with cursors. Helptables would be OK.

This is really where you want the ignore nulls option on lag() and lead(). Alas.
An alternative is to use outer apply:
select t.*,
coalesce(t.price,
tprev.price +
datediff(day, tprev.date, t.date) * (tnext.price - tprev.price) / datediff(day, tprev.date, tnext.date)
) as est_price
from t outer apply
(select top 1 t2.*
from t t2
where t2.item = t.item and
t2.date <= t.date and
t2.price is not null
order by t2.date desc
) tprev outer apply
(select top 1 t2.*
from t t2
where t2.item = t.item and
t2.date >= t.date and
t2.price is not null
order by t2.date asc
) tnext ;
The complex arithmetic is just calculating the difference, dividing by the number of days, and then allocating the days to the current day.

WITH T1 AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Items ORDER BY Date) AS RN,
FORMAT(ROW_NUMBER() OVER (PARTITION BY Items ORDER BY Date),'D10') + FORMAT(Price,'0000000000.000000') AS RnPr
FROM YourTable
), T2 AS
(
SELECT *,
MAX(RnPr) OVER (PARTITION BY Items ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS prev,
MIN(RnPr) OVER (PARTITION BY Items ORDER BY Date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS next
FROM T1
), T3 AS
(
SELECT Items,
Date,
Price,
RnPr,
InterpolatedPrice = IIF(Price IS NOT NULL,prevPrice,prevPrice + (RN - prevRN) * (nextPrice - prevPrice)/NULLIF(nextRN - prevRN,0))
FROM T2
CROSS APPLY (VALUES(CAST(SUBSTRING(prev,11,17) AS decimal(16,6)),
CAST(LEFT(prev, 10) AS INT),
CAST(SUBSTRING(next,11,17) AS decimal(16,6)),
CAST(LEFT(next, 10) AS INT)
)) V(prevPrice,prevRN,nextPrice,nextRN)
)
--UPDATE T3 SET Price = InterpolatedPrice
SELECT *
FROM T3
ORDER BY Items,
Date
Which returns
the row_number and price are bundled together in a single column (RnPr above). The order of RnPr is the same as the order by row_number. MIN and MAX both ignore NULLS. So finding the MAX(RnPr) between UNBOUNDED PRECEDING AND CURRENT ROW will include the value of the previous NOT NULL price if the price in the current row is null. And similarly MIN(RnPr) will find the next with a frame between CURRENT ROW AND UNBOUNDED FOLLOWING.
This can then be cracked apart to get the price and row number as above.
If happy with the results the final SELECT can be removed and the UPDATE uncommented as in this demo.

Just replace "YourTable" (4 of them) with your actual table name.
If you are happy with the results, then comment out the select and un-comment the UPDATE and WHERE.
Select A.Items,A.Date,
--Update YourTable Set
Price = IsNull(A.Price,((DateDiff(DD,B.Date,A.Date)/(DateDiff(DD,B.Date,C.Date)+0.0))*(C.Price - B.Price)) + B.Price)
From YourTable A
Outer Apply (Select Top 1 Date,Price from YourTable Where Items=A.Items and Date<A.Date and Price is not Null and A.Price is null Order by Price Desc) B
Outer Apply (Select Top 1 Date,Price from YourTable Where Items=A.Items and Date>A.Date and Price is not Null and A.Price is null Order by Price) C
--Where Price is NULL
Returns
Now, you'll notice nulls between 01/06 and 01/08 for Items 1. This is because there is no cap to interpolate with.

You can do this by using a series of window functions in common table expressions.
T1 assigns a row number to each row (rn)
T2 finds the first non null Price row number before and after the current row (rnb / rna)
T3 calculates the adjusted Price by looking up the prices before and after
declare #T table (Items int, Date date, Price float)
insert into #T (Items, Date, Price) values
(1, '2016-01-01', 10), (1, '2016-01-02', 15), (1, '2016-01-03', null), (1, '2016-01-04', null), (1, '2016-01-05', 8), (1, '2016-01-06', null), (1, '2016-01-07', null), (1, '2016-01-08', null), (2, '2016-01-01', 14), (2, '2016-01-02', 7), (2, '2016-01-03', null), (2, '2016-01-04', null), (2, '2016-01-05', 16), (2, '2016-01-06', null), (2, '2016-01-07', null), (2, '2016-01-08', 5)
;with T1 as
(
select *,
row_number() over (order by Items, Date) as rn
from #T
),
T2 as
(
select *,
max(case when price is null then null else rn end) over (partition by Items order by Date) as rnb,
min(case when price is null then null else rn end) over (partition by Items order by Date desc) as rna
from T1
)
select Items, Date,
isnull(price,
lag(Price, rn-rnb, Price) over (order by rn) -
(
lag(Price, rn-rnb, Price) over (order by rn) -
lead(Price, rna-rn, Price) over (order by rn)
) / (rna-rnb) * (rn-rnb)
) as Price
from T2
order by Items, Date

select based on specific values

I have this table:
ID NO.
111 6
222 7
333 9
111 8
333 4
222 3
111 7
222 5
333 2
I want to select only 2 ID numbers from table where NO. column equal specific values.
For example i tried this query but i didn't get the expected result:
SELECT top 2 * FROM mytable where NO. in
(select NO. from mytable )
Expected result:
111 6
111 8
222 7
222 3
333 9
333 3

You seem to want to select two rows in the table for each id, based on a condition on the No column. For this, one method uses row_number():
select t.*
from (select t.*, row_number() over (partition by id order by id) as seqnum
from mytable t
where <condition goes here>
) t
where seqnum <= 2;

I'm guessing (333,3) is a mistake and you expect (333,2). If not I have no idea.
SELECT
ua.ID
, ua.[NO.]
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY t.[NO.] ASC) AS RowNum
, t.ID
, t.[NO.]
FROM dbo.t1 AS t
UNION ALL
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY t.[NO.] DESC)
, ID
, t.[NO.]
FROM dbo.t1 AS t
) ua
WHERE ua.RowNum = 1
ORDER BY ID, ua.[NO.] DESC
If you're just trying to get top 2 values for each group, you need something to define the order, ie. a third column. Then you don't need UNION ALL, just use WHERE ua.RowNum < 3.

/*Select 2 random rows per id where the number of rows per id can vary between 1 and infinity
A good article for this:-*/
--https://www.mssqltips.com/sqlservertip/3157/different-ways-to-get-random-data-for-sql-server-data-sampling/
DECLARE #TABLE TABLE(ID INT,NO INT)
INSERT INTO #TABLE
VALUES
(111, 6),
(222, 7),
(333 , 9),
(111 , 8),
(333 , 4),
(222 , 3),
(111 , 7),
(222 , 5),
(333 , 2)
select t.* from
(
Select s.* ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY randomnumber) ROWNUMBER
from
(
SELECT ID,NO,
(ABS(CHECKSUM(NEWID())) % 100001) + ((ABS(CHECKSUM(NEWID())) % 100001) * 0.00001) [randomnumber]
FROM #TABLE
) s
) t
where t.rownumber < 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the latest records per Group By SQL - sql

This is a generic query without using analytical function. SQLFiddle Demo SELECT a.* FROM table1 a INNER JOIN (SELECT max(odate) modate, oname, oItem FROM table1 GROUP BY oName, oItem ) b ON a.oname=b.oname AND a.oitem=b.oitem AND a.odate=b.modate

Add a primary key suppose id field to the table and make it auto increment,. Then order by id you will get it. It is the traditional way. By using your table you can only order by oDate. But is is having same date multiple times, so it also won't solve your problem.

I think you need a query like this: SELECT * FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY oName ORDER BY oDate DESC) seq FROM yourTable) t WHERE (seq <= 2) ORDER BY oDate;

Related

GROUP by Largest String for all the substrings

SQL Occurrence of Sequence Number

Rows Columns Traverse

Fluently adding values in t-sql

select based on specific values

Categories

Resources