SQL Server Sum a specific number of rows based on another column - sql

Here are the important columns in my table
ItemId RowID CalculatedNum
1 1 3
1 2 0
1 3 5
1 4 25
1 5 0
1 6 8
1 7 14
1 8 2
.....
The rowID increments to 141 before the ItemID increments to 2. This cycle repeats for about 122 million rows.
I need to SUM the CalculatedNum field in groups of 6. So sum 1-6, then 7-12, etc. I know I end up with an odd number at the end. I can discard the last three rows (numbers 139, 140 and 141). I need it to start the SUM cycle again when I get to the next ItemID.
I know I need to group by the ItemID but I am having trouble trying to figure out how to get SQL to SUM just 6 CalculatedNum's at a time. Everything else I have come across SUMs based on a column where the values are the same.
I did find something on Microsoft's site that used the ROW_NUMBER function but I couldn't quite make sense of it. Please let me know if this question is not clear.
Thank you

You need to group by (RowId - 1) / 6 and ItemId. Like this:
drop table if exists dbo.Items;
create table dbo.Items (
ItemId int
, RowId int
, CalculatedNum int
);
insert into dbo.Items (ItemId, RowId, CalculatedNum)
values (1, 1, 3), (1, 2, 0), (1, 3, 5), (1, 4, 25)
, (1, 5, 0), (1, 6, 8), (1, 7, 14), (1, 8, 2);
select
tt.ItemId
, sum(tt.CalculatedNum) as CalcSum
from (
select
*
, (t.RowId - 1) / 6 as Grp
from dbo.Items t
) tt
group by tt.ItemId, tt.Grp

You could use integer division and group by.
SELECT ItemId, (RowId-1)/6 as Batch, sum(CalculatedNum)
FROM your_table GROUP BY ItemId, Batch
To discard incomplete batches:
SELECT ItemId, (RowId-1)/6 as Batch, sum(CalculatedNum), count(*) as Cnt
FROM your_table GROUP BY ItemId, Batch HAVING Cnt = 6
EDIT: Fix an off by one error.

To ensure you're querying 6 rows at a time you can try to use the modulo function : https://technet.microsoft.com/fr-fr/library/ms173482(v=sql.110).aspx
Hope this can help.

Thanks everyone. This was really helpful.
Here is what we ended up with.
SELECT ItemID, MIN(RowID) AS StartingRow, SUM(CalculatedNum)
FROM dbo.table
GROUP BY ItemID, (RowID - 1) / 6
ORDER BY ItemID, StartingRow
I am not sure why it did not like the integer division in the select statement but I checked the results against a sample of the data and the math is correct.

Related

Partially sort an SQL table according to column values

Suppose I have a table that looks like this:
product, color
1, 1
2, 1
3, 1
4, 2
5, 2
6, 2
7, 3
8, 3
would it be possible to re-arrange the table such that products are re-arranged by color? For example, in this case the answer would be:
product, color
1, 1
4, 2
7, 3
2, 1
5, 2
8, 3
2, 1
6, 2
3, 1
Sure, you can order by NEWID()
Let's make the test data;
IF OBJECT_ID('tempdb..#TestData') IS NOT NULL DROP TABLE #TestData
GO
CREATE TABLE #TestData (product int, colour int)
INSERT INTO #TestData (product, colour)
VALUES
(1,1)
,(2,1)
,(3,1)
,(4,2)
,(5,2)
,(6,2)
,(7,3)
,(8,3)
Then run the query on this;
SELECT
product
,colour
FROM #TestData
ORDER BY NEWID()
Which gives a random order of the data like this;
product colour
4 2
1 1
5 2
7 3
6 2
3 1
2 1
8 3
Edit: I've just seen that you seem to want to order with some pattern in the colour column, not random. I'm going to leave this answer anyway as a random result.
I would select a random number as third column and sort by that random number. In pseudo code:
SELECT PRODUCT,
COLOR,
RANDOM_NUMBER()
FROM YOUR_TABLE
ORDER BY 3
The generation of a random number depends on your database. In Oracle, it would be dbms_random.random.
You can get rid of the random number by re-selecting from the table as follows:
SELECT PRODUCT,
COLOR
FROM (SELECT PRODUCT,
COLOR,
RANDOM_NUMBER()
FROM YOUR_TABLE
ORDER BY 3)
Sounds like a job for row_number:
SELECT product, colour, ROW_NUMBER() OVER (PARTITION BY color ORDER BY product)
FROM TABLE
ORDER BY 3, 2

Delete rows per user and eventType keeping N rows

I want to delete the oldest entries in a table and keep N rows. This is fairly simple to do.
DELETE TOP(1000) from TABLE ORDER BY [date] DESC
But I want to delete rows based on User and EventType. So if we set N=50, I want to keep the newest 50 records per User and EventType.
My Table looks like this:
API_eventLog
- uid (PK, int, not null)
- eventTypeID (FK, int, not null)
- userGUID (FK, uniqueIdentifier, not null)
- date (datetime, not null)
- ...
There is a similar question already on SO that has the following answer, but unfortunately it is for SQL Server 2005.
;WITH Dealers AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY DealerID ORDER BY SomeTimeStamp DESC) RowID
FROM MyDealersTable
)
DELETE
FROM Dealers
WHERE RowID > 50
Is there a solution to this in SQL server 2000 with good performance? All I can think of myself is a cursor-based solution but that is way to slow run be executed frequently.
Example Data:
[uid] [EventTypeID] [userGUID] [date]
1 1 5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B 2013-11-17
2 2 5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B 2013-11-17
3 3 5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B 2013-11-18
4 4 5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B 2013-11-18
5 1 5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B 2013-11-19
6 1 5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B 2013-11-22
7 1 17941D18-CC79-4C29-BBBA-9CBE60993E43 2013-11-06
8 2 17941D18-CC79-4C29-BBBA-9CBE60993E43 2013-11-17
9 3 17941D18-CC79-4C29-BBBA-9CBE60993E43 2013-12-01
10 2 17941D18-CC79-4C29-BBBA-9CBE60993E43 2013-12-07
11 2 17941D18-CC79-4C29-BBBA-9CBE60993E43 2013-12-18
11 1 17941D18-CC79-4C29-BBBA-9CBE60993E43 2013-12-20
In above example, given N=2, I would like to delete row with [uid] 1 and 8. (ie keeping the 2 newest rows per User and EventTypeID.
The following query should simulate the ROW_NUMBER function in SQL Server 2000.
DELETE API_eventLog
WHERE id IN
(SELECT
id
FROM API_eventLog AS a1
WHERE
(SELECT COUNT(1)
FROM API_eventLog AS a2
WHERE a2.userguid = a1.userguid
AND a2.eventTypeID = a1.eventTypeID
AND (a2.date > a1.date)) > 0);
SELECT * FROM API_eventLog;
Replace the 0 by 50 or whatever number or rows you want to keep.
I can't test it in SQL Server 2000, but here is a working example in SQL Server 2008.
EDIT:
Seems like I made a small mistake. The date check should be greater than instead of less than if you want to keep the newest rows. I updated my example.
sorry #David i had to move then.mine is almost same. i have assume generate row number per userGUID per eventtypeid date ascending.also each userGUID per event may not have 50 or more rows even in real life.In that case what ?
Declare #API_eventLog table(uid int,EventTypeID int,userGUID uniqueIdentifier,date1 date)
insert into #API_eventLog
values(1, 1, '5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B', '2013-11-17'),
(2, 2, '5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B', '2013-11-17'),
(3, 3, '5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B', '2013-11-18'),
(4, 4, '5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B', '2013-11-18'),
(5, 1, '5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B', '2013-11-19'),
(6, 1, '5B1DCB9D-4EC7-4AAE-BEB1-DC1EA90EA06B', '2013-11-22'),
(7, 1, '17941D18-CC79-4C29-BBBA-9CBE60993E43', '2013-11-06'),
(8, 2, '17941D18-CC79-4C29-BBBA-9CBE60993E43', '2013-11-17'),
(9, 3, '17941D18-CC79-4C29-BBBA-9CBE60993E43', '2013-12-01'),
(10, 2, '17941D18-CC79-4C29-BBBA-9CBE60993E43', '2013-12-07'),
(11, 2, '17941D18-CC79-4C29-BBBA-9CBE60993E43', '2013-12-18'),
(11, 1, '17941D18-CC79-4C29-BBBA-9CBE60993E43', '2013-12-20')
Declare #i int=1
--you can test this other sample data
select * from
(select *,
(select count(*) from #API_eventLog b where b.userGUID=a.userGUID and a.EventTypeID=b.EventTypeID and b.date1<=a.date1) rn
from #API_eventLog a)t4
where rn<=#i
-- you can perform this
delete from t4 from
(select *,
(select count(*) from #API_eventLog b where b.userGUID=a.userGUID and a.EventTypeID=b.EventTypeID and b.date1<=a.date1) rn
from #API_eventLog a)t4
where rn<=#i

MSSQL ORDER BY Passed List

I am using Lucene to perform queries on a subset of SQL data which returns me a scored list of RecordIDs, e.g. 11,4,5,25,30 .
I want to use this list to retrieve a set of results from the full SQL Table by RecordIDs.
So SELECT * FROM MyFullRecord
where RecordID in (11,5,3,25,30)
I would like the retrieved list to maintain the scored order.
I can do it by using an Order by like so;
ORDER BY (CASE WHEN RecordID = 11 THEN 0
WHEN RecordID = 5 THEN 1
WHEN RecordID = 3 THEN 2
WHEN RecordID = 25 THEN 3
WHEN RecordID = 30 THEN 4
END)
I am concerned with the loading of the server loading especially if I am passing long lists of RecordIDs. Does anyone have experience of this or how can I determine an optimum list length.
Are there any other ways to achieve this functionality in MSSQL?
Roger
You can record your list into a table or table variable with sorting priorities.
And then join your table with this sorting one.
DECLARE TABLE #tSortOrder (RecordID INT, SortOrder INT)
INSERT INTO #tSortOrder (RecordID, SortOrder)
SELECT 11, 1 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 3, 3 UNION ALL
SELECT 25, 4 UNION ALL
SELECT 30, 5
SELECT *
FROM yourTable T
LEFT JOIN #tSortOrder S ON T.RecordID = S.RecordID
ORDER BY S.SortOrder
Instead of creating a searched order by statement, you could create an in memory table to join. It's easier on the eyes and definitely scales better.
SQL Statement
SELECT mfr.*
FROM MyFullRecord mfr
INNER JOIN (
SELECT *
FROM (VALUES (1, 11),
(2, 5),
(3, 3),
(4, 25),
(5, 30)
) q(ID, RecordID)
) q ON q.RecordID = mfr.RecordID
ORDER BY
q.ID
Look here for a fiddle
Something like:
SELECT * FROM MyFullRecord where RecordID in (11,5,3,25,30)
ORDER BY
CHARINDEX(','+CAST(RecordID AS varchar)+',',
','+'11,5,3,25,30'+',')
SQLFiddle demo

In MYSQL, how can I select multiple rows and have them returned in the order I specified?

I know I can select multiple rows like this:
select * FROM table WHERE id in (1, 2, 3, 10, 100);
And I get the results returned in order: 1, 2, 3, 10, 100
But, what if I need to have the results returned in a specific order? When I try this:
select * FROM table WHERE id in (2, 100, 3, 1, 10);
I still get the results returned in the same order: 1, 2, 3, 10, 100
Is there a way to get the results returned in the exact order that I ask for?
(There are limitations due to the way the site is set up that won't allow me to ORDER BY using another field value)
the way you worded that I'm not sure if using ORDER BY is completely impossible or just ordering by some other field... so at the risk of submitting a useless answer, this is how you'd typically order your results in such a situation.
SELECT *
FROM table
WHERE id in (2, 100, 3, 1, 10)
ORDER BY FIELD (id, 2, 100, 3, 1, 10)
Unless you are able to do ORDER BY, there is no guaranteed way.
The sort you are getting is due to the way MySQL executes the query: it combines all range scans over the ranges defined by the IN list into a single range scan.
Usually, you force the order using one of these ways:
Create a temporary table with the value and the sorter, fill it with your values and order by the sorter:
CREATE TABLE t_values (value INT NOT NULL PRIMARY KEY, sorter INT NOT NULL)
INSERT
INTO t_values
VALUES
(2, 1),
(100, 1),
(3, 1),
(1, 1),
(10, 1);
SELECT m.*
FROM t_values v
JOIN mytable m
ON m.id = v.value
ORDER BY
sorter
Do the same with an in-place rowset:
SELECT m.*
FROM (
SELECT 2 AS value, 1 AS sorter
UNION ALL
SELECT 100 AS value, 2 AS sorter
UNION ALL
SELECT 3 AS value, 3 AS sorter
UNION ALL
SELECT 1 AS value, 4 AS sorter
UNION ALL
SELECT 10 AS value, 5 AS sorter
)
JOIN mytable m
ON m.id = v.value
ORDER BY
sorter
Use CASE clause:
SELECT *
FROM mytable m
WHERE id IN (1, 2, 3, 10, 100)
ORDER BY
CASE id
WHEN 2 THEN 1
WHEN 100 THEN 2
WHEN 3 THEN 3
WHEN 1 THEN 4
WHEN 10 THEN 5
END
You can impose an order, but only based on the value(s) of one or more columns.
To get the rows back in the order you specify in the example you would need to add a second column, called a "sortkey" whose values can be used to sort the rows in the desired sequence,
using the ORDER BY clause. In your example:
Value Sortkey
----- -------
1 4
2 1
3 3
10 5
100 2
select value FROM table where ... order by sortkey;

Continuous sequences in SQL

Having a table with the following fields:
Order,Group,Sequence
it is required that all orders in a given group form a continuous sequence. For example: 1,2,3,4 or 4,5,6,7. How can I check using a single SQL query what orders do not comply with this rule? Thank you.
Example data:
Order Group Sequence
1 1 3
2 1 4
3 1 5
4 1 6
5 2 3
6 2 4
7 2 6
Expected result:
Order
5
6
7
Also accepted if the query returns only the group which has the wrong sequence, 2 for the example data.
Assuming that the sequences are generated and therefore cannot be duplicated:
SELECT group
FROM theTable
GROUP BY group
HAVING MAX(Sequence) - MIN(Sequence) &lt> (COUNT(*) - 1);
How about this?
select Group from Table
group by Group
having count(Sequence) <= max(Sequence)-min(Sequence)
[Edit] This assumes that Sequence does not allow duplicates within a particular group. It might be better to use:count != max - min + 1
[Edit again] D'oh, still not perfect. Another query to flush out duplicates would take care of that though.
[Edit the last] The original query worked fine in sqlite, which is what I had available for a quick test. It is much more forgiving than SQL server. Thanks to Bell for the pointer.
Personaly I think I would consider rethinking the requirement. It is the nature of relational databases that gaps in sequences can easily occur due to records that are rolled back. For instance, supppose an order starts to create four items in it, but one fails for some rason and is rolled back. If you precomputed the sequences manually, you would then have a gap is the one rolled back is not the last one. In other scenarios, you might get a gap due to multiple users looking for sequence values at approximately the same time or if at the last minute a customer deleted one record from the order. What are you honestly looking to gain from having contiguous sequences that you don't get from a parent child relationship?
This SQL selects the orders 3 and 4 wich have none continuous sequences.
DECLARE #Orders TABLE ([Order] INTEGER, [Group] INTEGER, Sequence INTEGER)
INSERT INTO #Orders VALUES (1, 1, 0)
INSERT INTO #Orders VALUES (1, 2, 0)
INSERT INTO #Orders VALUES (1, 3, 0)
INSERT INTO #Orders VALUES (2, 4, 0)
INSERT INTO #Orders VALUES (2, 5, 0)
INSERT INTO #Orders VALUES (2, 6, 0)
INSERT INTO #Orders VALUES (3, 4, 0)
INSERT INTO #Orders VALUES (3, 6, 0)
INSERT INTO #Orders VALUES (4, 1, 0)
INSERT INTO #Orders VALUES (4, 2, 0)
INSERT INTO #Orders VALUES (4, 8, 0)
SELECT o1.[Order]
FROM #Orders o1
LEFT OUTER JOIN #Orders o2 ON o2.[Order] = o1.[Order] AND o2.[Group] = o1.[Group] + 1
WHERE o2.[Order] IS NULL
GROUP BY o1.[Order]
HAVING COUNT(*) > 1
So your table is in the form of
Order Group Sequence
1 1 4
1 1 5
1 1 7
..and you want to find out that 1,1,6 is missing?
With
select
min(Sequence) MinSequence,
max(Seqence) MaxSequence
from
Orders
group by
[Order],
[Group]
you can find out the bounds for a given Order and Group.
Now you can simulate the correct data by using a special numbers table, which just contains every single number you could ever use for a sequence. Here is a good example of such a numbers table. It's not important how you create it, you could also create an excel file with all the numbers from x to y and import that excel sheet.
In my example I assume such a numbers table called "Numbers" with only one column "n":
select
[Order],
[Group],
n Sequence
from
(select min(Sequence) MinSequence, max(Seqence) MaxSequence from [Table] group by [Order], [Group]) MinMaxSequence
left join Numbers on n >= MinSequence and n <= MaxSequence
Put that SQL into a new view. In my example I will call the view "vwCorrectOrders".
This gives you the data where the sequences are continuous. Now you can join that data with the original data to find out which sequences are missing:
select
correctOrders.*
from
vwCorrectOrders co
left join Orders o on
co.[Order] = o.[Order]
and co.[Group] = o.[Group]
and co.Sequence = o.Sequence
where
o.Sequence is null
Should give you
Order Group Sequence
1 1 6
After a while I came up with the following solution. It seems to work but it is highly inefficient. Please add any improvement suggestions.
SELECT OrdMain.Order
FROM ((Orders AS OrdMain
LEFT OUTER JOIN Orders AS OrdPrev ON (OrdPrev.Group = OrdMain.Group) AND (OrdPrev.Sequence = OrdMain.Sequence - 1))
LEFT OUTER JOIN Orders AS OrdNext ON (OrdNext.Group = OrdMain.Group) AND (OrdNext.Sequence = OrdMain.Sequence + 1))
WHERE ((OrdMain.Sequence < (SELECT MAX(Sequence) FROM Orders OrdMax WHERE (OrdMax.Group = OrdMain.Group))) AND (OrdNext.Order IS NULL)) OR
((OrdMain.Sequence > (SELECT MIN(Sequence) FROM Orders OrdMin WHERE (OrdMin.Group = OrdMain.Group))) AND (OrdPrev.Order IS NULL))