Query to return all results except for the first record - sql

I have a archive table that has records of transactions per locationID.
A location will have 0, 1 or many rows in this table.
I need a SELECT query that will return rows for any location that has more than 1 row, and to skip the first entry.
e.g.
Transactions table
transactionId locationId amount
1 11 2343
2 11 23434
3 25 342
4 32 234
5 77 234
6 11 38938
7 43 234
8 43 1235
So given the above, since the locationID has multiple rows, I will get back all rows except for the first one (lowest transacton_id):
2 11 23434
6 11 38938
8 43 1235

You can use row_number to do this. This assumes there would be no duplicate transactionid's.
select transactionid,locationid,amount
from
(select t.*, row_number() over(partition by locationid order by transactionid) as rn
from transactions t) t
where rn > 1

The other answer is fine. You could also write it this way, it might give you a little insight into grouping practices:
SELECT Transactions.TransactionID, Transactions.locationID, Transactions.amount
FROM Transactions INNER JOIN
(SELECT locationID, MIN(TransactionID) AS MinTransaction,
COUNT(TransactionID) AS CountTransaction
FROM Transactions
GROUP BY locationID) TableSum ON Transactions.locationID = TableSum.locationID
WHERE (Transactions.TransactionID <> TableSum.MinTransaction) AND
(TableSum.CountTransaction > 1)

Related

Sql getting MAX and MIN values based on two columns for the ids from two others

I'm having difficulties figuring a query out, would someone be able to assist me with this?
Problem: 4 columns that represent results for the 2 separate tests. One of them taken in UK and another in US. Both of them are the same test and I need to find the highest and lowest score for the test taken in both countries. I also need to avoid using subqueries and temporary tables. Would appreciate theoretical ideas and actual solutions for the problem.
The table looks like this:
ResultID Test_UK Test_US Test_UK_Score Test_US_Score
1 1 2 48 11
2 4 1 21 24
3 3 1 55 71
4 5 6 18 78
5 7 4 19 49
6 1 3 23 69
7 5 2 98 35
8 6 7 41 47
The desired results I'm looking for:
TestID HighestScore LowestScore
1 71 23
2 35 11
3 69 55
4 49 21
5 98 18
6 78 41
7 47 19
I tried implementing a case of comparison, but I still ended up with subquery to pull out the final results. Also tried union, but it ends up in a sub query again. As far as I can think it shoul be a case when then query, but can't really come up with the logic for it, as it requires to match the ID's of the tests.
Thank you!
What I've tried and got the best results (still wrong)
select v.TestID,
max(case when Test_US_Score > Test_UK_Score then Test_UK_Score else null end) MaxS,
min(case when Test_UK_Score > Test_US_Score then Test_US_Score else null end) MinS
FROM ResultsDB rDB CROSS APPLY
(VALUES (Test_UK, 1), (Test_US, 0)
) V(testID, amount)
GROUP BY v.TestID
Extra
The answer provided by M. Kanarkowski is a perfect solution. I'm no expert on CTE, and a bit confused, how would it be possible to adapt this query to return the result ID of the row that min and max were found.
something like this:
TestID Result_ID_Max Result_ID_Min
1 3 6
2 7 1
3 6 3
Extra 2
The desired results of the query would me something like this.
The two last columns represent the IDs of the rows from the original table where the max and min values were found.
TestID HighestScore LowestScore Result_ID_Of_Max Result_ID_Of_Min
1 71 23 3 6
2 35 11 7 1
3 69 55 6 3
For example you can use union to have results from both countries togehter and then just pick the maximum and the minimum for your data.
with cte as (
select Test_UK as TestID, Test_UK_Score as score from yourTable
union all
select Test_US as TestID, Test_US_Score as score from yourTable
)
select
TestID
,max(score) as HighestScore
,min(score) as LowestScore
from cte
group by TestID
order by TestID
Extra:
I assumed that you want to have the additional column with the previous result. If not just take the above select and replace Test_UK_Score and Test_US_Score with ResultID.
with cte as (
select Test_UK as TestID, Test_UK_Score as score, ResultID from yourTable
union all
select Test_US as TestID, Test_US_Score as score, ResultID from yourTable
)
select
TestID
,max(score) as HighestScore
,min(score) as LowestScore
,max(ResultID) as Result_ID_Max
,min(ResultID) as Result_ID_Min
from cte
group by TestID
order by TestID

Filter and keep most recent duplicate

Please help me with this one, I'm stuck and cant figure out how to write my Query. I'm working with SQL Server 2014.
Table A (approx 65k ROWS) CEID = primary key
CEID State Checksum
1 2 666
2 2 666
3 2 666
4 2 333
5 2 333
6 9 333
7 9 111
8 9 111
9 9 741
10 2 656
Desired output
CEID State Checksum
3 2 666
6 9 333
8 9 111
9 9 741
10 2 656
I want to keep the row with highest CEID if "state" is equal for all duplicate checksums. If state differs but Checksum is equal i want to keep the row with highest CEID for State=9. Unique rows like CEID 9 and 10 should be included in result regardless of State.
This join returns all duplicates:
SELECT a1.*, a2.*
FROM tableA a1
INNER JOIN tableA a2 ON a1.ChecksumI = a2.ChecksumI
AND a1.CEID <> a2.CEID
I've also identified MAX(CEID) for each duplicate checksum with this query
SELECT a.Checksum, a.State, MAX(a.CEID) CEID_MAX ,COUNT(*) cnt
FROM tableA a
GROUP BY a.Checksum, a.State
HAVING COUNT(*) > 1
ORDER BY a.Checksum, a.State
With the first query, I can't figure out how to SELECT the row with the highest CEID per Checksum.
The problem I encounter with last one is that GROUP BY isn't allowed in subqueries when I try to join on it.
You can use row_number() with partition by checksum and order by State desc and CEID desc. Note both of your condition may be fulfill by ORDER BY State desc, CEID desc
And take the first row_number
;with
cte as
(
select *, rn = row_number() over (Partition by Checksum order by State desc, CEID desc)
from TableA
)
select *
from cte
where rn = 1
order by CEID;

How do I make a query that selects where the SUM equals a fixed value

I've spent that last couple of days searching for a way to make a SQL query that searches the database and returns records where the SUM of the same ID's equal or grater then the value provided.
For this I've been using the W3schools database to test it out in the products table.
More so what I've been trying to do:
SELECT * FROM products
WHERE supplierid=? and SUM(price) > 50
in the "where supplier id" would loop through same suppliers and sum of their price higher than 50 in this case return the records.
In this case it would read supplier ID 1 then add the price of all that supplier 18+19+10=47 now 47 < 50 so it will not print those records at the end. Next supplier ID 2 22+21.35=43.35 and again would not print those records until the sum of price is higher than 50 it will print
I'm working with a DB2 database.
SAMPLE data:
ProductID ProductName SupplierID CategoryID Price
1 Chais 1 1 18
2 Chang 1 1 19
3 Aniseed 1 2 10
4 Chef Anton 2 2 22
5 Chef Anton 2 2 21.35
6 Grandma's 3 2 25
7 Uncle Bob 3 7 30
8 Northwoods 3 2 40
9 Mishi 4 6 97
10 Ikura 4 8 31
11 Queso 5 4 21
12 Queso 5 4 38
13 Konbu 6 8 6
14 Tofu 6 7 23.25
How about:
select * from products where supplierid in (
select supplierid
from products
group by supplierid
having sum(price) > 50
);
The subquery finds out all the supplierid values that match your condition. The main (external) query retrieves all rows that match the list of supplierids.
not tested, but I would expect db2 to have analytic functions and CTEs, so perhaps:
with
basedata as (
select t.*
, sum(t.price) over(partition by t.supplierid) sum_price
from products t
)
select *
from basedata
where supplierid = ?
and sum_price > 50
The analytic function aggregates the price information but does not group the resultset, so you get the rows from your initial result, but restricted to those with an aggregated price value > 50.
The difference to a solution with a subquery is, that the use of the analytic function should be more efficient since it has to read the table only once to produce the result.

complex paratition sum in postgresql

I have tables as follow:
A deliveries
delveryid clientid deliverydate
1 10 2015-01-01
2 10 2015-02-02
3 11 2015-04-08
B items in deliveris
itemid deliveryid qty status
70 1 5 1
70 1 8 2
70 2 10 1
72 1 12 1
70 3 100 1
I need to add a column to my query that gives me the qty of each part in other deliveris of the same client.
meaning that for given data of client 10 and delivery id 1 I need to show:
itemid qty status qtyOther
70 5 1 10 //itemid 70 exists in delivery 2
70 8 2 10 //itemid 70 exists in delivery 2
72 12 1 0 //itemid 72 doesn't exists in other delivery of client 11
Since I need to add qtyOther to my existing qry i'm trying to avoid using Group By as it's a huge query and if I use SUM in select I will have to group by all items in select.
This is what I have so far:
Select ....., coalesce( SUM(a.qty) OVER (PARTITION BY a.itemid) ,0) AS qtyOther
FROM B b
LEFT JOIN A a USING
LEFT JOIN (other tables)
WHERE clientid=10 ....
This query gives me the total sum of qty per itemid for specific clientid, regardless of which delivery it is. How do I change it so it will consider the delivryid? I need something like:
coalesce( SUM(a.qty) OVER (PARTITION BY a.itemid) FROM B where deliveryid<>b.deliveryid ,0) AS qtyOther
Any suggestions how to do that?
Note: I can NOT change the condition in WHERE.
I think you just want to subtract out the total for the current delivery:
Select .....,
(coalesce( SUM(a.qty) OVER (PARTITION BY a.itemid), 0) -
coalesce( SUM(a.qty) OVER (PARTITION BY a.itemid, a.deliveryid), 0)
) as qtyOther

How to track how many times a column changed its value?

I have a table called crewWork as follows :
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int )
After the table was populated, I need to know how many times a change in apt occurred and how many times a change in floor occurred. Usually I expect to find 10 rows on each apt and 40-50 on each floor.
I could just write a scalar function for that, but I was wondering if there's any way to do that in t-SQL without having to write scalar functions.
Thanks
The data will look like this:
FloorNumber AptNumber WorkType simTime
1 1 12 10
1 1 12 25
1 1 13 35
1 1 13 47
1 2 12 52
1 2 12 59
1 2 13 68
1 1 14 75
1 4 12 79
1 4 12 89
1 4 13 92
1 4 14 105
1 3 12 115
1 3 13 129
1 3 14 138
2 1 12 142
2 1 12 150
2 1 14 168
2 1 14 171
2 3 12 180
2 3 13 190
2 3 13 200
2 3 14 205
3 3 14 216
3 4 12 228
3 4 12 231
3 4 14 249
3 4 13 260
3 1 12 280
3 1 13 295
2 1 14 315
2 2 12 328
2 2 14 346
I need the information for a report, I don't need to store it anywhere.
If you use the accepted answer as written now (1/6/2023), you get correct results with the OP dataset, but I think you can get wrong results with other data.
CONFIRMED: ACCEPTED ANSWER HAS A MISTAKE (as of 1/6/2023)
I explain the potential for wrong results in my comments on the accepted answer.
In this db<>fiddle, I demonstrate the wrong results. I use a slightly modified form of accepted answer (my syntax works in SQL Server and PostgreSQL). I use a slightly modified form of the OP's data (I change two rows). I demonstrate how the accepted answer can be changed slightly, to produce correct results.
The accepted answer is clever but needs a small change to produce correct results (as demonstrated in the above db<>fiddle and described here:
Instead of doing this as seen in the accepted answer COUNT(DISTINCT AptGroup)...
You should do thisCOUNT(DISTINCT CONCAT(AptGroup, '_', AptNumber))...
DDL:
SELECT * INTO crewWork FROM (VALUES
-- data from question, with a couple changes to demonstrate problems with the accepted answer
-- https://stackoverflow.com/q/8666295/1175496
--FloorNumber AptNumber WorkType simTime
(1, 1, 12, 10 ),
-- (1, 1, 12, 25 ), -- original
(2, 1, 12, 25 ), -- new, changing FloorNumber 1->2->1
(1, 1, 13, 35 ),
(1, 1, 13, 47 ),
(1, 2, 12, 52 ),
(1, 2, 12, 59 ),
(1, 2, 13, 68 ),
(1, 1, 14, 75 ),
(1, 4, 12, 79 ),
-- (1, 4, 12, 89 ), -- original
(1, 1, 12, 89 ), -- new , changing AptNumber 4->1->4 ges)
(1, 4, 13, 92 ),
(1, 4, 14, 105 ),
(1, 3, 12, 115 ),
...
DML:
;
WITH groupedWithConcats as (SELECT
*,
CONCAT(AptGroup,'_', AptNumber) as AptCombo,
CONCAT(FloorGroup,'_',FloorNumber) as FloorCombo
-- SQL SERVER doesnt have TEMPORARY keyword; Postgres doesn't understand # for temp tables
-- INTO TEMPORARY groupedWithConcats
FROM
(
SELECT
-- the columns shown in Andriy's answer:
-- https://stackoverflow.com/a/8667477/1175496
ROW_NUMBER() OVER ( ORDER BY simTime) as RN,
-- AptNumber
AptNumber,
ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as RN_Apt,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as AptGroup,
-- FloorNumber
FloorNumber,
ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as RN_Floor,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as FloorGroup
FROM crewWork
) grouped
)
-- if you want to see how the groupings work:
-- SELECT * FROM groupedWithConcats
-- otherwise just run this query to see the counts of "changes":
SELECT
COUNT(DISTINCT AptCombo)-1 as CountAptChangesWithConcat_Correct,
COUNT(DISTINCT AptGroup)-1 as CountAptChangesWithoutConcat_Wrong,
COUNT(DISTINCT FloorCombo)-1 as CountFloorChangesWithConcat_Correct,
COUNT(DISTINCT FloorGroup)-1 as CountFloorChangesWithoutConcat_Wrong
FROM groupedWithConcats;
ALTERNATIVE ANSWER
The accepted-answer may eventually get updated to remove the mistake. If that happens I can remove my warning but I still want leave you with this alternative way to produce the answer.
My approach goes like this: "check the previous row, if the value is different in previous row vs current row, then there is a change". SQL doesn't have idea or row order functions per se (at least not like in Excel for example; )
Instead, SQL has window functions. With SQL's window functions, you can use the window function RANK plus a self-JOIN technique as seen here to combine current row values and previous row values so you can compare them. Here is a db<>fiddle showing my approach, which I pasted below.
The intermediate table, showing the columns which has a value 1 if there is a change, 0 otherwise (i.e. FloorChange, AptChange), is shown at the bottom of the post...
DDL:
...same as above...
DML:
;
WITH rowNumbered AS (
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY simTime) as RN
FROM crewWork
)
,joinedOnItself AS (
SELECT
rowNumbered.*,
rowNumberedRowShift.FloorNumber as FloorShift,
rowNumberedRowShift.AptNumber as AptShift,
CASE WHEN rowNumbered.FloorNumber <> rowNumberedRowShift.FloorNumber THEN 1 ELSE 0 END as FloorChange,
CASE WHEN rowNumbered.AptNumber <> rowNumberedRowShift.AptNumber THEN 1 ELSE 0 END as AptChange
FROM rowNumbered
LEFT OUTER JOIN rowNumbered as rowNumberedRowShift
ON rowNumbered.RN = (rowNumberedRowShift.RN+1)
)
-- if you want to see:
-- SELECT * FROM joinedOnItself;
SELECT
SUM(FloorChange) as FloorChanges,
SUM(AptChange) as AptChanges
FROM joinedOnItself;
Below see the first few rows of the intermediate table (joinedOnItself). This shows how my approach works. Note the last two columns, which have a value of 1 when there is a change in FloorNumber compared to FloorShift (noted in FloorChange), or a change in AptNumber compared to AptShift (noted in AptChange).
floornumber
aptnumber
worktype
simtime
rn
floorshift
aptshift
floorchange
aptchange
1
1
12
10
1
0
0
2
1
12
25
2
1
1
1
0
1
1
13
35
3
2
1
1
0
1
1
13
47
4
1
1
0
0
1
2
12
52
5
1
1
0
1
1
2
12
59
6
1
2
0
0
1
2
13
68
7
1
2
0
0
Note instead of using the window function RANK and JOIN, you could use the window function LAG to compare values in the current row to the previous row directly (no need to JOIN). I don't have that solution here, but it is described in the Wikipedia article example:
Window functions allow access to data in the records right before and after the current record.
If I am not missing anything, you could use the following method to find the number of changes:
determine groups of sequential rows with identical values;
count those groups;
subtract 1.
Apply the method individually for AptNumber and for FloorNumber.
The groups could be determined like in this answer, only there's isn't a Seq column in your case. Instead, another ROW_NUMBER() expression could be used. Here's an approximate solution:
;
WITH marked AS (
SELECT
FloorGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime),
AptGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime)
FROM crewWork
)
SELECT
FloorChanges = COUNT(DISTINCT FloorGroup) - 1,
AptChanges = COUNT(DISTINCT AptGroup) - 1
FROM marked
(I'm assuming here that the simTime column defines the timeline of changes.)
UPDATE
Below is a table that shows how the distinct groups are obtained for AptNumber.
AptNumber RN RN_Apt AptGroup (= RN - RN_Apt)
--------- -- ------ ---------
1 1 1 0
1 2 2 0
1 3 3 0
1 4 4 0
2 5 1 4
2 6 2 4
2 7 3 4
1 8 5 => 3
4 9 1 8
4 10 2 8
4 11 3 8
4 12 4 8
3 13 1 12
3 14 2 12
3 15 3 12
1 16 6 10
… … … …
Here RN is a pseudo-column that stands for ROW_NUMBER() OVER (ORDER BY simTime). You can see that this is just a sequence of rankings starting from 1.
Another pseudo-column, RN_Apt contains values produces by the other ROW_NUMBER, namely ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime). It contains rankings within individual groups of identical AptNumber values. You can see that, for a newly encountered value, the sequence starts over, and for a recurring one, it continues where it stopped last time.
You can also see from the table that if we subtract RN from RN_Apt (could be the other way round, doesn't matter in this situation), we get the value that uniquely identifies every distinct group of same AptNumber values. You might as well call that value a group ID.
So, now that we've got these IDs, it only remains for us to count them (count distinct values, of course). That will be the number of groups, and the number of changes is one less (assuming the first group is not counted as a change).
add an extra column changecount
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int ,changecount int)
increment changecount value for each updation
if want to know count for each field then add columns corresponding to it for changecount
Assuming that each record represents a different change, you can find changes per floor by:
select FloorNumber, count(*)
from crewWork
group by FloorNumber
And changes per apartment (assuming AptNumber uniquely identifies apartment) by:
select AptNumber, count(*)
from crewWork
group by AptNumber
Or (assuming AptNumber and FloorNumber together uniquely identifies apartment) by:
select FloorNumber, AptNumber, count(*)
from crewWork
group by FloorNumber, AptNumber