SQL to get first date and amount per account - sql

I want to get back the date and amount of the first transaction per account in a transaction table. The table (GiftHeader) looks like this:
EntityID Date Amount
1 1/1/2027 00:00:00:00 1.00
1 2/1/2027 00:00:00:00 2.00
2 2/1/2027 00:00:00:00 4.00
2 3/1/2027 00:00:00:00 2.00
In this case, I would expect the following:
EntityID BatchDate Amount
1 1/1/2027 00:00:00:00 1.00
2 2/1/2027 00:00:00:00 4.00
Here's the SQL I'm using which isn't working.
select DISTINCT entityid, min(BatchDate) as FirstGiftDate
from GiftHeader
group by EntityId,BatchDate
order by EntityId
Any help would be appreciated.
Regards,
Joshua Goodwin

You can use top 1 with ties as below
Select top 1 with ties * from GiftHeader
order by row_number() over (partition by entityid order by [BatchDate])
Other traditional approach is
Select * from (
Select *, RowN = row_number() over (partition by entityid order by BatchDate) from GiftHeader ) a
Where a.RowN = 1
Output:
+----------+-------------------------+--------+
| EntityId | BatchDate | Amount |
+----------+-------------------------+--------+
| 1 | 2027-01-01 00:00:00.000 | 1 |
| 2 | 2027-02-01 00:00:00.000 | 4 |
+----------+-------------------------+--------+

You can use ROW_NUMBER as follow
SELECT EntityID,
Date,
Amount
FROM (SELECT ROW_NUMBER()
OVER (
PARTITION BY EntityID
ORDER BY Date) AS RN,
*
FROM GiftHeader) a
WHERE a.RN = 1

Related

Select row A if a condition satisfies else select row B for each group

We have 2 tables, bookings and docs
bookings
booking_id | name
100 | "Val1"
101 | "Val5"
102 | "Val6"
docs
doc_id | booking_id | doc_type_id
6 | 100 | 1
7 | 100 | 2
8 | 101 | 1
9 | 101 | 2
10 | 101 | 2
We need the result like this:
booking_id | doc_id
100 | 7
101 | 10
Essentially, we are trying to get the latest record of doc per booking, but if doc_type_id 2 is present, select the latest record of doc type 2 else select latest record of doc_type_id 1.
Is this possible to achieve with a performance friendly query as we need to apply this in a very huge query?
You can do it with FIRST_VALUE() window function by sorting properly the rows for each booking_id so that the rows with doc_type_id = 2 are returned first:
SELECT DISTINCT booking_id,
FIRST_VALUE(doc_id) OVER (PARTITION BY booking_id ORDER BY doc_type_id = 2 DESC, doc_id DESC) rn
FROM docs;
If you want full rows returned then you could use ROW_NUMBER() window function:
SELECT booking_id, doc_id, doc_type_id
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY booking_id ORDER BY doc_type_id = 2 DESC, doc_id DESC) rn
FROM docs
) t
WHERE rn = 1;

Selecting specific distinct column in SQL

I am trying to create a select statement so that it does a specific distinct on one column. I am trying to make it so that there is not multiple fruits within each id. If there is multiple fruits under an id, I would like use only 1 approved fruit, over the rotten fruit. If there is only 1 fruit under that id, use it.
SELECT id, fruit, fruitweight, status
FROM myfruits
Raw data from current select
id | fruit | fruitweight | status
1 | apple | .2 | approved
1 | apple | .8 | approved
1 | apple | .1 | rotten
1 | orange | .5 | approved
2 | grape | .1 | rotten
2 | orange | .7 | approved
2 | orange | .5 | approved
How it should be formatted after constraint
id | fruit | fruitweight | status
1 | apple | .2 | approved
1 | orange | .5 | approved
2 | grape | .1 | rotten
2 | orange | .7 | approved
I can do something along the lines of select distinct id,fruit,fruitweight,status from myfruits,
but that will only take out the duplicates if all columns are the same.
CTE with aggregate and row_number.
declare #YourTable table (id int, fruit varchar(64), fruitweight decimal(2,1),status varchar(64))
insert into #YourTable
values
(1,'apple',0.2,'approved'),
(1,'apple',0.8,'approved'),
(1,'apple',0.1,'rotten'),
(1,'orange',0.5,'approved'),
(2,'grape',0.1,'rotten'),
(2,'orange',0.7,'approved'),
(2,'orange',0.5,'approved')
;with cte as(
select
id
,fruit
,fruitweight = min(fruitweight)
,[status]
,RN = row_number() over (partition by id, fruit order by case when status = 'approved' then 1 else 2 end)
from
#YourTable
group by
id,fruit,status)
select
id
,fruit
,fruitweight
,status
from
cte
where RN = 1
Another method, without the aggregate... assuming you want the first fruightweight
;with cte as(
select
id
,fruit
,fruitweight
,[status]
,RN = row_number() over (partition by id, fruit order by case when status = 'approved' then 1 else 2 end, fruitweight)
from
#YourTable)
select
id
,fruit
,fruitweight
,status
from
cte
where RN = 1
Another option is using the WITH TIES clause.
Example
Select top 1 with ties *
From YourTable
Order By Row_Number() over (Partition By id,fruit order by status,fruitweight)
A shorter version of scsimon's solution without aggregates.
If you have SQL Server < 2012, you'll have to use case instead of iif.
select
id
,fruit
,fruitweight
,status
from
(
select
id
,fruit
,fruitweight
,status
,rownum = row_number() over(partition by id, fruit order by iif(status = 'approved', 0, 1), fruitweight desc)
from myfruits
) x
where rownum = 1
EDIT: I started writing before scsimon edited his post to included a version without aggregates...

Select only 1 payment from a table with customers with multiple payments

I have a table called "payments" where I store all the payments of my costumers and I need to do a select to calculate the non-payment rate in a given month.
The costumers can have multiples payments in that month, but I should count him only once: 1 if any of the payments is done and 0 if any of the payment was made.
Example:
+----+------------+--------+
| ID | DATEDUE | AMOUNT |
+----+------------+--------+
| 1 | 2016-11-01 | 0 |
| 1 | 2016-11-15 | 20.00 |
| 2 | 2016-11-10 | 0 |
+----+------------+--------+
The result I expect is from the rate of november:
+----+------------+--------+
| ID | DATEDUE | AMOUNT |
+----+------------+--------+
| 1 | 2016-11-15 | 20.00 |
| 2 | 2016-11-10 | 0 |
+----+------------+--------+
So the rate will be 50%.
But if the select is:
SELECT * FROM payment WHERE DATEDUE BETWEEN '2016-11-01' AND '2016-11-30'
It will return me 3 rows and the rate will be 66%, witch is wrong. Ideas?
PS: This is a simpler example of the real table. The real query have a lot of columns, subselects, etc.
It sounds like you need to partition your results per customer.
SELECT TOP 1 WITH TIES
ID,
DATEDUE,
AMOUNT
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC)
WHERE DATEDUE BETWEEN '2016-11-01' AND '2016-11-30'
PS: The BETWEEN operator is frowned upon by some people. For clarity it might be better to avoid it:
What do BETWEEN and the devil have in common?
Try this
SELECT
id
, SUM(AMOUNT) AS AMOUNT
FROM
Payment
GROUP BY
id;
This might help if you want other columns.
WITH cte (
SELECT
id
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC ) AS RowNum
-- other row
)
SELECT *
FROM
cte
WHERE
RowNum = 1;
To calculate the rate, you can use explicit division:
select 1 - count(distinct case when amount > 0 then id end) / count(*)
from payment
where . . .;
Or, in a way that is perhaps easier to follow:
select avg(flag * 1.0)
from (select id, (case when max(amount) > 0 then 0 else 1 end) as flag
from payment
where . . .
group by id
) i

Remove duplicate rows query result except for one in Microsoft SQL Server?

How would I delete all duplicate month from a Microsoft SQL Server Table?
For example, with the following syntax I just created:
SELECT * FROM Cash WHERE Id = '2' AND TransactionDate between '2014/07/01' AND '2015/02/28'
and the query result is:
+----+-------------------------+
|Id | TransactionDate |
+----+-------------------------+
| 2 | 2014-07-22 00:00:00.000 |
| 2 | 2014-08-09 00:00:00.000 |
| 2 | 2014-08-25 00:00:00.000 |
| 2 | 2014-08-29 00:00:00.000 |
| 2 | 2015-01-27 00:00:00.000 |
| 2 | 2015-01-28 00:00:00.000 |
+----+-------------------------+
How would I remove duplicates month which is only return any 1 value for any 1 month each, like this result:
+----+-------------------------+
|Id | TransactionDate |
+----+-------------------------+
| 2 | 2014-07-22 00:00:00.000 |
| 2 | 2014-08-09 00:00:00.000 |
| 2 | 2015-01-27 00:00:00.000 |
+----+-------------------------+
You can do it with the help of ROW_NUMBER.
This will tell you which are the rows you are going to keep
SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28'
You can delete the other rows with a CTE.
with myCTE (id,transactionDate, firstTrans) AS (
SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28'
)
delete from myCTE where firstTrans <> 1
Will only keep one transaction for each month of each year.
EDIT:
filter by the row_number and will only return the rows you want
select id, transactionDate from (SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28') where firstTrans = 1
When you run this query you will get the highest Id for each month in each year.
SELECT MAX(<IdColumn>) AS Id, YEAR(<DateColumn>) AS YE, MONTH(<DateColumn>) AS MO FROM <YourTable>
GROUP BY YEAR(<DateColumn>), MONTH(<DateColumn>)
If needed, for example, you can late delete rows that their Id is not in this query.
Select only the first row per month
SELECT *
FROM Cash c
WHERE c.Id = '2'
AND c.TransactionDate between '2014/07/01' AND '2015/02/28'
AND NOT EXISTS ( SELECT 'a'
FROM Cash c2
WHERE c2.Id = c.Id
AND YEAR(c2.TransactionDate) * 100 + MONTH(c2.TransactionDate) = YEAR(c.TransactionDate) * 100 + MONTH(c.TransactionDate)
AND c2.TransactionDate < c.TransactionDate
)

Creating row_number that starts over for a subset of data

I'm pretty stumped with my latest issue. Here's how my data looks
ID Item Price Rank
1 A 1.00 1
2 A 2.00 2
3 A 2.00 2
4 A 4.00 3
5 B 1.23 1
6 B 2.24 2
7 B 4.40 3
The problem is when there is a 'tie' (see rows id = 2 and 3) , I want it to be:
ID Item Price Rank
1 A 1.00 1
2 A 2.00 2
3 A 2.00 3
4 A 4.00 4
I know I could do it really easily with a cursor, but I think performance-wise that would be terrible. I tried using window functions like LAG and Row_Number but you're still dealing with row by row decisions. And I couldn't come up with a way to iterate through and then start over when you get to the next item.
Can anyone think of a better way to deal with this instead of a cursor? Sadly, correcting the source data is not really an option.
Window Functions
Here you have to use row_number() instead of dense_rank() or rank()
select ID,Item,Price,row_number() over(order by Item,Price)
from Table1
Your are looking for both row_number() and a partition by clause:
select id, item, price, row_number() over (partition by item order by price, id) as rank
from table t;
May be you can try below query:
SELECT ID, ITEM, PRICE,
ROW_NUMBER() OVER
(
PARTITION BY ITEM
ORDER BY PRICE, ID
) AS RANK
FROM MY_TABLE;
This will get you output:
ID | Item | Price |Rank
1 | A | 1.00 | 1
2 | A | 2.00 | 2
3 | A | 2.00 | 3
4 | A | 4.00 | 4
5 | B | 1.23 | 1
6 | B | 2.24 | 2
7 | B | 4.40 | 3
Try,
In which OrderBy you want
select id, item, price, Row_Number() over (partition by item order by price, id) as Row_Number,
Dense_Rank() over (partition by item order by price, id) as Dense_Rank,
Rank() over (partition by item order by price, id) as Rank
from table YourTableName;