SQL server delete all rows that have a duplicate (inclusive) - sql

I have a table named Sales:
+----------+-----------------+------------+
| Salesman | Sales Portfolio | Month |
+----------+-----------------+------------+
| Kavi | 12500 | 2018-01-05 |
| Kavi | 12500 | 2018-02-28 |
| Kavi | 12500 | 2018-03-20 |
| Raj | 21055 | 2018-01-05 |
| Raj | 32015 | 2018-02-28 |
| Raj | 12000 | 2018-03-20 |
+----------+-----------------+------------+
If a Sales Portfolio value is duplicated, remove all rows including itself from the table. In the example above, 12500 is duplicated, so remove all rows where Sales Portfolio = 12500.
Example expected output (only Raj displayed):

If you just want to display your expected output, then try the following:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY Salesman, [Sales Portfolio]) cnt
FROM yourTable
)
SELECT
Salesman, [Sales Portfolio], Month
FROM cte
WHERE cnt = 1;
If you want to delete the non displaying records as well, then we can use the same CTE:
DELETE FROM cte WHERE cnt > 1;

Related

Find the first order of a supplier in a day using SQL

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.
You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)
The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)
select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way
Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

How can I SELECT MAX(VALUE) from duplicate values which occur multiple time within each month?

I have records for each user which occur multiple times each month. I wish to select just the highest value from the repeated values for each month for each user.
Table schema
custacc
ID | ac_no | DODSTART | od_limit
---+--------+------------+----------
1 | 110011 | 2019-02-10 | 200,000
2 | 110011 | 2019-02-12 | 120,000
3 | 110014 | 2019-02-10 | 70,000
4 | 110014 | 2019-02-12 | 10,000
5 | 110009 | 2019-02-10 | 30,000
customer
ID | cust_no | name | cust_type
---+---------+-------+----------
1 | 110011 | Jame | M
2 | 110014 | Fred | N
3 | 110009 | Ahmed | M
How can I achieve this>
What I tried so far:
SELECT
custacc.ac_no,
custacc.od_limit,
custacc.DODSTART,
customer.name,
custacc.gl_no,
custacc.USERNAME,
customer.cust_type
FROM
custacc
LEFT JOIN
customer ON custacc.ac_no = customer.cust_no
INNER JOIN
(SELECT
MAX(DODSTART) LAST_UPDATE_DATE_TIME,
ac_no
FROM
custacc
GROUP BY
ac_no) s2 ON custacc.ac_no = s2.ac_no
AND custacc.DODSTART = s2.LAST_UPDATE_DATE_TIME
WHERE
custacc.od_limit != 0.00
The query doesn't return the expected result.
Try this(add columns that you need):
This is Oracle solution since you didn't mention it in your question:
SELECT ID,MAX(OD_LIMIT) OVER(PARTITION BY ID,EXTRACT(MONTH FROM DODSTART)) FROM CUSTACC;

How do I pick most recent record for each company for each statement date?

I have a database that stores different company's financial information for each financial statement date. Company A has records for 12/31/2016, 12/31/2015, and so on. My database also stores multiple records for each statement date per company if someone edits that record, such as if someone made a typo in the Cash line of the Balance Sheet for 12/31/2016. In this case, Company A would have two records for 12/31/2016, one for the initial entry and the latest one including the edit.
My query currently pulls every record for every 12 month statement date including all records for updates. I tried to insert a Rank so that it pulls only the most recent record for each statement date for each company, but then the query only pulls the most recent record for that company overall, ignoring prior statement dates.
My ideal results would be to have something like this:
COMPANY A | 12/31/16 | other info
COMPANY A | 12/31/15 | other info
COMPANY A | 12/31/14 | other info
COMPANY B | 12/31/16 | other info
currently without Rank, it pulls:
Stmt_id| company | FS_date | fs_id | num_of_months | KSOR | last_update | rank
000001 | Comp A | 2018-03-31 | 1001 | 12 | KSOR | 2018-04-06 14:24:49.227 | 1
000002 | Comp A | 2018-03-31 | 1001 | 12 | KSOR | 2018-04-06 10:49:22.530 | 2
000013 | Comp B| 2018-01-31 | 2002 |12 | KSOR | 2018-03-07 14:32:04.843 | 28
000015| Comp B | 2018-01-31 | 2002 | 12 | KSOR| 2018-03-07 12:48:34.533 | 29
000016| Comp B| 2018-01-31| 2002 | 12 | KSOR | 2018-03-07 12:20:08.180 | 30
Here is my query:
WITH CTE
AS (
SELECT [Stmt_ID]
,[Company]
,[fs_date]
,[fs_ID]
,[NUMBER_OF_MONTHS]
,[KSOR]
,[LAST_update]
,RANK() OVER (
PARTITION BY [fs_date] ORDER BY [stmt_ID] DESC
) AS RANKNUM
FROM [dbo].[SIRV]
)
Select *
FROM CTE
WHERE RANKNUM = 1 AND [NUMBER_OF_MONTHS] = 12
order by [fs_date] desc
You need the right partition by:
WITH CTE AS (
SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY company, fs_date
ORDER BY stmt_ID DESC
) as seqnum
FROM [dbo].[SIRV] s
)
SELECT CTE.*
FROM CTE
WHERE seqnum = 1 AND NUMBER_OF_MONTHS = 12
ORDER BY fs_date DESC;

How to determine an Increase in Employee Salary from consecutive Contract Rows?

I got a problem in my query :
My table store data like this
ContractID | Staff_ID | EffectDate | End Date | Salary | active
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 0
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 1
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 1
I would like to write a query like below:
ContractID | Staff_ID | EffectDate | End Date | Salary | Increase
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 50
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 50
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 200
-------------------------------------------------------------------------
Increase column is calculated by current contract minus previous contract
I use sql server 2008 R2
Unfortunately 2008R2 doesn't have access to LAG, but you can simulate the effect of obtaining the previous row (prev) in the scope of a current row (cur), with a RANKing and a self join to the previous ranked row, in the same partition by Staff_ID):
With CTE AS
(
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],[active],
ROW_NUMBER() OVER (Partition BY Staff_ID ORDER BY ContractID) AS Rnk
FROM Table1
)
SELECT cur.[ContractID], cur.[Staff_ID], cur.[EffectDate], cur.[End Date],
cur.[Salary], cur.Rnk,
CASE WHEN (cur.Rnk = 1) THEN 0 -- i.e. baseline salary
ELSE cur.Salary - prev.Salary END AS Increase
FROM CTE cur
LEFT OUTER JOIN CTE prev
ON cur.[Staff_ID] = prev.Staff_ID and cur.Rnk - 1 = prev.Rnk;
(If ContractId is always perfectly incrementing, we wouldn't need the ROW_NUMBER and could join on incrementing ContractIds, I didn't want to make this assumption).
SqlFiddle here
Edit
If you have Sql 2012 and later, the LEAD and LAG Analytic Functions make this kind of query much simpler:
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],
Salary - LAG(Salary, 1, Salary) OVER (Partition BY Staff_ID ORDER BY ContractID) AS Incr
FROM Table1
Updated SqlFiddle
One trick here is that we are calculating delta increments in salary, so for the first employee contract we need to return the current salary so that Salary - Salary = 0 for the first increase.