SQL: How to use subqueries to remove duplicates based on multiple conditions

SQL: How to use subqueries to remove duplicates based on multiple conditions - sql

First I want to remove duplicates by selecting the last date and then remove the remaining duplicates by selecting the role 'BLUE'
Example Table:
ID
STATUS
ROLE
DATE
1
ACTIVE
BLUE
Oct 20 2022
1
ACTIVE
RED
Dec 20 2022
2
ACTIVE
BLUE
Feb 02 2022
2
ACTIVE
RED
Feb 02 2022
3
INACTIVE
BLUE
Dec 03 2022
4
ACTIVE
RED
Dec 04 2022
Expected result:
ID
STATUS
ROLE
DATE
1
ACTIVE
RED
Dec 20 2022
2
ACTIVE
BLUE
Feb 02 2022
3
INACTIVE
BLUE
Dec 03 2022
4
ACTIVE
RED
Dec 04 2022
This is what I have so far:
SELECT a.ID,
a.STATUS,
a.ROLE,
a.DATE
FROM
(
SELECT ID, Max(DATE) as MaxDate
FROM WorkersTest
GROUP BY ID
) b
INNER JOIN WorkersTest as a
ON a.ID = b.ID
AND a.DATE = b.MaxDate
ORDER BY b."ID"
Then as you can see I still need to add the second filter/subquery...

First we can use following subquery to get each id with its latest date:
SELECT id, MAX(date) AS maxDate
FROM yourtable
GROUP BY id;
This can basically be used in the whole query like this:
SELECT y.id, y.status, y.role,
FORMAT(y.date, 'MMM dd yyyy') AS date
FROM yourtable y
JOIN
(SELECT id, MAX(date) AS maxDate
FROM yourtable
GROUP BY id) grouped
ON y.id = grouped.id
AND y.date = grouped.maxDate
ORDER BY y.id;
But this will list both "blue" and "red" role in case they both have the same latest date.
Therefore, the result would be incorrect:
ID
STATUS
ROLE
DATE
1
ACTIVE
RED
Dec 20 2022
2
ACTIVE
BLUE
Feb 02 2022
2
ACTIVE
RED
Feb 02 2022
3
INACTIVE
BLUE
Dec 03 2022
4
ACTIVE
RED
Dec 04 2022
So, to also satisfy the condition to only show the "blue" row in this case, there are different options. One of them would be to use a further subquery with a window function as for example ROW_NUMBER.
This could become necessary if there are further roles.
In our specific case with two roles only, we don't need this, but can use MIN instead because "blue" appears before "red" (if we would like to get "red" rows instead, we would use MAX).
So the query is now this one:
SELECT y.id, y.status,
MIN(y.role) AS role,
FORMAT(y.date, 'MMM dd yyyy') AS date
FROM yourtable y
JOIN
(SELECT id, MAX(date) AS maxDate
FROM yourtable
GROUP BY id) grouped
ON y.id = grouped.id
AND y.date = grouped.maxDate
GROUP BY y.id, y.status, y.date
ORDER BY y.id;
This will produce the correct result:
ID
STATUS
ROLE
DATE
1
ACTIVE
RED
Dec 20 2022
2
ACTIVE
BLUE
Feb 02 2022
3
INACTIVE
BLUE
Dec 03 2022
4
ACTIVE
RED
Dec 04 2022
We can replicate this here: db<>fiddle
A general hint: If possible, we should avoid to use SQL key words as column name or table name (here "role" and "date").
Especially the name "date" is also not meaningful because it misses to tell us which kind of date. We should therefore prefer clear names like for example "sellDate" or "quittingDate".

Related

SUM and Count in one SQL Query

I have this kind of data
time Members
-------------------------------------------------- -----------
Jun 23 2016 1
Jun 23 2016 1
Jun 23 2016 2
Jun 29 2016 6
Jul 11 2016 3
Jul 11 2016 1
Jul 13 2016 1
I obtained this data using this sql query
SELECT CONVERT (VARCHAR(12), a.registered_time), COUNT(b.member_id) AS Members
FROM b
Inner JOIN a ON b.mirror_id = a.mirror_id
GROUP BY
(a.registered_time) order by a.registered_time
I want to get the sum of total numbers if they are of the same date for exampple the date of June 23 2016 will have total members of 4 and so on. Is it possible to have SUM() FUnction on Count()? How can I do this?

Convert the value to a date and include that in both the select and group by:
SELECT CONVERT(date, a.registered_time) as dte, COUNT(b.member_id) AS Members
FROM b JOIN
a
ON b.mirror_id = a.mirror_id
GROUP BY CONVERT(date, a.registered_time)
ORDER BY CONVERT(date, a.registered_time);

oracle pivot query suggestion

I have a simple table that has data like the following
FiscalYear Month Value
2013 01 10
2013 02 15
....
2014 01 15
2014 02 20
using Oracle(11g) Pivot query is it possible to get something like this?
Month 2013 2014
01 10 15
02 15 20

SELECT month, value_2013, value_2014
FROM (SELECT fiscalyear, month, value FROM your_table)
PIVOT (SUM (value) AS value
FOR (fiscal_year)
IN ('2013', '2014'))

Select Every Date for Date Range and Insert

Using SQL Server 2008
I have a table A which has start date, end date and value. For each date within the start date and end date in Table A, I need to insert (or update if already exists) that date in table B such that the value in this table is value in A/DateDiff(Day,StartDate of A,EndDate of A).
Example:
Table A
ID StartDate EndDate Value
1 01 Jan 2014 03 Jan 2014 33
2 01 Feb 2014 02 Feb 2014 20
3 02 Jan 2014 03 Jan 2014 10
Table B
ID Date Value
1 01 Jan 2014 11
2 02 Jan 2014 16
3 03 Jan 2014 16
4 01 Feb 2014 10
5 02 Feb 2014 10
The way values are computed are - For ID 1, there are 3 days which means 11 units per day. So 1st, 2nd, 3rd Jan all get 11 units. Then because there are additional units with date range 2nd Jan to 3rd Jan which amount to 5 units per day, 2nd and 3rd Jan will be (11+5) 16. 1st and 2nd Feb just have one record so they will simply be 20/2 = 10.
I can think of a solution using loops, but want to avoid it entirely.
Is there any way I can achieve this through a set based solution? It is important for me to do this in bulk using set based approach.
I am trying to read through various articles and seems like CTE, Calendar Table or Tally Table might help but the examples I have seen require setting variables and passing start date and end date which I think will work for single record but not when doing all records at a time. Please suggest.
Thanks!

I think this should do it (DEMO):
;with cte as (
select
id
,startdate
,enddate
,value / (1+datediff(day, startdate, enddate)) as value
,startdate as date
from units
union all
select id, startdate, enddate, value, date+1 as date
from cte
where date < enddate
)
select
row_number() over (order by date) as ID
,date
,sum(value) as value
from cte
group by date
The idea is to use a Recursive CTE to explode the date ranges into one record per day. Also, the logic of value / (1+datediff(day, startdate, enddate)) distributes the total value evenly over the number of days in each range. Finally, we group by day and sum together all the values corresponding to that day to get the output:
| ID | DATE | VALUE |
|----|---------------------------------|-------|
| 1 | January, 01 2014 00:00:00+0000 | 11 |
| 2 | January, 02 2014 00:00:00+0000 | 16 |
| 3 | January, 03 2014 00:00:00+0000 | 16 |
| 4 | February, 01 2014 00:00:00+0000 | 10 |
| 5 | February, 02 2014 00:00:00+0000 | 10 |
From here you can join with your result table (Table B) by date, and update/insert the value as needed. That logic might look something like this (test it first of course before running in production!):
update B set B.VALUE = R.VALUE from TableB B join Result R on B.DATE = R.DATE
insert TableB (DATE, VALUE)
select DATE, VALUE from Result R where R.DATE not in (select DATE from TableB)

Oracle sql split amounts by weeks

So I have a table like:
UNIQUE_ID MONTH
abc 01
93j 01
acc 01
7as 01
oks 02
ais 02
asi 03
asd 04
etc
I query:
select count(unique_id) as amount, month
from table
group by month
now everything looks great:
AMOUNT MONTH
4 01
2 02
1 03
etc
is there a way to get oracle to split the amounts by weeks?
the way that the result look something like:
AMOUNT WEEK
1 01
1 02
1 03
1 04
etc

Assuming you know the year - lets say we go with 2014 then you need to generate all the weeks a year
select rownum as week_no
from all_objects
where rownum<53) weeks
then state which months contain the weeks (for 2014)
select week_no, to_char(to_date('01-JAN-2014','DD-MON-YYYY')+7*(week_no-1),'MM') month_no
from
(select rownum as week_no
from all_objects
where rownum<53) weeks
Then join in your data
select week_no,month_no, test.unique_id from (
select week_no, to_char(to_date('01-JAN-2014','DD-MON-YYYY')+7*(week_no-1),'MM') month_no
from
(select rownum as week_no
from all_objects
where rownum<53) weeks) wm
join test on wm.month_no = test.tmonth
This gives your data for the each week as you described above. You can redo your query and count by week instead of month.

Three tables SQL query

I have a table (Vehicles) which contains a list of vehicles.
VehicleID
PlateNo
CurrentDriver
I also have a table (History) which contains a the driver history for the vehicles:
HistoryID
VehicleID
ReceivedDate (vehicle receiving date)
DriverName
I have another table (Repairs) which contains the repairs for all the vehicles:
RepairID
VehicleID
RepairDate
RepairCost
Using SQL Server and based on the History table, I want to get all the RepairCost values between two dates for a given DriverName.
For example, I want to get all the RepairCost values for driver 'John Doe', between 01.01.2013 and 01.05.2013, who was allocated to three different vehicles in that period.
My query so far is:
SELECT H.DriverName, R.RepairCost, R.RepairDate
FROM Repairs AS R
INNER JOIN Vehicles AS V ON R.VehicleID = V.VehicleID
INNER JOIN History H ON H.VehicleID = V.VehicleID
WHERE H.DriverName = 'John'
AND R.RepairDate BETWEEN '01.01.2013' AND '04.01.2013'
There's also some sample data in a SQL Fiddle.
The problem seems to be that I'm getting all the results twice.
LATER EDIT:
My progress so far:
DECLARE #Driver varchar(50),#StartDt datetime, #EndDt datetime
SELECT #Driver = 'John Doe',#StartDt = '20130101' ,#EndDt = '20130501'
;With VehicleAllocation
AS
(
SELECT h.*,h1.ChangeDate
FROM History h
OUTER APPLY (SELECT MIN(ReceivedDate) AS ChangeDate
FROM History
WHERE VehicleID = h.VehicleID
AND DriverName <> h.DriverName
AND ReceivedDate > h.ReceivedDate
)h1
WHERE h.DriverName = #Driver
)
SELECT *
FROM VehicleAllocation h
INNER JOIN Repairs r
ON r.VehicleID = h.VehicleID
WHERE DriverName = #Driver
AND RepairDate > = #StartDt
AND RepairDate < #EndDt + 1
AND RepairDate BETWEEN h.ReceivedDate AND COALESCE(h.ChangeDate,RepairDate)
I discoverd a problem with the line 'AND DriverName <> h.DriverName'. Why is that line useful? If I had the same driver name, one after the other, in the History table, it skipped to the last car delivery date for that driver name.
Sample data:
'History' table
ReceivedDate DriverName
04.11.2013 Mike
13.11.2013 Dan
15.11.2013 Dan
17.11.2013 Ryan
20.11.2013 Dan
22.11.2013 Ryan
25.11.2013 Mike
26.11.2013 Dan
29.11.2013 Ryan
04.12.2013 Dan
'Repairs' table
RepairDate RepairCost
05.11.2013 2615.30
14.11.2013 135.66
16.11.2013 4913.04
18.11.2013 538.92
21.11.2013 152.48
23.11.2013 5946.89
26.11.2013 3697.64
27.11.2013 734.01
30.11.2013 279.62
Query result
RepairDate RepairCost
07.11.2013 380.00
14.11.2013 135.66
16.11.2013 4913.04
16.11.2013 4913.04
21.11.2013 152.48
27.11.2013 734.01
As you can see in the query result, line 3 and 4 have the same value/date.
The query interval was 01-01-2013 <-> 31-12-2013.
Also, what if I want to get the SUM of different colums from different tables?
For example, SUM(Total) column from 'Repairs' table, SUM(Value) column from 'Tires' table...
How can I adapt the script?
Thanks!

I have no idea why you include you Vehicle table in your query, as you don't want any information from there.
You are getting "double" results because you match every Repair (e.g. the one on jan 15th) with every record with the same Vehicle id is History (there are three of those!). Two of those matches are for drive John, so you get two results.
What you want is to match only on the driver that, according to your history table, was the drive at the time of the repair!
So, I first matched each repairdate with the actually matching Receiveddate in the history table:
SELECT R1.Repairdate, Max(H1.ReceivedDate) as ReceivedDate
FROM Repairs R1
JOIN History H1
ON R1.VehicleID=H1.VehicleID
AND H1.ReceivedDate < R1.RepairDate
GROUP BY R1.RepairDate
I then used that query in a join to receive the wanted data:
SELECT R.RepairDate, H.DriverName, R.RepairCost
FROM Repairs AS R
JOIN History H ON R.VehicleID=H.VehicleID
JOIN (
SELECT R1.Repairdate, Max(H1.ReceivedDate) as ReceivedDate
FROM Repairs R1
JOIN History H1
ON R1.VehicleID=H1.VehicleID
AND H1.ReceivedDate < R1.RepairDate
GROUP BY R1.RepairDate)
AS H2
ON R.Repairdate = H2.Repairdate
AND H.ReceivedDate = H2.Receiveddate
WHERE R.RepairDate BETWEEN '01.01.2013' AND '04.01.2013'
AND H.DriverName = 'John'
This returns me 6 records : http://sqlfiddle.com/#!3/fcebf/62
As a sanity check, leave out the complete WHERE on date and name and include the vehicle number in teh select.
You will get 14 repairs listed with the name of the driver who was drivign the vehicle at that time. You can easily confirm that the driver linked to the vehicle at that time is correct according to your History data:
DRIVERNAME VEHICLEID REPAIRDATE REPAIRCOST
John 1 January, 15 2013 00:00:00+0000 10
Ryan 2 January, 18 2013 00:00:00+0000 15
Ryan 2 January, 22 2013 00:00:00+0000 15
John 1 February, 03 2013 00:00:00+0000 5
Ryan 2 February, 05 2013 00:00:00+0000 25
John 1 February, 10 2013 00:00:00+0000 10
John 2 February, 26 2013 00:00:00+0000 10
Ryan 1 March, 01 2013 00:00:00+0000 100
John 2 March, 03 2013 00:00:00+0000 30
John 2 March, 08 2013 00:00:00+0000 5
Ryan 1 March, 10 2013 00:00:00+0000 45
Ryan 1 March, 17 2013 00:00:00+0000 25
Ryan 2 March, 25 2013 00:00:00+0000 10
Ryan 2 March, 28 2013 00:00:00+0000 30

if you add HistoryID to your query you will notice that rows are not duplicate, but they have different HistoryID, try this query
SELECT H.DriverName, R.RepairCost, R.RepairDate , H.HistoryID
FROM
Repairs AS R
INNER JOIN Vehicles AS V ON R.VehicleID=V.VehicleID
INNER JOIN History H ON H.VehicleID=V.VehicleID
WHERE H.DriverName='John' AND R.RepairDate BETWEEN '01.01.2013' AND '04.01.2013'
so I think it's something wrong with your data.
you can eliminate these duplicate rows using DISTINCT , but I recommend you to double check your History table data

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: How to use subqueries to remove duplicates based on multiple conditions - sql

Related

SUM and Count in one SQL Query

oracle pivot query suggestion

Select Every Date for Date Range and Insert

Oracle sql split amounts by weeks

Three tables SQL query

Categories

Resources