I'm working on an access query and kinda hit a dead end. I want to delete all duplicate rows in a table that have the same value in the columns Brand, SerialNr, Seats and LastRepair that have the value "2013" in the year column.
I'm trying to delete all rows that have duplicates in those columns and the year 2013 so there isnt a single one left. (Not just delete the duplicated so there is only one left but delete all instances so there is none left)
The original table looks like this:
Brand
SerialNr
Seats
Color
LastRepair
Year
Ford
145
4
Blue
01.01.2020
2010
Ford
145
4
Red
01.01.2020
2010
Ford
145
4
Red
01.01.2020
2013
Ford
145
4
Green
01.01.2020
2013
Porsche
146
2
White
01.01.2022
2013
Ferrari
146
2
White
01.01.2022
2013
Volkswagen
147
4
Blue
01.01.2021
2017
Volkswagen
147
4
Red
01.01.2021
2013
Volkswagen
147
4
Orange
01.01.2021
2013
And the outcome table should look like this:
Brand
SerialNr
Seats
Color
LastRepair
Year
Ford
145
4
Blue
01.01.2020
2010
Ford
145
4
Red
01.01.2020
2010
Porsche
146
2
White
01.01.2022
2013
Ferrari
146
2
White
01.01.2022
2013
Volkswagen
147
4
Blue
01.01.2021
2017
I tried doing it with this question but I need the rows deleted if they have a duplicated value in the those columns so there isnt a single one left who has the same year.
I also tried to do a "find duplicates" query and make an outter join but was unsuccesful so far achieving the desired outcome. I'm thankful for any help.
DELETE Exists (SELECT 1
FROM carTable As t2
WHERE t1.Brand = t2.Brand AND t1.SerialNr = t2.SerialNr AND t1.Seats = t2.Seats AND t1.LastRepair = t2.LastRepair
HAVING Count(*) > 1
), t1.[FilNr], *
FROM carTable AS t1, carTable
WHERE (((Exists (SELECT 1
FROM carTable As t2
WHERE t1.Brand = t2.Brand AND t1.SerialNr = t2.SerialNr AND t1.Seats = t2.Seats AND t1.LastRepair = t2.LastRepair
HAVING Count(*) > 1
))<>False) AND ((t1.[year])=2013));
You can use an EXISTS subquery to identify duplicated rows and delete them.
In the subquery, we just select based on the columns you want to identify duplicates by, then check if the count is greater than 1 (since Count is an aggregate, it's in the HAVING clause).
DELETE * FROM t AS t1
WHERE EXISTS(
SELECT 1
FROM t As t2
WHERE t1.Brand = t2.Brand AND t1.SerialNr = t2.SerialNr AND t1.Seats = t2.Seats AND t1.LastRepair = t2.LastRepair
HAVING Count(*) > 1
)
AND Year = 2013
If your goal is to never have duplicate information in the "Brand" column, that can be accomplished in the table design itself. It's much more efficient to setup the table such that it limits what the user can input in certain circumstances. There's a couple ways you can do this. You can set the primary key to the Brand column, or change the "Indexed" property of that column to "Yes (No Duplicates)" If you are using an auto-number as the ID field and plan on relating a table by that ID, then the index is your best bet.
I have a table of Cases with info like the following -
ID
CaseName
Date
Occupation
11
John
2020-01-01
Joiner
12
Mark
2019-10-10
Mechanic
And a table of Financial information like the following -
ID
CaseID
Date
Value
1
11
2020-01-01
1,000
2
11
2020-02-03
2,000
3
12
2019-10-10
3,000
4
12
2019-12-25
4,000
What I need to produce is a list of Cases including details of the most recent Financial value, for example -
ID
CaseName
Occupation
Lastest Value
11
John
Joiner
2,000
12
Mark
Mechanic
4,000
Now I can join my tables easy enough with -
SELECT *
FROM Cases AS c
LEFT JOIN Financial AS f ON f.CaseID = c.ID
And I can find the most recent date per case from the financial table with -
SELECT CaseID, MAX(Date) AS LastDate
FROM Financial
GROUP BY CaseID
But I am struggling to find a way to bring these two together to produce the required results as per the table set out above.
A simple method is window functions:
SELECT *
FROM Cases c LEFT JOIN
(SELECT f.*, MAX(date) OVER (PARTITION BY CaseId) as max_date
FROM Financial f
) f
ON f.CaseID = c.ID AND f.max_date = f.date;
I have a large set of imperfect data, from this data I reverse engineering a table for the coding used.
For this particular task, it is know that all records with a specific division code should all have the same group ID and plan ID (which are not included in the data) from another source I been able to add a close but imperfect (and incomplete) mapping of the group ID and plan ID. Now I want to work backwards and build a division mapping table. I have gotten data down to a format like this:
Division Year Group Plan Cnt
52 2019 30 101 9031
52 2020 30 101 9562
54 2019 60 602 3510
54 2020 60 602 3385
56 2019 76 904 1113
56 2020 76 905 1125
56 2020 76 001 6
The Division and Year columns should from a primary key. As you can see 56, 2020 is not unique, but by looking at the cnt column it is easy to see that the record with a count of 6 is a bad record and should be dropped.
What I need is a method to return each division and year pair once with the group and plan IDs that have the largest count.
Thank You
I found the answer using the Rank() function and WHERE clause:
SELECT *
FROM (
SELECT Division, Year, Group, Plan_Cd
, RANK() OVER (PARTITION BY Division, Year ORDER BY Cnt DESC ) AS 'rk'
FROM DivisionMap ) R
WHERE rk = 1
Please consider the following payment data:
customerID paymentID pamentType paymentDate paymentAmount
---------------------------------------------------------------------
1 1 A 2015-11-28 500
1 2 A 2015-11-29 -150
1 3 B 2016-03-07 300
2 4 A 2015-03-03 200
2 5 B 2016-05-25 -100
2 6 C 2016-06-24 700
1 7 B 2015-09-22 110
2 8 B 2016-01-03 400
I need to tally per year, per customer, the sum of the diverse payment types (A = invoice, B = credit note, etc), as follows:
year customerID paymentType paymentSum
-----------------------------------------------
2015 1 A 350 : paymentID 1 + 2
2015 1 B 110 : paymentID 7
2015 1 C 0
2015 2 A 200 : paymentID 4
2015 2 B 0
2015 2 C 0
2016 1 A 0
2016 1 B 300 : paymentID 3
2016 1 C 0
2016 2 A 0
2016 2 B 300 : paymentID 5 + 8
2016 2 C 700 : paymentId 6
It is important that there are values for every category (so for 2015, customer 1 has 0 payment value for type C, but still it is good to see this).
In reality, there are over 10 payment types and about 30 customers. The total date range is 10 years.
Is this possible to do in only SQL, and if so could somebody show me how? If possible by using relatively easy queries so that I can learn from it, for instance by storing intermediary result into a #temptable.
Any help is greatly appreciated!
a simple GROUP BY with SUM() on the paymentAmount will gives you what you wanted
select year = datepart(year, paymentDate),
customerID,
paymentType,
paymentSum = sum(paymentAmount)
from payment_data
group by datepart(year, paymentDate), customerID, paymentType
This is a simple query that generates the required 0s. Note that it may not be the most efficient way to generate this result set. If you already have lookup tables for customers or payment types, it would be preferable to use those rather than the CTEs1 I use here:
declare #t table (customerID int,paymentID int,paymentType char(1),paymentDate date,
paymentAmount int)
insert into #t(customerID,paymentID,paymentType,paymentDate,paymentAmount) values
(1,1,'A','20151128', 500),
(1,2,'A','20151129',-150),
(1,3,'B','20160307', 300),
(2,4,'A','20150303', 200),
(2,5,'B','20160525',-100),
(2,6,'C','20160624', 700),
(1,7,'B','20150922', 110),
(2,8,'B','20160103', 400)
;With Customers as (
select DISTINCT customerID from #t
), PaymentTypes as (
select DISTINCT paymentType from #t
), Years as (
select DISTINCT DATEPART(year,paymentDate) as Yr from #t
), Matrix as (
select
customerID,
paymentType,
Yr
from
Customers
cross join
PaymentTypes
cross join
Years
)
select
m.customerID,
m.paymentType,
m.Yr,
COALESCE(SUM(paymentAmount),0) as Total
from
Matrix m
left join
#t t
on
m.customerID = t.customerID and
m.paymentType = t.paymentType and
m.Yr = DATEPART(year,t.paymentDate)
group by
m.customerID,
m.paymentType,
m.Yr
Result:
customerID paymentType Yr Total
----------- ----------- ----------- -----------
1 A 2015 350
1 A 2016 0
1 B 2015 110
1 B 2016 300
1 C 2015 0
1 C 2016 0
2 A 2015 200
2 A 2016 0
2 B 2015 0
2 B 2016 300
2 C 2015 0
2 C 2016 700
(We may also want to play games with a numbers table and/or generate actual start and end dates for years if the date processing above needs to be able to use an index)
Note also how similar the top of my script is to the sample data in your question - except it's actual code that generates the sample data. You may wish to consider presenting sample code in such a way in the future since it simplifies the process of actually being able to test scripts in answers.
1CTEs - Common Table Expressions. They may be thought of as conceptually similar to temp tables - except we don't actually (necessarily) materialize the results. They also are incorporated into the single query that follows them and the whole query is optimized as a whole.
Your suggestion to use temp tables means that you'd be breaking this into multiple separate queries that then necessarily force SQL to perform the task in an order that we have selected rather than letting the optimizer choose the best approach for the above single query.
I have a table with data as follows
Person_ID Date Sale
1 2016-05-08 2686
1 2016-05-09 2688
1 2016-05-14 2689
1 2016-05-18 2691
1 2016-05-24 2693
1 2016-05-25 2694
1 2016-05-27 2695
and there are a million such id's for different people. Sale count is recorded only when a sale increases else it is not. Therefore data for id' 2 can be different from id 1.
Person_ID Date Sale
2 2016-05-10 26
2 2016-05-20 29
2 2016-05-18 30
2 2016-05-22 39
2 2016-05-25 40
Sale count of 29 on 5/20 means he sold 3 products on 20th, and had sold 26 till 5/10 with no sale in between these 2 dates.
Question: I want a sql/dynamic sql to calculate the daily a sales of all the agents and produce a report as follows:
ID Sale_511 Sale_512 Sale_513 -------------- Sale_519 Sale_520
2 0 0 0 --------------- 0 3
(29-26)
Question is how do I use that data to calculate a report. As I do have data between 5/20 to 5/10. SO i can just write a query saying A-B = C?
Can anyone help? Thank you.
P.S - New to SQL so learning.
Using Sql Server 2008.
Most SQL dialects support the lag() function. You can get what you want as:
select person_id, date,
(sale - lag(sale) over (partition by person_id, date)) as Daily_Sales
from t;
This produces one row per date for each person. This format is more typical for how SQL would return such results.
In SQL Server 2008, you can do:
select t.person_id, t.date,
(t.sale - t2.sale) as Daily_Sales
from t outer apply
(select top 1 t2.*
from t t2
where t2.person_id = t.person_id and t2.date < t.date
) t2