SQL Query to Count total rows grouping different columns - sql

I have a database that has RMA return data. I want to write a query to return the total number of times a unit has been returned (each return has a unique RMA Number). I also need to return the number of times a unit has returned multiple times, and the number of times it returned for the same symptom. A record is created each time the unit goes to a station (RMA, symptom, and date returned is propagated for each station record).
The data looks like this:
ID SN RMA SYMPTOM Station Date_Returned
21567 A001 84704 POWER FAULT DockRecv 01/01/2015
21568 A001 84704 POWER FAULT Repair 01/01/2015
21569 A001 84704 POWER FAULT Ship 01/01/2015
10235 A002 83494 NO DISPLAY DockRecv 02/20/2015
10236 A002 83494 NO DISPLAY Repair 02/20/2015
10237 A002 83494 NO DISPLAY Ship 02/20/2015
36548 A002 84283 ABNORMAL NOISE DockRecv 10/05/2015
36549 A002 84283 ABNORMAL NOISE Repair 10/05/2015
36550 A002 84283 ABNORMAL NOISE Ship 10/05/2015
38790 A003 83432 HDD FAULT DockRecv 09/15/2015
38791 A003 83432 HDD FAULT Repair 09/15/2015
38792 A003 83432 HDD FAULT Ship 09/15/2015
69613 A003 84276 HDD FAULT DockRecv 01/30/2016
69614 A003 84276 HDD FAULT Repair 01/30/2016
69615 A003 84276 HDD FAULT Ship 01/30/2016
56732 A004 82011 NFF DockRecv 12/01/2015
56733 A004 82011 NFF Repair 12/01/2015
56734 A004 82011 NFF Ship 12/01/2015
My Output needs to look like this:
Total_Returns Repeat_Return Same_Symptom_Return
6 2 1
A001(RMA 84704) is a single return.
A002 is a multiple return-(RMA 83494) is the first return (after repaired, the unit is shipped out) after some time in the field, the unit is returned again A002(RMA 84283).... When a unit is returned, it goes through 3 stations (we create a record for each station (propagating the RMA, symptom, and date returned for each station record).
I can get Total_Returns with the code:
Select count(*) as totalcount
From
(
SELECT
[SN]
,[RMA]
FROM [dbo].[test]
Group by [SN],[RMA]
)as a

There are 3 quite different methods needed to arrive at the counts, so I have used 3 separate sub-queries. see this working at sqlfiddle (but not on MS SQL Server) here: http://sqlfiddle.com/#!5/9df16/1
Result:
| Total_Count | Repeat_Return | Same_Symptom_Return |
|-------------|---------------|---------------------|
| 6 | 2 | 1 |
Query:
select
(select count(distinct SN + RMA + SYMPTOM) from table1) as Total_Count
, (select count(*) from(
SELECT SN
FROM table1
Group by SN
having count(distinct Date_Returned) > 1)
) as Repeat_Return
, (select count(*) from(
SELECT SYMPTOM
FROM table1
Group by SYMPTOM
having count(*)/3 > 1)
) as Same_Symptom_Return
note: you should include "sql server" as a tag on your question (well I presum it is that because of the [dbo].[test]

I got it to work... I'm sure there is a more streamline way to write it...
SELECT
-- Get Total_Returned Count
(Select distinct
count(*) as 'Total_Returned'
From
( SELECT
[SN]
,[RMA]
FROM [dbo].[test]
Group by [SN],[RMA]
)a) AS 'Total_Returned'
-- Get Repeat_Return Count
,(Select distinct
[Repeat_Return] - COUNT(*) OVER() AS [Repeat_Return]
From
( SELECT
COUNT(*) OVER() AS [Repeat_Return]
,[SN]
,[RMA]
FROM [dbo].[test]
Group by [SN],[RMA]
)a Group by [SN],[Repeat_Return]) AS 'Repeat_Return'
-- Get Same_Symptom_Return Count
,(Select distinct
[Same_Symptom_Return] - COUNT(*) OVER() AS [Same_Symptom_Return]
From
( SELECT
COUNT(*) OVER() AS [Same_Symptom_Return]
,[SN]
,[RMA]
,SYMPTOM
FROM [dbo].[test]
Group by SN, SYMPTOM, RMA
)a Group by [SN], SYMPTOM, [Same_Symptom_Return]) AS 'Same_Symptom_Return'
Result:
|Total_Returned | Repeat_Return | Same_Symptom_Return |
|---------------|---------------|---------------------|
| 6 | 2 | 1 |

Related

SQL Server: Select duplicate rows

I have a table:
personId
Date
location
abc123
15-09-2022
London
abc123
15-09-2022
Nottingham
efg321
12-09-2022
Leeds
abc123
13-09-2022
Birmingham
I want to select and return the duplicate rows based on Date and location columns, for example, in the above table: personId 'abc123' is present at location both 'London' and 'Nottingham' on the same date, so I would like to return these rows.
I have tried this query:
SELECT personId, Date FROM sampleTable GROUP BY personId, Date HAVING COUNT(*) > 1
But it gives me the count. I want the rows with all three columns. Expected result:
personId
Date
location
abc123
15-09-2022
London
abc123
15-09-2022
Nottingham
Can anyone please help me with this? Thanks
Try something like this:
SELECT
sampleTable.*
FROM
sampleTable
INNER JOIN -- acts as a filter here
(
SELECT
personId,
Date
FROM
sampleTable
GROUP BY
personId,
Date
HAVING
COUNT(*) > 1
) problemTable
ON sampleTable.personId = problemTable.personId
AND sampleTable.Date = problemTable.Date
ORDER BY
sampleTable.personId,
sampleTable.Date,
sampleTable.location
;
The derived problemTable calculates personId/Date combos that have multiple sampleTable rows. INNER JOINing sampleTable with problemTable, by nature of an INNER JOIN, returns an abridged version of sampleTable: one that only contains combos found within problemTable as well—and those are the ones you care about!
Using INNER JOIN as a filter mechanism is a common theme in SQL, so keep it in the back of your mind.
Its pretty easy using window functions.
Inner SQL returns same table with extra col that marks duplicate rows. Then outer sql filters rows that has duplicate
inner sql result
personid date location check
abc123 13-09-2022 Birmingham 1
abc123 15-09-2022 London 2
abc123 15-09-2022 Nottingham 2
efg321 12-09-2022 Leeds 1
final
personid date location check
abc123 15-09-2022 London 2
abc123 15-09-2022 Nottingham 2
SQL
WITH temp AS (
SELECT
personid,
datecol,
location,
COUNT( personid ) OVER (PARTITION BY personid, datecol) AS check
FROM sampletable
)
SELECT *
FROM temp
WHERE check > 1

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

SQL COUNT the number purchase between his first purchase and the follow 10 months

every customer has different first-time purchase date, I want to COUNT the number of purchases they have between the following 10 months after the first purchase?
sample table
TransactionID Client_name PurchaseDate Revenue
11 John Lee 10/13/2014 327
12 John Lee 9/15/2015 873
13 John Lee 11/29/2015 1,938
14 Rebort Jo 8/18/2013 722
15 Rebort Jo 5/21/2014 525
16 Rebort Jo 2/4/2015 455
17 Rebort Jo 3/20/2016 599
18 Tina Pe 10/8/2014 213
19 Tina Pe 6/10/2016 3,494
20 Tina Pe 8/9/2016 411
my code below just use ROW_NUM function to identify the first purchase, but I don't know how to do the calculations or there's a better way to do it?
SELECT client_name,
purchasedate,
Dateadd(month, 10, purchasedate) TenMonth,
Row_number()
OVER (
partition BY client_name
ORDER BY client_name) RM
FROM mytable
You might try something like this - I assume you're using SQL Server from the presence of DATEADD() and the fact that you're using a window function (ROW_NUMBER()):
WITH myCTE AS (
SELECT TransactionID, Client_name, PurchaseDate, Revenue
, MIN(PurchaseDate) OVER ( PARTITION BY Client_name ) AS min_PurchaseDate
FROM myTable
)
SELECT Client_name, COUNT(*)
FROM myCTE
WHERE PurchaseDate <= DATEADD(month, 10, min_PurchaseDate)
GROUP BY Client_name
Here I'm creating a common table expression (CTE) with all the data, including the date of first purchase, then I grab a count of all the purchases within a 10-month timeframe.
Hope this helps.
Give this a whirl ... Subquery to get the min purchase date, then LEFT JOIN to the main table to have a WHERE clause for the ten month date range, then count.
SELECT Client_name, COUNT(mt.PurchaseDate) as PurchaseCountFirstTenMonths
FROM myTable mt
LEFT JOIN (
SELECT Client_name, MIN(PurchaseDate) as MinPurchaseDate GROUP BY Client_name) mtmin
ON mt.Client_name = mtmin.Client_name AND mt.PurchaseDate = mtmin.MinPurchaseDate
WHERE mt.PurchaseDate >= mtmin.MinPurchaseDate AND mt.PurchaseDate <= DATEADD(month, 10, mtmin.MinPurchaseDate)
GROUP BY Client_name
ORDER BY Client_name
btw I'm guessing there's some kind of ClientID involved, as nine character full name runs the risk of duplicates.

MAX on group returns multiple values with same date but different times

I have followed many of the excellent pieces of advise on this site about selecting the MAX from a group of rows.
I have a history file and I only want the top date and comments for each project number. I am creating a derived table in a Boxi universe from this information. It all goes pretty well but if there are two entries for the same day but with different times they are both returned. This duplicates that entry on the subsequent report. Is there some way to make the MAX command go down to the time level of the date field?
Database is SQL Server 2005
-------------Sql used for derived table
Select
Projectno, Comment, CreatedOn
from
ReportHistory
Where
ReportHistory.ItemName=('ProjectCode1')
and
CreatedOn in(Select max(CreatedOn) FROM ReportHistory group by Projectno)
-------------------Example database
Projectno Comment Created on
1 Started 2013-01-04 11:04:00
2 Late 2013-01-06 11:22:00
3 Late 2013-01-07 11:06:00
1 On Time 2013-01-08 11:01:00 *these two both get selected*
1 Late 2013-01-08 12:05:00 *these two both get selected*
3 Back on schedule 2013-01-08 14:20:00
2 Still overdue 2013-01-09 09:01:00
MAX on a DATETIME data type do obviously take the time into account, that is not what's wrong with your query. The problem is that you are not ensuring that the max value for CreatedOn is for the correct ProjectNo. You could use analytical functions for this:
;WITH CTE AS
(
SELECT Projectno,
Comment,
CreatedOn,
ROW_NUMBER() OVER(PARTITION BY ProjectNo ORDER BY CreatedOn DESC) RN
FROM ReportHistory
WHERE ReportHistory.ItemName = 'ProjectCode1'
)
SELECT Projectno, Comment, CreatedOn
FROM CTE
WHERE RN = 1
Query if there are no same projectno with the same date:
SQLFIDDLEExample
SELECT h.Projectno,
h.Comment,
h.[Created on]
FROM ReportHistory h
WHERE h.[Created on] =(Select max(h2.[Created on])
FROM ReportHistory h2
WHERE h2.Projectno = h.Projectno )
ORDER BY h.Projectno
Result:
| PROJECTNO | COMMENT | CREATED ON |
-----------------------------------------------------------------
| 1 | Late | January, 08 2013 12:05:00+0000 |
| 2 | Still overdue | January, 09 2013 09:01:00+0000 |
| 3 | Back on schedule | January, 08 2013 14:20:00+0000 |
Query if there are same projectno with the same date:
SELECT h.Projectno,
MAX(h.Comment) AS Comment,
h.[Created on]
FROM ReportHistory h
WHERE h.[Created on] =(Select max(h2.[Created on])
FROM ReportHistory h2
WHERE h2.Projectno = h.Projectno )
GROUP BY h.Projectno,
h.[Created on]
ORDER BY h.Projectno
I think you receive copies when dates at different projects are identical.
For eg. add in your data (4, 'On Time', '2013-01-08 11:01:00')
Then result will be SQLFiddle
But you need this result SQLFiddle
SELECT *
FROM ReportHistory t
WHERE t.ItemName=('ProjectCode1')
AND EXISTS (
SELECT 1
FROM ReportHistory
WHERE projectNo = t.projectNo
GROUP BY projectNo
HAVING MAX(CreatedOn) = t.CreatedOn
)

SQL: How do I count the number of clients that have already bought the same product?

I have a table like the one below. It is a record of daily featured products and the customers that purchased them (similar to a daily deal site). A given client can only purchase a product one time per feature, but they may purchase the same product if it is featured multiple times.
FeatureID | ClientID | FeatureDate | ProductID
1 1002 2011-05-01 500
1 2333 2011-05-01 500
1 4458 2011-05-01 500
2 8888 2011-05-10 700
2 2333 2011-05-10 700
2 1111 2011-05-10 700
3 1002 2011-05-20 500
3 4444 2011-05-20 500
4 4444 2011-05-30 500
4 2333 2011-05-30 500
4 1002 2011-05-30 500
I want to count by FeatureID the number of clients that purchased FeatureID X AND who purchased the same productID during a previous feature.
For the table above the expected result would be:
FeatureID | CountofReturningClients
1 0
2 0
3 1
4 3
Ideally I would like to do this with SQL, but am also open to doing some manipulation in Excel/PowerPivot. Thanks!!
If you join your table to itself, you can find the data you're looking for. Be careful, because this query can take a long time if the table has a lot of data and is not indexed well.
SELECT t_current.FEATUREID, COUNT(DISTINCT t_prior.CLIENTID)
FROM table_name t_current
LEFT JOIN table_name t_prior
ON t_current.FEATUREDATE > t_prior.FEATUREDATE
AND t_current.CLIENTID = t_prior.CLIENTID
AND t_current.PRODUCTID = t_prior.PRODUCTID
GROUP BY t_current.FEATUREID
"Per feature, count the clients who match for any earlier Features with the same product"
SELECT
Curr.FeatureID
COUNT(DISTINCT Prev.ClientID) AS CountofReturningClients --edit thanks to feedback
FROM
MyTable Curr
LEFT JOIN
MyTable Prev WHERE Curr.FeatureID > Prev.FeatureID
AND Curr.ClientID = Prev.ClientID
AND Curr.ProductID = Prev.ProductID
GROUP BY
Curr.FeatureID
Assumptions: You have a table called Features that is:
FeatureID, FeatureDate, ProductID
If not then you could always create one on the fly with a temporary table, cte or view.
Then:
SELECT
FeatureID
, (
SELECT COUNT(DISTINCT ClientID) FROM Purchases WHERE Purchases.FeatureDate < Feature.FeatureDate AND Feature.ProductID = Purchases.ProductID
) as CountOfReturningClients
FROM Features
ORDER BY FeatureID
New to this, but wouldn't the following work?
SELECT FeatureID, (CASE WHEN COUNT(clientid) > 1 THEN COUNT(clientid) ELSE 0 END)
FROM table
GROUP BY featureID