sql query to find sum of all rows and count of duplicates - sql

If data is in the following format:
SID TID Tdatetime QID QTotal
----------------------------------------
100 1 01/12/97 9:00AM 66 110
100 1 01/12/97 9:00AM 66 110
100 1 01/12/97 10:00AM 67 110
100 2 01/19/97 9:00AM 66 .
100 2 01/19/97 9:00AM 66 110
100 2 01/19/97 10:00AM 66 110
100 3 01/26/97 9:00AM 68 120
100 3 01/26/97 9:00AM 68 120
110 1 02/03/97 10:00AM 68 110
110 3 02/12/97 9:00AM 64 115
110 3 02/12/97 9:00AM 64 115
120 1 04/05/97 9:00AM 66 105
120 1 04/05/97 10:00AM 66 105
I would like to be able to write a query to sum the QTotal column for all rows and find the count of duplicate rows for the Tdatetime column.
The output would look like:
Year Total Count
97 | 1340 | 4
The third column in the result does not include the count of distinct rows in the table. And the output is grouped by the year in the TDateTime column.

The following query may help:
SELECT
'YEAR ' + CAST(sub.theYear AS VARCHAR(4)),
COUNT(sub.C),
(SELECT SUM(QTotal) FROM MyTable WHERE YEAR(Tdatetime) = sub.theYear) AS total
FROM
(SELECT
YEAR(Tdatetime) AS theYear,
COUNT(Tdatetime) AS C
FROM MyTable
GROUP BY Tdatetime, YEAR(Tdatetime)
HAVING COUNT(Tdatetime) >= 2) AS sub

This will work if you really want to group by the tDateTime column:
SELECT DISTINCT tDateTime, SUM(QTotal), Count(distinct tDateTime)
FROM Table
GROUP BY tDateTime
HAVING Count(distinct tDateTime) > 1
But your results look like you want to group by the Year in the tDateTime column. Is this correct?
If so try this:
SELECT DISTINCT YEAR (tDateTime), SUM(QTotal), Count(distinct tDateTime)
FROM Table
GROUP BY YEAR (tDateTime)
HAVING Count(distinct tDateTime) > 1

You must do SELECT from this table GROUPing by QTotal, using COUNT(subSELECT from this table WHERE QTotal is the same). If I only I had time I would write you SQL statement, but it'll take some minutes.

Something like:
select Year(Tdatetime) ,sum(QTotal), count(1) from table group by year(Tdatetime )
or full date
select Tdatetime ,sum(QTotal), count(1) from table group by year(Tdatetime)
Or your ugly syntax ( : ) )
select 'Year ' + cast(Year(tdatetime) as varchar(4))
+ '|' + cast(sum(QTotal) as varchar(31))
+ '|' + cast(count(1) as varchar(31))
from table group by year(Tdatetime )
Or do you want just the year? Sum all columns? Or just by year?

SELECT
YEar + year(Tdatetime),
SUM ( QTotal ),
(SELECT COUNT(*) FROM (
SELECT Tdatetime FROM tDateTime GROUP BY Tdatetime
HAVING COUNT(QID) > 1) C
FROM
Tdatetime t
GROUP BY
YEar + year(Tdatetime)

This is the first time I have asked a question on stackoverflow. It looks like I have lost my original ID info. I had to register to login and add comments to the question I posted.
To answer OMG Ponies question, this is a SQL Server 2008 database.
#Abe Miessler , the row with SID 120 does not contain duplicates. the first row for SID 120 shows 9:00AM in the datetime column , and the second row shows 10:00AM.
#Zafer, your query is the accepted answer. I made a few minor tweaks to get it to work. Thanks.
Thanks due to Abe Miessler and the others for your help.

Related

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

Combining Two Tables & Summing REV amts by Mth

Below are my two tables of data
Acct BillingDate REV
101 01/05/2018 5
101 01/30/2018 4
102 01/15/2018 2
103 01/4/2018 3
103 02/05/2018 2
106 03/06/2018 5
Acct BillingDate Lease_Rev
101 01/15/2018 2
102 01/16/2018 1
103 01/19/2018 2
104 02/05/2018 3
105 04/02/2018 1
Desired Output
Acct Jan Feb Mar Apr
101 11
102 3
103 5 2
104 3
105 1
106 5
My SQL Script is Below:
SELECT [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,SUM(case when [NewSalesHistory].[billingdate] between '6/1/2016' and '6/30/2016' then REV else 0 end ) + [X].[Jun-16] AS 'Jun-16'
FROM [NewSalesHistory]
FULL join (SELECT [Account]
,SUM(case when [BWLease].[billingdate] between '6/1/2016' and '6/30/2016' then Lease_REV else 0 end ) as 'Jun-16'
FROM [AirgasPricing].[dbo].[BWLease]
GROUP BY [Account]) X ON [NewSalesHistory].[Account] = [X].[Account]
GROUP BY [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,[X].[Jun-16]
I am having trouble combining these tables. If there is a rev amt and lease rev amt then it will combine (sum) for that account. If there is not a lease rev amt (which is the majority of the time), it brings back NULLs for all other rev amts accounts in Table 1. Table one can have duplicate accounts with different Rev, while the Table two is one unique account only w Lease rev. The output above is how I would like to see the data.
What am I missing here? Thanks!
I would suggest union all and group by:
select acct,
sum(case when billingdate >= '2016-01-01' and billingdate < '2016-02-01' then rev end) as rev_201601,
sum(case when billingdate >= '2016-02-01' and billingdate < '2016-03-01' then rev end) as rev_201602,
. . .
from ((select nsh.acct, nsh.billingdate, nsh.rev
from NewSalesHistory
) union all
(select bl.acct, bl.billingdate, bl.rev
from AirgasPricing..BWLease bl
)
) x
group by acct;
Okay, so there are a few things going on here:
1) As Gordon Linoff mentioned you can perform a union all on the two tables. Be sure to limit your column selections and name your columns appropriately:
select
x as consistentname1,
y as consistentname2,
z as consistentname3
from [NewSalesHistory]
union all
select
a as consistentname1,
b as consistentname2,
c as consistentname3
from [BWLease]
2) Your desired result contains a pivoted month column. Generate a column with your desired granularity on the result of the union in step one. F.ex. months:
concat(datepart(yy, Date_),'-',datename(mm,Date_)) as yyyyM
Then perform aggregation using a group by:
select sum(...) as desiredcolumnname
...
group by PK1, PK2, yyyyM
Finally, PIVOT to obtain your result: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
3) If you have other fields/columns that you wish to present then you first need to determine whether they are measures (can be aggregated) or are dimensions. That may be best addressed in a follow up question after you've achieved what you set out for in this part.
Hope it helps
As an aside, it seems like you are preparing data for reporting. Performing these transformations can be facilitated using a GUI such as MS Power Query. As long as your end goal is not data manipulation in the DB itself, you do not need to resort to raw sql.

Output number of occurrences of id in a table

PK Date ID
=== =========== ===
1 07/04/2017 22
2 07/05/2017 22
3 07/07/2017 03
4 07/08/2017 04
5 07/09/2017 22
6 07/09/2017 22
7 07/10/2017 05
8 07/11/2017 03
9 07/11/2017 03
10 07/11/2017 03
I want to count the number of ID occurred in a given week/month, something like this.
ID Count
22 3 --> count as 1 only in the same date occurred twice one 07/09/2017
03 2 --> same as above, increment only one regardless how many times it occurred in a same date
04 1
05 1
I'm trying to implement this in a perl file, to output/print it in a csv file, I have no idea on what query will I execute.
Seems like a simple case of count distinct and group by:
SELECT Id, COUNT(DISTINCT [Date]) As [Count]
FROM TableName
WHERE [Date] >= #StartDate
AND [Date] <= #EndDate
GROUP BY Id
ORDER BY [Count] DESC
You can use COUNT with DISTINCT e.g.:
SELECT ID, COUNT(DISTINCT Date)
FROM table
GROUP BY ID;
You can read more abot how to get month from a date in get month from a date (it also works for year).
Your query will be :
select DATEPART(mm,Date) AS month, COUNT(ID) AS count from table group by month
Hope that helped you.

SQL - Get column value based on another column's average between select rows

I've got a table something like..
[DateValueField][Hour][Value]
2014-09-01 1 200
...
2014-09-01 24 400
2014-09-02 1 220
...
2014-09-02 24 200
...
I need the same value for each DateValueField based on the average Value for Hour between 6-12 for example but have that display for all hours, not just 6-12. For instance...
[DateValueField][Hour][Value]
2014-09-01 1 300
...
2014-09-01 24 300
2014-09-02 1 190
...
2014-09-02 24 190
...
Query I'm trying is...
select DateValueField, Hour,
(select avg(Value) as Value from MyTable where Hour
between 6 and 12) as Value from MyTable
where DateValueField between '2014' and '2015'
group by DateValueField, Hour
order by DateValueField, Hour
But it gives me the Value as an average of ALL Values but I need it averaged out for that particular day between the hours I specify.
I'd appreciate some help/advice. Thanks!
You can use a derived table to get the average value between hours 6 and 12 grouped by date and then join that to your original table
select t1.DateValueField, t1.Hour, t2.avg_value
from MyTable t1
join (
select DateValueField, avg(Value) avg_value
from MyTable
where hour between 6 and 12
group by DateValueField
) t2 on t2.DateValueField = t1.DateValueField
order by t1.DateValueField, t1.Hour
Note: You may want to use a left join if some of your dates don't have values between hours 6 and 12 but you still want to retrieve all rows from MyTable.

Calculate MonthlyVolume - PIVOT SQL

I am using SQL server 2008 and I have following Table with millions of rows...Here are few sample records
Serial_Num ReadingDate M_Counter Dyn_Counter
XYZ 3/15/2014 100 190
XYZ 4/18/2014 140 240
XYZ 5/18/2014 200 380
ABC 3/12/2014 45 40
ABC 4/19/2014 120 110
ABC 5/21/2014 130 155
This table will always have only one reading for each month and no missing months....
and I would like calculate M_Counter and Dyn_Counter values for each month, For an example XYZ -> May month calculated counter value should be 60 = 200 (05/18/2014 value) - 140 (04/18/2014 value). I would like to insert data into another table in following way.
CalculatedYear CalculatedMonth Serial_Num M_Counter_Calc Dyn_Counter_Calc
2014 4 XYZ 40 50
2014 5 XYZ 60 140
2014 4 ABC 75 70
2014 5 ABC 10 45
Any help really appreciated!
If you're using MS SQL, something like this should work. The concept is to sort the dataset based on Serial_Num and ReadingDate. Add a sequential Row ID and store into a temp table. Join the table onto itself such that you match up the current row with the previous row where the serial numbers still match. If there wasn't a prior month's reading, the value will be null. We use Isnull( x, 0) to account for this when doing the calculations.
declare #Temp1 table
(
RowID int,
Serial_Num varchar(3),
ReadingDate datetime,
M_Counter int,
Dyn_Counter int
)
insert into #Temp1
select ROW_NUMBER() over (order by Serial_Num, ReadingDate), *
from MyTable T
select
Year(T1.ReadingDate) As CalculatedYear,
Month(T1.ReadingDate) as CalculatedMonth,
T1.Serial_Num,
T1.M_Counter - ISNULL(T2.M_Counter,0) as Calculated_M_Counter,
T1.Dyn_Counter - isnull(T2.Dyn_Counter,0) as Calculated_Dyn_Counter
from #Temp1 T1
left outer join #Temp1 T2 on T1.RowID = T2.RowID + 1 and T1.Serial_Num = T2.Serial_Num
order by T1.Serial_Num, Year(T1.ReadingDate), Month(T1.ReadingDate)