SUM GROUP BY giving undesired result

SUM GROUP BY giving undesired result - sql

First 12 rows of Table T1:
Name Status Duration
Todd Active 60
Todd Active 60
Todd Active 60
Todd Schedu 60
Todd Schedu 60
Todd Schedu 120
Todd Schedu 120
Bran Active 30
Bran Active 30
Bran Active 60
Bran No Show 120
Bran No Show 120
If I run this query (or use a DISTINCT without the GROUP BY):
SELECT Name, Status, Duration
FROM Table T1
GROUP BY Name,Status,Duration
I get:
Name Status Duration
Todd Active 60
Todd Schedu 60
Todd Schedu 120
Bran Active 30
Bran Active 60
Bran No Show 120
From the above result, I want the desired result as SUM(Duration) GROUPED BY Name, Status:
Name Status Duration
Todd Active 60
Todd Schedu 180
Bran Active 90
Bran No Show 120
I'm trying this query to achieve the desired result:
SELECT Name, Status, SUM(Duration)
FROM Table T1
GROUP BY Name,Status
But I'm getting huge numbers for SUM(Duration) - It's probably adding all the durations and not the distinct durations for each group of Name and Status.

One method to get what you want uses a subquery:
SELECT Name, Status, SUM(Duration)
FROM (SELECT Name, Status, Duration
FROM Table T1
GROUP BY Name,Status,Duration
) NSD
GROUP BY Name, Status;

You can use Distinct inside SUM function. It will give you expected result.
SELECT Name, Status, SUM(DISTINCT Duration)
FROM T1
GROUP BY Name,Status

You could use CTE,
WITH C1 AS(
SELECT Name, Status, Duration
FROM Table T1
GROUP BY Name,Status,Duration
)
SELECT Name,Status,SUM(Duration) FROM C1 GROUP BY Name,Status

with temp_cte
as
(select Name, Status, Duration
FROM dbo.test2
group by name,status,duration
)
select tc.name,tc.status,sum(tc.duration) from temp_cte as tc
group by tc.name,tc.status
order by name

Related

Frequency of Address changes in number of days SQL

Hi I'm trying to find out how frequently a business would change their address. I've got two tables one with trading address and the other with office address. The complicated part is one id will have several sequence numbers. I need to find out the difference between one address's create date and another address create date.
Trading address table
ID
Create_date
Seq_no
Address
1
2002-03-23
1
20 bottle way
1
2002-05-23
2
12 sunset blvd
2
2003-01-14
1
76 moonrise ct
Office address table
ID
Create_date
Seq_no
Address
1
2004-02-13
1
12 paper st
2
2005-03-01
1
30 pencil way
2
2005-04-01
2
25 mouse rd
2
2005-08-01
3
89 glass cct
My result set will be
Difference
NumberOfID's
30 days
1
60 days
1
120 days
1
Other
2

I think I solved it. Steps are
I did an union and created a separate column to find out actual
sequence no for the union set.
Used LEAD function to create a separate column of to bring up the date.
Date difference to find out the actual difference between id's
Case statement to categorize the days and counting the id's
WITH BASE AS (
SELECT ID,SEQ_NO,CREATE_DATE
FROM TradingAddress
UNION ALL
SELECT ID,SEQ_NO,CREATE_DATE
FROM OfficeAddress
),
WORKINGS AS (
SELECT ID,CREATE_DATE,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY CREATE_DATE ASC) AS SNO,
LEAD(CREATE_DATE) OVER (PARTITION BY ID ORDER BY CREATE_DATE) AS REF_DATE,
DATEDIFF(DAY,CREATE_DATE,LEAD(CREATE_DATE) OVER (PARTITION BY ID ORDER BY CREATE_DATE)) AS DATE_DIFFERENCE
FROM BASE
),
WORKINGS_2 AS (
SELECT *,
CASE WHEN DATE_DIFFERENCE BETWEEN 1 AND 30 THEN '1-30 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 31 AND 60 THEN '31-60 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 61 AND 90 THEN '61-90 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 91 AND 120 THEN '91-120 DAYS'ELSE 'MORE THAN 120 DAYS'
END AS DIFFERENCE_DAYS
FROM WORKINGS
WHERE REF_DATE IS NOT NULL
)
SELECT DIFFERENCE_DAYS,COUNT(DIFFERENCE_DAYS) AS NUMBEROFIDS
FROM WORKINGS_2
GROUP BY DIFFERENCE_DAYS

you can do this in this way
SELECT DATEDIFF(day,t1.create_date,t2.create_date) AS 'yourdats', Count (*) as ids FROM test1 t1 join test2 t2 on t1.id = t2.id GROUP BY DATEDIFF(day,t1.create_date,t2.create_date)

Inner Join - special time conditions

Given an hourly table A with full heart_rate records, e.g.:
User Hour Heart_rate
Joe 1 60
Joe 2 70
Joe 3 72
Joe 4 75
Joe 5 68
Joe 6 71
Joe 7 78
Joe 8 83
Joe 9 85
Joe 10 80
And a subset hours where a purchase happened, e.g.
User Hour Purchase
Joe 3 'Soda'
Joe 9 'Coke'
Joe 10 'Doughnut'
I want to keep only those records from A that are in B or at most 2hr behind the B subset, without duplication, i.e. and preserving both the heart_rate from A and the item purchased from b so the outcome is
User Hour Heart_rate Purchase
Joe 1 60 null
Joe 2 70 null
Joe 3 72 'Soda'
Joe 7 78 null
Joe 8 83 null
Joe 9 85 'Coke'
Joe 10 80 'Doughnut'
How can the result be achieved with an inner join, without duplication (in this case the hours 8&9) (This is an MWE, assume multiple users and timestamps instead of hours)
The obvious solution is to combine
Inner Join + deduplication
Left join
Can this be achieved in a more elegant way?

You could use an INNER join of the tables and conditional aggregation for the deduplication:
SELECT a.User, a.Hour, a.Heart_rate,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
WHERE a.User = 'Joe' -- remove this line if you want results for all users
GROUP BY a.User, a.Hour, a.Heart_rate;
Or with MAX() window function:
SELECT DISTINCT a.*,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) OVER (PARTITION BY a.User, a.Hour) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour;
See the demo (for MySql but it is standard SQL).

Your solutiuons should work and sounds good.
There is another way, using 3 Select Statements.
The inner Select combines both tables by UNION ALL. Because only tables with the same columns can be combinded, fields which are only in one table have to be defined in the other one as well and set to null. The column hour_eat is added to see when the last purchase has occured. By sorting this table, we can archive that under each row from table B lies now the row of table A which occures next.
In the middle Select statement the lag(Purchase) gets the last Purchase. If we only think about the rows from the 1st table, the Purchase value from the 2nd table is now at the right place. This comes in handy if timestamps and not defined hours are used. The row the last_value calculates the time between the purchase and measurement of the heart_beat.
The outer Select filters the rows of interest. The last 2 hours before the purchase and only the rows of the 1st table.
With
heart_tbl as (SELECT "Joe" as USER, row_number() over() Hour, Heart_rate from unnest([60,72,72,75,68,71,78,83,85,80]) Heart_rate ),
eat_tbl as (Select "Joe" as User ,3 Hour , 'Soda' as Purchase UNION ALL SELECT "Joe", 9, 'Coke' UNION ALL SELECT "Joe", 10, 'Doughnut' )
SELECT user, hour,heart_rate,Purchase_,hours_till_Purchase
from
(
SELECT *,
lag(Purchase) over (order by hour, heart_rate is not null) as Purchase_,
hour-last_value(hour_eat ignore nulls) over (order by hour desc,heart_rate is not null) as hours_till_Purchase
From # combine both tables to one table (ordered by hours)
(
SELECT user, hour,heart_rate, null as Purchase, null as hour_eat from heart_tbl
UNION ALL
Select user, hour, null as heart_rate, Purchase, hour from eat_tbl
)
)
Where heart_rate is not null and hours_till_Purchase >= -2
order by hour

Need sum of column while selecting other value

I have a table like this:
empID name amt Date
------------------------------------
1 mark 20 22-10
1 mark 30 22-10
2 kane 50 22-12
2 kane 60 22-12
3 mike 60 22-10
and I want to get an output like that
empID name amt Date TOTAL
-----------------------------------------
1 mark 20 22-10 220
1 mark 30 22-10 220
2 kane 50 22-12 220
2 kane 60 22-12 220
3 mike 60 22-10 220
I have used sum(amt) but it is returning only 1 row; I want other rows as well.

You can use the window function sum() over() without any partition or order by
Example
Select *
,[Total] = sum(amt) over()
From YourTable

You need a windowing function
SELECT
empid
,name
,amt
,[date]
,SUM(amt) OVER(PARTITION BY '') AS Total -- as you show it
,SUM(amt) OVER(PARTITION BY empID) AS Total -- as I think you want it
FROM t
Documentation: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15

Since you need to display the grand total value for each row you don't need to partition or group by. Therefore, you can use SUM(amt) OVER () .
SELECT *,
SUM(amt) OVER () AS [Total]
FROM tabe_c

Get next value for each of values from next table CTE

I have the following table:
dbo.split
Name Time
Alex 120
John 80
John 300
Mary 500
Bob 900
And then another table dbo.travel
Name Time
Alex 150
Alex 160
Alex 170
John 90
John 100
John 310
Mary 550
Mary 600
Mary 499
Bob 800
Bob 700
For each value in table split I need to find the next value in table travel. I tried to do it with CTE a with ROW_NUMBER() to get next by group, but there's no way I can group by correct value, since dbo.split can containt multiple values for the same name.
I'm looking for the following output:
Name Time TravelTime
Alex 120 150
John 80 90
John 300 310
Mary 500 550
Bob 900 NULL
Here's what I have so far but it fails because split table can have multiple records per person:
;with result as (
select t.*,
ROW_NUMBER() OVER (Partition BY t.Name order by t.Time) as rn
from travel t join split s
on t.Name = s.Name and t.TIME>s.Time
)

I would use apply:
select s.*, t.time
from split s outer apply
(select top (1) t.*
from travel t
where t.name = s.name and t.time > s.time
order by t.time asc
) t;
In this case, apply is doing essentially the same thing as a correlated subquery, so you could phrase it that way as well.

You can try as below
Select * from(Select
Name,t.time,t1.time,
Row_number() over (partition by
Name,t.time order by t1.time) rn
from split t
Join travel t1 on t.time <t1.time and
t.name =t1.name)
where
rn=1;

SQL select specific group from table

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.

I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;

This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SUM GROUP BY giving undesired result - sql

One method to get what you want uses a subquery: SELECT Name, Status, SUM(Duration) FROM (SELECT Name, Status, Duration FROM Table T1 GROUP BY Name,Status,Duration ) NSD GROUP BY Name, Status;

You can use Distinct inside SUM function. It will give you expected result. SELECT Name, Status, SUM(DISTINCT Duration) FROM T1 GROUP BY Name,Status

You could use CTE, WITH C1 AS( SELECT Name, Status, Duration FROM Table T1 GROUP BY Name,Status,Duration ) SELECT Name,Status,SUM(Duration) FROM C1 GROUP BY Name,Status

with temp_cte as (select Name, Status, Duration FROM dbo.test2 group by name,status,duration ) select tc.name,tc.status,sum(tc.duration) from temp_cte as tc group by tc.name,tc.status order by name

Related

Frequency of Address changes in number of days SQL

Inner Join - special time conditions

Need sum of column while selecting other value

Get next value for each of values from next table CTE

SQL select specific group from table

Categories

Resources