I would like to get a result from a table:
Date Charges
22/04/2010 1764
22/04/2010 200
22/04/2010 761
22/04/2010 3985
22/04/2010 473
22/04/2010 677
22/04/2010 1361
22/04/2010 6232
22/04/2010 4095
23/04/2010 7224
23/04/2010 1748
23/04/2010 1355
23/04/2010 2095
23/04/2010 2063
23/04/2010 2331
23/04/2010 2331
23/04/2010 4473
23/04/2010 478
23/04/2010 1901
23/04/2010 1250
23/04/2010 1743
24/04/2010 1743
24/04/2010 3923
24/04/2010 1575
24/04/2010 1859
24/04/2010 2431
24/04/2010 1208
24/04/2010 158
24/04/2010 3246
24/04/2010 2898
24/04/2010 1517
24/04/2010 2368
24/04/2010 961
24/04/2010 4111
24/04/2010 3066
24/04/2010 740
25/04/2010 2651
25/04/2010 2693
25/04/2010 4847
25/04/2010 312
25/04/2010 1247
25/04/2010 5858
25/04/2010 1040
25/04/2010 941
25/04/2010 942
25/04/2010 1784
25/04/2010 418
25/04/2010 2248
25/04/2010 1834
25/04/2010 418
25/04/2010 2263
26/04/2010 2746
26/04/2010 942
26/04/2010 883
26/04/2010 3339
26/04/2010 3517
26/04/2010 761
26/04/2010 1738
26/04/2010 1370
26/04/2010 1501
26/04/2010 1197
26/04/2010 2452
26/04/2010 209
26/04/2010 1092
26/04/2010 4316
26/04/2010 1208
26/04/2010 1213
26/04/2010 2179
26/04/2010 1213
26/04/2010 1538
26/04/2010 1939
26/04/2010 956
26/04/2010 10715
26/04/2010 4321
26/04/2010 956
26/04/2010 2975
26/04/2010 798
26/04/2010 1738
where it shows the following fields:
Date, Count of >2500, Total of >2500, Total Count and Grand total between 1/4/2010 to 30/4/2010
i.e.
22/4/2010, 3, 14312, 9, 19548
23/4/2010, 2, 11697, 12, 28992
24/4/2010, 5, 17244, 15, 31804
25/4/2010, 4, 16049, 15, 29496
26/4/2010, 7, 31929, 27, 57812
...
...
All help are much appreciated! Thanks in advance.
Basics would be to use SUM and CASE, something like:
SELECT
DATEADD(day,DATEDIFF(day,'20010101',DateTimeActivity),'20010101') as Date,
SUM(CASE WHEN Charges > 2500 THEN 1 ELSE 0 END) as Count2500,
SUM(CASE WHEN Charges > 2500 THEN Charges END) as Sum2500,
COUNT(*) as CountTotal,
SUM(Charges) as SumTotal
FROM
AccActivity
WHERE
DateTimeActivity >= '20100401' and
DateTimeActivity < '20100501'
GROUP BY
DATEADD(day,DATEDIFF(day,'20010101',DateTimeActivity),'20010101')
Updated based on your comment, to use real table/column names. I assume you want to include transactions which occur on 30th April.
Note that I'm using a safe date format for my date literals (YYYYMMDD) - most other formats are ambiguous based on the regional settings on the server.
Also, I'm using DATEADD(day,DATEDIFF(day,'20010101',DateTimeActivity),'20010101') to strip the time component from the datetime - it looks slightly funky, but it's reasonable fast, and the same pattern can be used to do other datetime conversions relatively easily (e.g. if you need to group on months, you can just change both day options to month, and the dates will all be set to the 1st of their respective month)
You can try with:
SELECT date,
count(if(charges>2500, 1, NULL)) as countGt2500,
sum(if(charges>2500, charges, 0)) as totalGt2500,
count(charges) as countTotal,
sum(charges) as sumTotal,
FROM yourTable
WHERE date >= '2010/04/01'
AND date <= '2010/04/30'
GROUP BY date;
If you saved the full datetime on the field date you have to extract the date part from the datetime, to do it you can use the DATE function on the following way:
SELECT DATE(date) as day,
count(if(charges>2500, 1, NULL)) as countGt2500,
sum(if(charges>2500, charges, 0)) as totalGt2500,
count(charges) as countTotal,
sum(charges) as sumTotal,
FROM yourTable
WHERE date >= '2010/04/01'
AND date <= '2010/04/30'
GROUP BY day;
Related
REVISED POST
I need a query with a desired output shown in bullet #2. Below is a simple query of the data for a specific inventoryno. Notice avgcost can fluctuate for any given date. I need the highest avgcost on the most recent date, distinct to the inventoryno.
Note I have included sample snippets for additional reference however stackoverflow links my images instead of pasting directly here because I am a new OP.
Current query and output
select inventoryno, avgcost, dts
from invtrans
where DTS < '01-JAN-23'
order by dts desc;
INVENTORYNO
AVGCOST
DTS
264
52.36411
12/31/2022
264
52.36411
12/31/2022
264
52.36411
12/31/2022
507
149.83039
12/31/2022
6005
57.45968
12/31/2022
6005
57.45968
12/31/2022
6005
57.45968
12/31/2022
1518
4.05530
12/31/2022
1518
4.05530
12/31/2022
1518
4.05530
12/31/2022
1518
4.15254
12/31/2022
1518
4.15254
12/31/2022
1518
4.1525
12/31/2022
365
0.00000
2/31/2022
365
0.00000
2/31/2022
365
0.00000
2/31/2022
Snippet for above
My proposed query which doesn't work due to 'not a single-group group function
Select distinct inventoryno, Max(avgcost), max(dts)
from invtrans
where DTS < '01-JAN-23'
order by inventoryno;
DESIRED OUTPUT
INVENTORYNO
AVGCOST
DTS
264
52.36411
12/31/2022
507
149.83039
12/31/2022
6005
57.45968
12/31/2022
1518
4.15254
12/31/2022
365
0.00000
2/31/2022
Desired for above snippet
I have included the raw table with a few rows below for better context.
Raw table for reference
select * from invtrans
KEY
SOURCE
INVENTORYNO
WAREHOUSENO
QUANTITY
QOH
AVGCOST
DTS
EMPNO
INVTRANSNO
TOTALAMT
CO_ID
1805
INVXFER
223
3
1200
2811
0.78377
5/22/2018
999
112029
940.80000
1
076394
PROJ
223
3
-513
2298
0.78376
5/23/2018
999
112030
-402.19000
1
111722
APVCHR
223
3
3430
5728
0.79380
6/1/2018
999
112033
2862.68000
1
073455
PROJ
223
3
-209
5519
0.79392
6/8/2018
999
112034
-163.86000
1
076142
PROJ
223
3
-75
5444
0.79396
6/12/2018
999
112035
-58.80000
1
073492
PROJ
223
3
-252
5192
0.79411
6/13/2018
999
112036
-197.57000
1
072377
PROJ
223
3
-1200
3992
0.79414
8/22/2018
999
112056
-952.80000
1
If anyone could assist me further, it would be ideal for the query below to contain the 'avgcost' column. Otherwise I can take the fixed query from step 2 and the one below to excel and combine there, but would prefer not to.
Remember, Avgcost NEEDS to be the maximum avgcost based on the most recent date. I cannot figure it out. Thank you.
select inventoryno,
count(inventoryno),
MAX(DTS),
sum(quantity),
sum(totalamt)
from invtrans
where DTS < '01-JAN-23'
group by inventoryno
order by inventoryno;
INVENTORYNO
COUNT(INVENTORYNO)
MAX(DTS)
SUM(QUANTITY)
SUM(TOTALAMT)
1
103
11/28/2022 7:07:46 AM
75
1153.46
10
888
9/26/2022 9:31:20 AM
0
0
100
1287
12/31/2022
162
70486.77
1001
241
11/28/2022 7:27:04 PM
181
14207.43
1002
759
12/31/2022
566
76424.46
1003
936
12/31/2022
120
25252.61
1004
263
11/30/2022 10:48:00 AM
550
1627.62
1005
487
11/28/2022 5:05:56 PM
750
4435.51
1006
9
11/23/2022 8:38:05 AM
1311
504.63
1008
13
11/30/2022 10:48:00 AM
0
0
1009
38
10/31/2022 6:50:27 AM
90
2680.36
101
535
12/31/2022
79
48153.44
102
238
11/28/2022 6:42:01 PM
24
17802.91
1020
2
12/13/2019
50
119.89
1021
262
12/31/2022
2000
4844.37
1022
656
11/23/2022 4:49:35 PM
300
1315.17
1023
1693
12/31/2022
1260
2002.56
1025
491
11/28/2022 5:05:56 PM
225
864.75
1026
62
9/23/2022 4:35:14 PM
375
11956.17
1027
109
10/28/2022 8:44:21 AM
300
2157.97
1028
39
9/4/2019 12:30:00 AM
50
244.62
Example output of what I ultimately need
I'm trying to do a weekly forecast in FBProphet for just 5 weeks ahead. The make_future_dataframe method doesn't seem to be working right....makes the correct one week intervals except for one week between jul 3 and Jul 5....every other interval is correct at 7 days or a week. Code and output below:
INPUT DATAFRAME
ds y
548 2010-01-01 3117
547 2010-01-08 2850
546 2010-01-15 2607
545 2010-01-22 2521
544 2010-01-29 2406
... ... ...
4 2020-06-05 2807
3 2020-06-12 2892
2 2020-06-19 3012
1 2020-06-26 3077
0 2020-07-03 3133
CODE
future = m.make_future_dataframe(periods=5, freq='W')
future.tail(9)
OUTPUT
ds
545 2020-06-12
546 2020-06-19
547 2020-06-26
548 2020-07-03
549 2020-07-05
550 2020-07-12
551 2020-07-19
552 2020-07-26
553 2020-08-02
All you need to do is create a dataframe with the dates you need for predict method. utilizing the make_future_dataframe method is not necessary.
I need to get the average time between phone calls per agent at one of our call centers. These are the queries I have thus far:
--This query gets all the agentIDs and their callstarts and callends, and as far as I know, attaches an incrementing row-number to them.
drop table #thuranin
SELECT AgentID, CallStart, CallEnd, row_number() over (order by (select NULL)) AS rowInt
INTO #thuranin
FROM Main.CallRecord
WHERE DialerPoolManagementID is null and incomingcall = 0
ORDER BY AgentID
--This query attempts to get the time between each call
drop table #ploto
SELECT nin.AgentID, (CAST(ISNULL(th.CallStart, 0) - nin.CallEnd AS float)) AS AverageTimeBetween
INTO #ploto
FROM #thuranin nin
LEFT JOIN #thuranin AS th ON th.rowInt = (SELECT MIN(rowInt) FROM #thuranin WHERE rowInt > #thuranin.rowInt AND AgentID=th.AgentID)
--This query should average the times by agent
SELECT agentID, AVG(ABS(AverageTimeBetween)) as avGTimebwn
FROM #ploto
WHERE ABS(AverageTimeBetween) > 20000
GROUP BY AgentID
However, this fails (hence why I'm here). It returns values in the -40000's, and I'm not entirely sure why. I need to get the average amount of time between a call end and a call start per agent.
I know that calls at the end of the day to start of the next day could be inflating that number, but I'm unsure of how to deal with that either.
Here's a sample hundred rows from the #thuranin temporary table, if it helps:
AgentID CallStart CallEnd rowInt
NULL 2013-05-29 13:48:39.000 2013-05-29 13:57:20.000 139541
191 2013-05-29 13:50:16.000 2013-05-29 13:50:43.000 139581
NULL 2013-05-29 13:52:04.000 2013-05-29 13:52:46.000 139621
115 2013-05-29 13:53:20.000 2013-05-29 13:53:21.000 139661
190 2013-05-29 13:56:27.000 2013-05-29 13:57:59.000 139701
NULL 2013-05-29 13:58:46.000 2013-05-29 13:59:44.000 139741
171 2013-05-03 18:37:07.000 2013-05-03 18:37:14.000 139781
NULL 2013-05-03 18:39:49.000 2013-05-03 18:41:52.000 139821
107 2013-05-03 18:42:32.000 2013-05-03 18:42:38.000 139861
184 2013-05-03 18:45:38.000 2013-05-03 18:46:08.000 139901
NULL 2013-05-03 18:47:07.000 2013-05-03 18:47:57.000 139941
31 2013-06-14 15:22:02.000 2013-06-14 15:22:44.000 139981
31 2013-06-14 15:24:47.000 2013-06-14 15:25:16.000 140021
31 2013-06-14 15:29:10.000 2013-06-14 15:29:11.000 140061
31 2013-06-14 15:33:57.000 2013-06-14 15:34:06.000 140101
31 2013-06-14 15:41:32.000 2013-06-14 15:42:18.000 140141
172 2013-04-24 21:48:47.000 2013-04-24 21:51:45.000 140181
169 2013-04-24 21:50:42.000 2013-04-24 21:50:53.000 140221
65 2013-04-24 21:52:47.000 2013-04-24 21:52:54.000 140261
169 2013-04-24 21:57:49.000 2013-04-24 21:57:57.000 140301
NULL 2013-04-24 22:04:59.000 2013-04-24 22:06:11.000 140341
31 2013-06-20 14:37:45.000 2013-06-20 14:38:29.000 140381
31 2013-06-20 14:40:27.000 2013-06-20 14:41:09.000 140421
31 2013-06-20 14:44:05.000 2013-06-20 14:44:39.000 140461
31 2013-06-20 14:50:53.000 2013-06-20 14:51:17.000 140501
31 2013-06-20 14:58:52.000 2013-06-20 14:59:24.000 140541
31 2013-07-10 19:54:21.000 2013-07-10 19:54:31.000 140581
31 2013-07-10 20:01:24.000 2013-07-10 20:01:51.000 140621
31 2013-07-10 20:06:23.000 2013-07-10 20:07:14.000 140661
31 2013-07-10 20:09:46.000 2013-07-10 20:09:56.000 140701
31 2013-07-10 20:12:10.000 2013-07-10 20:12:49.000 140741
31 2013-07-10 20:14:45.000 2013-07-10 20:14:59.000 140781
175 2013-07-01 22:35:54.000 2013-07-01 22:36:14.000 140821
191 2013-07-01 22:42:29.000 2013-07-01 22:43:43.000 140861
175 2013-07-01 22:49:42.000 2013-07-01 22:49:57.000 140901
107 2013-07-01 22:59:39.000 2013-07-01 23:00:48.000 140941
191 2013-07-01 23:09:52.000 2013-07-01 23:10:52.000 140981
NULL 2013-04-02 15:47:14.000 2013-04-02 15:48:06.000 141021
NULL 2013-04-02 15:48:48.000 2013-04-02 15:49:07.000 141061
NULL 2013-04-02 15:50:03.000 2013-04-02 15:50:53.000 141101
196 2013-04-02 15:52:05.000 2013-04-02 15:52:52.000 141141
NULL 2013-04-02 15:53:03.000 2013-04-02 15:53:06.000 141181
NULL 2013-05-08 16:17:54.000 2013-05-08 16:18:10.000 141221
140 2013-05-08 16:19:53.000 2013-05-08 16:20:05.000 141261
188 2013-05-08 16:21:34.000 2013-05-08 16:38:04.000 141301
NULL 2013-05-08 16:23:22.000 2013-05-08 16:25:02.000 141341
NULL 2013-05-08 16:25:16.000 2013-05-08 16:27:02.000 141381
31 2013-07-01 23:13:21.000 2013-07-01 23:14:24.000 141421
31 2013-07-01 23:24:23.000 2013-07-01 23:25:23.000 141461
31 2013-07-01 23:40:14.000 2013-07-01 23:40:50.000 141501
31 2013-07-01 23:44:35.000 2013-07-01 23:45:18.000 141541
31 2013-07-01 23:51:58.000 2013-07-01 23:54:33.000 141581
31 2013-07-02 13:03:17.000 2013-07-02 13:04:21.000 141621
158 2013-07-02 13:10:14.000 2013-07-02 13:11:09.000 141661
189 2013-07-02 13:13:48.000 2013-07-02 13:13:55.000 141701
202 2013-07-02 13:16:42.000 2013-07-02 13:16:42.000 141741
107 2013-07-02 13:19:31.000 2013-07-02 13:19:48.000 141781
31 2013-07-02 13:22:31.000 2013-07-02 13:24:44.000 141821
NULL 2013-03-21 18:59:22.000 2013-03-21 19:00:20.000 141861
NULL 2013-03-21 19:01:20.000 2013-03-21 19:01:30.000 141901
112 2013-03-21 19:03:29.000 2013-03-21 19:04:02.000 141941
159 2013-03-21 19:05:27.000 2013-03-21 19:06:31.000 141981
169 2013-03-21 19:07:25.000 2013-03-21 19:08:32.000 142021
NULL 2013-03-15 14:03:40.000 2013-03-15 14:04:14.000 142061
NULL 2013-03-15 14:04:41.000 2013-03-15 14:05:01.000 142101
NULL 2013-03-15 14:06:08.000 2013-03-15 14:07:10.000 142141
NULL 2013-03-15 14:07:47.000 2013-03-15 14:08:48.000 142181
65 2013-03-15 14:09:02.000 2013-03-15 14:09:17.000 142221
183 2013-05-21 17:25:14.000 2013-05-21 17:26:29.000 142261
NULL 2013-05-21 17:27:59.000 2013-05-21 17:28:35.000 142301
NULL 2013-05-21 17:31:42.000 2013-05-21 17:32:47.000 142341
166 2013-05-21 17:35:05.000 2013-05-21 17:36:01.000 142381
182 2013-05-21 17:37:38.000 2013-05-21 17:37:48.000 142421
166 2013-05-21 17:39:46.000 2013-05-21 17:40:21.000 142461
166 2013-04-24 22:13:50.000 2013-04-24 22:14:46.000 142501
65 2013-04-24 22:22:18.000 2013-04-24 22:22:21.000 142541
182 2013-04-24 22:25:54.000 2013-04-24 22:26:01.000 142581
116 2013-04-24 22:31:14.000 2013-04-24 22:31:23.000 142621
182 2013-04-24 22:35:55.000 2013-04-24 22:36:10.000 142661
31 2013-06-20 15:12:42.000 2013-06-20 15:13:39.000 142701
31 2013-06-20 15:20:08.000 2013-06-20 15:20:28.000 142741
31 2013-06-20 15:23:29.000 2013-06-20 15:23:45.000 142781
31 2013-06-20 15:26:39.000 2013-06-20 15:27:06.000 142821
31 2013-06-20 15:28:57.000 2013-06-20 15:29:44.000 142861
NULL 2013-04-24 22:37:50.000 2013-04-24 22:38:37.000 142901
NULL 2013-04-24 22:40:07.000 2013-04-24 22:41:41.000 142941
116 2013-04-24 22:45:09.000 2013-04-24 22:45:24.000 142981
187 2013-04-24 22:48:15.000 2013-04-24 22:48:24.000 143021
NULL 2013-04-24 22:54:57.000 2013-04-24 22:55:33.000 143061
NULL 2013-05-01 21:36:20.000 2013-05-01 21:37:44.000 143101
NULL 2013-05-01 21:39:56.000 2013-05-01 21:40:11.000 143141
NULL 2013-05-01 21:43:57.000 2013-05-01 21:46:34.000 143181
NULL 2013-05-01 21:49:29.000 2013-05-01 21:49:43.000 143221
NULL 2013-05-01 21:56:55.000 2013-05-01 21:57:26.000 143261
NULL 2013-05-01 22:03:51.000 2013-05-01 22:04:34.000 143301
85 2013-07-10 20:16:50.000 2013-07-10 20:16:52.000 143341
31 2013-07-10 20:19:46.000 2013-07-10 20:20:00.000 143381
31 2013-07-10 20:24:22.000 2013-07-10 20:25:03.000 143421
31 2013-07-10 20:26:23.000 2013-07-10 20:27:32.000 143461
31 2013-07-10 20:28:03.000 2013-07-10 20:28:51.000 143501
I really tried to understand your #ploto generation but couldn't. For example I didn't understand the join and what you are selecting and why would you want to get 0 when it is null (which means 1900/01/01). Also subtracting date times and casting to float is a hard to follow way of getting time as a day fraction.
From your description this is what I inferred:
WITH ploto
AS (SELECT AgentId,
CallEnd,
LEAD(CallStart) OVER (PARTITION BY agentId ORDER BY CallStart) AS nextCall
FROM #thuranin)
SELECT AgentId,
AVG(DATEDIFF(SECOND, CallEnd, nextCall)) AS average
FROM ploto
WHERE nextCall > CallEnd
AND DATEDIFF(HOUR, callEnd, nextCall) < 6
GROUP BY AgentID;
With your sample data set output is (in seconds):
AgentId Average
NULL 287
31 326
65 1764
116 826
166 225
169 416
175 808
182 594
191 1569
And here is SQLFiddle link.
EDIT: Some explanation. You didn't specify version so I assumed at least MS SQL 2012. There are CallStarts (and CallEnd) that overlap with the previous (AgentId NULL). I removed them from check. Also arbitrarily assumed if there is more than 6 hours between a CallEnd and next CallStart then I should think employee's work has ended and she\he is gone home, that one shouldn't count.
I have a very large time series dataset, I would like to do a count() on close_p but a sum() on prd_vlm.
open_p high_p low_p close_p tot_vlm prd_vlm
datetime
2005-09-06 16:33:00 1234.25 1234.50 1234.25 1234.25 776 98
2005-09-06 16:34:00 1234.50 1234.75 1234.25 1234.50 1199 423
2005-09-06 16:35:00 1234.50 1234.50 1234.25 1234.50 1330 131
...
2017-06-25 18:41:00 2431.75 2432.00 2431.75 2432.00 5436 189
2017-06-25 18:42:00 2431.75 2432.25 2431.75 2432.25 5654 218
2017-06-25 18:43:00 2432.25 2432.75 2432.25 2432.75 5877 223
2017-06-25 18:44:00 2432.75 2432.75 2432.50 2432.75 5894 17
2017-06-25 18:45:00 2432.50 2432.50 2432.25 2432.25 6098 204
I can achieve this using the following code. But was wondering if there is a better way of achieve this using an apply function
group_count = df['close_p'].groupby(pd.TimeGrouper('D')).count()
group_volume = df['prd_vlm'].groupby(pd.TimeGrouper('D')).sum()
grouped = pd.concat([group_count,group_volume], axis=1)
print(grouped)
close_p prd_vlm
datetime
2005-09-06 232 4776.0
2005-09-07 1039 631548.0
2005-09-08 999 544112.0
2005-09-09 810 595044.0
You can use agg and apply different functions to different columns.
df.groupby(pd.TimeGrouper('D')).agg({'close_p':'count','prd_vlm':'sum'})
I have a table with columns and value like
ID Values FirstCol 2ndCol 3rdCol 4thCol 5thCol
1 1stValue 5466 34556 53536 54646 566
1 2ndValue 3544 957 667 1050 35363
1 3rdValue 1040 1041 4647 6477 1045
1 4thValue 1048 3546 1095 1151 65757
2 1stValue 845 5466 86578 885 859
2 2ndValue 35646 996 1300 7101 456467
2 3rdValue 102 46478 565 657 107
2 4thValue 5509 55110 1411 1152 1144
3 1stValue 845 854 847 884 675
3 2ndValue 984 994 4647 1041 1503
3 3rdValue 1602 1034 1034 1055 466
3 4thValue 1069 1610 6111 1124 1144
Now I want a result set in below form, is this possible with Pivot or Case statment?
ID Cols 1stValue 2ndValue 3rdValue 4thValue
1 FirstCol 5466 3544 1040 1048
1 2ndCol 34556 957 1041 3546
1 3rdCol 53536 667 4647 1095
1 4thCol 54646 1050 6477 1151
1 5thCol 566 35363 1045 65757
2 FirstCol 845 35646 102 5509
2 2ndCol 5466 996 46478 55110
2 3rdCol 86578 1300 565 1411
2 4thCol 885 7101 657 1152
2 5thCol 859 456467 107 1144
3 FirstCol 845 984 1602 1069
3 2ndCol 854 994 1034 1610
3 3rdCol 847 4647 1034 6111
3 4thCol 884 1041 1055 1124
3 5thCol 675 1503 466 1144
Assuming the table name is t1 this should do the trick:
SELECT * FROM t1
UNPIVOT (val FOR name IN ([FirstCol], [2ndCol], [3rdCol], [4thCol], [5thCol])) unpiv
PIVOT (SUM(val) FOR [Values] IN ([1stValue], [2ndValue], [3rdValue], [4thValue])) piv
There's sorting issue, it'd be good to rename FirstCol to 1stCol, then ORDER BY ID, name would put it in required order.