Creating Groups of Consecutive Values in Access Query - sql

To be clear, I'm not a developer, I'm just a business analyst trying to achieve something in Access which has stumped me.
I have a table of values as such:
Area Week
232 1
232 2
232 3
232 4
232 5
232 6
232 7
232 8
232 9
232 10
232 11
232 12
232 35
232 36
232 37
232 38
232 39
232 41
232 42
232 43
232 44
232 45
232 46
232 47
232 48
232 49
232 50
232 51
232 52
330 1
330 2
330 3
330 4
330 33
330 34
330 35
330 36
330 37
330 38
330 39
330 40
330 41
330 42
330 43
330 44
330 45
330 47
330 48
330 49
330 50
I would like to create a query using SQL in Access to create grouping as follows:
Area Code Week Start Week End
232 1 12
232 35 39
232 41 52
330 1 4
330 33 45
330 47 50
However everything I have read leads me to use the ROWNUM() function which is not native to Access.
I'm OK with general queries in Access, but am not very familiar with SQL.
How can I go about achieving my task?
Thanks
Mike

Use another database! MS Access doesn't have good functionality (in general).
You can do what you want, but it is expensive:
select area, min(week), max(week)
from (select t.*,
(select count(*)
from t as t2
where t2.area = t.area and t2.week <= t.week
) as seqnum
from t
) as t
group by area, (week - seqnum);
The correlated subquery is essentially doing row_number().

Related

pandas df add new column based on proportion of two other columns from another dataframe

I have df1 which has three columns (loadgroup, cartons, blocks) like this
loadgroup
cartons
blocks
cartonsPercent
blocksPercent
1
2269
14
26%
21%
2
1168
13
13%
19%
3
937
8
11%
12%
4
2753
24
31%
35%
5
1686
9
19%
13%
total(sum of column)
8813
68
100%
100%
The interpretation is like this: out of df1 26% cartons which is also 21% of blocks are assigned to loadgroup 1, etc. we can assume blocks are 1 to 68, cartons are 1 to 8813.
I also have df2 which also has cartons and blocks columns. but does not have loadgroup.
My goal is to assign loadgroup (1-5 as well) to df2 (100 blocks 29608 cartons in total), but keep the proportions, for example, for df2, 26% cartons 21% blocks assign loadgroup 1, 13% cartons 19% blocks assign loadgroup 2, etc.
df2 is like this:
block
cartons
0
533
1
257
2
96
3
104
4
130
5
71
6
68
7
87
8
99
9
51
10
291
11
119
12
274
13
316
14
87
15
149
16
120
17
222
18
100
19
148
20
192
21
188
22
293
23
120
24
224
25
449
26
385
27
395
28
418
29
423
30
244
31
327
32
337
33
249
34
528
35
528
36
494
37
540
38
368
39
533
40
614
41
462
42
350
43
618
44
463
45
552
46
397
47
401
48
397
49
365
50
475
51
379
52
541
53
488
54
383
55
354
56
760
57
327
58
211
59
356
60
552
61
401
62
320
63
368
64
311
65
421
66
458
67
278
68
504
69
385
70
242
71
413
72
246
73
465
74
386
75
231
76
154
77
294
78
275
79
169
80
398
81
227
82
273
83
319
84
177
85
272
86
204
87
139
88
187
89
263
90
90
91
134
92
67
93
115
94
45
95
65
96
40
97
108
98
60
99
102
total 100 blocks
29608 cartons
I want to add loadgroup column to df2, try to keep those proportions as close as possible. How to do it please? Thank you very much for the help.
I don't know how to find loadgroup column based on both cartons percent and blocks percent. But generate random loadgroup based on either cartons percent or blocks percent is easy.
Here is what I did. I generate 100,000 seeds first, then for each seed, I add column loadgroup1 based on cartons percent, loadgroup2 based on blocks percent, then calculate both percentages, then compare with df1 percentages, get absolute difference, record it. For these 100,000 seeds, I take the minimum difference one as my solution, which is sufficient for my job.
But this is not the optimal solution, and I am looking for quick and easy way to do this. Hope somebody can help.
Here is my code.
df = pd.DataFrame()
np.random.seed(10000)
seeds = np.random.randint(1, 1000000, size = 100000)
for i in range(46530, 46537):
print(seeds[i])
np.random.seed(seeds[i])
df2['loadGroup1'] = np.random.choice(df1.loadgroup, len(df2), p = df1.CartonsPercent)
df2['loadGroup2'] = np.random.choice(df1.loadgroup, len(df2), p = df1.blocksPercent)
df2.reset_index(inplace = True)
three = pd.DataFrame(df2.groupby('loadGroup1').agg(Cartons = ('cartons', 'sum'), blocks = ('block', 'count')))
three['CartonsPercent'] = three.Cartons/three.Cartons.sum()
three['blocksPercent'] = three.blocks/three.blocks.sum()
four = df1[['CartonsPercent','blocksPercent']] - three[['CartonsPercent','blocksPercent']]
four = four.abs()
subdf = pd.DataFrame({'i':[i],'Seed':[seeds[i]], 'Percent':['CartonsPercent'], 'AbsDiff':[four.sum().sum()]})
df = pd.concat([df,subdf])
three = pd.DataFrame(df2.groupby('loadGroup2').agg(Cartons = ('cartons', 'sum'), blocks = ('block', 'count')))
three['CartonsPercent'] = three.Cartons/three.Cartons.sum()
three['blocksPercent'] = three.blocks/three.blocks.sum()
four = df1[['CartonsPercent','blocksPercent']] - three[['CartonsPercent','blocksPercent']]
four = four.abs()
subdf = pd.DataFrame({'i':[i],'Seed':[seeds[i]], 'Percent':['blocksPercent'], 'AbsDiff':[four.sum().sum()]})
df = pd.concat([df,subdf])
df.sort_values(by = 'AbsDiff', ascending = True, inplace = True)
df = df.head(10)
Actually the first row of df will tell me the seed I am looking for, I kept 10 rows just for curiosity.
Here is my solution.
block
cartons
loadgroup
0
533
4
1
257
1
2
96
4
3
104
4
4
130
4
5
71
2
6
68
1
7
87
4
8
99
4
9
51
4
10
291
4
11
119
2
12
274
2
13
316
4
14
87
4
15
149
5
16
120
3
17
222
2
18
100
2
19
148
2
20
192
3
21
188
4
22
293
1
23
120
2
24
224
4
25
449
1
26
385
5
27
395
3
28
418
1
29
423
4
30
244
5
31
327
1
32
337
5
33
249
4
34
528
1
35
528
1
36
494
5
37
540
3
38
368
2
39
533
4
40
614
5
41
462
4
42
350
5
43
618
4
44
463
2
45
552
1
46
397
3
47
401
3
48
397
1
49
365
1
50
475
4
51
379
1
52
541
1
53
488
2
54
383
2
55
354
1
56
760
5
57
327
4
58
211
2
59
356
5
60
552
4
61
401
1
62
320
1
63
368
3
64
311
3
65
421
2
66
458
5
67
278
4
68
504
5
69
385
4
70
242
4
71
413
1
72
246
2
73
465
5
74
386
4
75
231
1
76
154
4
77
294
4
78
275
1
79
169
4
80
398
4
81
227
4
82
273
1
83
319
3
84
177
4
85
272
5
86
204
3
87
139
1
88
187
4
89
263
4
90
90
4
91
134
4
92
67
3
93
115
3
94
45
2
95
65
2
96
40
4
97
108
2
98
60
2
99
102
1
Here are the summaries.
loadgroup
cartons
blocks
cartonsPercent
blocksPercent
1
7610
22
26%
22%
2
3912
18
13%
18%
3
3429
12
12%
12%
4
9269
35
31%
35%
5
5388
13
18%
13%
It's very close to my target though.

BigQuery SQL: determine the number of daily transactions given a moving counter

I've been stuck for hours with writing a SQL query that would solve the following:
Given a history of a daily customer transaction counter, is it possible to specify exactly how many transactions were made each day?
Each datapoint represents sum of all transactions made in the last 30 days (ignore the missing dates)
The counter will decrement if the number of transactions made on the current day was smaller than the number of transactions that are no longer factored in, as they were made 31 days ago. It would increment otherwise.
The complete history of the counter is unavailable, so we don't know the numbers' evolution from the beginning, but only from certain point in time.
Please refer to the following table (for one offer_id):
transaction_date num_transactions
0 21/05/2022 25
1 22/05/2022 26
2 23/05/2022 25
3 24/05/2022 28
4 25/05/2022 30
5 26/05/2022 32
6 27/05/2022 33
7 28/05/2022 34
8 29/05/2022 33
9 30/05/2022 33
10 31/05/2022 34
11 01/06/2022 35
12 02/06/2022 35
13 03/06/2022 59
14 04/06/2022 73
15 07/06/2022 87
16 08/06/2022 98
17 09/06/2022 109
18 10/06/2022 120
19 11/06/2022 123
20 12/06/2022 122
21 13/06/2022 127
22 14/06/2022 142
23 15/06/2022 145
24 16/06/2022 148
25 17/06/2022 156
26 18/06/2022 162
27 19/06/2022 164
28 20/06/2022 167
29 21/06/2022 173
30 22/06/2022 185
31 23/06/2022 194
32 24/06/2022 206
33 25/06/2022 206
34 26/06/2022 208
35 28/06/2022 227
36 29/06/2022 237
37 30/06/2022 241
38 01/07/2022 248
39 02/07/2022 237
40 03/07/2022 230
41 04/07/2022 217
42 05/07/2022 208
43 06/07/2022 214
44 07/07/2022 216
45 08/07/2022 211
46 09/07/2022 203
47 10/07/2022 194
48 11/07/2022 192
49 12/07/2022 195
50 13/07/2022 193
51 14/07/2022 181
52 15/07/2022 174
53 16/07/2022 169
54 17/07/2022 162
55 18/07/2022 162
56 19/07/2022 164
57 20/07/2022 160
58 21/07/2022 163
59 22/07/2022 155
60 23/07/2022 144
61 24/07/2022 134
62 25/07/2022 139
63 26/07/2022 154
For each day (at least starting with 23/06) I'd like to be able to tell what were the numbers of transactions day-by-day in the preceding 30 days that sum up to the transactions counter on that day.
My current code in BigQuery SQL is below. It is obviously wrong - although the calculated counter evolution history does sum up to the right numbers when negative numbers are included, I'm interested in finding out only the actual transaction counts (thus only positive numbers and 0 are in question) for each last 30-days window.
When I include a simple condition that when a decrement happens, let's round it up to 0...:
WHEN IFNULL(transactions_diff_yesterday + transaction_reference, 0) < 0
THEN 0
... the sum for the last 30 days never matches the counter.
WITH outer_base AS(
WITH base AS(
SELECT
*,
LAG(num_transactions, 31) OVER(PARTITION BY offer_id ORDER BY offer_id, transaction_date) as transactions_31_days_ago,
IFNULL(LAG(num_transactions, 30) OVER(PARTITION BY offer_id ORDER BY offer_id, transaction_date), 0) as transactions_30_days_ago,
IFNULL(LAG(transactions, 1) OVER(PARTITION BY offer_id ORDER BY offer_id, transaction_date), 0) as transactions_yesterday
FROM
`my_table`
ORDER BY
offer_id,
transaction_date
)
SELECT
*,
IFNULL(transactions - transactions_yesterday, 0) AS transactions_diff_yesterday,
IFNULL(transactions_30_days_ago - transactions_31_days_ago, 0) AS transaction_reference
FROM
base
)
SELECT
*,
CASE
WHEN IFNULL(transactions_diff_yesterday + transaction_reference, 0) < 0
THEN 0
ELSE
IFNULL(transactions_diff_yesterday + transaction_reference, 0) END
AS real_transactions
FROM
outer_base;

Group questions by answers with SQL

SQL Server table:
userId
QuestionId
Question
AnswerId
Answer
32
98
What is the total salary in your family?
380
4000
32
99
How many are brothers?
385
5
33
98
What is the total salary in your family?
382
3000
33
99
How many are brothers?
385
5
34
98
What is the total salary in your family?
382
3000
34
99
How many are brothers?
385
5
35
98
What is the total salary in your family?
381
5000
35
99
How many are brothers?
384
4
36
98
What is the total salary in your family?
381
5000
36
99
How many are brothers?
383
3
37
98
What is the total salary in your family?
381
5000
37
99
How many are brothers?
383
3
38
98
What is the total salary in your family?
380
4000
38
99
How many are brothers?
385
5
39
98
What is the total salary in your family?
380
4000
39
99
How many are brothers?
385
5
41
98
What is the total salary in your family?
381
5000
41
99
How many are brothers?
383
3
I want to find the list of the number of common answers given to the questions
Example:
salary: 5000 brothers: 3 count = 3 user
Question1Id
Question2Id
Answer1
Answer2
count
98
99
3000
5
2
98
99
4000
5
3
98
99
5000
3
3
98
99
5000
4
1
Here you go:
select
a.questionid, b.questionid,
a.answer as answer1, b.answer as answer2, count(*) as count
from mytable a
join mytable b on a.userid = b.userid
where a.questionid = 98
and b.questionid = 99
group by a.questionid, b.questionid, a.answer, b.answer

pandas how to filter and slice with multiple conditions

Using pandas, how do I return dataframe filtered by value of 2 in 'GEN' column, value 20 in 'AGE' column and exclude columns with name 'GEN' and 'BP'? Thanks in advance:)
AGE GEN BMI BP S1 S2 S3 S4 S5 S6 Y
59 2 32.1 101 157 93.2 38 4 4.8598 87 151
48 1 21.6 87 183 103.2 70 3 3.8918 69 75
72 2 30.5 93 156 93.6 41 4 4.6728 85 141
24 1 25.3 84 198 131.4 40 5 4.8903 89 206
50 1 23 101 192 125.4 52 4 4.2905 80 135
23 1 22.6 89 139 64.8 61 2 4.1897 68 97
20 2 22 90 160 99.6 50 3 3.9512 82 138
66 2 26.2 114 255 185 56 4.5 4.2485 92 63
60 2 32.1 83 179 119.4 42 4 4.4773 94 110
20 1 30 85 180 93.4 43 4 5.3845 88 310
You can do this -
cols = df.columns[~df.columns.isin(['GEN','BP'])]
out=df.loc[(df['GEN'] == 2) & (df['AGE'] == 20),cols]
OR
out=df.query("'GEN'==2 and 'AGE'==20").loc[cols]

Group clause in SQL command

I have 3 tables: Deliveries, IssuedWarehouse, ReturnedStock.
Deliveries: ID, OrderNumber, Material, Width, Gauge, DelKG
IssuedWarehouse: OrderNumber, IssuedKG
ReturnedStock: OrderNumber, IssuedKG
What I'd like to do is group all the orders by Material, Width and Gauge and then sum the amount delivered, issued to the warehouse and issued back to stock.
This is the SQL that is really quite close:
SELECT
DELIVERIES.Material,
DELIVERIES.Width,
DELIVERIES.Gauge,
Count(DELIVERIES.OrderNo) AS [Orders Placed],
Sum(DELIVERIES.DeldQtyKilos) AS [KG Delivered],
Sum(IssuedWarehouse.[Qty Issued]) AS [Film Issued],
Sum([Film Retns].[Qty Issued]) AS [Film Returned],
[KG Delivered]-[Film Issued]+[Film Returned] AS [Qty Remaining]
FROM (DELIVERIES
INNER JOIN IssuedWarehouse
ON DELIVERIES.OrderNo = IssuedWarehouse.[Order No From])
INNER JOIN [Film Retns]
ON DELIVERIES.OrderNo = [Film Retns].[Order No From]
GROUP BY Material, Width, Gauge, ActDelDate
HAVING ActDelDate Between [start date] And [end date]
ORDER BY DELIVERIES.Material;
This groups the products almost perfectly. However if you take a look at the results:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
COEX-GLOSS 590 75 1 534 500 124 158
COEX-MATT 1080 80 1 4226 4226 52 52
CPP 660 38 8 6720 2768 1384 5336
CPP 666 47 1 5677 5716 536 497
CPP 690 65 2 1232 717 202 717
CPP 760 38 3 3444 1318 510 2636
CPP 770 38 4 4316 3318 2592 3590
CPP 786 38 2 672 442 212 442
CPP 800 47 1 1122 1122 116 116
CPP 810 47 1 1127 1134 69 62
CPP 810 47 2 2250 1285 320 1285
CPP 1460 38 12 6540 4704 2442 4278
LD 975 75 1 502 502 182 182
LDPE 450 50 1 252 252 50 50
LDPE 520 70 1 250 250 95 95
LDPE 570 65 2 504 295 86 295
LDPE 570 65 2 508 278 48 278
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
LDPE 670 75 1 248 248 80 80
LDPE 690 47 1 476 476 390 390
LDPE 790 38 2 2104 1122 140 1122
LDPE 790 50 1 286 286 134 134
LDPE 790 50 1 250 250 125 125
LDPE 810 30 1 4062 4062 100 100
LDPE 843 33 1 408 408 835 835
LDPE 850 80 1 412 412 34 34
LDPE 855 30 1 740 740 83 83
LDPE 880 60 1 304 304 130 130
LDPE 900 70 2 1000 650 500 850
LDPE 1017 60 1 1056 1056 174 174
OPP 25 1100 1 381 381 95 95
OPP 1000 30 2 1358 1112 300 546
OPP 1000 30 1 1492 1491 100 101
OPP 1200 20 1 418 417 461 462
PET 760 12 3 1227 1876 132 -517
You'll see that there are some materials that have the same width and gauge yet they are not grouped. I think this is because the delivered qty is different on the orders. For example:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
I would like these two rows to be grouped. They have the same material, width and gauge but the delivered qty is different therefore it hasn't grouped it.
Can anyone help me group these strange rows?
Your "problem" is that the deliveries occurred on different dates, and you're grouping by ActDelDate so the data splits, but because you haven't selected the ActDelDate column, this isn't obvious.
The fix is: Remove ActDelDate from the group by list
You should also remove the unnecessary brackets around the first join, and change
HAVING ActDelDate Between [start date] And [end date]
to
WHERE ActDelDate Between [start date] And [end date]
and have it before the GROUP BY
You are grouping by the delivery date, which is causing the rows to be split. Either omit the delivery date from the results and group by, or take the min/max of the delivery date.