Good Day All,
I need assistance in an creating an update query that groups my data.
The data in my table is actually spatial in nature and can be thought of a matrix that is 10 columns by 5 rows. I have the ObjectID, Row and Column but I want the column DesiredResult which is a 2x2 grouping of the rows & columns.
So the R,Cs of 1,1 1,2, 2,1 and 2,2, will have a DesiredResult of 1 while the 1,3 1,4 2,3 2,4 will have a DesiredResult of 2 and so on (see below for an example) ....
I was able to create the R and C columns using a combination of Quotient & Mod so I assume I would do somethign similar but I am stuck. How would I go about this query in MS Access ?
ObjectID R C DesiredResult
1 1 1 1
2 1 2 1
3 1 3 2
4 1 4 2
5 1 5 3
6 1 6 3
7 1 7 4
8 1 8 4
9 1 9 5
10 1 10 5
11 2 1 1
12 2 2 1
13 2 3 2
14 2 4 2
15 2 5 3
16 2 6 3
17 2 7 4
18 2 8 4
19 2 9 5
20 2 10 5
21 3 1 6
22 3 2 6
23 3 3 7
24 3 4 7
25 3 5 8
26 3 6 8
27 3 7 9
28 3 8 9
29 3 9 10
30 3 10 10
31 4 1 6
32 4 2 6
33 4 3 7
34 4 4 7
35 4 5 8
36 4 6 8
37 4 7 9
38 4 8 9
39 4 9 10
40 4 10 10
41 5 1 11
42 5 2 11
43 5 3 12
44 5 4 12
45 5 5 13
46 5 6 13
47 5 7 14
48 5 8 14
49 5 9 15
50 5 10 15
Something like ... ?
SELECT a.Row, a.Col, Col\2 AS D1, Col Mod 2 AS D2, [D1]+[D2] AS Desired
FROM table AS a
ORDER BY a.Row, a.Col;
Remou had a close approximation but it turns out this gives me what I need. I needed both a row and a column index.
SELECT ObjectID, R, C,
Int(([C]-1)/2) AS ColIndex,
Int(([R]-1)/2) AS RowIndex,
[RowIndex]*5+[ColIndex]+1 AS DesiredResult
FROM Testing
ORDER BY ObjectID
The key in the query is that there is the number 2 in both the Column & Row Index (which is the grouping size) and the number 5 is used in Desired Result and represents the Number of Row cells.
Thanks !
Related
I currently have a table with data like below.
How would I go about grouping by HeaderId and get the distinct HeaderId where it contains multiple specified items?
An example is to return which HeaderId contains NTNB and NMPTN locations. I use the SQL below and then use C# to manually check if they are the same HeaderId .
SELECT DISTINCT([HeaderId]) FROM [dbo].[timings] WHERE Location = 'NTNB'
SELECT DISTINCT([HeaderId]) FROM [dbo].[timings] WHERE Location = 'NMPTN'
An expected outcome looking for distinct HeaderId's containing NTNB and NMPTN for the data below would be 4.
HeaderId Ordinal Location
3 0 KRKYLEJ
3 1 IRNVLJN
3 2 LGML
3 3 TRWLJN
3 4 STAPLFD
3 5 TOTODSL
4 0 CREWBHM
4 1 CREWBHJ
4 2 MADELEY
4 3 NTNB
4 4 STAFFRD
4 5 STAFTVJ
4 6 WHHSJN
4 7 COLWICH
4 8 RUGLYNJ
4 9 RUGL
4 10 LCHTNJ
4 11 AMNGTNJ
4 12 NNTN
4 13 RUGBTVJ
4 14 RUGBY
4 15 HMTNJ
4 16 LNGBKBY
4 17 NMPTN
4 18 HANSLPJ
4 19 MKNSCEN
4 20 DNBGHSJ
4 21 BLTCHLY
4 22 LEDBRNJ
4 23 TRING
4 24 BONENDJ
4 25 WATFDJ
4 26 HROW
4 27 WMBY
4 28 WLSDNBJ
4 29 HARLSJN
4 30 WLSDWLJ
4 31 CMDNJN
4 32 CMDNSTH
4 33 EUSTON
4 34 CMDNSTH
4 35 CMDNJN
4 36 QPRKAC
Aggregate by the HeaderId and then assert that both locations are present:
SELECT HeaderId
FROM timings
WHERE Location IN ('NTNB', 'NMPTN')
GROUP BY HeaderId
HAVING MIN(Location) <> MAX(Location)
How to solve same problem in this link Sum of group but keep the same value for each row in r using pandas?
I can generate separate df have the sum for each group and then merge the generated df with the original.
You can use groupby & transform as below to get your output.
df['sumx']=df.groupby(['ID', 'Group'],sort=False)['x'].transform(sum)
df['sumy']=df.groupby(['ID', 'Group'],sort=False)['y'].transform(sum)
df
output
ID Group x y sumx sumy
1 1 1 1 12 3 25
2 1 1 2 13 3 25
3 1 2 3 14 3 14
4 3 1 4 15 15 48
5 3 1 5 16 15 48
6 3 1 6 17 15 48
7 3 2 7 18 15 37
8 3 2 8 19 15 37
9 4 1 9 20 30 63
10 4 1 10 21 30 63
11 4 1 11 22 30 63
12 4 2 12 23 12 23
I came across this behavior in RODBC (using SQL Server driver):
df1 = data.frame(matrix(c(1:20), nrow=10))
df1
which outputs
X1 X2
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
which makes sense. Then I save the table using RODBC
sqlSave(conout, df1, 'TEST')
Then I switch the two created columns:
df2 = df1[,c(2,1)]
df2
which outputs
X2 X1
1 11 1
2 12 2
3 13 3
4 14 4
5 15 5
6 16 6
7 17 7
8 18 8
9 19 9
10 20 10
which also makes sense.
Seeing those two tables, I see that X1 only contains 1:10 and X2 only contains 11:20. Now, when I do
sqlSave(conout, df2, 'TEST', append=TRUE, fast=FALSE)
sqlQuery(conout, 'SELECT * FROM TEST')
rownames X1 X2
1 1 1 11
2 2 2 12
3 3 3 13
4 4 4 14
5 5 5 15
6 6 6 16
7 7 7 17
8 8 8 18
9 9 9 19
10 10 10 20
11 1 11 1
12 2 12 2
13 3 13 3
14 4 14 4
15 5 15 5
16 6 16 6
17 7 17 7
18 8 18 8
19 9 19 9
20 10 20 10
which is definitely not what I saved. Now three questions:
How is this possible?
Where is this behavior explained in the RODBC manual?
How can I prevent the behavior without reordering my columns (the real case behind this example has > 300 columns).
I want to find the outlier in multiple columns at a time and replace the outlier value with some other value based on two conditions.
sample dataset:
day phone_calls received
1 11 11
2 12 12
3 10 0
4 13 12
5 170 2
6 9 9
7 67 1
8 180 150
9 8 1
10 10 10
find out the outlier range, let's say the range is (8-50), then replace the value: if the column value is less than 8 then replace with 8, and if greater than 50 then replace with 50.
Please help I am new to pandas.
I think need set_index with clip:
df = df.set_index('day').clip(8,50)
print (df)
phone_calls received
day
1 11 11
2 12 12
3 10 8
4 13 12
5 50 8
6 9 9
7 50 8
8 50 50
9 8 8
10 10 10
Or similar with iloc select all columns without first:
df.iloc[:, 1:] = df.iloc[:, 1:].clip(8,50)
print (df)
day phone_calls received
0 1 11 11
1 2 12 12
2 3 10 8
3 4 13 12
4 5 50 8
5 6 9 9
6 7 50 8
7 8 50 50
8 9 8 8
9 10 10 10
EDIT: You can specify columns in list:
cols = ['phone_calls','received']
df[cols] = df[cols].clip(8,50)
print (df)
day phone_calls received
0 1 11 11
1 2 12 12
2 3 10 8
3 4 13 12
4 5 50 8
5 6 9 9
6 7 50 8
7 8 50 50
8 9 8 8
9 10 10 10
I have problem while assigning the Ranks for the below scenarios.In my scenario running total calculated based on the Cnt field.
My sql query should return Rank values like below output. Per page it should accept only 40 rows, so im assigning ranks contain only 40 records. If the running total crossing 40 it should change ranks. For each count 40 it should change the rank values.
It would great help if I can get sql query to return values
select f1,f2,sum(f2) over(order by f1) runnign_total
from [dbo].[Sheet1$]
OutPut:
ID cnt Running Total Rank
1 4 4 1
2 5 9 1
3 4 13 1
4 4 17 1
5 4 21 1
6 5 26 1
7 4 30 1
8 4 34 1
9 4 38 1
10 4 42 2
11 4 46 2
12 4 50 2
13 4 54 2
14 4 58 2
15 4 62 2
16 4 66 2
17 4 70 2
18 4 74 2
19 4 78 2
20 4 82 3
21 4 86 3
22 4 90 3
select f1,f2,sum(f2) over(order by f1) running_total, Floor(sum(f2) over(order by f1) / 40) [rank]
from [dbo].[Sheet1$]