Creating a Lookup Matrix in Microsoft Access - sql

I have the matrix below in Excel and want to import it into Access (2016) to then use in queries. The aim is to be able to lookup values based on the row and column. Eg lookup criteria of 10 and 117 should return 98.1.
Is this possible? I'm an Access novice and don't know where to start.
.
10
9
8
7
6
5
4
3
2
1
0
120
100.0
96.8
92.6
86.7
78.8
68.2
54.4
37.5
21.3
8.3
0.0
119
99.4
96.2
92.0
86.2
78.5
67.9
54.3
37.5
21.3
8.3
0.0
118
98.7
95.6
91.5
85.8
78.1
67.7
54.1
37.4
21.2
8.3
0.0
117
98.1
95.1
90.9
85.3
77.8
67.4
54.0
37.4
21.2
8.3
0.0
116
97.4
94.5
90.3
84.8
77.4
67.1
53.8
37.4
21.1
8.3
0.0
115
96.8
93.9
89.8
84.4
77.1
66.9
53.7
37.3
21.1
8.3
0.0

Consider creating a table with 3 columns to store this data:
Value1 - numeric
Value2 - numeric
LookupValue - currency
You can then use DLookup to get the value required:
?DLookup("LookupValue","LookupData","Value1=117 AND Value2=10")
If you have the values stored in variables, then you need to concatenate them in:
lngValue1=117
lngValue2=10
Debug.Print DLookup("LookupValue","LookupData","Value1=" & lngValue1 & " AND Value2=" & lngValue2)

Related

Passing some rows based on a threshold

I am wondering how can I adapt the code (or add extra calls) below to pass some rows based on a threshold (here, a percentage value). My xyz file:
21.4 7
21.5 141
21.6 4
21.7 43
21.8 26
21.9 133
22 305
22.1 216
22.2 93
22.3 33
22.4 13
22.7 23
22.8 2
22.9 10
23 39
23.1 22
23.2 33
23.3 8
23.4 9
23.5 2
23.6 270
23.7 724
23.8 2349
23.9 2
24 1
24.1 11
24.2 376
24.3 1452
24.4 92
with the following awk call I obtain the corresponding percentage of the value in the 3rd column:
awk 'FNR==NR { s+=$2; next; } { printf "%s\t%s\t%s%%\n", $1, $2, 100*$2/s }' xyz xyz | sort -k3 -g
which gives:
24 1 0.0155304%
22.8 2 0.0310607%
23.5 2 0.0310607%
23.9 2 0.0310607%
21.6 4 0.0621214%
21.4 7 0.108713%
23.3 8 0.124243%
23.4 9 0.139773%
22.9 10 0.155304%
24.1 11 0.170834%
22.4 13 0.201895%
23.1 22 0.341668%
22.7 23 0.357198%
21.8 26 0.403789%
22.3 33 0.512502%
23.2 33 0.512502%
23 39 0.605684%
21.7 43 0.667806%
24.4 92 1.42879%
22.2 93 1.44432%
21.9 133 2.06554%
21.5 141 2.18978%
22.1 216 3.35456%
23.6 270 4.1932%
22 305 4.73676%
24.2 376 5.83942%
23.7 724 11.244%
24.3 1452 22.5501%
23.8 2349 36.4808%
So, I want to automagically filter the last Nth rows if the sum of the last values in the 3rd column is just greater than 60%, in the case above it will be 36.4808% + 22.5501% + 11.244% = 70.2749%:
23.7 724 11.244%
24.3 1452 22.5501%
23.8 2349 36.4808%
Any hints are appreciated,
That could be done in a single awk command, but I think this version is shorter:
awk -v OFS="\t" 'FNR==NR { s+=$2; next; } { $3=100*$2/s "%" }1' xyz xyz |
sort -k3 -g |
awk '(t+=$3)>40'
prints out:
23.7 724 11.244%
24.3 1452 22.5501%
23.8 2349 36.4808%
Assumptions:
for rows with duplicate values in the 2nd column, precendence goes to rows with a higher row number (via awk/FNR)
One GNU awk idea:
awk '
BEGIN {OFS="\t" }
{ a[$2][FNR]=$1 # [FNR] needed to distinguish between rows with duplicate values in 2nd column
s+=$2
}
END { PROCINFO["sorted_in"]="#ind_num_desc" # sort array by numeric index (descending order)
for (i in a) { # loop through array (sorted by $2 descending)
for (j in a[i]) { # loop through potential duplicate $2 values (sorted by FNR descending)
pct=100*i/s
out[--c]=a[i][j] OFS i OFS pct "%" # build output line, store in out[] array, index= -1, -2, ...
sum+=pct
if (sum > 60) break # if the sum of percentages > 60 then break from loop
}
if (sum > 60) break # if the sum of percentages > 60 then break from loop
}
for (i=c;i<0;i++) # print contents of out[] array starting with lowest index and increasing to -1
print out[i]
}
' xyz
NOTE: requires GNU awk for:
multidimensional arrays (aka array of arrays)
PROCINFO["sorted_in"] support
This generates:
23.7 724 11.244%
24.3 1452 22.5501%
23.8 2349 36.4808%

Add column for percentages

I have a df who looks like this:
Total Initial Follow Sched Supp Any
0 5525 3663 968 296 65 533
I transpose the df 'cause I have to add a column with the percentages based on column 'Total'
Now my df looks like this:
0
Total 5525
Initial 3663
Follow 968
Sched 296
Supp 65
Any 533
So, How can I add this percentage column?
The expected output looks like this
0 Percentage
Total 5525 100
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6
Do you know how can I add this new column?
I'm working in jupyterlab with pandas and numpy
Multiple column 0 by scalar from Total with Series.div, then multiple by 100 by Series.mul and last round by Series.round:
df['Percentage'] = df[0].div(df.loc['Total', 0]).mul(100).round(1)
print (df)
0 Percentage
Total 5525 100.0
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6
Consider below df:
In [1328]: df
Out[1328]:
b
a
Total 5525
Initial 3663
Follow 968
Sched 296
Supp 65
Any 533
In [1327]: df['Perc'] = round(df.b.div(df.loc['Total', 'b']) * 100, 1)
In [1330]: df
Out[1330]:
b Perc
a
Total 5525 100.0
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6

pands filter df by aggregate failing

My df:
Plate Route Speed Dif Latitude Longitude
1 724TL054M RUTA 23 0 32.0 19.489872 -99.183970
2 0350021 RUTA 35 0 33.0 19.303572 -99.083700
3 0120480 RUTA 12 0 32.0 19.356400 -99.125694
4 1000106 RUTA 100 0 32.0 19.212614 -99.131874
5 0030719 RUTA 3 0 36.0 19.522831 -99.258500
... ... ... ... ... ... ...
1617762 923CH113M RUTA 104 0 33.0 19.334467 -99.016880
1617763 0120077 RUTA 12 0 32.0 19.302448 -99.084530
1617764 0470053 RUTA 47 0 33.0 19.399706 -99.209190
1617765 0400070 CETRAM 0 33.0 19.265041 -99.163290
1617766 0760175 RUTA 76 0 33.0 19.274513 -99.240150
I want to filter those plates which, when their summed Dif (and hence I did a groupby) is bigger than 3600 (1 hour, since Dif is seconds), keep those. Otherwise discard them.
I tried (after a post from here):
df.groupby('Plate').filter(lambda x: x['Dif'].sum() > 3600)
But I still get about 60 plates with under 3600 as sum:
df.groupby('Plate').agg({'Dif':'sum'}).reset_index().nsmallest(60, 'Dif')
Plate Dif
952 655NZ035M 268.0
1122 949CH002C 814.0
446 0440220 1318.0
1124 949CH005C 1334.0
1042 698NZ011M 1434.0
1038 697NZ011M 1474.0
1 0010193 1509.0
282 0270302 1513.0
909 614NZ021M 1554.0
156 0140236 1570.0
425 0430092 1577.0
603 0620123 1586.0
510 0530029 1624.0
213 0180682 1651.0
736 0800126 1670.0
I have been some hours into this and I cant solve it. Any help is appreciated.
Assign it back
df = df.groupby('Plate').filter(lambda x: x['Dif'].sum() > 3600)
Then
df.groupby('Plate').agg({'Dif':'sum'}).reset_index().nsmallest(60, 'Dif')

Value error while trying to group by the pandas data frame and get the row having maximum value for one of the columns

I am new to use data frame and trying to group by the below data-frame by the field Name and want to get the rows having maximum value for the column 'high'.
Name date high low
0 20MICRONS 06-03-2020 31.55 27.45
1 AGRITECH 06-03-2020 33.2 30.2
2 20MICRONS 09-03-2020 30 26.85
3 AGRITECH 09-03-2020 30.45 26.4
4 AGRITECH 11-03-2020 28.75 26.55
5 INFY 11-03-2020 695.95 669.05
6 20MICRONS 13-03-2020 24.7 19.45
7 AGRITECH 13-03-2020 26.45 22.55
8 INFY 06-03-2020 744 729.1
9 INFY 09-03-2020 725.85 697
10 20MICRONS 11-03-2020 28.25 24.65
11 20MICRONS 12-03-2020 28.7 21.5
12 AGRITECH 12-03-2020 28.5 24.85
13 INFY 12-03-2020 670 627.5
14 INFY 13-03-2020 667 570
required output with the maximum of 'high' column for all the stocks group-wise:
Name date high low
8 INFY 06-03-2020 744 729.1
1 AGRITECH 06-03-2020 33.2 30.2
0 20MICRONS 06-03-2020 31.55 27.45
likewise it would be helpful if I get the minimum as well. i have tried max() and idxmax() functions as below but i recive value error. could you please help me.
df[['IndexName','date','high']].loc[df[['IndexName','date','high']].reset_index().groupby(['IndexName'])['high'].idxmax()]
did not help but.
i am able to solve my problem as below using an additional for loop and comparison.
grouped=df.reset_index().groupby('Name')
dfinal= pd.DataFrame(columns=['Name','date','high','low'])
if(sys_Arg_highlow==1):
for group in grouped:
dfinal = dfinal.append(group[group['high']==group['high'].max()])
else:
for group in grouped:
dfinal = dfinal.append(group[group['low']==group['low'].min()])
However , i would like to know if there are any better ways of doing this. thank you.

Plotting Charts for monthly counts per company

I want to create a program that prints out bar charts or CSV files for monthly counts per company. So I should have a graph for January which has all the companies on the x axis and the counts on the y axis
So I am able to split my date in to month and year and I want that to be the heading. So I am able to program my df table to be this:
Date Modified Company
2019-01 Apple 113 0.0
Blackberry 66 0.0
LG 73 0.0
Linux 115 0.0
Microsoft 187 0.0
Panasonic 336 0.0
Samsung 151 0.0
2019-02 Apple 151 0.0
Blackberry 163 0.0
LG 301 0.0
Linux 108 0.0
Microsoft 199 0.0
Panasonic 142 0.0
Samsung 304 0.0
2019-03 Apple 358 0.0
Blackberry 230 0.0
LG 288 0.0
Linux 464 0.0
Microsoft 53 0.0
Panasonic 113 0.0
Samsung 177 0.0
df['Date Modified']=pd.to_datetime(df['Date']).dt.to_period('M')
df=df.groupby(["Date Modified","Company"]).sum()
print(df)
df = pd.read_csv("Sample_Data.csv")
df['Date Modified']=pd.to_datetime(df['Date']).dt.to_period('M')
df=df.groupby(["Date Modified","Company"]).sum()
So there's currently nothing faulty with this program. I want to create monthly graphs with every company listed on the x axis and the count on the y axis with a title containg the month and year so for e.g 2019-03 or 2019-02
months = df.index.levels[0]
for month in months:
data = df.loc[month]
data.plot(kind='bar', align='center', title =str(month), legend=True)