Extracting and Parsing Table from HTML using VBA - vba

I am using Microsoft Office Version 1703.
I have been tasked with:
Creating a weekly Excel sheet using data from AccuWeather Professional for 10 specific locations and have that updated weekly.
Creating historical data going back 4 or 5 years for the same multiple locations. Ideally I'd like to take the time to automate this as it has been considered a long term project.
Now the pos for doing this was originally using Text to Columns in Excel. If I use Text to Column it imports it as an array and I have to use space as a delimiter to break them down into columns and rows correctly before finally hand inputting it into the presentation sheet.
There is a picture of the accuweather site and the information I'm attempting to grab:
When simply copying and pasting the data I receive this as an array for example:
TODAY'S DATE: 2-JUN-17
JUN-17 FOR Monticello White County Airp, IN (676') LAT=40.7N LON= 86.8W
TEMPERATURE PRECIPITATION
ACTUAL NORMAL
HI LO AVG HI LO AVG DEPT AMNT SNOW SNCVR HDD
1 81 48 65 78 55 66 -1 0.00 0.0e 0 0
2 M M M 78 55 67 M M 0.0 0 M
3 M M M 78 56 67 M M 0.0 0 M
4 M M M 79 56 67 M M 0.0 0 M
5 M M M 79 56 68 M M 0.0 0 M
6 M M M 79 57 68 M M 0.0 0 M
7 M M M 79 57 68 M M 0.0 0 M
8 M M M 80 57 69 M M 0.0 0 M
9 M M M 80 58 69 M M 0.0 0 M
10 M M M 80 58 69 M M 0.0 0 M
11 M M M 80 58 69 M M 0.0 0 M
12 M M M 81 58 70 M M 0.0 0 M
13 M M M 81 59 70 M M 0.0 0 M
14 M M M 81 59 70 M M 0.0 0 M
15 M M M 81 59 70 M M 0.0 0 M
16 M M M 81 59 70 M M 0.0 0 M
17 M M M 82 60 71 M M 0.0 0 M
18 M M M 82 60 71 M M 0.0 0 M
19 M M M 82 60 71 M M 0.0 0 M
20 M M M 82 60 71 M M 0.0 0 M
21 M M M 82 60 71 M M 0.0 0 M
22 M M M 82 61 72 M M 0.0 0 M
23 M M M 83 61 72 M M 0.0 0 M
24 M M M 83 61 72 M M 0.0 0 M
25 M M M 83 61 72 M M 0.0 0 M
26 M M M 83 61 72 M M 0.0 0 M
27 M M M 83 61 72 M M 0.0 0 M
28 M M M 83 61 72 M M 0.0 0 M
29 M M M 83 62 73 M M 0.0 0 M
30 M M M 84 62 73 M M 0.0 0 M
TOTALS FOR KMCX
HIGHEST TEMPERATURE 81 TOTAL PRECIP 0.00
LOWEST TEMPERATURE 48 TOTAL SNOWFALL 0.0
AVERAGE TEMPERATURE 64.5 NORMAL PRECIP 4.08
DEPARTURE FROM NORM -2.0 % OF NORMAL PRECIP 0
HEATING DEGREE DAYS 0
NORMAL DEGREE DAYS 0
shows up like this:
The HTML selector is:
body > center > table > tbody > tr > td.pageContent > table > tbody > tr:nth-child(2) > td > table > tbody > tr:nth-child(1) > td > font > table:nth-child(5) > tbody > tr > td > pre
The issue with doing a Web Query is that even if I have Internet Explorer save my password it will not login in Web Query. I managed to frankenstein a VBA script that opens I.E., logs in successfully, and navigates to this intended page. I imagine I could create individual scripts in a sequence to accomplish grabbing the weather data for each specific location fairly easily. The problem I'm having is writing a VBA script to only grab what is between that <pre> I referenced above. Right now I have the script selecting all, copying and pasting it into my sheet.
What I would ideally like to accomplish is Navigate to AccuWeather Pro, succesfull Log In, Pull up historical data for specific location. Grab all the data referenced above, import it into Excel, and format it to my presentation sheet automatically. It'd be even nicer if I could get it to automatically update at least weekly.
Here is my VBA code:
Sub Test()
Dim ieApp As Object
Sheets("Sheet1").Select
Range("A1:A1000") = "" ' erase previous data
Range("A1").Select
Set ieApp = CreateObject("InternetExplorer.Application")
With ieApp
.Visible = True
.Navigate "https://wwwl.accuweather.com/error.php?url=proa.accuweather.com/adcbin/professional/forecast_local.asp?zipcode=47960&mt=pro"
Do While .Busy: DoEvents: Loop
Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
Set ieDoc = .Document
' fill in the login form – View Source from your browser to get the control names
With ieDoc.forms(0)
.UserName.Value = "username"
.Password.Value = "password"
.Submit
End With
Do While .Busy: DoEvents: Loop
Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
' now that we’re in, go to the page we want
.Visible = True
.Navigate "http://proa.accuweather.com/adcbin/professional/historical_index.asp"
Do While .Busy: DoEvents: Loop
Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
.ExecWB 17, 0 ' // SelectAll
.ExecWB 12, 2 ' // Copy selection
ActiveSheet.PasteSpecial Format:="Text", link:=False, DisplayAsIcon:=False
Range("A1").Select
.Quit
.Quit ' just to make sure
End With
End Sub
I did my best to be as thorough, accurate, and correct with my question as possible, I apologize if I've committed any stack exchange social faux pas etc.

Related

Print file with column numbers

My file looks something like this:
--------------------------VREV-C-SEQAETGPCRAMISRWYFDVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCG-----
--P-----------------------RRKL-C-ILHRNPGRCYDKIPAFYYNQKKKQCERFDWSGCGGNSNRFKTIEECRRTCIG----
--------------------------APDF-C-LEPPYDGPCRALHLRYFYNAKAGLCQTFYYGGCLAKRNNFESAEDCMRTC------
How can I add a header with the respective column number with a readable format, a.k.a. adding spaces so that a two digit column number doesn't make it unreadable.
1 2 3 4 5 6 7 8 9 10 11 12 13 ....
- - - - - - - - - - - - - - - - - - - - - - - - - - V R E V...
I need the user to see this output to select the column number where he wants to cut.
The standard way of doing this not modifying the column spacing but creating the multi digit column number vertically. For example,
$ awk 'NR==1{n=length();
if(n>10) for(i=0;i<n;i++) printf "%s", int(i/10); print "";
for(i=1;i<=n;i++) printf "%s",i%10; print ""}1' file
00000000001111111111222222222233333333334444444444555555555566666666667777777777888888888
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
--------------------------VREV-C-SEQAETGPCRAMISRWYFDVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCG-----
--P-----------------------RRKL-C-ILHRNPGRCYDKIPAFYYNQKKKQCERFDWSGCGGNSNRFKTIEECRRTCIG----
--------------------------APDF-C-LEPPYDGPCRALHLRYFYNAKAGLCQTFYYGGCLAKRNNFESAEDCMRTC------
this is done up to 99 columns but can be extended easily for higher digits as well.
For readability, perhaps you can group the blocks by 10.
... | sed -E 's/(.{10})/\1 /g'
0000000000 1111111111 2222222222 3333333333 4444444444 5555555555 6666666666 7777777777 888888888
1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 123456789
---------- ---------- ------VREV -C-SEQAETG PCRAMISRWY FDVTEGKCAP FFYGGCGGNR NNFDTEEYCM AVCG-----
--P------- ---------- ------RRKL -C-ILHRNPG RCYDKIPAFY YNQKKKQCER FDWSGCGGNS NRFKTIEECR RTCIG----
---------- ---------- ------APDF -C-LEPPYDG PCRALHLRYF YNAKAGLCQT FYYGGCLAKR NNFESAEDCM RTC------
The format you described can be done as well, not sure will be more useful though
$ awk 'BEGIN {FS=""; OFS=" "}
NR==1 {n=length(); for(i=1;i<=n;i++) printf "%-3s", i ; print ""}
{$1=$1}1' file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
- - - - - - - - - - - - - - - - - - - - - - - - - - V R E V - C - S E Q A E T G P C R A M I S R W Y F D V T E G K C A P F F Y G G C G G N R N N F D T E E Y C M A V C G - - - - -
- - P - - - - - - - - - - - - - - - - - - - - - - - R R K L - C - I L H R N P G R C Y D K I P A F Y Y N Q K K K Q C E R F D W S G C G G N S N R F K T I E E C R R T C I G - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - A P D F - C - L E P P Y D G P C R A L H L R Y F Y N A K A G L C Q T F Y Y G G C L A K R N N F E S A E D C M R T C - - - - - -

SQL: Insert Rows and interpolate

I have a large SQL table and I want to add rows so all issue ages 40-75 are present and all the issue ages have a db_perk and accel_perk which is added via liner interpolation.
Here is a small portion of my data
class gender iss_age dur db_perk accel_perk ext_perk
111 F 40 1 0.1961 0.0025 0
111 F 45 1 0.2985 0.0033 0
111 F 50 1 0.472 0.0065 0
111 F 55 1 0.7075 0.01 0
111 F 60 1 1.0226 0.0238 0
111 F 65 1 1.5208 0.0551 0
111 F 70 1 2.3808 0.1296 0
111 F 75 1 4.0748 0.3242 0
I want my output to look something like this
class gender iss_age dur db_perk accel_perk ext_perk
111 F 40 1 0.1961 0.0025 0
111 F 41 1 0.21656 0.00266 0
111 F 42 1 0.23702 0.00282 0
111 F 43 1 0.25748 0.00298 0
111 F 44 1 0.27794 0.00314 0
111 F 45 1 0.2985 0.0033 0
I basically want to have all the columns, but iss_age, db_perk, and accel_perk be the same as the column above
Is there anyway to do this?

Simple vernam cryptography

I want to learn vernam encryption.
First of all, can you confirm me that the algorithm is the same for encoding and decoding?
I have read an exercice which say to decode this message with Pi:
01237 55235 31127 12189 87479 1592
I have tried vernam python pacakge and i tried this:
py_vernam.vernam('01237552353112712189874791592','3.141592653589793238462643383')
or
py_vernam.vernam('01237552353112712189874791592','31415926535897932384626433832')
But it does not give me a readable message...
Thanks
"can you confirm me that the algorithm is the same for encoding and decoding?"
-> first of all, we are talking about encryption not encoding (at the first step) ... the notable difference, to tell those two apart, is that there is a key involved here...
depending on which variant of vernam you are handling, encryption and decryption may be the same or not ... for the binary variant it surely is the same operation, a simple XOR
if you happen to have got your fingers on the "let's do this by hand" or schoolbok variant, it is not, basically because it handles values mod 10 and not mod 2, leading to encryption is + ... decryption is - ...
the notation in blocks of 5 is an indication for the mod 10 variant, since with the mod 2 variant you usually just handle binary data
01237552353112712189874791592 Ciphertext
31415926535897932384626433832 Key
========================================
70822636828325880805258368760 Text (encoded)
so finally we have to read the encoding into characters... (second step)
it is up to the user of the cipher to give these numbers a meaning so when searching for your example message you can find a french page showing a substitution table... so ... let's have a look ...
Substitution table:
Clair A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Chiffré 6 38 32 4 8 30 36 34 39 31 78 72 70 76 9 79 71 58 2 0 52 50 56 54 1 59
Result:
70 8 2 2 6 36 8 2 8 32 58 8 0 8 0 52 58 36 8 76 0
M E S S A G E S E C R E T E T U R G E N T

inserting an empty line in between every two elements a column (data frame + pandas)

My data frame looks something like this:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
import pandas as pd
df =pd.read_csv('weekone.txt',)
df.columns=['Games']
I'm trying to put a blank line in between every two elements (teams).
So I want it to look like this:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
But when I'm using this loop
for i in df2.index:
if (df2.index[i])%2 == 1:
df2.Games[i]=df2.Games[i]+('\n')
else:
df2.Games[i] = df2.Games[i]
I'm getting an output like this:
Games
0 CAR 20
1 DEN 21\n
2 TB 31
3 ATL 24\n
4 SD 27
5 KC 33\n
6 CIN 23
7 NYJ 22\n
What am I doing wrong? Thanks.
you can do it this way:
In [172]: x
Out[172]:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
In [173]: %paste
empty_line = pd.DataFrame([''], columns=x.columns, index=[''])
rslt = x.loc[:1]
g = x.groupby(x.index//2)
for i in range(1, len(g)):
rslt = pd.concat([rslt.append(empty_line), g.get_group(i)])
## -- End pasted text --
In [174]: rslt
Out[174]:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
the index's dtype is object now:
In [178]: rslt.index.dtype
Out[178]: dtype('O')
or having -1 as an index for empty lines:
In [175]: %paste
empty_line = pd.DataFrame([''], columns=x.columns, index=[-1])
rslt = x.loc[:1]
g = x.groupby(x.index//2)
for i in range(1, len(g)):
rslt = pd.concat([rslt.append(empty_line), g.get_group(i)])
## -- End pasted text --
In [176]: rslt
Out[176]:
Games
0 CAR 20
1 DEN 21
-1
2 TB 31
3 ATL 24
-1
4 SD 27
5 KC 33
-1
6 CIN 23
7 NYJ 22
index dtype:
In [181]: rslt.index.dtype
Out[181]: dtype('int64')

How to subtract one dataframe from another?

First, let me set the stage.
I start with a pandas dataframe klmn, that looks like this:
In [15]: klmn
Out[15]:
K L M N
0 0 a -1.374201 35
1 0 b 1.415697 29
2 0 a 0.233841 18
3 0 b 1.550599 30
4 0 a -0.178370 63
5 0 b -1.235956 42
6 0 a 0.088046 2
7 0 b 0.074238 84
8 1 a 0.469924 44
9 1 b 1.231064 68
10 2 a -0.979462 73
11 2 b 0.322454 97
Next I split klmn into two dataframes, klmn0 and klmn1, according to the value in the 'K' column:
In [16]: k0 = klmn.groupby(klmn['K'] == 0)
In [17]: klmn0, klmn1 = [klmn.ix[k0.indices[tf]] for tf in (True, False)]
In [18]: klmn0, klmn1
Out[18]:
( K L M N
0 0 a -1.374201 35
1 0 b 1.415697 29
2 0 a 0.233841 18
3 0 b 1.550599 30
4 0 a -0.178370 63
5 0 b -1.235956 42
6 0 a 0.088046 2
7 0 b 0.074238 84,
K L M N
8 1 a 0.469924 44
9 1 b 1.231064 68
10 2 a -0.979462 73
11 2 b 0.322454 97)
Finally, I compute the mean of the M column in klmn0, grouped by the value in the L column:
In [19]: m0 = klmn0.groupby('L')['M'].mean(); m0
Out[19]:
L
a -0.307671
b 0.451144
Name: M
Now, my question is, how can I subtract m0 from the M column of the klmn1 sub-dataframe, respecting the value in the L column? (By this I mean that m0['a'] gets subtracted from the M column of each row in klmn1 that has 'a' in the L column, and likewise for m0['b'].)
One could imagine doing this in a way that replaces the the values in the M column of klmn1 with the new values (after subtracting the value from m0). Alternatively, one could imagine doing this in a way that leaves klmn1 unchanged, and instead produces a new dataframe klmn11 with an updated M column. I'm interested in both approaches.
If you reset the index of your klmn1 dataframe to be that of the column L, then your dataframe will automatically align the indices with any series you subtract from it:
In [1]: klmn1.set_index('L')['M'] - m0
Out[1]:
L
a 0.777595
a -0.671791
b 0.779920
b -0.128690
Name: M
Option #1:
df1.subtract(df2, fill_value=0)
Option #2:
df1.subtract(df2, fill_value=None)