Print file with column numbers - awk

My file looks something like this:
--------------------------VREV-C-SEQAETGPCRAMISRWYFDVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCG-----
--P-----------------------RRKL-C-ILHRNPGRCYDKIPAFYYNQKKKQCERFDWSGCGGNSNRFKTIEECRRTCIG----
--------------------------APDF-C-LEPPYDGPCRALHLRYFYNAKAGLCQTFYYGGCLAKRNNFESAEDCMRTC------
How can I add a header with the respective column number with a readable format, a.k.a. adding spaces so that a two digit column number doesn't make it unreadable.
1 2 3 4 5 6 7 8 9 10 11 12 13 ....
- - - - - - - - - - - - - - - - - - - - - - - - - - V R E V...
I need the user to see this output to select the column number where he wants to cut.

The standard way of doing this not modifying the column spacing but creating the multi digit column number vertically. For example,
$ awk 'NR==1{n=length();
if(n>10) for(i=0;i<n;i++) printf "%s", int(i/10); print "";
for(i=1;i<=n;i++) printf "%s",i%10; print ""}1' file
00000000001111111111222222222233333333334444444444555555555566666666667777777777888888888
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
--------------------------VREV-C-SEQAETGPCRAMISRWYFDVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCG-----
--P-----------------------RRKL-C-ILHRNPGRCYDKIPAFYYNQKKKQCERFDWSGCGGNSNRFKTIEECRRTCIG----
--------------------------APDF-C-LEPPYDGPCRALHLRYFYNAKAGLCQTFYYGGCLAKRNNFESAEDCMRTC------
this is done up to 99 columns but can be extended easily for higher digits as well.
For readability, perhaps you can group the blocks by 10.
... | sed -E 's/(.{10})/\1 /g'
0000000000 1111111111 2222222222 3333333333 4444444444 5555555555 6666666666 7777777777 888888888
1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 123456789
---------- ---------- ------VREV -C-SEQAETG PCRAMISRWY FDVTEGKCAP FFYGGCGGNR NNFDTEEYCM AVCG-----
--P------- ---------- ------RRKL -C-ILHRNPG RCYDKIPAFY YNQKKKQCER FDWSGCGGNS NRFKTIEECR RTCIG----
---------- ---------- ------APDF -C-LEPPYDG PCRALHLRYF YNAKAGLCQT FYYGGCLAKR NNFESAEDCM RTC------
The format you described can be done as well, not sure will be more useful though
$ awk 'BEGIN {FS=""; OFS=" "}
NR==1 {n=length(); for(i=1;i<=n;i++) printf "%-3s", i ; print ""}
{$1=$1}1' file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
- - - - - - - - - - - - - - - - - - - - - - - - - - V R E V - C - S E Q A E T G P C R A M I S R W Y F D V T E G K C A P F F Y G G C G G N R N N F D T E E Y C M A V C G - - - - -
- - P - - - - - - - - - - - - - - - - - - - - - - - R R K L - C - I L H R N P G R C Y D K I P A F Y Y N Q K K K Q C E R F D W S G C G G N S N R F K T I E E C R R T C I G - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - A P D F - C - L E P P Y D G P C R A L H L R Y F Y N A K A G L C Q T F Y Y G G C L A K R N N F E S A E D C M R T C - - - - - -

Related

Is there a way to reference the tuple below in a calculation?

i have this view here:
x | y | z
-----+------+-----
a | 645 |
b | 46 |
c | 356 |
d | 509 |
Is there a way to write a query for a z item to reference a different row?
For example, if i want z to be the value of the tuple below's y value - 1
So:
z.a = y.b - 1 = 46 - 1 = 45
z.b = y.c - 1 = 356 - 1 = 355
z.c = y.d - 1 = 509 - 1 = 508
You are describing window function lead(), which lets you access any column on the the "next" row (given a partiton and an order by criteria):
select
x,
y,
lead(y) over(order by x) - 1 as z
from mytbale

SQL: Insert Rows and interpolate

I have a large SQL table and I want to add rows so all issue ages 40-75 are present and all the issue ages have a db_perk and accel_perk which is added via liner interpolation.
Here is a small portion of my data
class gender iss_age dur db_perk accel_perk ext_perk
111 F 40 1 0.1961 0.0025 0
111 F 45 1 0.2985 0.0033 0
111 F 50 1 0.472 0.0065 0
111 F 55 1 0.7075 0.01 0
111 F 60 1 1.0226 0.0238 0
111 F 65 1 1.5208 0.0551 0
111 F 70 1 2.3808 0.1296 0
111 F 75 1 4.0748 0.3242 0
I want my output to look something like this
class gender iss_age dur db_perk accel_perk ext_perk
111 F 40 1 0.1961 0.0025 0
111 F 41 1 0.21656 0.00266 0
111 F 42 1 0.23702 0.00282 0
111 F 43 1 0.25748 0.00298 0
111 F 44 1 0.27794 0.00314 0
111 F 45 1 0.2985 0.0033 0
I basically want to have all the columns, but iss_age, db_perk, and accel_perk be the same as the column above
Is there anyway to do this?

Extracting and Parsing Table from HTML using VBA

I am using Microsoft Office Version 1703.
I have been tasked with:
Creating a weekly Excel sheet using data from AccuWeather Professional for 10 specific locations and have that updated weekly.
Creating historical data going back 4 or 5 years for the same multiple locations. Ideally I'd like to take the time to automate this as it has been considered a long term project.
Now the pos for doing this was originally using Text to Columns in Excel. If I use Text to Column it imports it as an array and I have to use space as a delimiter to break them down into columns and rows correctly before finally hand inputting it into the presentation sheet.
There is a picture of the accuweather site and the information I'm attempting to grab:
When simply copying and pasting the data I receive this as an array for example:
TODAY'S DATE: 2-JUN-17
JUN-17 FOR Monticello White County Airp, IN (676') LAT=40.7N LON= 86.8W
TEMPERATURE PRECIPITATION
ACTUAL NORMAL
HI LO AVG HI LO AVG DEPT AMNT SNOW SNCVR HDD
1 81 48 65 78 55 66 -1 0.00 0.0e 0 0
2 M M M 78 55 67 M M 0.0 0 M
3 M M M 78 56 67 M M 0.0 0 M
4 M M M 79 56 67 M M 0.0 0 M
5 M M M 79 56 68 M M 0.0 0 M
6 M M M 79 57 68 M M 0.0 0 M
7 M M M 79 57 68 M M 0.0 0 M
8 M M M 80 57 69 M M 0.0 0 M
9 M M M 80 58 69 M M 0.0 0 M
10 M M M 80 58 69 M M 0.0 0 M
11 M M M 80 58 69 M M 0.0 0 M
12 M M M 81 58 70 M M 0.0 0 M
13 M M M 81 59 70 M M 0.0 0 M
14 M M M 81 59 70 M M 0.0 0 M
15 M M M 81 59 70 M M 0.0 0 M
16 M M M 81 59 70 M M 0.0 0 M
17 M M M 82 60 71 M M 0.0 0 M
18 M M M 82 60 71 M M 0.0 0 M
19 M M M 82 60 71 M M 0.0 0 M
20 M M M 82 60 71 M M 0.0 0 M
21 M M M 82 60 71 M M 0.0 0 M
22 M M M 82 61 72 M M 0.0 0 M
23 M M M 83 61 72 M M 0.0 0 M
24 M M M 83 61 72 M M 0.0 0 M
25 M M M 83 61 72 M M 0.0 0 M
26 M M M 83 61 72 M M 0.0 0 M
27 M M M 83 61 72 M M 0.0 0 M
28 M M M 83 61 72 M M 0.0 0 M
29 M M M 83 62 73 M M 0.0 0 M
30 M M M 84 62 73 M M 0.0 0 M
TOTALS FOR KMCX
HIGHEST TEMPERATURE 81 TOTAL PRECIP 0.00
LOWEST TEMPERATURE 48 TOTAL SNOWFALL 0.0
AVERAGE TEMPERATURE 64.5 NORMAL PRECIP 4.08
DEPARTURE FROM NORM -2.0 % OF NORMAL PRECIP 0
HEATING DEGREE DAYS 0
NORMAL DEGREE DAYS 0
shows up like this:
The HTML selector is:
body > center > table > tbody > tr > td.pageContent > table > tbody > tr:nth-child(2) > td > table > tbody > tr:nth-child(1) > td > font > table:nth-child(5) > tbody > tr > td > pre
The issue with doing a Web Query is that even if I have Internet Explorer save my password it will not login in Web Query. I managed to frankenstein a VBA script that opens I.E., logs in successfully, and navigates to this intended page. I imagine I could create individual scripts in a sequence to accomplish grabbing the weather data for each specific location fairly easily. The problem I'm having is writing a VBA script to only grab what is between that <pre> I referenced above. Right now I have the script selecting all, copying and pasting it into my sheet.
What I would ideally like to accomplish is Navigate to AccuWeather Pro, succesfull Log In, Pull up historical data for specific location. Grab all the data referenced above, import it into Excel, and format it to my presentation sheet automatically. It'd be even nicer if I could get it to automatically update at least weekly.
Here is my VBA code:
Sub Test()
Dim ieApp As Object
Sheets("Sheet1").Select
Range("A1:A1000") = "" ' erase previous data
Range("A1").Select
Set ieApp = CreateObject("InternetExplorer.Application")
With ieApp
.Visible = True
.Navigate "https://wwwl.accuweather.com/error.php?url=proa.accuweather.com/adcbin/professional/forecast_local.asp?zipcode=47960&mt=pro"
Do While .Busy: DoEvents: Loop
Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
Set ieDoc = .Document
' fill in the login form – View Source from your browser to get the control names
With ieDoc.forms(0)
.UserName.Value = "username"
.Password.Value = "password"
.Submit
End With
Do While .Busy: DoEvents: Loop
Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
' now that we’re in, go to the page we want
.Visible = True
.Navigate "http://proa.accuweather.com/adcbin/professional/historical_index.asp"
Do While .Busy: DoEvents: Loop
Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
.ExecWB 17, 0 ' // SelectAll
.ExecWB 12, 2 ' // Copy selection
ActiveSheet.PasteSpecial Format:="Text", link:=False, DisplayAsIcon:=False
Range("A1").Select
.Quit
.Quit ' just to make sure
End With
End Sub
I did my best to be as thorough, accurate, and correct with my question as possible, I apologize if I've committed any stack exchange social faux pas etc.

inserting an empty line in between every two elements a column (data frame + pandas)

My data frame looks something like this:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
import pandas as pd
df =pd.read_csv('weekone.txt',)
df.columns=['Games']
I'm trying to put a blank line in between every two elements (teams).
So I want it to look like this:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
But when I'm using this loop
for i in df2.index:
if (df2.index[i])%2 == 1:
df2.Games[i]=df2.Games[i]+('\n')
else:
df2.Games[i] = df2.Games[i]
I'm getting an output like this:
Games
0 CAR 20
1 DEN 21\n
2 TB 31
3 ATL 24\n
4 SD 27
5 KC 33\n
6 CIN 23
7 NYJ 22\n
What am I doing wrong? Thanks.
you can do it this way:
In [172]: x
Out[172]:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
In [173]: %paste
empty_line = pd.DataFrame([''], columns=x.columns, index=[''])
rslt = x.loc[:1]
g = x.groupby(x.index//2)
for i in range(1, len(g)):
rslt = pd.concat([rslt.append(empty_line), g.get_group(i)])
## -- End pasted text --
In [174]: rslt
Out[174]:
Games
0 CAR 20
1 DEN 21
2 TB 31
3 ATL 24
4 SD 27
5 KC 33
6 CIN 23
7 NYJ 22
the index's dtype is object now:
In [178]: rslt.index.dtype
Out[178]: dtype('O')
or having -1 as an index for empty lines:
In [175]: %paste
empty_line = pd.DataFrame([''], columns=x.columns, index=[-1])
rslt = x.loc[:1]
g = x.groupby(x.index//2)
for i in range(1, len(g)):
rslt = pd.concat([rslt.append(empty_line), g.get_group(i)])
## -- End pasted text --
In [176]: rslt
Out[176]:
Games
0 CAR 20
1 DEN 21
-1
2 TB 31
3 ATL 24
-1
4 SD 27
5 KC 33
-1
6 CIN 23
7 NYJ 22
index dtype:
In [181]: rslt.index.dtype
Out[181]: dtype('int64')

pasting files/multiple columns with different number of rows

Hi I was trying to paste multiple files (each with a single column but different number of rows) together. But it did't provide what I was expecting. How to solve that?
paste file1.txt file2.txt paste3.txt ... paste100 > out.txt
input file 1:
A
B
C
input file 2:
D
E
input file 3:
F
G
H
I
J
.......
......
Desired output:
A D F
B E G
C H
I
J
Would this be same if the files have multiple columns with different number of rows?
for example:
file1
A 1
B 2
C 3
file2
D 4
E 5
file3
F 6 %
G 7 &
H 8 #
I 9 #
J 10 ?
output:
A 1 D 4 F 6 %
B 2 E 5 G 7 &
C 3 H 8 #
I 9 #
J 10 ?
Isn't the default behaviour of paste exactly what you ask?
% paste <(echo "a
b
c
d") <(echo "1
2
3") <(echo "10
> 20
> 30
> 40
> 50
> 60")
a 1 10
b 2 20
c 3 30
d 40
50
60
%