I have a data file with five columns, when I use the printf command in awk, the output isn't aligned.
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
I need to use all columns with the same space separator, independent is the number are tens, hundreds, or thousands like this:
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
How can I do this using printf in awk?
I'm using this:
awk '{printf "%d %s %d %s %d %s %d %s %d\n", $1,"",$2,"",$3,"",$4,"",$5}' test
Use column on the output rather than trying to format it with awk:
$ column -t -R'1,2,3,4,5' file
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
Your version of column may already support -R0 which means "right align all columns" so you don't need to list them, see https://github.com/util-linux/util-linux/issues/1306.
As #glennjackman pointed out in comments:
The BSD-derived column on MacOS does not have the -R option. Have to do rev file | column -t | rev on a mac
Most printf formats allow the width to be supplied as an argument to a * format specifier, eg:
printf "%*s", 5 "abc"
Is evaulated as:
printf "%5s, "abc"
One awk idea making use of this printf/* feature:
awk '
FNR==NR { for (i=1;i<=NF;i++)
w[i]= length($i) > w[i] ? length($i) : w[i] # find max width for each column
next
}
{ pfx=""
for (i=1;i<=NF;i++) {
printf "%s%*s", pfx, w[i], $i
pfx=" " # (aligned) column delimiter == 2 spaces for columns 2 to NF
}
print "" # terminate current line
}
' five.dat five.dat
NOTES:
requires 2 passes of the input file (could be rewritten to use a single pass but will need to store the entire file in memory)
assumes the minimum delimiter between (aligned) columns is 2 spaces
This generates:
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
The 3 easiest solution are: 1) pipe the output to column -t, 2) use a tab separator (doesn't completely align the text, but for the sample input is sufficient, and 3) print each column on a fixed width.
$ cat input
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
$ awk '($1=$1) || 1' OFS=\\t input
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
$ awk '{printf "%5s%5s%5s%5s%5s\n", $1, $2, $3, $4, $5}' input
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
Here's one that requires two passes of the data (hence the file file in the end):
$ awk 'NR==FNR { # first pass
for(i=1;i<=NF;i++)
if(m[i]=="" || m[i]<length($i)) # get max field widths
m[i]=length($i)
next
}
{ # second pass
for(i=1;i<=NF;i++)
printf "%" m[i] "s%s",$i,(i==NF?ORS:" ") # output two spaces in between
}' file file # two passes, twice the file
Output:
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
Here is an awk to do that:
awk 'FNR==NR{for(i=1;i<=NF;i++) if (w[i]<length($i)) w[i]=length($i); next}
{for(i=1;i<=NF;i++) printf("%*s%s", w[i], $i, i<NF ? OFS : ORS)}
' file file
Prints:
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
Then if you want to field size to increase, just add that:
awk -v wp=5 'FNR==NR{for(i=1;i<=NF;i++) if (w[i]<length($i)+wp) w[i]=length($i)+wp; next}
{for(i=1;i<=NF;i++) printf("%*s%s", w[i], $i, i<NF ? OFS : ORS)}
' file file
118 96 105 106 0
119 97 106 107 0
120 98 107 108 0
121 99 108 109 0
122 100 109 110 0
123 101 110 111 0
124 102 111 23 0
125 11 12 112 0
126 103 112 113 0
127 104 113 114 0
128 105 114 115 0
Related
I want to ask a question on how to call on the specific columns that only contains an even numbers.
On my previous questions :How to make all the rows data drop the similar data and multiplying float numbers.
df2 =df['hlogUs_dB'].str.split('[,:]',expand = True)
df2 = data.drop(["0"])
df2
0 1 2 3 4 5 6 7 8 9 ... 276 277 278 279 280 281 282 283 284 285
0 109 -3.4 110 -3.4 111 -3.4 112 -3.5 113 -3.5 ... 343 -4.3 344 -4.3 345 -4.2 346 -4.2 347 -4.2
1 109 -3.5 110 -3.5 111 -3.4 112 -3.4 113 -3.4 ... 343 -4.1 344 -4.2 345 -4.4 346 -4.4 347 -4.2
2 109 -3.7 110 -3.7 111 -3.8 112 -3.8 113 -3.8 ... 343 -4.2 344 -4.3 345 -4.3 346 -4.3 347 -4.3
3 109 -3.5 110 -3.6 111 -3.6 112 -3.6 113 -3.7 ... 343 -4.1 344 -4.1 345 -4.1 346 -4.1 347 -4.1
4 109 -3.7 110 -3.8 111 -3.8 112 -3.8 113 -3.8 ... 343 -4.2 344 -4.2 345 -4.2 346 -4.2 347 -4.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
165 109 -5.2 110 -5.3 111 -5.5 112 -5.7 113 -5.9 ... 343 -5.4 344 -5.3 345 -5.2 346 -5.1 347 -5.1
166 109 -5.5 110 -5.6 111 -5.8 112 -6.1 113 -6.3 ... 343 -5.5 344 -5.4 345 -5.3 346 -5.2 347 -5.2
167 109 -6.0 110 -6.2 111 -6.4 112 -6.7 113 -7.1 ... 343 -4.9 344 -4.9 345 -4.9 346 -4.9 347 -4.9
168 109 -5.4 110 -5.5 111 -5.7 112 -5.9 113 -6.2 ... 343 -5.9 344 -5.7 345 -5.7 346 -5.6 347 -5.6
169 109 -5.9 110 -6.1 111 -6.4 112 -6.6 113 -7.0 ... 343 -5.7 344 -5.7 345 -5.7 346 -5.6 347 -5.6
170 rows × 286 columns
My question is how to called out on even number without using a manual way of typing all the even number of the head of columns.
such as:
df2[[0,2,4]]*= 2
I am currently stuck on the ideas on making the conditional on the header columns. I want to call on even numbers only. I hope to find a suitable solutions on the questions. Thank you in advance.
We can select all columns based on whose modulus 2 is 0 (even):
even_cols = df.columns[(df.columns % 2) == 0]
even_cols:
Int64Index([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18,
...
266, 268, 270, 272, 274, 276, 278, 280, 282, 284],
dtype='int64', length=143)
Then operations can use the newly created index:
df[even_cols] *= 2
df:
0 1 2 3 4 5 6 ... 279 280 281 282 283 284 285
0 198 78 122 16 146 8 124 ... 8 102 61 168 52 148 25
1 18 31 78 59 44 80 116 ... 75 124 51 4 96 38 7
2 152 66 112 0 114 31 172 ... 18 186 19 84 29 36 0
3 80 99 152 25 34 31 106 ... 59 190 33 68 31 66 83
4 192 95 48 95 130 14 8 ... 6 74 79 40 46 198 65
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
165 170 58 110 75 196 11 50 ... 54 46 53 146 62 30 48
166 54 8 148 25 174 40 114 ... 6 180 32 94 44 142 16
167 42 48 48 31 126 60 86 ... 11 128 10 162 67 142 13
168 54 37 70 2 128 38 134 ... 85 166 40 142 57 54 52
169 164 41 146 40 64 44 28 ... 83 90 86 188 23 38 35
If we need every other column instead of columns based on numeric value, we can set the step of slicing to create a list of columns:
every_other_column = df.columns[::2]
df[every_other_column] *= 2
Or simply modify the DataFrame without creating a list of columns:
df.loc[:, ::2] *= 2
Sample DataFrame:
import numpy as np
import pandas as pd
np.random.seed(5)
df = pd.DataFrame(np.random.randint(0, 100, (170, 286)))
print(df)
df:
0 1 2 3 4 5 6 ... 279 280 281 282 283 284 285
0 99 78 61 16 73 8 62 ... 8 51 61 84 52 74 25
1 9 31 39 59 22 80 58 ... 75 62 51 2 96 19 7
2 76 66 56 0 57 31 86 ... 18 93 19 42 29 18 0
3 40 99 76 25 17 31 53 ... 59 95 33 34 31 33 83
4 96 95 24 95 65 14 4 ... 6 37 79 20 46 99 65
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
165 85 58 55 75 98 11 25 ... 54 23 53 73 62 15 48
166 27 8 74 25 87 40 57 ... 6 90 32 47 44 71 16
167 21 48 24 31 63 60 43 ... 11 64 10 81 67 71 13
168 27 37 35 2 64 38 67 ... 85 83 40 71 57 27 52
169 82 41 73 40 32 44 14 ... 83 45 86 94 23 19 35
I have some measurement datas that need to be filtered, I read them as dataframe data, like these:
df
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
and I need to use two different conditions at the same time, that is, to filter 'RequestTime' 'RequestID' and 'ResponseTime' 'ResponseID' by use drop_duplicate(subset=) at the same time. I have used follow command to get the filter results for each of the two conditions:
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
but how to combine the two conditions to drop duplicate row 3 and row 7?
IIUC,
m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]
Output:
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
Create a mask (boolean series) to boolean index your dataframe.
Or chain methods:
df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])
There is a space-separated string contains 2304(48x48) items. I need simply save this as a 48x48 image file. Downloaded from here
var img = "70 80 82 72 58 58 60 63 54 58 60 48 89 115 121 119 115 110 98 91 84 84 90 99 110 126 143 153 158 171 169 172 169 165 129 110 113 107 95 79 66 62 56 57 61 52 43 41 65 61 58 57 56 69 75 70 65 56 54 105 146 154 151 151 155 155 150 147 147 148 152 158 164 172 177 182 186 189 188 190 188 180 167 116 95 103 97 77 72 62 55 58 54 56 52 44 50 43 54 64 63 71 68 64 52 66 119 156 161 164 163 164 167 168 170 174 175 176 178 179 183 187 190 195 197 198 197 198 195 191 190 145 86 100 90 65 57 60 54 51 41 49 56 47 38 44 63 55 46 52 54 55 83 138 157 158 165 168 172 171 173 176 179 179 180 182 185 187 189 189 192 197 200 199 196 198 200 198 197 177 91 87 96 58 58 59 51 42 37 41 47 45 37 35 36 30 41 47 59 94 141 159 161 161 164 170 171 172 176 178 179 182 183 183 187 189 192 192 194 195 200 200 199 199 200 201 197 193 111 71 108 69 55 61 51 42 43 56 54 44 24 29 31 45 61 72 100 136 150 159 163 162 163 170 172 171 174 177 177 180 187 186 187 189 192 192 194 195 196 197 199 200 201 200 197 201 137 58 98 92 57 62 53 47 41 40 51 43 24 35 52 63 75 104 129 143 149 158 162 164 166 171 173 172 174 178 178 179 187 188 188 191 193 194 195 198 199 199 197 198 197 197 197 201 164 52 78 87 69 58 56 50 54 39 44 42 26 31 49 65 91 119 134 145 147 152 159 163 167 171 170 169 174 178 178 179 187 187 185 187 190 188 187 191 197 201 199 199 200 197 196 197 182 58 62 77 61 60 55 49 59 52 54 44 22 30 47 68 102 123 136 144 148 150 153 157 167 172 173 170 171 177 179 178 186 190 186 189 196 193 191 194 190 190 192 197 201 203 199 194 189 69 48 74 56 60 57 50 59 59 51 41 20 34 47 79 111 132 139 143 145 147 150 151 160 169 172 171 167 171 177 177 174 180 182 181 192 196 189 192 198 195 194 196 198 201 202 195 189 70 39 69 61 61 61 53 59 59 45 40 26 40 61 93 124 135 138 142 144 146 151 152 158 165 168 168 165 161 164 173 172 167 172 167 180 198 198 193 199 195 194 198 200 198 197 195 190 65 35 68 59 59 62 57 60 59 50 44 32 54 90 115 132 137 138 140 144 146 146 156 165 168 174 176 176 175 168 168 169 171 175 171 172 192 194 184 198 205 201 194 195 193 195 192 186 57 38 72 65 57 62 58 57 60 54 49 47 79 116 130 138 141 141 139 141 143 145 157 164 164 166 173 174 176 179 179 176 181 189 188 173 180 175 160 182 189 198 192 189 190 190 188 172 46 44 64 66 59 62 57 56 62 53 50 66 103 133 137 141 143 141 136 132 131 136 127 118 111 107 108 123 131 143 154 158 166 177 181 175 170 159 148 171 161 176 185 192 194 188 190 162 53 49 58 63 61 61 55 56 61 51 50 81 116 139 142 142 146 144 136 128 119 112 97 85 90 91 88 92 90 80 81 84 106 122 132 144 145 144 147 163 147 163 173 181 190 187 191 167 61 48 53 61 61 58 54 56 61 51 53 89 123 140 144 145 146 147 136 122 107 99 95 92 90 87 83 76 67 52 46 52 63 69 83 96 119 132 148 159 136 137 143 138 143 152 156 156 70 48 50 59 61 57 54 54 61 52 56 93 124 135 140 144 148 150 140 125 114 101 80 54 56 54 41 41 33 40 39 35 49 60 63 74 107 129 147 147 116 111 100 77 76 86 108 111 73 49 50 60 62 60 57 55 63 59 56 89 121 134 139 146 151 152 150 141 127 111 96 77 85 70 32 31 37 91 65 50 48 59 73 83 112 136 155 130 60 46 38 40 43 81 116 91 72 52 48 58 62 62 59 53 61 59 52 85 114 134 140 147 154 159 158 153 145 143 150 126 121 125 68 45 89 137 95 70 78 75 95 109 131 153 171 94 23 16 32 82 82 65 113 77 71 54 48 56 62 62 60 53 60 56 52 75 108 133 141 149 158 166 169 167 163 156 155 146 112 119 134 127 142 140 121 117 129 114 120 129 146 174 191 98 46 33 33 109 147 98 109 67 73 55 50 56 64 64 61 58 61 53 54 64 106 129 140 148 159 169 175 176 174 165 159 156 145 120 115 124 127 131 133 141 147 142 141 147 161 182 202 154 114 96 100 158 158 153 123 61 76 57 48 56 64 64 63 62 61 54 55 44 97 131 137 147 158 168 177 181 183 179 170 168 169 165 155 152 151 152 154 162 165 158 153 158 168 187 206 186 147 135 144 145 152 178 115 57 74 58 48 58 64 63 63 59 63 55 53 66 104 130 132 144 153 162 170 180 185 187 181 178 182 180 177 173 171 171 177 176 172 164 161 167 164 185 207 197 173 152 141 141 161 191 104 54 69 60 48 57 65 62 60 57 64 55 50 94 111 124 130 135 150 159 163 172 179 184 184 178 178 177 173 171 174 177 178 176 169 165 161 163 161 180 205 201 183 171 177 178 180 194 101 55 65 60 47 55 65 63 59 58 63 57 52 90 105 117 122 130 143 153 157 163 171 174 182 183 182 178 174 175 175 177 175 172 163 161 159 157 162 178 200 201 188 181 172 177 187 198 98 57 63 61 48 52 61 64 63 60 65 57 51 95 104 113 117 127 136 145 152 156 162 162 165 173 177 182 183 183 180 181 177 165 153 154 152 153 160 174 193 200 188 185 180 182 192 196 101 60 60 56 49 50 60 66 64 62 64 59 53 99 104 111 112 118 132 142 147 155 158 160 159 162 171 176 184 186 183 180 169 154 141 135 145 155 164 180 196 205 188 189 188 189 193 192 98 61 64 55 49 49 60 66 63 64 63 60 57 99 105 108 112 113 125 139 143 150 155 158 164 169 174 176 182 183 182 177 163 141 133 147 151 164 170 185 200 210 194 188 192 186 185 180 88 64 67 60 46 50 59 65 64 64 64 59 56 101 103 108 109 109 118 134 143 143 147 155 159 166 171 174 177 179 178 172 153 129 143 161 159 166 171 186 197 207 203 185 191 183 179 164 73 67 67 66 48 50 57 65 65 63 64 61 57 103 108 114 112 110 115 128 138 144 145 152 156 159 164 168 172 172 169 161 139 125 147 156 161 162 164 180 188 188 197 185 187 181 180 137 65 70 68 70 52 47 53 62 65 63 65 61 58 105 109 112 120 113 112 122 134 141 149 150 153 155 159 164 167 167 162 152 134 115 126 119 106 99 109 141 158 150 155 175 184 176 175 106 63 70 68 68 50 46 50 57 63 63 64 61 59 107 110 110 117 117 114 117 128 137 147 148 150 153 156 161 162 163 156 150 148 105 70 45 26 25 47 73 74 79 128 177 180 173 157 77 66 68 67 68 52 49 51 56 62 62 62 62 60 101 107 108 114 115 114 117 125 134 143 148 149 152 154 158 160 158 155 160 158 132 88 73 73 64 52 66 91 138 160 174 173 171 125 64 67 63 64 68 54 50 49 54 60 60 60 62 60 98 105 105 109 111 114 117 125 131 139 145 148 153 153 156 157 156 161 168 165 153 139 122 115 105 89 103 150 182 161 171 173 162 89 64 64 62 64 69 56 48 49 56 58 60 59 62 60 89 99 108 106 109 111 119 120 125 134 140 146 152 153 153 153 156 159 162 160 150 136 129 133 133 122 133 148 178 168 168 175 132 61 67 66 65 63 69 57 47 50 55 58 59 61 62 60 89 96 105 107 105 107 117 120 123 124 133 141 149 153 151 145 151 145 139 140 138 128 126 124 129 125 136 142 164 172 168 168 87 58 67 63 62 61 69 57 39 44 55 56 59 63 62 62 84 91 92 98 102 103 113 119 121 118 128 138 146 151 147 142 140 128 127 128 129 126 135 140 135 130 143 146 149 166 174 131 62 65 62 59 67 63 68 83 89 65 42 52 60 60 62 63 77 84 84 91 99 101 107 112 117 118 122 134 145 149 144 134 127 127 129 130 134 125 126 132 152 153 151 150 151 165 171 87 59 65 64 61 58 86 122 138 208 207 154 71 52 56 55 56 69 77 83 85 93 91 102 112 116 118 119 127 140 144 142 131 112 95 85 75 62 58 56 59 87 88 83 127 142 165 149 62 65 62 59 77 113 192 156 84 185 196 197 168 81 70 75 69 58 65 73 82 81 79 95 107 114 116 116 123 136 142 136 132 131 102 71 58 49 41 33 41 36 49 60 99 136 168 111 53 63 71 138 186 203 195 146 87 91 72 79 95 103 82 61 74 55 57 68 75 76 77 84 96 106 110 111 121 130 138 136 142 153 159 152 152 154 145 133 136 147 158 156 155 147 158 74 57 60 123 181 174 126 89 72 67 57 43 55 67 76 86 60 45 51 45 52 68 75 73 77 88 96 100 104 113 115 121 134 146 149 146 149 148 155 168 174 179 178 169 169 174 161 131 44 47 82 150 168 136 104 75 66 80 67 58 48 54 68 88 121 102 51 45 38 53 66 65 70 86 92 96 102 103 109 116 130 136 136 133 136 138 137 135 128 130 143 158 165 164 147 87 62 74 123 160 170 100 99 107 79 71 86 75 57 45 49 65 122 130 43 48 40 39 55 61 59 71 82 87 88 93 105 118 123 128 130 124 111 98 94 88 67 55 84 129 147 148 105 48 82 142 161 164 164 76 72 85 100 88 72 90 84 54 48 54 73 100 73 36 44 31 37 53 51 55 67 74 77 87 97 108 118 125 132 122 106 86 80 82 75 73 83 110 129 126 46 22 130 177 196 193 166 72 52 54 73 100 92 75 99 95 65 68 61 63 91 65 42 37 22 28 39 44 57 68 74 83 92 101 119 131 143 141 134 136 140 139 134 136 139 138 136 85 23 114 202 198 199 180 173 98 36 86 130 150 137 99 77 101 99 72 56 43 77 82 79 70 56 28 20 25 36 50 63 73 83 98 111 124 139 156 160 159 169 168 165 163 159 149 114 43 26 133 183 192 177 152 137 130 125 139 173 195 186 137 101 88 101 105 70 46 77 72 84 87 87 81 64 37 20 31 40 46 65 88 108 110 125 149 157 153 162 164 158 159 154 140 78 21 11 61 144 168 173 157 138 150 148 132 159 182 183 136 106 116 95 106 109 82";
//save string as byte array
var arrrStr = img.Split(" ").Select(s => Convert.ToString(s)).ToArray();
var byt = arrrStr.Select(byte.Parse).ToArray();
//save the file by this array, the line below throws an exception.
using (System.Drawing.Image image = System.Drawing.Image.FromStream(new MemoryStream(byt)))
{
image.Save("output.jpg", ImageFormat.Jpeg); // Or Png
}
And as you guess it doesn't work, how to convert this pixel string to file(this value is generated from a file in origin)
I'm needing to sort a long list of ID numbers into 'grids' of 8 ID numbers down (8 cells/rows), 6 ID numbers across (or 6 columns long etc), sorted from smallest to largest ID number. When one 'grid' is 'full', the numbers which cannot fit in the first grid should go on to form a second one and so on. The last 4 cells of the last row should be blank. (This is a template for a lab procedure).
ie this is the data I have:
column of ID numbers
and this how I want it to be (but like, 6 of these)
example 'grid'
Here's one method.
Sample data
import pandas as pd
import numpy as np
# Sorted list of string IDs
l = np.arange(0, 631, 1).astype('str')
Code
N = 44
# Ensure we can reshape last group
data = np.concatenate((l, np.repeat('', N-len(l)%N)))
# Split array, make a separate `DataFrame` for each grid.
data = [
pd.DataFrame(np.concatenate((x, np.repeat('', 4))).reshape(8,6))
for x in np.array_split(data, np.arange(N, len(l), N))
]
df = pd.concat(data, ignore_index=True) # If want a single df in the end
Output df:
0 1 2 3 4 5
0 0 1 2 3 4 5
1 6 7 8 9 10 11
2 12 13 14 15 16 17
3 18 19 20 21 22 23
4 24 25 26 27 28 29
5 30 31 32 33 34 35
6 36 37 38 39 40 41
7 42 43
8 44 45 46 47 48 49
9 50 51 52 53 54 55
10 56 57 58 59 60 61
11 62 63 64 65 66 67
12 68 69 70 71 72 73
13 74 75 76 77 78 79
14 80 81 82 83 84 85
15 86 87
16 88 89 90 91 92 93
...
110 608 609 610 611 612 613
111 614 615
112 616 617 618 619 620 621
113 622 623 624 625 626 627
114 628 629 630
115
116
117
118
119
func = lambda lst,n: np.pad(lst, (0,n*(1+len(lst)//n) - len(lst)), 'constant')
rows, cols = 8, 6
arr = np.arange(1, 283, 1) ##np.array(df.A)
new_df = pd.DataFrame(func(arr, rows*cols).reshape(-1,cols))
new_df
0 1 2 3 4 5
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
4 25 26 27 28 29 30
5 31 32 33 34 35 36
6 37 38 39 40 41 42
7 43 44 45 46 47 48
8 49 50 51 52 53 54
9 55 56 57 58 59 60
10 61 62 63 64 65 66
11 67 68 69 70 71 72
12 73 74 75 76 77 78
13 79 80 81 82 83 84
14 85 86 87 88 89 90
15 91 92 93 94 95 96
16 97 98 99 100 101 102
17 103 104 105 106 107 108
18 109 110 111 112 113 114
19 115 116 117 118 119 120
20 121 122 123 124 125 126
21 127 128 129 130 131 132
22 133 134 135 136 137 138
23 139 140 141 142 143 144
24 145 146 147 148 149 150
25 151 152 153 154 155 156
26 157 158 159 160 161 162
27 163 164 165 166 167 168
28 169 170 171 172 173 174
29 175 176 177 178 179 180
30 181 182 183 184 185 186
31 187 188 189 190 191 192
32 193 194 195 196 197 198
33 199 200 201 202 203 204
34 205 206 207 208 209 210
35 211 212 213 214 215 216
36 217 218 219 220 221 222
37 223 224 225 226 227 228
38 229 230 231 232 233 234
39 235 236 237 238 239 240
40 241 242 243 244 245 246
41 247 248 249 250 251 252
42 253 254 255 256 257 258
43 259 260 261 262 263 264
44 265 266 267 268 269 270
45 271 272 273 274 275 276
46 277 278 279 280 281 282
47 0 0 0 0 0 0
I think it's better to save this dataframe into an excel worksheet and then remove the last padded zeros manually. Hope this helped
I have a file ending with a number, character or a comma:
file1.txt
1 101 111 BCX A#WWW 123
1 101 111 BCX A#WWW 123;;;;;;
1 298 306 CCC A#QQQ 234-ck
1 298 306 CCC A#QQQ 234-ck;
1 298 306 CCC A#QQQ 234-ck ;;
1 299 308 CCD A#QQQ 234-cJ
1 299 309 DDD A#ZZZ 345;678
1 299 309 DDD A#ZZZ 345;678
The output should be :
1 101 111 BCX A#WWW 123
1 101 111 BCX A#WWW 123
1 298 306 CCC A#QQQ 234-ck
1 298 306 CCC A#QQQ 234-ck
1 298 306 CCC A#QQQ 234-ck
1 299 308 CCD A#QQQ 234-cJ
1 299 309 DDD A#ZZZ 345;678
1 299 309 DDD A#ZZZ 345;678
What I do only removes one comma from the end:
cat file1.txt | sed 's/;$//g'
1 101 111 BCX A#WWW 123
1 101 111 BCX A#WWW 123;;;;;
1 298 306 CCC A#QQQ 234-ck
1 298 306 CCC A#QQQ 234-ck
1 298 306 CCC A#QQQ 234-ck ;
1 299 308 CCD A#QQQ 234-cJ
1 299 309 DDD A#ZZZ 345;678
1 299 309 DDD A#ZZZ 345;678
How can I remove all of the ";"s from the end of the file until I see a letter/number?
Modify your sed, no need of cat
sed 's/;*$//' infile
using awk
awk '{sub(/;*$/,"")}1' infile