awk + multiple spaces as the delimiter. 2 [duplicate] - awk

I have file with multiple spaces as the delimiter.
> cat file1.csv
col1 col2 col3 col4
col1 col2 col3
col1 col2
col1
col1 col2 col3 col4 col5
This is the output I want but with no empty new lines (which makes me wonder if my -F' {2,}' is working)
> awk -F' {2,}' 'NR==1{print $0}' file1.csv | tr " " "\n"
col1
col2
col3
col4
>
But I was hoping to do it with AWK and using the OFS, but not sure if doing it right
> awk -F' {2,}' 'BEGIN{OFS=","} NR==1{print $0}' file1.csv
col1 col2 col3 col4
Other work in this space for my ref
I want to do something like this but with
>cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
> awk -F, 'NR==1{print $0}' file.csv
col1,col2,col3,col4
> awk -F, 'NR==1{print $0}' file.csv | tr "," "\n"
col1
col2
col3
col4
> awk -F, 'NR==1{print $0}' file.csv | tr "," "\n" | cat -n
1 col1
2 col2
3 col3
4 col4
I can just use sed to remove blank lines but I want to use awk OFS as above:
> awk -F' {2,}' 'NR==1{print $0}' file1.csv | tr " " "\n" | sed '/^[[:space:]]*$/d'
col1
col2
col3
col4
>

Assumptions:
we're only interested in the 1st line of the file
each field/column of the (1st) line is to be printed on a separate line
Sample inputs:
$ head file?.csv
==> file1.csv <==
col1 col2 col3 col4
col1 col2 col3
col1 col2
col1
col1 col2 col3 col4 col5
==> file2.csv <==
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
For the 1st file (file1.csv) we can use the default input field delimiter (ie, white space):
$ awk '{for (i=1;i<=NF;i++) print $i; exit}' file1.csv
col1
col2
col3
col4
For the 2nd file (file2.csv) we use an input field delimiter of a comma (,):
$ awk -F',' '{for (i=1;i<=NF;i++) print $i; exit}' file2.csv
col1
col2
col3
col4
NOTE: in neither of these cases do we need to worry setting the output field separator (OFS)
If we absolutely, positively need to set, and use, a non-default OFS:
$ awk 'BEGIN{OFS="\n"} {$1=$1; print; exit}' file1.csv
col1
col2
col3
col4
$ awk -F',' 'BEGIN{OFS="\n"} {$1=$1; print; exit}' file2.csv
col1
col2
col3
col4

Given:
$ head file?.csv
==> file1.csv <==
col1 col2 col3 col4
col1 col2 col3
col1 col2
col1
col1 col2 col3 col4 col5
==> file2.csv <==
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
You can use head and sed.
For a space / tab separated file:
$ head -1 file1.csv | sed -E 's/[[:blank:]]{1,}/\n/g'
col1
col2
col3
col4
Comma separated:
$ head -1 file2.csv | sed -E 's/,/\n/g'
col1
col2
col3
col4
You can also skip head altogether and use sed to 1) find line n, 2) do the replacement; 3) quit. Here for the fifth line:
sed -nE '5{s/([[:blank:]]{1,})/\n/g; p; q; }' file1.csv
col1
col2
col3
col4
col5
Just use 1 for the first line.
Or similarly with awk:
$ awk -v ln=5 -F"[[:blank:]]{1,}" 'FNR==ln{for(i=1;i<=NF;i++) print $i; exit}' file1.csv

if you just wanna de-dupe it all down to col1 to col5, try :
1 col1 col2 col3 col4
2 col1 col2 col3
3 col1 col2
4 col1
5 col1 col2 col3 col4 col5
6 col1,col2,col3,col4
7 col1,col2,col3
8 col1,col2
9 col1
10 col1,col2,col3,col4,col5
{m,g,n}awk '!__[$0]++' RS='[,[:space:]]+'
col1
col2
col3
col4
col5

Related

using OFS with awk on file with multiple spaces as delimiter

I have file with multiple spaces as the delimiter.
> cat file1.csv
col1 col2 col3 col4
col1 col2 col3
col1 col2
col1
col1 col2 col3 col4 col5
This is the output I want but with no empty new lines (which makes me wonder if my -F' {2,}' is working)
> awk -F' {2,}' 'NR==1{print $0}' file1.csv | tr " " "\n"
col1
col2
col3
col4
>
But I was hoping to do it with AWK and using the OFS, but not sure if doing it right
> awk -F' {2,}' 'BEGIN{OFS=","} NR==1{print $0}' file1.csv
col1 col2 col3 col4
Other work in this space for my ref
I want to do something like this but with
>cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
> awk -F, 'NR==1{print $0}' file.csv
col1,col2,col3,col4
> awk -F, 'NR==1{print $0}' file.csv | tr "," "\n"
col1
col2
col3
col4
> awk -F, 'NR==1{print $0}' file.csv | tr "," "\n" | cat -n
1 col1
2 col2
3 col3
4 col4
I can just use sed to remove blank lines but I want to use awk OFS as above:
> awk -F' {2,}' 'NR==1{print $0}' file1.csv | tr " " "\n" | sed '/^[[:space:]]*$/d'
col1
col2
col3
col4
>
Assumptions:
we're only interested in the 1st line of the file
each field/column of the (1st) line is to be printed on a separate line
Sample inputs:
$ head file?.csv
==> file1.csv <==
col1 col2 col3 col4
col1 col2 col3
col1 col2
col1
col1 col2 col3 col4 col5
==> file2.csv <==
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
For the 1st file (file1.csv) we can use the default input field delimiter (ie, white space):
$ awk '{for (i=1;i<=NF;i++) print $i; exit}' file1.csv
col1
col2
col3
col4
For the 2nd file (file2.csv) we use an input field delimiter of a comma (,):
$ awk -F',' '{for (i=1;i<=NF;i++) print $i; exit}' file2.csv
col1
col2
col3
col4
NOTE: in neither of these cases do we need to worry setting the output field separator (OFS)
If we absolutely, positively need to set, and use, a non-default OFS:
$ awk 'BEGIN{OFS="\n"} {$1=$1; print; exit}' file1.csv
col1
col2
col3
col4
$ awk -F',' 'BEGIN{OFS="\n"} {$1=$1; print; exit}' file2.csv
col1
col2
col3
col4
Given:
$ head file?.csv
==> file1.csv <==
col1 col2 col3 col4
col1 col2 col3
col1 col2
col1
col1 col2 col3 col4 col5
==> file2.csv <==
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
You can use head and sed.
For a space / tab separated file:
$ head -1 file1.csv | sed -E 's/[[:blank:]]{1,}/\n/g'
col1
col2
col3
col4
Comma separated:
$ head -1 file2.csv | sed -E 's/,/\n/g'
col1
col2
col3
col4
You can also skip head altogether and use sed to 1) find line n, 2) do the replacement; 3) quit. Here for the fifth line:
sed -nE '5{s/([[:blank:]]{1,})/\n/g; p; q; }' file1.csv
col1
col2
col3
col4
col5
Just use 1 for the first line.
Or similarly with awk:
$ awk -v ln=5 -F"[[:blank:]]{1,}" 'FNR==ln{for(i=1;i<=NF;i++) print $i; exit}' file1.csv
if you just wanna de-dupe it all down to col1 to col5, try :
1 col1 col2 col3 col4
2 col1 col2 col3
3 col1 col2
4 col1
5 col1 col2 col3 col4 col5
6 col1,col2,col3,col4
7 col1,col2,col3
8 col1,col2
9 col1
10 col1,col2,col3,col4,col5
{m,g,n}awk '!__[$0]++' RS='[,[:space:]]+'
col1
col2
col3
col4
col5

Scatter multiple rows having duplicate columns to single unique row in postgresql

how to scatter multiple duplicate rows into one row in sql/postgresql.
For example --->
lets i am getting 3 rows of
col1 col2 col3
-------------------
11 test rat
11 test cat
11 test test
I want something like this
col1 col2 col3 col4
------------------------
11 test rat cat
Its the same thing like groupby in lodash. But how do I achieve the same in postgresql query?
You're looking for crosstab
postgres=# create table ab (col1 text, col2 text, col3 text);
CREATE TABLE
postgres=# insert into ab values ('t1','test','cat'),('t1','test','rat'),('t1','test','test');
INSERT 0 3
postgres=# select * from crosstab('select col1,col2,col3 from ab') as (col1 text, col2 text, col3 text, col4 text);
col1 | col2 | col3 | col4
------+------+------+------
t1 | cat | rat | test
(1 row)
Disclosure: I work for EnterpriseDB (EDB)

How can I import one column in a table to another table on sql?

I have the table
| Col1 | Col2 | Col3 |
I want a new table with these values
| Col4 | Col5 | Col2 | Col6 |
With any values of col2 onto the new table
Thanks!
If It Is Just About To Import All Data Of Col2 In Another Table ,Then
Suppose TB1 has Columns (Col1 , Col2 , Col3)
and TB2 has Columns (Col4 , Col5 , Col2 , Col6)
Then Your Quert Would Be Like This ,
Insert INTO TB2(Col2)(Select Col2 From TB1);
else
Please Give More Specification Like You Want To Update Data That is Not In TB2 Like That.

How to create column values by looking at other columns in a table? SQL

I have three columns in a table.
Requirements: The value of col2 and col3 should make col1.
Below shows the table I have right now, which needs to be change.
col1 col2 col3
AB football
AB football
ER driving
ER driving
TR city
TR city
Below shows the table that needs to be change to
col1 col2 col3
AB_football_1 AB football
AB_football_2 AB football
ER_driving_1 ER driving
ER_driving_2 ER driving
TR_city_1 TR city
TR_city_2 TR city
As you can see in col1, it should take col2, put (underscore), then col3, put (underscore) then increment the number according to the values in col2 and col3.
Can this be approached within CREATE or SELECT or INSERT statement or Trigger Statement, if so any tips would be grateful.
Try as
SELECT col2
||'_'
||col3
||'_'
||rank col1,
col2,
col3
FROM (SELECT col2,
col3,
ROW_NUMBER()
OVER(
PARTITION BY col2, col3
ORDER BY col2) rank
FROM my_table)
Output
+---------------+------+----------+
| COL1 | COL2 | COL3 |
+---------------+------+----------+
| AB_football_1 | AB | football |
| AB_football_2 | AB | football |
| ER_driving _1 | ER | driving |
| ER_driving _2 | ER | driving |
| TR_city _1 | TR | city |
| TR_city _2 | TR | city |
+---------------+------+----------+
/* table is */
col1 col2 col3
test 123
/* Try this query */
UPDATE `demo`
SET `col1` = concat(col2, '_', col3)
/* Output will be */
col1 col2 col3
test_123 test 123
This is easy to do (at SELECT) using row_number() window function , something like this:
select
col2 ||'_'|| col3 ||'_'|| row_number() over(partition by col2, col3 order by col2) as col1,
col2,
col3
from t

How to get unique rows in one file comparing with multiple files in awk

I have tab-delimited files as shown below and would like to get the output as described below. I tried to some extent with the below commands but could not reach the final task. The description is slightly lengthy to make the question clear.
file1.txt
col1 col2 col3 col4 col5
ID1 str1 234 cond1 0
ID1 str2 567 cond1 0
ID1 str3 789 cond1 1
ID1 str4 123 cond1 1
file2.txt
col1 col2 col3 col4 col5
ID2 str1 235 cond1 0
ID2 str2 567 cond2 1
ID2 str3 789 cond1 1
ID2 str4 123 cond2 0
file3.txt
col1 col2 col3 col4 col5
ID3 str1 235 cond1 0
ID3 str2 567 cond2 1
ID3 str3 789 cond1 1
I would like to find the unique rows in file1.txt when compared with rest of the files, file2.txt and file3.txt. Columns col2 and col3 are used as keys to search. I have an additional condition to delete only if col4="cond1" though the keys col2 and col3 are found in file2.txt and file3.txt. Below is the code and output:
awk -F "\t" 'NR == 1 { OFS="\t"; print $0; next }
NR == FNR { a[$2,$3] = $0; next }
{ if ($4=="cond1") delete a[$2, $3] }
END { for (i in a) print a[i] }' file1.txt file2.txt file3.txt
Output:
col1 col2 col3 col4 col5
ID1 str1 234 cond1 0
ID1 str2 567 cond1 0
ID1 str4 123 cond1 0
Now, I would like to add additional columns with a list of col1 values and a count of col1 values from the files which do not meet the condition $4=="cond1" in file2.txt and file3.txt.
DESIRED OUTPUT
col1 col2 col3 col4 col5 col6 col7
ID1 str1 234 cond1 0 NA NA
ID1 str2 567 cond1 0 ID2,ID3 2
ID1 str4 123 cond1 0 ID2 1
Though str2 and 567 are present in file2.txt and file3.txt, the row from file1.txt is retained since col=="cond2" in file2.txt and file3.txt. Now the issue is to get those additional columns col6 and col7. Any idea?
NOTE: This is test case where file1 is compared with file2 and file3. In the real scenario, there will be more files to compare with file.
awk -vOFS="\t" '!c{c=$0"\tcol6\tcol7";next}NR==FNR{a[$2$3]=$0;next}{if($4=="cond1"){delete a[$2$3]}else{b[$2$3]=b[$2$3]?b[$2$3]","$1:$1}}END{print c;for(i in a){s=split(b[i],t,",");if(!s){b[i]=s="NA"}print a[i],b[i],s}}' a b c
col1 col2 col3 col4 col5 col6 col7
ID1 str2 567 cond1 0 ID2,ID3 2
ID1 str1 234 cond1 0 NA NA
ID1 str4 123 cond1 1 ID2 1