Creating scripts for obtaining data from a text file - awk

I have a text file named stat.txt which contains lines each in the format
<User Name>-<IP>-<File Name>-<Size>. Each line contains a user name,an IP address,a file name and a download file size.I need to create a script userstat.awk which allows the following data to be obtained when the specific command is written:
userstat.awk u -will list all files
userstat.awk total -will list total size of all files
So far,I have tried to list all the files for a user using default commands but I can't do it using these commands.

Given stat.txt:
user-1.1.1.1-file.jpg-20
root-1.1.1.1-file.jpg-20
user-1.1.1.1-img.jpg-20
root-1.1.1.1-thing.jpg-20
You could use the command (improved by #ClasesWikner):
awk -F- '{print $3; s+=$4}END {print "total: " s}' stat.txt
To output:
file.jpg
file.jpg
img.jpg
thing.jpg
total: 80
As mentioned by #Scheff this will not work when usernames or file names contain a -.

Related

awk: if pattern is matched append some data

I have a data set created by a tool with file name test.deg. The file contents is as follows:
1 I0.XPDIN1 1.581e-01 1.507e-01 3.662e-04 3.891e-02
2 I0.XPXA1 1.577e-01 1.502e-01 3.653e-04 3.859e-02
3 I0.XPXA2 1.538e-01 1.444e-01 3.552e-04 3.471e-02
I have a second file ,test.spf, containing the following information:
XPDIN1 XPDIN1#d XPDIN1#g XPDIN1#s VPP
XPXA1 XPXA1#d XPXA1#g XPXA1#s VPP
XPXA2 XPXA2#d XPXA2#g XPXA2#s VPP
I am trying to write an awk script that matches the Instance name from test.deg to the instance name in test.spf. When the script sees a match I would like the 5th column's contents appended to that matched instance name's line end. Example output for I0.XPDIN1 in test.deg would be XPDIN1 XPDIN1#d XPDIN1#g XPDIN1#s VPP 3.662e-04
The script needs to match the instance name from test.deg after the prefix I0. to the first instance name call in test.spf then add the 5th columns data.
Thanks,
Bad Awk
GNU Awk
$ awk 'FNR==NR{a[$2]=$5; next} ("I0."$1 in a){$6=a["I0."$1]}1' test.deg test.spf
XPDIN1 XPDIN1#d XPDIN1#g XPDIN1#s VPP 3.662e-04
XPXA1 XPXA1#d XPXA1#g XPXA1#s VPP 3.653e-04
XPXA2 XPXA2#d XPXA2#g XPXA2#s VPP 3.552e-04

How to replace strings in text with id from second text?

I've got two CSV files. The first file contains organism family names and connection weight information but I need to change the format of the file to load it into different programs like Gephi. I have created a second file where each family has an ID value. I haven't found a good example on this site on how to change the family names in the first file to the ids from the second file. Example of my files:
$ cat edge_file.csv
Source,Target,Weight,Type,From,To
Argasidae,Alcaligenaceae,0.040968439,undirected,A_Argasidae,B_Alcaligenaceae
Argasidae,Burkholderiaceae,0.796351574,undirected,A_Argasidae,B_Burkholderiaceae
Argasidae,Methylophilaceae,0.276912259,undirected,A_Argasidae,B_Methylophilaceae
Argasidae,Oxalobacteraceae,0.460508445,undirected,A_Argasidae,B_Oxalobacteraceae
Argasidae,Rhodocyclaceae,0.764558003,undirected,A_Argasidae,B_Rhodocyclaceae
Argasidae,Sphingomonadaceae,0.70198002,undirected,A_Argasidae,B_Sphingomonadaceae
Argasidae,Zoogloeaceae,0.034648156,undirected,A_Argasidae,B_Zoogloeaceae
Argasidae,Agaricaceae,0.190482976,undirected,A_Argasidae,F_Agaricaceae
Argasidae,Bulleribasidiaceae,0.841600859,undirected,A_Argasidae,F_Bulleribasidiaceae
Argasidae,Camptobasidiaceae,0.841600859,undirected,A_Argasidae,F_Camptobasidiaceae
Argasidae,Chrysozymaceae,0.190482976,undirected,A_Argasidae,F_Chrysozymaceae
Argasidae,Cryptococcaceae,0.055650172,undirected,A_Argasidae,F_Cryptococcaceae
$ cat id_file.csv
Id,Family
1,Argasidae
2,Buthidae
3,Alcaligenaceae
4,Burkholderiaceae
5,Methylophilaceae
6,Oxalobacteraceae
7,Rhodocyclaceae
8,Oppiidae
9,Sphingomonadaceae
10,Zoogloeaceae
11,Agaricaceae
12,Bulleribasidiaceae
13,Camptobasidiaceae
14,Chrysozymaceae
15,Cryptococcaceae
I basically want the edge_file.csv output to turn into the output below, where Source and Target have changed from family names to ids instead.
Source,Target,Weight,Type,From,To
1,3,0.040968439,undirected,A_Argasidae,B_Alcaligenaceae
1,4,0.796351574,undirected,A_Argasidae,B_Burkholderiaceae
1,5,0.276912259,undirected,A_Argasidae,B_Methylophilaceae
1,6,0.460508445,undirected,A_Argasidae,B_Oxalobacteraceae
1,7,0.764558003,undirected,A_Argasidae,B_Rhodocyclaceae
1,9,0.70198002,undirected,A_Argasidae,B_Sphingomonadaceae
1,10,0.034648156,undirected,A_Argasidae,B_Zoogloeaceae
1,11,0.190482976,undirected,A_Argasidae,F_Agaricaceae
1,12,0.841600859,undirected,A_Argasidae,F_Bulleribasidiaceae
1,13,0.841600859,undirected,A_Argasidae,F_Camptobasidiaceae
1,14,0.190482976,undirected,A_Argasidae,F_Chrysozymaceae
1,15,0.055650172,undirected,A_Argasidae,F_Cryptococcaceae
I haven't been able to figure it out with awk since I'm new to it, but I tried some variations from other examples here such as (just testing it out for the "Source" column):
awk 'NR==FNR{a[$1]=$1;next}{$1=a[$1];}1' edge_file.csv id_file.csv
Everything just prints out blank. My understanding is that I should create an array for the Source and Target columns in the edge_file.csv, and then replace it with the first column from the id_file.csv, which is the Id column. Can't get the syntax to work even for just one column.
You're close. This oneliner should help:
awk -F, -v OFS=',' 'NR==FNR{a[$2]=$1;next}{$1=a[$1];$2=a[$2]}1' id_file.csv edge_file.csv

How can I insert the content of the variable into single quotes inside the INSERT INTO command?

I created a text file. The name of this is "test.txt" and the content is first part below. I also created script with the name insert.sh.
I run the command with ./insert.sh test.txt.
If the words / strings are in single quotes, it will insert the words into the columns. Also it will insert numbers without single quotes. The csv that I will eventually use won't have single quotes and I don't want to change the data.
How can I insert the content of the variable into single quotes inside the INSERT INTO command?
I am using psql.
Text file, test.txt
'one','ten','hundred'
'two','twenty','twohundred'
Script, insert.sh:
#!/bin/bash
while read cell
do
name=$cell
echo "$cell"
####Insert from txt into table####
sudo -u username -H -- psql -d insert_test -c "
INSERT INTO first (ten, hundred, thousend) VALUES ($cell);
"
done < $1
something like this:
INSERT INTO first (ten, hundred, thousend) VALUES (INSERT" $cell "QUOTES);
UPDATE:
I changed the code. I added the single quotes around $cell as you suggested.
#!/bin/bash
while read cell
do
name=$cell
echo "$cell"
####Insert from txt into table####
sudo -u username -H -- psql -d insert_test -c "
INSERT INTO first (ten, hundred, thousend) VALUES ('$cell');
"
done < $1
and I removed the quotes out of the text file since the csv file that I want to use later wont have any single quotes.
new text file.
one,ten,hundred
two,twenty,twohundred
and im getting the error:
one,two,three
ERROR: INSERT has more target columns than expressions
LINE 2: INSERT INTO first (ten, hundred, thousend) VALUES ('one,two,...
You need to modify the $IFS (Internal Field Separator) variable to determine the line separator used by Bash. Since you used a CSV like file, you IFS come to , character, thus this is the result $IFS=,. Note that if you need to do others stuff in you script, you need to redefine the $IFS var to the original state, so you need to store it in an temportal variable before you change it, something like $OLDIFS=$IFS.
readline read the entire line and separate the values depending on $IFS var, thus you need to write the adecauted quantity of var where readline will store the words, i.e., if you line have 3 words, you need to give 3 vars to readline, e.g.: file: foo,baz,bar, readline -r word1 word2 word3. If you don't give the correct amount of vars, readline will store the rest of word in a single var, that is your problem.
So, a solution to your problem would be:
#!/bin/bash
$OLDIFS=$IFS # If you need to do more stuff.
while IFS=, read -r word1 word2 word3
do
sudo -u username -H -- psql -d insert_test -c
"INSERT INTO first (ten, hundred, thousend) VALUES (${word1}, ${word2}, ${word3});"
done < $1
$IFS=$OLDIFS # Same of line 2.
# ...
NOTE: This is insecure because lead with easily to a SQL injection. If you use this, only use in a local database that don't have any sensetive data.

Removing steric (*) from the end of a fasta sequence in a multi fasta file

I have a multifasta file containi g predicted proteins from 2 abinitio tools. Every sequence contains a steric (*) in the end. I want to remove it from the file. my sequences are like this:
>snapgene1
SFLPSAEAIEKVLSHMSRRIIDDMKAELQQPEMRWFWP*
>snapgene2
SFLPSAEAIEKVLSHIIIIAAAAKKKPPFFDDMKAELQQPEMRWFWP*
i want the sequences like this :
>snapgen1
SFLPSAEAIEKVLSHMSRRIIDDMKAELQQPEMRWFWP
>snapgene2
SFLPSAEAIEKVLSHIIIIAAAAKKKPPFFDDMKAELQQPEMRWFWP
Can anyone help me in this. Thankyou
If the text stored in a file "temp.txt",you can use command :
sed -i "s/*$//" temp.txt
In awk, if you keep your fastas in file:
$ awk '{sub(/\*$/,"")}1' file
>snapgene1
SFLPSAEAIEKVLSHMSRRIIDDMKAELQQPEMRWFWP
>snapgene2
SFLPSAEAIEKVLSHIIIIAAAAKKKPPFFDDMKAELQQPEMRWFWP
It replaces trailing * with nothing.

Executing the SQL from shell scripting

I have a table called query_master table which has 4 columns and the 4th column has SQL query as values. In total there are 5 entries in the query table.
Table Structure:
S.No --> Key --> Title --> Query
1 100 EG select * from dual
Now my objective is, I have to fetch the SQL queries using shell script from the query_master and execute it. The output of that each SQL query should be written on a separate log file, and the log filename should be equal to the name of the title.
Can you please help in achieving this scenario using stored procedures or stored functions which will be more helpful for me.
I need to achieve this using shell scripting.
Try this, assuming you're using mysql:
awk -F'\t' 'NR!=1 {system("mysql -u user -p -e " $4 " database")}' file
Where file is the file containing the table, user is the user and database is the database. Alternatively set these as variables instead of hard coding them like this:
awk -F'\t' -v db="database" -v user="user" 'NR!=1 {system(""mysql -u " user " -p -e " $4 " " db)}' file
Make a shell script that accepts a SQL statement from commandline (or inputfile or stdin) and does all things for you like exporting ORACLE_HOME, tnsnames, username, password, redirecting output, calling sqlplus, output formatting, deleting column headers and other sqlplus settings.
With your magicsql.sh (after testing), aim for a solution like
magicsql.sh "select key, query from query_master order by key" | while read key query; do
magicsql.sh "${query}" > /tmp/${key}.out
done