AWK multiline match - awk

I have read more topics about my problem but nothing has resolved that!
I have these text in my_text file:
Address dlkjfhadvkahvealkjvhfelkafver
Phone 4752935729527297
Discription fkdshkhglkhrtlghltkg
Misc 5897696h8ghgvjhgh578hg
Address klsfghtrgjgjktsrljgsjgm
Phone 5789058309809583
Discription dskjfvhfhgjvnwmrew
Misc h09v3n3vt7957jt795783hj
.....
.....
.....
And I want to filter this file data by 3 (or more) line value such as Address, Phone, Misc.
I test awk '/Address/,/Phone/,/Misc/' my_text but error!

You need to use or | operator.
Matched line:
awk '/Address|Phone|Misc/{ print $0 }' your_text
Result:
Address dlkjfhadvkahvealkjvhfelkafver
Address dlkjfhadvkahvealkjvhfelkafver
Phone 4752935729527297
Misc 5897696h8ghgvjhgh578hg
Address klsfghtrgjgjktsrljgsjgm
Phone 5789058309809583
Misc h09v3n3vt7957jt795783hj
if you print $1 you get Address, phone or what you matched. And $2 will print your values only.

Related

Is there a query using SQLite to count valid unique email address from 3 separate email address field To, CC, BCC?

I have the following query I'm working with which results 1 per row but I know there is more than one email address stored within the field separated by semicolon
SELECT UID, EmailToField,
EmailToField REGEXP '[a-zA-Z0-9+._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+' AS valid_emailTo
FROM table
For example my DB have
UID
EmailTo
EmailCC
EmailBCC
001
emailTo_1#domain.com; emailTo_2#domain.com
emailCC_1#domain.com
EmailBcc1#domain.com
Expecting results to show
UID
validEmailToCcBcc_count
001
4
Used AWK instead of SQL to obtain results, the following worked!
awk '{print NR " " gsub(/[a-zA-Z0-9+._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+/, "")}' test.csv > results.csv

strugling with awk script need help to done this just need your suggestion or logic

I have a sql file to filter the data
-- Edit this file by adding your SQL below each question.
-------------------------------------------------------------------------------
-------------------------------------------------------------
-- The following queries are based on the 1994 census data.
-------------------------------------------------------------
.read 1994
-census-summary-1.sql
-- 4. what is the average age of people from China?
select avg(age)
from census
where native_country ='China';
-- 5. what is the average age of people from Taiwan?
select avg(age)
from census
where native_country ='Taiwan';
-- 6. which native countries have "land" in their name?
select distinct(native_country)
from census
where native_country like '%land%';
--------------------------------------------------------------------------------------
-- The following queries are based on the courses-ddl.sql and courses-small.sql data
--------------------------------------------------------------------------------------
drop table census;
.read courses-ddl.sql
.read courses-small-1.sql
-- 11. what are the names of all students who have taken some course? Don't show duplicates.
select distinct(name)
from student
where tot_cred > 0;
-- 12. what are the names of departments that offer 4-credit courses? Don't list duplicates.
select distinct(dept_name)
from course
where credits=4;
-- 13. What are the names and IDs of all students who have received an A in a computer science class?
select distinct(name), id
from student natural join takes natural join course
where dept_name="Comp. Sci." and grade="A";
if I run
./script.awk -v ID=6 file.sql
Note that the problem id is passed to the awk script as variable ID on the command line, like this:
-v ID=6
How Can I get the result like
Result :
select distinct(native_country) from census where native_country like '%land%';
With your shown samples and in GNU awk, please try following GNU awk code using its match function. Where id is an awk variable has value which you want to make sure should be checked in lines of your Input_file. Also I have used exit to get/print the very first match and get out of program to save some time/cycle, in case you have more than one matches then simply remove it from following code.
awk -v RS= -v id="6" '
match($0,/(\n|^)-- ([0-9]+)\.[^\n]*\n(select[^;]*;)/,arr) && arr[2]==id{
gsub(/\n/,"",arr[3])
print arr[3]
exit
}
' Input_file
One option with awk could be matching the start of the line with -- 6. where 6 is the ID.
Then move to the next line, and set a variable that the start of the part that you want to match is seen
Then print all lines that do not start with a space and are seen.
Set seen to 0 when encountering an "empty" line
Concatenate the lines that you want in the output as a single line, and at the end remove the trailing space.
gawk -v ID=6 '
match($0, "^-- "ID"\\.") {
seen=1
next
}
/^[[:space:]]*$/ {
seen=0
}
seen {
a = a $0 " "
}
END {
sub(/ $/, "", a)
print a
}
' file.sql
Or as a single line
gawk -v ID=6 'match($0,"^-- "ID"\\."){seen=1;next};/^[[:space:]]*$/{seen=0};seen{a=a$0" "};END{sub(/ $/,"",a);print a}' file.sql
Output
select distinct(native_country) from census where native_country like '%land%';
Another option with gnu awk setting the row separator to an "empty" line and using a regex with a capture group to match all lines after the initial -- ID match that do not start with a space
gawk -v ID=6 '
match($0, "\\n-- "ID"\\.[^\\n]*\\n(([^[:space:]][^\\n]*(\\n|$))*)", m) {
gsub(/\n/, " ", m[1])
print m[1]
}
' RS='^[[:space:]]*$' file

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

awk replace serialized number lines and move up other lines

I have a file that has the following format
1 - descrio #944
name
address
2 - desanother #916
name
address
3 - somedes #957
name
address
and i want to get the output as,
Usercode #944, name, address
Usercode #916, name, address
Usercode #957, name, address
With awk
awk 'NR%3 == 1{sub(/^.*#/, "Usercode #")};{ORS=NR%3?", ":"\n"};1' file
Usercode #944, name, address
Usercode #916, name, address
Usercode #957, name, address
For a variable number of rows
awk -v RS='(^|\n)[[:digit:]]+[[:blank:]]*-[[:blank:]]*' '{sub(/\n$/, "");
gsub(/\n/, ", "); printf "%s", $0""RT}END{print ""}' file
If you do not have # in any of your descriptions, try:
sed -e 's/.*#/Usercode #/;N;N;s/\n/, /g' input
You may try this command also,
$ paste -d'~' - - - < ccc | sed 's/^[^#]*/Usercode /g;s/~/, /g'
Usercode #944, name, address
Usercode #916, name, address
Usercode #957, name, address

Separate text and pass it to a SQL

I'm using the latest Debian version.
I have this file:
2301,XT_ARTICLES
2101,XT_HOUSE_PHOTOS
301,XT_PDF
101611,XT_FIJOS
I want to separate this text so I can add the ID and the name to a one SQL. The SQL must be repeated according to the number of lines in the file, but I don't know how can I do it.
Can anybody help me, please?
Is this fit your needs ?
awk -F',' '{print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"}' file.txt
INSERT INTO foobar VALUES(2301, 'XT_ARTICLES');
INSERT INTO foobar VALUES(2101, 'XT_HOUSE_PHOTOS');
INSERT INTO foobar VALUES(301, 'XT_PDF');
INSERT INTO foobar VALUES(101611, 'XT_FIJOS');
If it's ok, just pipe that in MySQL :
awk -F',' '
BEGIN{
print "USE qux;"
}
{
print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"
}' file.txt | mysql