Is there a query using SQLite to count valid unique email address from 3 separate email address field To, CC, BCC? - sql

I have the following query I'm working with which results 1 per row but I know there is more than one email address stored within the field separated by semicolon
SELECT UID, EmailToField,
EmailToField REGEXP '[a-zA-Z0-9+._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+' AS valid_emailTo
FROM table
For example my DB have
UID
EmailTo
EmailCC
EmailBCC
001
emailTo_1#domain.com; emailTo_2#domain.com
emailCC_1#domain.com
EmailBcc1#domain.com
Expecting results to show
UID
validEmailToCcBcc_count
001
4

Used AWK instead of SQL to obtain results, the following worked!
awk '{print NR " " gsub(/[a-zA-Z0-9+._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+/, "")}' test.csv > results.csv

Related

strugling with awk script need help to done this just need your suggestion or logic

I have a sql file to filter the data
-- Edit this file by adding your SQL below each question.
-------------------------------------------------------------------------------
-------------------------------------------------------------
-- The following queries are based on the 1994 census data.
-------------------------------------------------------------
.read 1994
-census-summary-1.sql
-- 4. what is the average age of people from China?
select avg(age)
from census
where native_country ='China';
-- 5. what is the average age of people from Taiwan?
select avg(age)
from census
where native_country ='Taiwan';
-- 6. which native countries have "land" in their name?
select distinct(native_country)
from census
where native_country like '%land%';
--------------------------------------------------------------------------------------
-- The following queries are based on the courses-ddl.sql and courses-small.sql data
--------------------------------------------------------------------------------------
drop table census;
.read courses-ddl.sql
.read courses-small-1.sql
-- 11. what are the names of all students who have taken some course? Don't show duplicates.
select distinct(name)
from student
where tot_cred > 0;
-- 12. what are the names of departments that offer 4-credit courses? Don't list duplicates.
select distinct(dept_name)
from course
where credits=4;
-- 13. What are the names and IDs of all students who have received an A in a computer science class?
select distinct(name), id
from student natural join takes natural join course
where dept_name="Comp. Sci." and grade="A";
if I run
./script.awk -v ID=6 file.sql
Note that the problem id is passed to the awk script as variable ID on the command line, like this:
-v ID=6
How Can I get the result like
Result :
select distinct(native_country) from census where native_country like '%land%';
With your shown samples and in GNU awk, please try following GNU awk code using its match function. Where id is an awk variable has value which you want to make sure should be checked in lines of your Input_file. Also I have used exit to get/print the very first match and get out of program to save some time/cycle, in case you have more than one matches then simply remove it from following code.
awk -v RS= -v id="6" '
match($0,/(\n|^)-- ([0-9]+)\.[^\n]*\n(select[^;]*;)/,arr) && arr[2]==id{
gsub(/\n/,"",arr[3])
print arr[3]
exit
}
' Input_file
One option with awk could be matching the start of the line with -- 6. where 6 is the ID.
Then move to the next line, and set a variable that the start of the part that you want to match is seen
Then print all lines that do not start with a space and are seen.
Set seen to 0 when encountering an "empty" line
Concatenate the lines that you want in the output as a single line, and at the end remove the trailing space.
gawk -v ID=6 '
match($0, "^-- "ID"\\.") {
seen=1
next
}
/^[[:space:]]*$/ {
seen=0
}
seen {
a = a $0 " "
}
END {
sub(/ $/, "", a)
print a
}
' file.sql
Or as a single line
gawk -v ID=6 'match($0,"^-- "ID"\\."){seen=1;next};/^[[:space:]]*$/{seen=0};seen{a=a$0" "};END{sub(/ $/,"",a);print a}' file.sql
Output
select distinct(native_country) from census where native_country like '%land%';
Another option with gnu awk setting the row separator to an "empty" line and using a regex with a capture group to match all lines after the initial -- ID match that do not start with a space
gawk -v ID=6 '
match($0, "\\n-- "ID"\\.[^\\n]*\\n(([^[:space:]][^\\n]*(\\n|$))*)", m) {
gsub(/\n/, " ", m[1])
print m[1]
}
' RS='^[[:space:]]*$' file

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

SQL - When a column has a value from a list and a value not in that same list

Not sure the best way to word this but I'm looking for a way to specify a condition when a value in a column has at least one value in a given list AND avalue not in the same list, then that column's value should show up. An example table:
email program
john#john.com program1
john#john.com program2
john#john.com program3
jeff#jeff.com program3
jeff#jeff.com program4
steve#steve.com program1
steve#steve.com program2
If I have this table and a list of (program1, program2), I would like the corresponding email to show up if the programs associated with a given email match at least one in the given list AND if the given email has a program NOT in the given list
So for the table above and the given list above all we would have show up with the correct query would be:
email
john#john.com
Any help on this would be greatly appreciated. Note: this would be in Redshift/PostgreSQL
I like doing this with group by and having. Here is a pretty general approach:
select email
from t
group by email
having sum( (program = 'program1')::int ) > 0 and
sum( (program = 'program2')::int ) = 0;
In this case, "program1" is required and "program2" is not. And, you can keep adding conditions -- as many as you like.
I forget if Redshift supports the :: syntax. You can always express this using standard SQL:
having sum( case when program = 'program1' then 1 else 0 end ) > 0 and
sum( case when program = 'program2' then 1 else 0 end ) = 0;
EDIT:
I think #dnswit is right on the parsing of the OP's question. The logic would be:
having sum( (program in ('program1', 'program2'))::int ) > 0 and
sum( (program not in ('program1', 'program2'))::int ) > 0;
if you just want a single list of emails no matter how many times they are on the list by having multiple programs
it is just select distinct email from tablename
First your Data Table is constructed wrong, you should use an unique Identifier so you can retrieve the program version you are specifying.
so your database should look like this:
> email program1 program2 program3
john#john.com ProgVersion1 ProgVersion2 ProgVersion3
steve#steve.com ProgVersion1 ProgVersion2 ProgVersion3
If you notice of the table above you can now query to get the program value you need for the specified Email. Use SQL Query, your Data Fields for your table are email, Program 1 Program 2 Program 3, when retrieving the value of the fields to be displayed, you are using redundancy you do not need to repeat the email address multiple times for each version of the program. This would not be expectable methodology.
SQL Query you can use:
instructions: you will create a parameter to use as a variable to query the data table from the list.
> CREATE PROCEDURE spLoadMyProgramVersion
>
> #email nvarchar(50),
>
> AS
>
>BEGIN
>SELECT program1,program2,program3
>FROM MyTableName
>WHERE (email LIKE #email) RETURN
This will allow you to load all your program version in a list by just specify the email address you want to load, this is a loading stored procedure just use it when you make a SQLCommand Object you can call your stored procedure.

KSH-update : Get rows affected

Hello I'm querying a teradata database like that :
for var in `db2 -x "$other_query"`;
do
query_update_date="update test SET date =Null WHERE
name_test='$var '"
db2 -v "$query_update_date"
done
My query is executed but what I would like to print the query_update_date only when one row or more is affected (changed ) by update.
Example :
If I have
First query of loop :
query_update_date="update test SET date =Null WHERE
name_test='John'"
and second query of the loop :
query_update_date="update test SET date =Null WHERE
name_test='Jeff'"
and in my table before the query :
name_test date
Jeff 01/07/2016
John Null
After the query
name_test date
Jeff Null
John Null
The date from John was already null , so it hasn't been affected by update.
And
db2 -v "$query_update_date"
print my queries. What I want for previous example is to print in my logs only
query_update_date="update test SET date =Null WHERE
name_test='Jeff'"
Take snapshots of the table before and after you execute the query. Use whatever tools you like: how about (assuming SQL) SPOOL a "SELECT * FROM T" query into this file beforehand, and into that file afterwards. Use the UNIX diff command to compare the two files, and merely count the length of its output:
LINES_OUT=$(diff oldResults newResults | wc -l)
if [[ $LINES_OUT = 0 ]]
then
# log the query, however you do that
fi
If $IS_ANY_DIFF is true, log the query; otherwise, don't.

How to ignore null SQL results within BASH?

I wrote a script that connects to the database and spools a list of orders being delivered to a certain customer and e-mails it to them, but some days the customer won't have any being delivered so they get a blank attachment.
What's the best way so if the SQL results are null then it should skip the customer?
In the script it loops through the account numbers set as a variable:
accounts=100...
for i in $accounts {
do
do_data $i
do_mail $i
}
SQL like:
do_data () {
sqlplus -s "$user/$pass#$db" <<EOF
SPOOL $1.csv
SELECT order_no
FROM orders
WHERE customer_number = $1
}
Basically if do_data outputs nothing then it shouldn't get as far as do_mail.
I can see that results are spooled into $1.csv file
Check if the file is not empty and based on that either call do_mail function or not
Make sure that Oracle will not return anything if no data is found
Another way is to alter SQL statement to return string "NO DATA FOUND"
You can check if such string is in output and act accordingly