Perl - don't send double sql request - sql

i have a perl script and i don't want to send a double request :
the request is '2018-03-15 12:30:00', 'Metric A', 62 and i want send only one time and not more :
in my mariadb bdd i have double line :
SELECT time, measurement, valueOne FROM `metric_values`;
results :
+---------------------+-----------------+----------+
| time | measurement | valueOne |
+---------------------+-----------------+----------+
| 2018-03-15 12:30:00 | Metric A | 62 |
| 2018-03-15 12:30:00 | Metric A | 62 |
my perl scipt :
use DBI;
open (FILE, 'logfile');
while (<FILE>) {
($word1, $word2, $word3, $word4, $word5, $word6, $word7, $word8, $word9, $word10, $word11, $word12, $word13, $word14) = split(" ");
$word13 =~ s/[^\d.]//g;
if ($word13 > 5) {
if ($word2 eq "Jan") {
$word2 = "01"
}
if ($word2 eq "Feb") {
$word2 = "02"
}
if ($word2 eq "Mar") {
$word2 = "03"
}
if ($word2 eq "Apr") {
$word2 = "04"
}
if ($word2 eq "May") {
$word2 = "05"
}
if ($word2 eq "Jun") {
$word2 = "06"
}
if ($word2 eq "Jul") {
$word2 = "07"
}
if ($word2 eq "Aug") {
$word2 = "08"
}
if ($word2 eq "Sep") {
$word2 = "09"
}
if ($word2 eq "Oct") {
$word2 = "10"
}
if ($word2 eq "Nov") {
$word2 = "11"
}
if ($word2 eq "Dec") {
$word2 = "12"
}
print "'$word5-$word2-$word3 $word4', $word11, $word13 \n";
}
# Connect to the database.
my $dbh = DBI->connect("DBI:mysql:database=db;host=ip",
"titi", 'mp!',
{'RaiseError' => 1}) ;
my $sth = $dbh->prepare(
"INSERT `metric_values` (time, measurement, valueOne) VALUES('$word5-$word2-$word3 $word4', $word11, $word13);")#result is ('2018-03-15 12:30:00', 'Metric A', 62)
or die "prepare statement failed: $dbh->errstr()";
$sth->execute() or die "execution failed: $dbh->errstr()";
print $sth->rows . " rows found.\n";
$sth->finish;
my log file:
Wed Oct 17 04:57:08 2018 : Resource = 'toto' cstep= 'titi' time =23.634s
Wed Oct 17 04:57:50 2018 : Resource = 'toto' cstep= 'titi' time =22.355s
thanks for your response

In a comment, you say this:
i execute this script every 5 minute and that create many same line in the table, i don't want same line in my table
I think this is what is happening.
Every five minutes you run your program. Each time you run the program you use exactly the same log file as input. So the same records get processed every time and new copies of the data are inserted on each run.
There's nothing wrong with your existing code. It's doing exactly what you've asked it to do. It's just not clever enough. You need to make it cleverer. You have a few options.
Remove from the log file the records that have been processed. That way you only insert each record once.
Add a flag to each record in your log file which indicates that it has been added to the database. You can then check that flag when processing the file and only insert records that don't have the flag.
Add an index to your table to ensure that it can only contain one copy of each record. You'll then need to change your code so it ignores any duplicate data errors that you get back from the database.
Use REPLACE instead of INSERT and ensure you have the correct primary key on your table to ensure that duplicate records aren't inserted.
Without knowing a log more about your particular application, it's hard to know which of these options is the best approach for you. I suspect you'll find the REPLACE option the easiest to implement.
Update: I hope you'll find some general comments on your code to be useful.
Your code to open the file works, of course, but it is some distance from current best practice. I recommend a) using a lexical filehandle, b) using the three-arg version of open() and c) checking the return value from the call.
open my $fh, '<', 'logfile'
or die "Could not open 'logfile': $!\n";
Using variables called $word1, $word2, etc is a terrible idea. A better idea would be to use an array:
my #words = split ' ',
If you really want individual variables, then please give them better names:
my ($day, $mon, $date, $time, $year, ... ) = split(' ');
Personally, I'd turn each record into a hash.
my #cols = qw[day mon date time year ... ];
# and then, in your loop
my %record;
#record{#cols} = split ' ';
Converting the month to a number the way you do it is clunky. Consider setting up a conversion hash.
my %months = (
Jan => 1,
Feb => 2,
...
);
Then your code becomes (assuming $mon instead of $word2):
$mon = sprintf '%02d', $months{$mon}
or die "$mon is not a valid month\n";
But, actually, you should use something like Time::Piece to deal with dates and times.
my $timestamp = "$day $mon $date $time $year";
my $tp = Time::Piece->strptime($timestamp, '%a %b %d %H:%M:%S $Y');
say $tp->ymd, ' ', $tp->hms;

Related

QlikView Convert 1753-01-01 00:00:00.000 to NULL

I am trying to convert data from SQL as 1753-01-01 00:00:00.000 to be shown as NULL values in QlikView.
I do the following in the QlikView Load statements -
SET NullTimeStamp = if ($1 = '1753-01-01 00:00:00', null(), $1);
Then use it when LOAD:
LOAD
$(NullTimeStamp(YourDateField1)) AS YOURDATEFIELD1,
$(NullTimeStamp(YourDateField2)) AS YOURDATEFIELD2,
$(NullTimeStamp(YourDateField3)) AS YOURDATEFIELD3
However, I have many fields with Time and Dates in my tables so I was wondering if there is a more elegant way of solving this issue?
Ive done something similar in the past. The idea is to generate part of the load script in a variable and then use this variable as part of the next load script
DummyData:
Load * Inline [
Something1 , Something2, Something3, Something4, Something5
1753-01-01 00:00:00.000, 2, 3, 4, 1753-01-01 00:00:00.000
];
SET NullTimeStamp = if ($1 = '1753-01-01 00:00:00', null(), $1);
// Define a temp table
// that holds list of fields that have to be checked with NullTimeStamp
Fields:
Load * Inline [
FieldNames
Something1
Something2
Something3
Something4
Something5
];
let FieldsConcatenation = '';
// loop through the NullTimeStamp-ed fields
for a = 1 to NoOfRows('Fields')
let f = FieldValue('FieldNames', a);
// concatenate each iteration to form part of the RealLoad table script
let FieldsConcatenation = '$(FieldsConcatenation)' & '$(NullTimeStamp(' & '$(f)' & ')) as ' & Upper('$(f)') & ',' & chr(13);
next
// remove the last comma
let FieldsConcatenation = left('$(FieldsConcatenation)', Index('$(FieldsConcatenation)', ',' , -1) -1);
// we dont need this anymore
Drop Table Fields;
// add FieldsConcatenation variable as part of the load script
RealLoad:
Load
$(FieldsConcatenation),
'a' as LoadTheRestHere
Resident
DummyData;
// we dont need this anymore
Drop Table DummyData;
FieldsConcatenation variable will have the following content:
The original table:
And the final table:

What is the proper way to use IF THEN in AQL?

I'm trying to use IF THEN style AQL, but the only relevant operator I could find in the AQL documentation was the ternary operator. I tried to add IF THEN syntax to my already working AQL but it gives syntax errors no matter what I try.
LET doc = DOCUMENT('xp/a-b')
LET now = DATE_NOW()
doc == null || now - doc.last >= 45e3 ?
LET mult = (doc == null || now - doc.last >= 6e5 ? 1 : doc.multiplier)
LET gained = FLOOR((RAND() * 3 + 3) * mult)
UPSERT {_key: 'a-b'}
INSERT {
amount: gained,
total: gained,
multiplier: 1.1,
last: now
}
UPDATE {
amount: doc.amount + gained,
total: doc.total + gained,
multiplier: (mult < 4 ? FLOOR((mult + 0.1) * 10) / 10 : 4),
last: now
}
IN xp
RETURN NEW
:
RETURN null
Gives the following error message:
stacktrace: ArangoError: AQL: syntax error, unexpected identifier near 'doc == null || now - doc.last >=...' at position 1:51 (while parsing)
The ternary operator can not be used like an if/else construct in the way to tried. It is for conditional (sub-)expressions like you use to calculate mult. It can not stand by itself, there is nothing it can be returned or assigned to if you write it like an if-expression.
Moreover, it would require braces, but the actual problem is that the body contains operations like LET, UPSERT and RETURN. These are language constructs which can not be used inside of expressions.
If I understand correctly, you want to:
insert a new document if no document with key a-b exists yet in collection xb
if it does exist, then update it, but only if the last update was 45 seconds or longer ago
Does the following query work for you?
FOR id IN [ 'xp/a-b' ]
LET doc = DOCUMENT(id)
LET key = PARSE_IDENTIFIER(id).key
LET now = DATE_NOW()
FILTER doc == null || now - doc.last >= 45e3
LET mult = (doc == null || now - doc.last >= 6e5 ? 1 : doc.multiplier)
LET gained = FLOOR((RAND() * 3 + 3) * mult)
UPSERT { _key: key }
INSERT {
_key: key,
amount: gained,
total: gained,
multiplier: 1.1,
last: now
}
UPDATE {
amount: doc.amount + gained,
total: doc.total + gained,
multiplier: (mult < 4 ? FLOOR((mult + 0.1) * 10) / 10 : 4),
last: now
}
IN xp
RETURN NEW
I added _key to INSERT, otherwise the document will get an auto-generated key, which does not seem intended. Using a FOR loop and a FILTER acts like an IF construct (without ELSE). Because this is a data modification query, it is not necessary to explicitly RETURN anything and in your original query you RETURN null for the ELSE case anyway. While yours would result in [ null ], mine produces [ ] (truly empty result) if you try execute the query in quick succession and nothing gets updated or inserted.
Note that it is necessary to use PARSE_IDENTIFIER() to get the key from the document ID string and do INSERT { _key: key }. With INSERT { _key: doc._key } you would run into an invalid document key error in the insert case, because if there is no document xp/a-b, DOCUMENT() returns null and doc._key is therefore also null, leading to _key: null - which is invalid.

Tcl pass variable as parameters to function

Using tcl, i want to pass variable parameters to function.
i tried this code
proc launch_proc { msg proc_name {params {}} } {
puts "params launch_proc is $params \n"
}
proc test { param } {
puts "param test is $param \n"
launch_proc "1.5.2 test param" test_standard {{*}$param param1 param2 param3"
}
}
test value
--> params launch_proc is {*}$param param1 param2 param3"
$param is not evaluated (i use tcl 8.5)
Tcl has support for this using the keyword args as the last parameter in the argument list.
Here is an example directly from the wiki:
proc demo {first {second "none"} args} {
puts "first = $first"
puts "second = $second"
puts "args = $args"
}
demo one
demo one two
demo one two three four five
results in
first = one
second = none
args =
first = one
second = two
args =
first = one
second = two
args = three four five
You can use the expand syntax for this as well which makes the following two calls to demo equivalent.
demo one two three four five
set params {three four five}
demo one two {*}$params
You're passing a list and need to instead send each item as a parameter to the proc.
proc test {p1 p2 p3} {
puts "$p1 - $p2 - $p3"
}
set value {one two three}
# This should work in tcl 8.5+
test {*}$value
set value {four five six}
# For tcl < 8.5
foreach {p1 p2 p3} $value {
test $p1 $p2 $p3
break
}
# or
set value {seven eight nine}
test [lindex $value 0] [lindex $value 1] [lindex $value 2]
Output:
$ tclsh test.tcl
one - two - three
four - five - six
seven - eight - nine
You need upvar, and use list when you construct the params list for launch_proc
proc test {varname} {
upvar 1 $varname value
puts "param test is $varname => $value"
launch_proc "1.5.2 test param" test_standard [list {*}$value par1 par2 par3]
}
proc launch_proc {msg proc_name {params {}}} {
puts "params launch_proc: [join $params ,]"
}
set value {"a b c" "d e f" "g h i"}
test value
param test is value => "a b c" "d e f" "g h i"
params launch_proc: a b c,d e f,g h i,par1,par2,par3

How to convert a list of attribute-value pairs into a flat table whose columns are attributes

I'm trying to convert a csv file containing 3 columns (ATTRIBUTE_NAME,ATTRIBUTE_VALUE,ID) into a flat table whose each row is (ID,Attribute1,Attribute2,Attribute3,....). The samples of such tables are provided at the end.
Either Python, Perl or SQL is fine. Thank you very much and I really appreciate your time and efforts!
In fact, my question is very similar to this post, except that in my case the number of attributes is pretty big (~300) and not consistent across each ID, so hard coding each attribute might not be a practical solution.
For me, the challenging/difficult parts are:
There are approximately 270 millions lines of input, the total size of the input table is about 60 GB.
Some single values (string) contain comma (,) within, and the whole string will be enclosed with double-quote (") to make the reader aware of that. For example "JPMORGAN CHASE BANK, NA, TX" in ID=53.
The set of attributes is not the same across ID's. For example, the number of overall attributes is 8, but ID=53, 17 and 23 has only 7, 6 and 5 respectively. ID=17 does not have attributes string_country and string_address, so output blank/nothing after the comma.
The input attribute-value table looks like this. In this sample input and output, we have 3 ID's, whose number of attributes can be different depending on we can obtain such attributes from the server or not.
ATTRIBUTE_NAME,ATTRIBUTE_VALUE,ID
num_integer,100,53
string_country,US (United States),53
string_address,FORT WORTH,53
num_double2,546.0,53
string_acc,My BankAcc,53
string_award,SILVER,53
string_bankname,"JPMORGAN CHASE BANK, NA, TX",53
num_integer,61,17
num_double,34.32,17
num_double2,200.541,17
string_acc,Your BankAcc,17
string_award,GOLD,17
string_bankname,CHASE BANK,17
num_integer,36,23
num_double,78.0,23
string_country,CA (Canada),23
string_address,VAN COUVER,23
string_acc,Her BankAcc,23
The output table should look like this. (The order of attributes in the columns is not fixed. It can be sorted alphabetically or by order-of-appearance.)
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,US (United States),FORT WORTH,546.0,My BankAcc,SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,Your BankAcc,GOLD,CHASE BANK
23,36,78.0,CA (Canada),VAN COUVER,,Her BankAcc,,
This program will do as you ask. It expects the name of the input file as a parameter on the command line.
Update Looking more carefully at the data I see that not all of the data fields are available for every ID. That makes things more complex if the fields are to be kept in the same order as they appear in the file.
This program works by scanning the file and accumulating all the data for output into hash %data. At the same time it builds a hash %headers, that keeps the position each header appears in the data for each ID value.
Once the file has been scanned, the collected headers are sorted by finding the first ID for each pair that includes information for both headers. The sort order for that pair within the complete set must be the same as the order they appeared in the data for that ID, so it's just a matter of comparing the two position values using <=>.
Once a sorted set of headers has been created, the %data hash is dumped, accessing the complete list of values for each ID using a hash slice.
Update 2 Now that I realise the sheer size of your data I can see that my second attempt was also flawed, as it tried to read all of the information into memory before outputting it. That isn't going to work unless you have a monster machine with about 1TB of memory!
You may get some mileage from this version. It scans twice through the file, the first time to read the data so that the full set of header names can be created and ordered, then again to read the data for each ID and output it.
Let me know if it's not working for you, as there's still things I can do to make it more memory-efficient.
use strict;
use warnings;
use 5.010;
use Text::CSV;
use Fcntl 'SEEK_SET';
my $csv = Text::CSV->new;
open my $fh, '<', $ARGV[0] or die qq{Unable to open "$ARGV[0]" for input: $!};
my %headers = ();
my $last_id;
my $header_num;
my $num_ids;
while (my $row = $csv->getline($fh)) {
next if $. == 1;
my ($key, $val, $id) = #$row;
unless (defined $last_id and $id eq $last_id) {
++$num_ids;
$header_num = 0;
$last_id = $id;
print STDERR "Processing ID $id\n";
}
$headers{$key}[$num_ids-1] = ++$header_num;
}
sub by_position {
for my $id (0 .. $num_ids-1) {
my ($posa, $posb) = map $headers{$_}[$id], our $a, our $b;
return $posa <=> $posb if $posa and $posb;
}
0;
}
my #headers = sort by_position keys %headers;
%headers = ();
print STDERR "List of headers complete\n";
seek $fh, 0, SEEK_SET;
$. = 0;
$csv->combine('ID', #headers);
print $csv->string, "\n";
my %data = ();
$last_id = undef;
while () {
my $row = $csv->getline($fh);
next if $. == 1;
if (not defined $row or defined $last_id and $last_id ne $row->[2]) {
$csv->combine($last_id, #data{#headers});
print $csv->string, "\n";
%data = ();
}
last unless defined $row;
my ($key, $val, $id) = #$row;
$data{$key} = $val;
$last_id = $id;
}
output
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,"US (United States)","FORT WORTH",546.0,"My BankAcc",SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,"Your BankAcc",GOLD,"CHASE BANK"
23,36,78.0,"CA (Canada)","VAN COUVER",,"Her BankAcc",,
Use Text::CSV from CPAN:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Text::CSV;
my $col_csv = Text::CSV->new();
my $id_attr_csv = Text::CSV->new({ eol=>"\n", });
$col_csv->column_names( $col_csv->getline( *DATA ));
while( my $row = $col_csv->getline_hr( *DATA )){
# do all the keys but skip if ID
for my $attribute ( keys %$row ){
next if $attribute eq 'ID';
$id_attr_csv->print( *STDOUT, [ $attribute, $row->{$attribute}, $row->{ID}, ]);
}
}
__DATA__
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,US (United States),FORT WORTH,546.0,My BankAcc,SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,Your BankAcc,GOLD,CHASE BANK
23,36,78.0,CA (Canada),VAN COUVER,,Her BankAcc,,

Hive combine column values based upon condition

I was wondering if it is possible to combine column values based upon a condition. Let me explain...
Let say my data looks like this
Id name offset
1 Jan 100
2 Janssen 104
3 Klaas 150
4 Jan 160
5 Janssen 164
An my output should be this
Id fullname offsets
1 Jan Janssen [ 100, 160 ]
I would like to combine the name values from two rows where the offset of the two rows are no more apart then 1 character.
My question is if this type of data manipulation is possible with and if it is could someone share some code and explaination?
Please be gentle but this little piece of code return some what what I want...
ArrayList<String> persons = new ArrayList<String>();
// write your code here
String _previous = "";
//Sample output form entities.txt
//USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
//USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
File file = new File("entities.txt");
try {
//
// Create a new Scanner object which will read the data
// from the file passed in. To check if there are more
// line to read from it we check by calling the
// scanner.hasNextLine() method. We then read line one
// by one till all line is read.
//
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
if(_previous == "" || _previous == null)
_previous = scanner.nextLine();
String _current = scanner.nextLine();
//Compare the lines, if there offset is = 1
int x = Integer.parseInt(_previous.split(",")[3]) + Integer.parseInt(_previous.split(",")[4]);
int y = Integer.parseInt(_current.split(",")[4]);
if(y-x == 1){
persons.add(_previous.split(",")[1] + " " + _current.split(",")[1]);
if(scanner.hasNextLine()){
_current = scanner.nextLine();
}
}else{
persons.add(_previous.split(",")[1]);
}
_previous = _current;
}
} catch (Exception e) {
e.printStackTrace();
}
for(String person : persons){
System.out.println(person);
}
Working of this piece sample data
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Richard,PERSON,7,2732
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2740
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2756
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3093
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3195
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,3220
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,10858
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,11063
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Ken,PERSON,3,11186
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,11234
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,17073
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,17095
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Stephanie,PERSON,9,17330
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Putt,PERSON,4,17340
Which produces this output
Richard Marottoli
Marottoli
Marottoli
Marottoli
Berkowitz
Berkowitz
Marottoli
Lea
Lea
Ken
Marottoli
Berkowitz
Lea
Stephanie Putt
Kind regards
Load the table using below create table
drop table if exists default.stack;
create external table default.stack
(junk string,
name string,
cat string,
len int,
off int
)
ROW FORMAT DELIMITED
FIELDS terminated by ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://nameservice1/....';
Use below query to get your desired output.
select max(name), off from (
select CASE when b.name is not null then
concat(b.name," ",a.name)
else
a.name
end as name
,Case WHEN b.off1 is not null
then b.off1
else a.off
end as off
from default.stack a
left outer join (select name
,len+off+ 1 as off
,off as off1
from default.stack) b
on a.off = b.off ) a
group by off
order by off;
I have tested this it generates your desired result.