I got a list of variables to loop through the database. How can I detect the variable is not in the database? What query should I use? How to print out error message once detected the variable is not in database.
My Code:
$variable = $sql->{'variable'};
foreach my $sql (#Records){
**Below statement will select existed variable, what should I change to make it select not existed variable**
$sqlMySQL = "Select LOT from table where LOT like '%$variable%'";
}
**If not exist**{
print("Not exist")
}
Expected result:
While the $variable loop through the database, if the $variable not exist in the database then print out the $variable or not exist.
Thanks for viewing, comments and answers.
I would go about it similar to the below.
A list of variables - Place those variables in an array (aka a list)
What query should I use - One that will only select exactly what you need and store it in the best dataset for traversal (selectall_hashref)
While the $variable loop through the database - Would require a DBI call for each $variable, so instead loop through your array to check for existence in the hash.
EXAMPLE
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbh =
DBI->connect( "dbi:SQLite:dbname=test.db", "USERNAME", "PASSWORD",
{ RaiseError => 1 },
) or die $DBI::errstr;
my #vars = ( 25, 30, 40 );
my $hash_ref =
$dbh->selectall_hashref( q / SELECT LOT FROM table /, q / LOT / );
$dbh->disconnect();
foreach (#vars) {
if ( exists $hash_ref->{$_} ) {
print $_ . "\n";
}
else {
print "Does not exist\n";
}
}
Something similar to that will pull all the LOT column values for your table into a hash key value pair that you can then compare against your array in a foreach loop.
I'm trying to retrieve a bunch of rows using sql (as a test - lets say 1000 rows in each iteration up to a million rows) and store in a file (.store file in my case but could be a text file - doesn't matter) in batches to avoid an out of memory issue. I am sql within a perl script.
Will appreciate if anyone can share an example.
example would be like -
sub query{
$test = "select * from employees";
return $test;
}
// later in the code -
my $temp;
my $dataset=DBUtils::make_database_iterator({query=> test($temp)});
}
store $dataset, $result_file;
return;
The best I can offer you with the limited amount of information you have given is this, which uses the SELECT statement's LIMIT clause to retrieve a limited number of rows from the table.
Obviously you will have to provide actual values for the DSN, the name of the table, and the store_block subroutine yourself.
use strict;
use warnings;
use autodie;
use DBI;
my $blocksize = 1000;
my ($dsn, $user, $pass) = (...);
my $dbh = DBI->connect($dsn, $user, $pass);
my $sth = $dbh->prepare('SELECT * FROM table LIMIT ? OFFSET ?') or die $DBI::errstr;
open my $fh, '>', 'test.store';
for (my $n = 0; $sth->execute($blocksize, $n * $blocksize); ++$n) {
my $block = $sth->fetchall_arrayref;
last unless #$block;
store_block($block, $fh);
}
close $fh;
sub store_block {
my ($block, $fh) = #_;
...
}
You say you want to work in batches to avoid an out of memory error. This suggests you're doing something like this...
my #all_the_rows = query_the_database($sql);
store_stuff(#all_the_rows);
You want to avoid doing that as much as possible for exactly the reason you gave, if the dataset grows large you might run out of memory.
Instead, you can read one row at a time and write one row at a time using DBI.
use strict;
use warnings;
use DBI;
# The file you're writing results to
my $file = '...';
# Connect to the database using DBI
my $dbh = DBI->connect(
...however you do that...,
{RaiseError => 1} # Turn on exceptions
);
# Prepare and execute the statement
my $sth = $dbh->prepare("SELECT * FROM employees");
$sth->execute;
# Get a row, write a row.
while( my $row = $sth->fetchrow_arrayref ) {
append_row_to_storage($row, $file);
}
I leave writing append_row_to_storage up to you.
I'm trying to convert a csv file containing 3 columns (ATTRIBUTE_NAME,ATTRIBUTE_VALUE,ID) into a flat table whose each row is (ID,Attribute1,Attribute2,Attribute3,....). The samples of such tables are provided at the end.
Either Python, Perl or SQL is fine. Thank you very much and I really appreciate your time and efforts!
In fact, my question is very similar to this post, except that in my case the number of attributes is pretty big (~300) and not consistent across each ID, so hard coding each attribute might not be a practical solution.
For me, the challenging/difficult parts are:
There are approximately 270 millions lines of input, the total size of the input table is about 60 GB.
Some single values (string) contain comma (,) within, and the whole string will be enclosed with double-quote (") to make the reader aware of that. For example "JPMORGAN CHASE BANK, NA, TX" in ID=53.
The set of attributes is not the same across ID's. For example, the number of overall attributes is 8, but ID=53, 17 and 23 has only 7, 6 and 5 respectively. ID=17 does not have attributes string_country and string_address, so output blank/nothing after the comma.
The input attribute-value table looks like this. In this sample input and output, we have 3 ID's, whose number of attributes can be different depending on we can obtain such attributes from the server or not.
ATTRIBUTE_NAME,ATTRIBUTE_VALUE,ID
num_integer,100,53
string_country,US (United States),53
string_address,FORT WORTH,53
num_double2,546.0,53
string_acc,My BankAcc,53
string_award,SILVER,53
string_bankname,"JPMORGAN CHASE BANK, NA, TX",53
num_integer,61,17
num_double,34.32,17
num_double2,200.541,17
string_acc,Your BankAcc,17
string_award,GOLD,17
string_bankname,CHASE BANK,17
num_integer,36,23
num_double,78.0,23
string_country,CA (Canada),23
string_address,VAN COUVER,23
string_acc,Her BankAcc,23
The output table should look like this. (The order of attributes in the columns is not fixed. It can be sorted alphabetically or by order-of-appearance.)
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,US (United States),FORT WORTH,546.0,My BankAcc,SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,Your BankAcc,GOLD,CHASE BANK
23,36,78.0,CA (Canada),VAN COUVER,,Her BankAcc,,
This program will do as you ask. It expects the name of the input file as a parameter on the command line.
Update Looking more carefully at the data I see that not all of the data fields are available for every ID. That makes things more complex if the fields are to be kept in the same order as they appear in the file.
This program works by scanning the file and accumulating all the data for output into hash %data. At the same time it builds a hash %headers, that keeps the position each header appears in the data for each ID value.
Once the file has been scanned, the collected headers are sorted by finding the first ID for each pair that includes information for both headers. The sort order for that pair within the complete set must be the same as the order they appeared in the data for that ID, so it's just a matter of comparing the two position values using <=>.
Once a sorted set of headers has been created, the %data hash is dumped, accessing the complete list of values for each ID using a hash slice.
Update 2 Now that I realise the sheer size of your data I can see that my second attempt was also flawed, as it tried to read all of the information into memory before outputting it. That isn't going to work unless you have a monster machine with about 1TB of memory!
You may get some mileage from this version. It scans twice through the file, the first time to read the data so that the full set of header names can be created and ordered, then again to read the data for each ID and output it.
Let me know if it's not working for you, as there's still things I can do to make it more memory-efficient.
use strict;
use warnings;
use 5.010;
use Text::CSV;
use Fcntl 'SEEK_SET';
my $csv = Text::CSV->new;
open my $fh, '<', $ARGV[0] or die qq{Unable to open "$ARGV[0]" for input: $!};
my %headers = ();
my $last_id;
my $header_num;
my $num_ids;
while (my $row = $csv->getline($fh)) {
next if $. == 1;
my ($key, $val, $id) = #$row;
unless (defined $last_id and $id eq $last_id) {
++$num_ids;
$header_num = 0;
$last_id = $id;
print STDERR "Processing ID $id\n";
}
$headers{$key}[$num_ids-1] = ++$header_num;
}
sub by_position {
for my $id (0 .. $num_ids-1) {
my ($posa, $posb) = map $headers{$_}[$id], our $a, our $b;
return $posa <=> $posb if $posa and $posb;
}
0;
}
my #headers = sort by_position keys %headers;
%headers = ();
print STDERR "List of headers complete\n";
seek $fh, 0, SEEK_SET;
$. = 0;
$csv->combine('ID', #headers);
print $csv->string, "\n";
my %data = ();
$last_id = undef;
while () {
my $row = $csv->getline($fh);
next if $. == 1;
if (not defined $row or defined $last_id and $last_id ne $row->[2]) {
$csv->combine($last_id, #data{#headers});
print $csv->string, "\n";
%data = ();
}
last unless defined $row;
my ($key, $val, $id) = #$row;
$data{$key} = $val;
$last_id = $id;
}
output
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,"US (United States)","FORT WORTH",546.0,"My BankAcc",SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,"Your BankAcc",GOLD,"CHASE BANK"
23,36,78.0,"CA (Canada)","VAN COUVER",,"Her BankAcc",,
Use Text::CSV from CPAN:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Text::CSV;
my $col_csv = Text::CSV->new();
my $id_attr_csv = Text::CSV->new({ eol=>"\n", });
$col_csv->column_names( $col_csv->getline( *DATA ));
while( my $row = $col_csv->getline_hr( *DATA )){
# do all the keys but skip if ID
for my $attribute ( keys %$row ){
next if $attribute eq 'ID';
$id_attr_csv->print( *STDOUT, [ $attribute, $row->{$attribute}, $row->{ID}, ]);
}
}
__DATA__
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,US (United States),FORT WORTH,546.0,My BankAcc,SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,Your BankAcc,GOLD,CHASE BANK
23,36,78.0,CA (Canada),VAN COUVER,,Her BankAcc,,
I want to optimize some of the SQL and just need an opinion on whether I should do it or leave it as is and why I should do it. SQL queries are executed via PHP & Java, I will show an example in PHP which will give an idea of what Im doing.
Main concerns are:
-Maintainability.
-Ease of altering tables without messing with all the legacy code
-Speed of SQL (is it a concern???)
-Readability
Example of what I have right now:
I take a LONG array from a customer (cant make it smaller unfortunately) and update the existing values with the new values provided by a customer in the following way:
$i = 0;
foreach($values as $value)
{
$sql = "UPDATE $someTable SET someItem$i = '$value' WHERE username='$username'";
mysql_query($sql, $con);
$i+=1;
}
Its easy to see from the above example that if the array of values is long, than I execute a lot of SQL statements.
Should I instead do something like:
$i = 0;
$j = count($values);
$sql = "UPDATE $someTable SET ";
foreach($values as $value)
{
if($i < $j) //append values to the sql string up to the last item
{
$sql .= "someItem$i = '$value', ";
}
$i+=1;
}
$sql .= "someItem$i = '$value' WHERE username='$username'"; //add the last item and finish the statement
mysql_query($sql, $con); //execute query once
OR which way should it be done / should I bother making these changes? (there a lot of the type and they all have 100+ items)
Thanks in advance.
The only way you'll get a definitive answer is to run both of these methods and profile it to see how long they take. With that said, I'm confident that running one UPDATE statement with a hundred name value pairs will be faster than running 100 UPDATE statements.
Don't run 100 seperate UPDATE statements!
Use a MySQL wrapper class which, when given an array of name => value pairs will return an SQL UPDATE statement. Its really simple. I'm just looking for the one we use now...
We use something like this (registration required) but adapted a little more to suit our needs. Really basic but very very handy.
For instance, the Update method is just this
/**
* Generate SQL Update Query
* #param string $table Target table name
* #param array $data SQL Data (ColumnName => ColumnValue)
* #param string $cond SQL Condition
* #return string
**/
function update($table,$data,$cond='')
{
$sql = "UPDATE $table SET ";
if (is_string($data)) {
$sql .= $data;
} else {
foreach ($data as $k => $v) {
$sql .= "`" . $k . "`" . " = " . SQL::quote($v) . ",";
}
$sql = SQL::trim($sql , ',');
}
if ($cond != '') $sql .= " WHERE $cond";
$sql .= ";";
return $sql;
}
If you can't change the code, make sure it is enclosed in transaction (if the storage engine is InnoDB) so no non-unique indexes will be updated before commiting transaction (this will speed up the write) and the new row won't be flushed to disk.
If this is MyISAM table, use UPDATE LOW_PRIORTY or lock table before the loop and unlock after read.
Of course, I'm sure you have index on the username column, but just to mention it - you need such index.
I'm reverse engineering the relationships between a medium-sized number of tables (50+) in an Oracle database where there are no foreign keys defined between the tables. I can count (somewhat) on being able to match column names across tables. For example, column name "SomeDescriptiveName" is probably the same across the set of tables.
What I would like to be able to do is to find a better way of extracting some set of relationships based on those matching column names than manually going through the tables one by one. I could do something with Java DatabaseMetaData methods but it seems like this is one of those tasks that someone has probably had to script before. Maybe extract the columns names with Perl or some other scripting lang, use the column names as a hash key and add tables to an array pointed to by the hash key?
Anyone have any tips or suggestions that might make this simpler or provide a good starting point? It's an ugly need, if foreign keys had already been defined, understanding the relationships would have been much easier.
Thanks.
You pretty much wrote the answer in your question.
my %column_tables;
foreach my $table (#tables) {
foreach my $column ($table->columns) {
push #{$column_tables[$column]}, $table;
}
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
my #tables = #{$column_tables[$column]};
next
if #tables < 2;
print $column, ': ';
foreach my $table (#tables) {
print $table->name, ' ';
}
print "\n";
}
My strategy would be to use the Oracle system catalog to find columns that are the same in column name and data type but different in table name. Also which one of the columns is part of a table's primary or unique key.
Here's a query that may be close to doing this, but I don't have an Oracle instance handy to test it:
SELECT col1.table_name || '.' || col1.column_name || ' -> '
|| col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1
JOIN all_tab_columns col2
ON (col1.column_name = col2.column_name
AND col1.data_type = col2.data_type)
JOIN all_cons_columns cc
ON (col2.table_name = cc.table_name
AND col2.column_name = cc.column_name)
JOIN all_constraints con
ON (cc.constraint_name = con.constraint_name
AND cc.table_name = con.table_name
AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;
Of course this won't get any case of columns that are related but have different names.
You can use a combination of three (or four) approaches, depending on how obfuscated the schema is:
dynamic methods
observation:
enable tracing in the RDBMS (or ODBC layer), then
perform various activities in the application (ideally record creation), then
identify which tables were altered in tight sequence, and with what column-value pairs
values occurring in more than one column during the sequence interval may indicate a foreign key relationship
static methods (just analyzing existing data, no need to have a running application)
nomenclature: try to infer relationships from column names
statistical: look at minimum/maximum (and possibly the average) of unique values in all numerical columns, and attempt to perform a match
code reverse engineering: your last resort (unless dealing with scripts) - not for the faint of heart :)
This is an interesting question. The approach I took was a brute force search for columns that matched types and values for a small sample set. You'll probably have to tweak the heuristics to provide good results for your schema. I ran this on a schema that didn't use auto-incremented keys and it worked well. The code is written for MySQL, but it's very easy to adapt to Oracle.
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");
my #list;
foreach my $table (show_tables()) {
foreach my $column (show_columns($table)) {
push #list, { table => $table, column => $column };
}
}
foreach my $m (#list) {
my #match;
foreach my $f (#list) {
if (($m->{table} ne $f->{table}) &&
($m->{column}{type} eq $f->{column}{type}) &&
(samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
{
# For better confidence, add other heuristics such as
# joining the tables and verifying that every value
# appears in the master. Also it may be useful to exclude
# columns in large tables without an index although that
# heuristic may fail for composite keys.
#
# Heuristics such as columns having the same name are too
# brittle for many of the schemas I've worked with. It may
# be too much to even require identical types.
push #match, "$f->{table}.$f->{column}{name}";
}
}
if (#match) {
print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- #match\n";
}
}
$dbh->disconnect();
exit;
sub show_tables {
my $result = query("show tables");
return ($result) ? #$result : ();
}
sub show_columns {
my ($table) = #_;
my $result = query("desc $table");
my #columns;
if ($result) {
#columns = map {
{ name => $_->[0],
type => $_->[1],
samples => query("select distinct $_->[0] from $table limit 10") }
} #$result;
}
return #columns;
}
sub samples_found {
my ($table, $column, $samples) = #_;
foreach my $v (#$samples) {
my $result = query("select count(1) from $table where $column=?", $v);
if (!$result || $result->[0] == 0) {
return 0;
}
}
return 1;
}
sub query {
my ($sql, #binding) = #_;
my $result = $dbh->selectall_arrayref($sql, undef, #binding);
if ($result && $result->[0] && #{$result->[0]} == 1) {
foreach my $row (#$result) {
$row = $row->[0];
}
}
return $result;
}