Best way to compare two hash of hashes? - sql

Right now I have two hash of hashes, 1 that I created by parsing a log file, and 1 that I grab from SQL. I need to compare them to find out if the record from the log file exists in the database already. Right now I am iterating through each element to compare them:
foreach my $i(#record)
{
foreach my $a(#{$data})
{
if ($i->{port} eq $a->{port} and $i->{name} eq $a->{name})
{
print "match found $i->{name}, updating record in table\n";
}
else
{
print "no match found for $tableDate $i->{port} $i->{owner} $i->{name} adding record to table\n";
executeStatement("INSERT INTO client_usage (date, port, owner, name, emailed) VALUES (\'$tableDate\', \'$i->{port}\', \'$i->{owner}\', \'$i->{name}\', '0')");
}
}
}
Naturally, this takes a long time to run through as the database gets bigger. Is there a more efficient way of doing this? Can I compare the keys directly?

You have more than a hash of hashes. You have two lists and each element in each list contains a hash of hashes. Thus, you have to compare each item in the list with each item in the other list. Your algorithm is efficiency is O2 -- not because it's a hash of hashes, but because you're comparing each row in one list with each row in another list.
Is it possible to go through your lists and turn them into a hash that is keyed by the port and name? That way, you go through each list once to create the indexing hash, then go through the hash once to do the comparison.
For example, to create the hash from the record:
my %record_hash;
foreach my $record_item (#record) {
my $name = $record_item->{name};
my $data = $record_item->{data}
my $record_hash{$name:$data} = \$record_item #Or something like this...
}
Next, you'd do the same for your data list:
my %data_hash;
foreach my $data_item (#{$data}) {
my $name = $data_item->{name};
my $data = $data_item->{data}
my $data_hash{$name:$data} = \$data_item #Or something like this...
}
Now you can go through your newly created hash just once:
foreach my $key (keys %record_hash) {
if (exists $data_hash{$key}) {
print "match found $i->{name}, updating record in table\n";
}
else {
print "no match found for $tableDate $i->{port} $i->{owner} $i->{name} adding record to table\n";
executeStatement("INSERT INTO client_usage (date, port, owner, name, emailed) VALUES (\'$tableDate\', \'$i->{port}\', \'$i->{owner}\', \'$i->{name}\', '0')");
}
}
Let's say you have 1000 elements in one list, and 500 elements in the other. Your original algorithm would have to loop 500 * 1000 times (half a million times). By creating an index hash, you have to loop through 2(500 + 1000) times (about 3000 times).
Another possibility: Since you're already using a SQL database, why not do the whole thing as a SQL query. That is, don't fetch the records. Instead, go through your data, and for each data item, fetch the record. If the record exists, you update it. If not, you create a new one. That maybe even faster because you're not turning the whole thing into a list in order to turn it into a hash.
There's a way to tie SQL databases directly to hashes. That might be a good way to go too.
Are you using Perl-DBI?

How about using Data::Difference:
use Data:Difference qw(data_diff);
my #diff = data_diff(\%hash_a, \%hash_b);
#diff = (
{ 'a' => 'value', 'path' => [ 'data' ] }, # exists in 'a' but not in 'b'
{ 'b' => 'value', 'path' => [ 'data' ] }, # exists in 'b' not in 'a'
);

Related

Get the difference from data in Swift and data in database

A table in database have two column: ID and Value
In my project there are another data in a dictionary that key is ID and value is Value.
I want to get difference: Data that are in the dictionary and are not in the database. If these both data were in database, I could use SQL commands "Except" or "Not Exist" to get the difference like below image.
What is the best way to do this?
I use SQLiteDB that the result of query is a dictionary like this:
[["ID":"id1", "Value": "val1"], ["ID":"id2", "Value": "val2"],...]
Also notice that both columns should be considered while compare these two data (dictionary and data in db).
// here we get intersection of new data and data we have
let intersection = dataSet1.filtered { data in dataSet2.contains(where: { $0.id == data.id })}
// delete elements from dataSet1 which belongs to intersection
let dataSet1MinusDataSet2 = dataSet1.filter { data in intersection.contains(where: { data.id == $0.id })}
I've written code here without any Xcode so errors in syntax is possible but I think you will get the idea

sql.eachRow only adds the last record into a list

Good day, I'm trying to add all the users from my db to a list and print it out in a frame. But the problem is that I am only retrieving the LAST record of the users table. The others are being ignored. Here's my code
table(selectionMode: ListSelectionModel.SINGLE_SELECTION){
sql.eachRow("select * from users"){row->
println row;
def staffList = []
staffList.add(uname:row.uname,pwd:row.pwd);
tableModel(list : staffList){
closureColumn(header:'Username',read:{row1 -> return row1.uname})
closureColumn(header:'Password',read:{row1 -> return row1.pwd})
}
I think the problem is that you have redefined the staffList array within the loop. Move that to before, and you may have better results.

Is there a way to records in bundles in a database?

I have a large mysql database (several 100000 records). I use PDO to access it. I need to fetch data, in units of approximately 100 records.
PDO:: fetchall results in too many records and exhausts the PC memory.
PDO::fetch gets me one record only.
Is there a way to request the next n (say 100) records?
Thanks
PDO::fetch gets me one record only.
You can always make another call to fetch to get another record. And so on, until all the records get fetched, just like it shown in every example:
while ($row = $stmt->fetch() {
print $row[0];
}
Note that you may also set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false in order to reduce the memory consumption
The MySQL client-server protocol would allow to fetch a certain number (>1) of rows from the resultset of a statement in a single response packet (if the data fits the 224-1 bytes limit). The COM_STMT_FETCH command has the field num rows which in the OP's case could be set to 100.
But the mysqlnd implementation currently sets this field explicitly and exclusivly to 1.
So, yes, currently your only option seems to be an unbuffered query and living with the (small) network overhead of fetching the records one by one, e.g.
<?php
$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array(
PDO::ATTR_EMULATE_PREPARES=>false,
PDO::MYSQL_ATTR_DIRECT_QUERY=>false,
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION
));
setup($pdo);
// use an unbuffered query
// so the complete result set isn't transfered into the php instance's memory
// before ->execute() returns
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$stmt = $pdo->prepare('SELECT id FROM soFoo WHERE id>?');
$stmt->execute( array(5) );
$rowsPerChunk = 10;
do {
// not saying that you have to use LimitIterator, it's just one example of how you can process the chunks
// and a demonstration that PDOStatement implements Traversable
// all in all the example doesn't do very useful things ;-)
$lit = new LimitIterator(new IteratorIterator($stmt), 0, $rowsPerChunk);
doSomething($lit);
}
while( $lit->getPosition()===$rowsPerChunk);
function doSomething(Iterator $it) {
foreach($it as $row) {
printf('%2d ', $row['id']); // yeah, yeah, not that useful....
}
echo "\r\n-------\r\n";
}
function setup($pdo) {
$pdo->exec('
CREATE TEMPORARY TABLE soFoo (
id int auto_increment,
primary key(id)
)
');
$stmt = $pdo->prepare('INSERT INTO soFoo VALUES ()');
for($i=0; $i<44; $i++) {
$stmt->execute();
}
}

How can I store the results of a SQL query as a hash with unique keys?

I have a query that returns multiple rows:
select id,status from store where last_entry = <given_date>;
The returned rows look like:
id status
-----------------
1131A correct
1132B incorrect
1134G empty
I want to store the results like this:
$rows = [
{
ID1 => '1131A',
status1 => 'correct'
},
{
ID2 => '1132B',
status2 => 'incorrect'
},
{
ID3 => '1134G',
status3 => 'empty'
}
];
How can I do this?
What you are looking for is a hash of hash in Perl. What you do is
Iterate over the results of your query.
Split each entry by tab
Create a hash with the id as key and status as value
Now to store the hash created by each such query you create another hash. Here the key could be something like 'given_date' in your case so you could write
$parent_hash{given_date}=\%child_hash
This will results in the parent hash having a reference of each query result.
For more you can refer to these resources:
http://perldoc.perl.org/perlref.html
http://www.thegeekstuff.com/2010/06/perl-array-reference-examples/
Have a look at DBI documentation.
Here is part of script that does what you want:
my $rows;
while(my $hash_ref = $sth->fetchrow_hashref) {
push #$rows, $hash_ref;
}
You can do this by passing a Slice option to DBI's selectall_arrayref:
my $results = $dbh->selectall_arrayref(
'select id,status from store where last_entry = ?',
{ Slice => {} },
$last_entry
);
This will return an array reference with each row stored in a hash. Note that since hash keys must be unique, you will run into problems if you have duplicate column names in your query.
This is the kind of question that raises an immediate red flag. It's somewhat of an odd request to want a collection (array/array reference) of data structures that are heterogeneous---that's the whole point of a collection. If you tell us what you intend to do with the data rather than what you want the data to look like, we can probably suggest a better solution.
You want something like this:
# select the data as an array of hashes - retured as an arrayref
my $rows = $dbh->selectall_arrayref($the_query, {Slice => {}}, #any_search_params);
# now make the id keys unique
my $i = 1;
foreach my $row ( #$rows) {
# remove each column and assign the value to a uniquely named column
# by adding a numeric suffix
$row->{"ID" . $i} = delete $row->{ID};
$row->{"status" . $i} = delete $row->{status};
$i += 1;
}
Add your own error checking.
So you said "save as a hash," but your example is an array of hashes. So there would be a slightly different method for a hash of hashes.

(drupal)a difficulty code to understand,get the same article's title under the same term

if ($node->taxonomy) {
$query = 'SELECT DISTINCT(t.nid), n.nid, n.title FROM {node} n INNER JOIN {term_node} t ON n.nid = t.nid WHERE n.nid != %d AND (';
$args = array($node->nid);
$tids = array();
foreach ($node->taxonomy as $term) {
$tids[] = 't.tid = %d';
$args[] = $term->tid;
}
$query .= implode(' OR ', $tids) . ')';
$result = db_query_range($query, $args, 0, 10);
while ($o = db_fetch_object($result)) {
echo l($o->title, 'node/' . $o->nid);
}
}
the code is from a drupal guru. . used to get the article's title under the same term in node.tpl.php, i have researched it two days, although know some part of it. the principle of the code i still don't know. expect someone can explain more details about it for me .many thanks.
Short version:
It gets the array of tags of the node, retrieves the first 10 nodes that use at least one of these tags and outputs a link for each of these 10 results.
Detailed version:
First of all, the variable "$node" is an object that contains the data about a specific node (e.g. a Page or Story node).
For example, "$node->title" would be the title of that node.
"$node->taxonomy" tests is that node is tagged (because if it has no tags, it cannot retrieve the other nodes using the same tag(s).
When there is one or several tags associated with that node/page/story, $node->taxonomy is an array .
Now about the SQL query:
"node" is the database table that stores the base fields (non-CCK) of every node.
"term_node" is the database table that contains the combination of tag (which is called a "taxonomy term") and node.
In both tables, "nid" is the "unique Node ID" (which is an internal autoincremented number). Because this column is in both tables, this is how the tables are joined together.
In "term_node", "tid" is the "unique Term ID" (which is also an internal autoincremented number).
The "node" table is aliased "n", therefore "n.nid" means "the Node ID stored in table node".
The "term_node" table is aliased "t", therefore "t.tid" means "the Term ID stored in table term_node".
The "foreach" loop goes thru the array of tags to extract the TermID of each tag used by the node in order to add it in the SQL query, and implode converts to a string.
The loop stores a piece of SQL query for each tag in variable $tids and stores the actual value in variable $args because Drupal database calls are safer when the arguments are passed separately from the SQL query: "%d" means "integer number".
"db_query_range" is a function that selects multiple rows in the database: here, "0 10" means "retrieve the first 10 results".
"db_fetch_object" in the "while" loop retrieves each result and stores it in the variable "$o", which is an object.
Therefore "$o->title" contains the value of the column "title" retrieved by the SQL query.
The function "l" is the drupal functin that creates the code for an HTML link: the first argument is the name of the link, the second argument is the drupal path: in Drupal, any node can be accessed by default using "www.yoursite.com/node/NodeID",
which is why it gives the path "node/123" (where 123 is the "Node ID").
This function is useful because it transparently handles custom paths, so if your node has a custom path to access it using "www.yoursite.com/my-great-page" instead, it will create a link to that page instead of "www.yoursite.com/node/123" automatically.
I wouldn't exactly call the guy who wrote this a guru, you could do this a lot prettier. Anyways what he does it create a query that looks like this:
SELECT DISTINCT(t.nid), n.nid, n.title FROM {node} n
INNER JOIN {term_node} t ON n.nid = t.nid
WHERE n.nid != %d
AND (t.tid = %d OR t.tid = %d OR ... t.tid = %d);
The end result is that he selects all the node ids and titles (only once) that share at least one term with the selected node, but isn't the node itself.