I have several semantic triples. Some examples:
Porky,species,pig // Porky's species is "pig"
Bob,sister,May // Bob's sister is May
May,brother,Sam // May's borther is Sam
Sam,wife,Jane // Sam's wife is Jane
... and so on ...
I store each triple in 6 different hashes. Example:
$ijk{Porky}{species}{pig} = 1;
$ikj{Porky}{pig}{species} = 1;
$jik{species}{Porky}{pig} = 1;
$jki{species}{pig}{Porky} = 1;
$kij{pig}{Porky}{species} = 1;
$kji{pig}{species}{Porky} = 1;
This lets me efficiently ask questions like:
What species is Porky (keys %{$ijk{Porky}{species}})
List all pigs (keys %{$jki{species}{pig}})
What information do I have on Porky? (keys %{$ijk{Porky}})
List all species (keys %{$jik{species}})
and so on. Note that none of the examples above go through a list one element at a time. They all take me "instantly" to my answer. In other words, each answer is a hash value. Of course, the answer itself may be a list, but I don't traverse any lists to get to that answer.
However, defining 6 separate hashes seems really inefficient. Is there
an easier way to do this without using an external database engine
(for this question, SQLite3 counts as an external database engine)?
Or have I just replicated a small subset of SQL into Perl?
EDIT: I guess what I'm trying to say: I love associative arrays, but they seem to be the wrong data structure for this job. What's the right data structure here, and what Perl module implements it?
Have you looked at using RDF::Trine? It has DBI-backed stores, but it also has in-memory stores, and can parse/serialize in RDF/XML, Turtle, N-Triples, etc if you need persistence.
Example:
use strict;
use warnings;
use RDF::Trine qw(statement literal);
my $ns = RDF::Trine::Namespace->new("http://example.com/");
my $data = RDF::Trine::Model->new;
$data->add_statement(statement $ns->Peppa, $ns->species, $ns->Pig);
$data->add_statement(statement $ns->Peppa, $ns->name, literal 'Peppa');
$data->add_statement(statement $ns->George, $ns->species, $ns->Pig);
$data->add_statement(statement $ns->George, $ns->name, literal 'George');
$data->add_statement(statement $ns->Suzy, $ns->species, $ns->Sheep);
$data->add_statement(statement $ns->Suzy, $ns->name, literal 'Suzy');
print "Here are the pigs...\n";
for my $pig ($data->subjects($ns->species, $ns->Pig)) {
my ($name) = $data->objects($pig, $ns->name);
print $name->literal_value, "\n";
}
print "Let's dump all the data...\n";
my $ser = RDF::Trine::Serializer::Turtle->new;
print $ser->serialize_model_to_string($data), "\n";
RDF::Trine is quite a big framework, so has a bit of a compile-time penalty. At run-time it's relatively fast though.
RDF::Trine can be combined with RDF::Query if you wish to query your data using SPARQL.
use RDF::Query;
my $q = RDF::Query->new('
PREFIX : <http://example.com/>
SELECT ?name
WHERE {
?thing :species :Pig ;
:name ?name .
}
');
my $r = $q->execute($data);
print "Here are the pigs...\n";
while (my $row = $r->next) {
print $row->{name}->literal_value, "\n";
}
RDF::Query supports both SPARQL 1.0 and SPARQL 1.1. RDF::Trine and RDF::Query are both written by Gregory Williams who was a member of the SPARQL 1.1 Working Group. RDF::Query was one of the first implementations to achieve 100% on the SPARQL 1.1 Query test suite. (It may have even been the first?)
"Efficient" is not really the right word here since you're worried about improving speed in exchange for memory, which is generally how it works.
Only real alternative is to store the triplets as distinct values, and then just have three "indexes" into them:
$row = [ "Porky", "species", "pig" ];
push #{$subject_index{Porky}}, $row;
push #{$relation_index{species}}, $row;
push #{$target_index{pig}}, $row;
To do something like "list all pigs", you'd have to find the intersection of $relation_index{species} and $target_index{pig}. Which you can do manually, or with your favorite set implementation.
Then wrap it all up in a nice object interface, and you've basically implemented INNER JOIN. :)
A single hash of hash should be sufficient:
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dump qw(dump);
my %data;
while (<DATA>) {
chomp;
my ($name, $type, $value) = split ',';
$data{$name}{$type} = $value;
}
# What species is Porky?
print "Porky's species is: $data{Porky}{species}\n";
# List all pigs
print "All pigs: " . join(',', grep {defined $data{$_}{species} && $data{$_}{species} eq 'pig'} keys %data) . "\n";
# What information do I have on Porky?
print "Info on Porky: " . dump($data{Porky}) . "\n";
# List all species
print "All species: " . join(',', uniq grep defined, map $_->{species}, values %data) . "\n";
__DATA__
Porky,species,pig
Bob,sister,May
May,brother,Sam
Sam,wife,Jane
Outputs:
Porky's species is: pig
All pigs: Porky
Info on Porky: { species => "pig" }
All species: pig
I think you are mixing categories and values, such as name=Porky, and species=pig.
Given your example, I'd go with something like this:
my %hash;
$hash{name}{Porky}{species}{pig} = 1;
$hash{species}{pig}{name}{Porky} = 1;
$hash{name}{Bob}{sister}{May} = 1;
$hash{sister}{May}{name}{Bob} = 1;
$hash{name}{May}{brother}{Sam} = 1;
$hash{brother}{Sam}{name}{May} = 1;
$hash{name}{Sam}{wife}{Jane} = 1;
$hash{wife}{Jane}{name}{Sam} = 1;
Yes, this has some apparent redundancy, since we can easily distinguish most names from other values. But the 3rd-level hash key is also a top level hash key, which can be used to get more information on some element.
Or have I just replicated a small subset of SQL into Perl?
It's pretty easy to start using actual SQL, using an SQLite in memory database.
#!/usr/bin/perl
use warnings; use strict;
use DBI;
my $dbh = DBI->connect("dbi:SQLite::memory:", "", "", {
sqlite_use_immediate_transaction => 0,
RaiseError => 1,
});
$dbh->do("CREATE TABLE triple(subject,predicate,object)");
$dbh->do("CREATE INDEX 'triple(subject)' ON triple(subject)");
$dbh->do("CREATE INDEX 'triple(predicate)' ON triple(predicate)");
$dbh->do("CREATE INDEX 'triple(object)' ON triple(object)");
for ([qw<Porky species pig>],
[qw<Porky color pink>],
[qw<Sylvester species cat>]) {
$dbh->do("INSERT INTO triple(subject,predicate,object) VALUES (?, ?, ?)", {}, #$_);
}
use JSON;
print to_json( $dbh->selectall_arrayref('SELECT * from triple WHERE predicate="species"', {Slice => {}}) );
Gives:
[{"object":"pig","predicate":"species","subject":"Porky"},
{"object":"cat","predicate":"species","subject":"Sylvester"}]
You can then query and index the data in a familiar manner. Very scalable as well.
I got a list of variables to loop through the database. How can I detect the variable is not in the database? What query should I use? How to print out error message once detected the variable is not in database.
My Code:
$variable = $sql->{'variable'};
foreach my $sql (#Records){
**Below statement will select existed variable, what should I change to make it select not existed variable**
$sqlMySQL = "Select LOT from table where LOT like '%$variable%'";
}
**If not exist**{
print("Not exist")
}
Expected result:
While the $variable loop through the database, if the $variable not exist in the database then print out the $variable or not exist.
Thanks for viewing, comments and answers.
I would go about it similar to the below.
A list of variables - Place those variables in an array (aka a list)
What query should I use - One that will only select exactly what you need and store it in the best dataset for traversal (selectall_hashref)
While the $variable loop through the database - Would require a DBI call for each $variable, so instead loop through your array to check for existence in the hash.
EXAMPLE
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbh =
DBI->connect( "dbi:SQLite:dbname=test.db", "USERNAME", "PASSWORD",
{ RaiseError => 1 },
) or die $DBI::errstr;
my #vars = ( 25, 30, 40 );
my $hash_ref =
$dbh->selectall_hashref( q / SELECT LOT FROM table /, q / LOT / );
$dbh->disconnect();
foreach (#vars) {
if ( exists $hash_ref->{$_} ) {
print $_ . "\n";
}
else {
print "Does not exist\n";
}
}
Something similar to that will pull all the LOT column values for your table into a hash key value pair that you can then compare against your array in a foreach loop.
I'm trying to retrieve a bunch of rows using sql (as a test - lets say 1000 rows in each iteration up to a million rows) and store in a file (.store file in my case but could be a text file - doesn't matter) in batches to avoid an out of memory issue. I am sql within a perl script.
Will appreciate if anyone can share an example.
example would be like -
sub query{
$test = "select * from employees";
return $test;
}
// later in the code -
my $temp;
my $dataset=DBUtils::make_database_iterator({query=> test($temp)});
}
store $dataset, $result_file;
return;
The best I can offer you with the limited amount of information you have given is this, which uses the SELECT statement's LIMIT clause to retrieve a limited number of rows from the table.
Obviously you will have to provide actual values for the DSN, the name of the table, and the store_block subroutine yourself.
use strict;
use warnings;
use autodie;
use DBI;
my $blocksize = 1000;
my ($dsn, $user, $pass) = (...);
my $dbh = DBI->connect($dsn, $user, $pass);
my $sth = $dbh->prepare('SELECT * FROM table LIMIT ? OFFSET ?') or die $DBI::errstr;
open my $fh, '>', 'test.store';
for (my $n = 0; $sth->execute($blocksize, $n * $blocksize); ++$n) {
my $block = $sth->fetchall_arrayref;
last unless #$block;
store_block($block, $fh);
}
close $fh;
sub store_block {
my ($block, $fh) = #_;
...
}
You say you want to work in batches to avoid an out of memory error. This suggests you're doing something like this...
my #all_the_rows = query_the_database($sql);
store_stuff(#all_the_rows);
You want to avoid doing that as much as possible for exactly the reason you gave, if the dataset grows large you might run out of memory.
Instead, you can read one row at a time and write one row at a time using DBI.
use strict;
use warnings;
use DBI;
# The file you're writing results to
my $file = '...';
# Connect to the database using DBI
my $dbh = DBI->connect(
...however you do that...,
{RaiseError => 1} # Turn on exceptions
);
# Prepare and execute the statement
my $sth = $dbh->prepare("SELECT * FROM employees");
$sth->execute;
# Get a row, write a row.
while( my $row = $sth->fetchrow_arrayref ) {
append_row_to_storage($row, $file);
}
I leave writing append_row_to_storage up to you.
function mysql_insert($data_array){
$sql = "insert into `". $this->table_name. '`';
$array_keys = array_keys($data_array);
$array_keys_comma = implode(",\n", preg_replace('/^(.*?)$/', "`$1`", $array_keys));
for($a=0,$b=count($data_array); $a<$b; $a++){ $question_marks .="?,"; }
$array_values = array_values($data_array);
$array_values_comma = implode(",", $array_values);
$sql.= " ($array_keys_comma) ";
$sql.= " values(". substr($question_marks, 0,-1) .")";
$prepare = $this->connDB->prepare($sql);
$insert = $prepare->execute(array($array_values_comma));
}
I want to creat like this universal functions, $data_array-comes from $_POST
This function will work for all form. But i dont know what is my wrong :S
I don't know what is my wrong
That's quite easy to know: number of bound variables does not match number of tokens.
I want to creat like this universal functions, $data_array-comes from $_POST
Here you go: Insert/update helper function using PDO
$array_values_comma is a scalar after you implode() the array. So you always pass an array of one element to your execute() function. You should pass $array_values.
Here's how I'd write this function:
function mysql_insert($data_array){
$columns = array_keys($data_array);
$column_list_delimited = implode(",",
array_map(function ($name) { return "`$name`"; }, $columns));
$question_marks = implode(",", array_fill(1, count($data_array), "?"));
$sql = "insert into `{$this->table_name}` ($column_list_delimited)
values ($question_marks)";
// always check for these functions returning FALSE, which indicates an error
// or alternatively set the PDO attribute to use exceptions
$prepare = $this->connDB->prepare($sql);
if ($prepare === false) {
trigger_error(print_r($this->connDB->errorInfo(),true), E_USER_ERROR);
}
$insert = $prepare->execute(array_values($data_array));
if ($insert === false) {
trigger_error(print_r($prepare->errorInfo(),true), E_USER_ERROR);
}
}
A further improvement would be to do some validation of $this->table_name and the keys of $data_array so you know they match an existing table and its columns.
See my answer to escaping column name with PDO for an example of validating column names.
I know that if statement gives a result as a Boolean.
<?php
if (isset($_GET['subj'])) {
$sel_subj = $_GET['subj'];
$sel_page = "";
?>
Can i use $sel_subj or $sel_page outside if statement ? The second question in the case of while loop ? Can i use a variable outside it or its considered as in the local scope ?
while ($page = mysql_fetch_array($page_set)) {
echo "<li";
if ($page["id"] == $sel_page) { echo " class=\"selected\""; }
echo "><a href=\"content.php?page=" . urlencode($page["id"]) .
"\">{$page["menu_name"]}</a></li>";
}
Basically yes, any variables defined inside an if or while will be available in the scope that the if or while exists in (as they are defined in a conditional though they might not have been set so you would receive an undefined warning)
so
function foo(){
$i=0
while($i==0){
$i=1;
$a=1;
}
echo $a;
//$a is available here although it might be undefined as the condition may not have been met
}
echo $a //$a is not available here
You should ideally declare the variable first.