Perl: for (min .. max) uses random order, but I want it in order 0,1,2, - sql

As I am a total beginner to perl, oracle sql and everything else. I have to write a script to parse an excel file and write the values into an oracle sql database.
Everything is good so far. But it writes the rows in random order into the database.
for ($row_min .. $row_max) {...insert into db code $sheetValues[$_][col0] etc...}
I don't get it why the rows are inserted in a random order?
And obviously how can I get them in order? excel_row 0 => db_row 0 and so on...
The values in the array are in order! The number of rows is dynamic.
Thanks for your help, I hope you got all the information you need.
Edit:
&parseWrite;
sub parseWrite {
my #sheetValues;
my $worksheet = $workbook->worksheet(0);
my ($row_min, $row_max) = $worksheet->row_range();
print "| Zeile $row_min bis $row_max |";
my ($col_min, $col_max) = $worksheet->col_range();
print " Spalte $col_min bis $col_max |<br>";
for my $row ($row_min .. $row_max) {
for my $col ($col_min .. $col_max) {
my $cell = $worksheet->get_cell ($row,$col);
next unless $cell;
$sheetValues[$row][$col] = $cell->value();
print $sheetValues[$row][$col] .
"(".$row."," .$col.")"."<br>";
}
}
for ($row_min .. $row_max) {
my $sql="INSERT INTO t_excel (
a,b,c,d,e
) VALUES (
'$sheetValues[$_][0 ]',
'$sheetValues[$_][1 ]',
'$sheetValues[$_][2 ]',
'$sheetValues[$_][3 ]',
'$sheetValues[$_][4 ]',
'$sheetValues[$_][5 ]'
)";
$dbh->do($sql);
}
}
With in order I mean that my PL/SQL Developer 8.0.3 (given by my company)
shows with SELECT * FROM t_excel;
pic
But shell = (2,0), maggie = (0,0) and 13 = (1,0) in the array.

The rows are being inserted in the order you expect. I believe the mistaken assumption here is that SELECT will return rows in the same order they're inserted. This is not true. While implementations may make it seem like it does, SELECT has no default order. You're thinking a table is basically like a big list, INSERT is adding to the end of it, and SELECT just iterates through it. That's not a bad approximation, but it can lead you to make bad assumptions. The reality is that you can say little for sure about how a table is stored.
SQL is a declarative language which means you tell the computer what you want. This is different from a most other language types where you tell the computer what to do. SELECT * FROM sometable says "give me all the rows and all their columns in the table". Since you didn't give an order, the database can return them in whatever order it likes. Contrast with the procedural meaning which would be "iterate through all the rows in the table" as if the table was some sort of list.
Most languages encourage you to take advantage of how data is stored. Declarative languages prevent you from knowing how data is stored.
If you want your SELECT to be ordered, you have to give it an ORDER BY.

Related

SQL Query to JSONiq Query

I want to convert an SQL query into a JSONiq Query, is there already an implementation for this, if not, what do I need to know to be able to create a program that can do this ?
I am not aware of an implementation, however, it is technically feasible and straightforward. JSONiq has 90% of its DNA coming from XQuery, which itself was partly designed by people involved in SQL as well.
From a data model perspective, a table is mapped to a collection and each row of the table is mapped to a flat JSON object, i.e., all fields are atomic values, like so:
{
"Name" : "Turing",
"First" : "Alan",
"Job" : "Inventor"
}
Then, the mapping is done by converting SELECT-FROM-WHERE queries to FLWOR expressions, which provide a superset of SQL's functionality.
For example:
SELECT Name, First
FROM people
WHERE Job = "Inventor"
Can be mapped to:
for $person in collection("people")
where $person.job eq "Inventor"
return project($person, ("Name", "First"))
More complicated queries can also be mapped quite straight-forwardly:
SELECT Name, COUNT(*)
FROM people
WHERE Job = "Inventor"
GROUP BY Name
HAVING COUNT(*) >= 2
to:
for $person in collection("people")
where $person.job eq "Inventor"
group by $name := $person.name
where count($person) ge 2
return {
name: $name,
count: count($person)
}
Actually, if for had been called from and return had been called select, and if these keywords were written uppercase, the syntax of JSONiq would be very similar to that of SQL: it's only cosmetics.

Ordering results dynamically in marklogic

So, Here is my question -
I want to order a list based off of a set of variables that determine the sorting fields and the order
In essence I need to do "order by" dynamically.
EG:
declare function getSearchResults(
$query as cts:query,
$sort as xs:string*,
$direction as xs:string*,
) as element()* {
let $results :=
cts:search(/*, $query)
let $sortFields := fn:tokenize($sort, "\|")
let $dec := $direction = 'desc' or $direction = 'descending'
let $sorted := sortByFields($results, $sort,$dec)
return $sorted
};
declare private function sortByFields ($results, $sortFields, $dec)
{
let $asc := fn:not($dec)
for $i in $results
order by
if ($sortFields[1]='id' and $asc) then $i//ldse:document/#id else (),
if ($sortFields[1]='id' and $dec) then $i//ldse:document/#id else () descending,
if ($sortFields[1]='title' and $asc) then $i//title else (),
if ($sortFields[1]='title' and $dec) then $i//title else () descending
return if (fn:count($sortFields) > 1 ) then (sortByFields($i,$sortFields[2 to fn:count($sortFields)],$dec)) else ($i)
};
This method will not work because it has to sort multiple times, and won't preserve the sort order each iteration.
I also tried this:
let $sortFields := fn:tokenize($sort, "\|")
let $dec := $direction = 'desc' or $direction = 'descending'
let $asc := fn:not($dec)
for $i in $results
for $j in 1 to fn:count($sortFields)
order by
if ($sortFields[$j]='id' and $asc) then $i//ldse:document/#id else (),
if ($sortFields[$j]='id' and $dec) then $i//ldse:document/#id else () descending,
if ($sortFields[$j]='title' and $asc) then $i//title else (),
if ($sortFields[$j]='title' and $dec) then $i//title else () descending
return $i
but this duplicates my data. (returns it ordered by each sort Field)
I would prefer not to use xdmp:eval because of code injection, is there any way that I can do this?
Any help or suggestions would be much appreciated.
Thanks a bunch!
Several things to consider.
If you are doing ordering on the results of a cts:search() then all the results must be returned before you can order them. That means you cannot do efficient pagination. E.g. if you have a million rows and want the top 100 ... if you order it dynamically then you have to fetch 1 million rows. If this is an issue then more complex solutions are needed.
Implementation wise, this is a good case for using function items ... but to make the ascending/decending work requires static analysis. Alternatively it can always be ascending (or decending) but the value be positive/negative.
e.g.
for $r in cts:search(...)
order by myfunc($r, $criteria)
return $r
declare function myfunc( $r , $criteria ) as xs:double
{
... logic to order $r in a natural ordering of -inf ... +f..
return $ordering
};
but before digging into that, I would recommend looking at search:search() instead.
http://docs.marklogic.com/search:search#opt-sort-order
There is some very power features in this including being able to define complex ordering as an xml element.
Ultimately to do custom ordering efficiently you will probably need to create range indexes so that the ordering can be done in the server itself instead of your code. For small data sets it is not as big an issue, but when you start searching over thousands, hundreds of thousands or millions of documents you cant effectively pull them all into memory on every search (or even if you can it will be slow). Not only do the results all have to be pulled into to memory to start sorting, but the xquery code has to be evaluated for every term. Using indexes the result set can often be directly returned in the right without even loading the documents.
There are other techniques you can use such as loading the results into a map or an array, creating a self sorting tree structure, pre creating custom subsets of the data etc.
Take a look at the search:search library first ... you can even define your own search syntax for users to type in, all type and injection safe and very well optimized over years.

num_rows in postgres always return 1

I'm trying to do a SELECT COUNT(*) with Postgres.
What I need: Catch the rows affected by the query. It's a school system. If the student is not registered, do something (if).
What I tried:
$query = pg_query("SELECT COUNT(*) FROM inscritossimulado
WHERE codigo_da_escola = '".$CodEscola."'
AND codigo_do_simulado = '".$simulado."'
AND codigo_do_aluno = '".$aluno."'");
if(pg_num_rows($query) == 0)
{
echo "Error you're not registered!";
}
else
{
echo "Hello!";
}
Note: The student in question IS NOT REGISTERED, but the result is always 1 and not 0.
For some reason, when I "show" the query, the result is: "Resource id #21". But, I look many times in the table, and the user is not there.
You are counting the number of rows in the answer, and your query always returns a single line.
Your query says: return one row giving the number of students matching my criteria. If no one matches, you will get back one row with the value 0. If you have 7 people matching, you will get back one row with the value 7.
If you change your query to select * from ... you will get the right answer from pg_num_rows().
Actually, don't count at all. You don't need the count. Just check for existence, which is proven if a single row qualifies:
$query = pg_query(
'SELECT 1
FROM inscritossimulado
WHERE codigo_da_escola = $$' . $CodEscola . '$$
AND codigo_do_simulado = $$' . $simulado. '$$
AND codigo_do_aluno = $$' . $aluno . '$$
LIMIT 1');
Returns 1 row if found, else no row.
Using dollar-quoting in the SQL code, so we can use the safer and faster single quotes in PHP (I presume).
The problem with the aggregate function count() (besides being more expensive) is that it always returns a row - with the value 0 if no rows qualify.
But this still stinks. Don't use string concatenation, which is an open invitation for SQL injection. Rather use prepared statements ... Check out PDO ...

Tuples are not inserted sequentially in database table?

I am trying to insert 10 values of the format "typename_" + i where i is the counter of the loop in a table named roomtype with attributes typename (primary key of SQL type character varying (45)) and samplephoto (it can be NULL and I am not dealing with this for now). What seems strange to me is that the tuples are inserted in different order than the loop counter increments. That is:
typename_1
typename_10
typename_2
typename_3
...
I suppose it's not very important but I can't understand why this is happening. I am using PostgreSQL 9.3.4, pgAdmin III version 1.18.1 and Eclipse Kepler.
The Java code that creates the connection (using JDBC driver) and makes the query is:
import java.sql.*;
import java.util.Random;
public class DBC{
Connection _conn;
public DBC() throws Exception{
try{
Class.forName("org.postgresql.Driver");
}catch(java.lang.ClassNotFoundException e){
java.lang.System.err.print("ClassNotFoundException: Postgres Server JDBC");
java.lang.System.err.println(e.getMessage());
throw new Exception("No JDBC Driver found in Server");
}
try{
_conn = DriverManager.getConnection("jdbc:postgresql://localhost:5432/hotelreservation","user", "0000");
ZipfGenerator p = new ZipfGenerator(new Random(System.currentTimeMillis()));
_conn.setCatalog("jdbcTest");
Statement statement = _conn.createStatement();
String query;
for(int i = 1; i <= 10; i++){
String roomtype_typename = "typename_" + i;
query = "INSERT INTO roomtype VALUES ('" + roomtype_typename + "','" + "NULL" +"')";
System.out.println(i);
statement.execute(query);
}
}catch(SQLException E){
java.lang.System.out.println("SQLException: " + E.getMessage());
java.lang.System.out.println("SQLState: " + E.getSQLState());
java.lang.System.out.println("VendorError: " + E.getErrorCode());
throw E;
}
}
}
But what I get in pgAdmin table is:
This is a misunderstanding. There is no "natural" order in a relational database table. While rows are normally inserted in sequence to the physical file holding a table, a wide range of activities can reshuffle physical order. And queries doing anything more than a basic (non-parallelized) sequential scan may return rows in any opportune order. That's according to standard SQL.
The order you see is arbitrary unless you add ORDER BY to the query.
pgAdmin3 by default orders rows by the primary key (unless specified otherwise). Your column is of type varchar and rows are ordered alphabetically (according to your current locale). All by design, all as it should be.
To sort rows like you seem to be expecting, you could pad some '0' in your text:
...
typename_0009
typename_0010
...
The proper solution would be to have a numeric column with just the number, though.
You may be interested in natural-sort. You may also be interested in a serial column.
i guess, that the output is ordered via alphabet ... if you create typename_1 thru typename_9, everything should be ok. you can also use typename_01 ( filled up with zeros ) to get the correct order.
if you are unsure about that, you can also add a sleep between the insert statements and record the insert-time in the database( as a column )
You are not seeing the order in which PostgreSQL stores the data, but rather the order in which pgadmin displays it.
The edit table feature of pgadmin automatically sorts the data by the primary key by default. that is what you are seeing.
In general, databases store table data in whatever order is convenient. Since you did not intentionally supply an ORDER BY you have no right to care what order it is actually in.

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

I'm looking for a more efficient way to run many columns updates on the same table like this:
UPDATE TABLE table
SET col = regexp_replace( col, 'foo', 'bar' )
WHERE regexp_match( col, 'foo' );
Such that foo, and bar, will be a combination of 40 different regex-replaces. I doubt even 25% of the dataset needs to be updated at all, but what I'm wanting to know is it is possible to cleanly achieve the following in SQL.
A single pass update
A single match of the regex, triggers a single replace
Not running all possible regexp_replaces if only one matches
Not updating all columns if only one needs the update
Not updating a row if no column has changed
I'm also curious, I know in MySQL (bear with me)
UPDATE foo SET bar = 'baz'
Has an implicit WHERE bar != 'baz' clause
However, in PostgreSQL I know this doesn't exist: I think I could at least answer one of my questions if I knew how to skip a single row's update if the target columns weren't updated.
Something like
UPDATE TABLE table
SET col = *temp_var* = regexp_replace( col, 'foo', 'bar' )
WHERE col != *temp_var*
Do it in code. Open up a cursor, then: grab a row, run it through the 40 regular expressions, and if it changed, save it back. Repeat until the cursor doesn't give you any more rows.
Whether you do it that way or come up with the magical SQL expression, it's still going to be a row scan of the entire table, but the code will be much simpler.
Experimental Results
In response to criticism, I ran an experiment. I inserted 10,000 lines from a documentation file into a table with a serial primary key and a varchar column. Then I tested two ways to do the update. Method 1:
in a transaction:
opened up a cursor (select for update)
while reading 100 rows from the cursor returns any rows:
for each row:
for each regular expression:
do the gsub on the text column
update the row
This takes 1.16 seconds with a locally connected database.
Then the "big replace," a single mega-regex update:
update foo set t =
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(t,
E'\bcommit\b', E'COMMIT'),
E'\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b',
E'9ACF10762B5F3D3B1B33EA07792A936A25E45010'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:53:13\b', E'04:53:13'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bUpdate\b', E'UPDATE'),
E'\bversion\b', E'VERSION'),
E'\bto\b', E'TO'), E'\b2.9.1\b',
E'2.9.1'), E'\bcommit\b', E'COMMIT'),
E'\b61c89e56f361fa860f18985137d6bf53f48c16ac\b',
E'61C89E56F361FA860F18985137D6BF53F48C16AC'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:51:58\b', E'04:51:58'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bNEWS:\b', E'NEWS:'),
E'\bAdd\b', E'ADD'), E'\bnotes\b',
E'NOTES'), E'\bfor\b', E'FOR'),
E'\bthe\b', E'THE'), E'\b2.9.1\b',
E'2.9.1'), E'\brelease.\b',
E'RELEASE.'), E'\bThanks\b',
E'THANKS'), E'\bto\b', E'TO'),
E'\beveryone\b', E'EVERYONE'),
E'\bfor\b', E'FOR')
The mega-regex update takes 0.94 seconds to update.
At 0.94 seconds compared to 1.16, it's true that the mega-regex update is faster, running in 81% of the time of doing it in code. It is not, however a lot faster. And ye Gods, look at that update statement. Do you want to write that, or try to figure out what went wrong when Postgres complains that you dropped a parenthesis somewhere?
Code
The code used was:
def stupid_regex_replace
sql = Select.new
sql.select('id')
sql.select('t')
sql.for_update
sql.from(TABLE_NAME)
Cursor.new('foo', sql, {}, #db) do |cursor|
until (rows = cursor.fetch(100)).empty?
for row in rows
for regex, replacement in regexes
row['t'] = row['t'].gsub(regex, replacement)
end
end
sql = Update.new(TABLE_NAME, #db)
sql.set('t', row['t'])
sql.where(['id = %s', row['id']])
sql.exec
end
end
end
I generated the regular expressions dynamically by taking words from the file; for each word "foo", its regular expression was "\bfoo\b" and its replacement string was "FOO" (the word uppercased). I used words from the file to make sure that replacements did happen. I made the test program spit out the regex's so you can see them. Each pair is a regex and the corresponding replacement string:
[[/\bcommit\b/, "COMMIT"],
[/\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b/,
"9ACF10762B5F3D3B1B33EA07792A936A25E45010"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:53:13\b/, "04:53:13"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bUpdate\b/, "UPDATE"],
[/\bversion\b/, "VERSION"],
[/\bto\b/, "TO"],
[/\b2.9.1\b/, "2.9.1"],
[/\bcommit\b/, "COMMIT"],
[/\b61c89e56f361fa860f18985137d6bf53f48c16ac\b/,
"61C89E56F361FA860F18985137D6BF53F48C16AC"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:51:58\b/, "04:51:58"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bNEWS:\b/, "NEWS:"],
[/\bAdd\b/, "ADD"],
[/\bnotes\b/, "NOTES"],
[/\bfor\b/, "FOR"],
[/\bthe\b/, "THE"],
[/\b2.9.1\b/, "2.9.1"],
[/\brelease.\b/, "RELEASE."],
[/\bThanks\b/, "THANKS"],
[/\bto\b/, "TO"],
[/\beveryone\b/, "EVERYONE"],
[/\bfor\b/, "FOR"]]
If this were a hand-generated list of regex's, and not automatically generated, my question is still appropriate: Which would you rather have to create or maintain?
For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html.
This is not necessarily a win - but it might well be in your case.
Or perhaps you can just add that implicit check as an explicit one?