Full text search failure on PostgreSQL - sql

I have a PostgreSQL used to index text content.
The SearchVector column is created successfully using the following code
UPDATE public."DocumentFiles"
SET "SearchVector" = setweight(to_tsvector('pg_catalog.italian', coalesce("DocumentFileName", '')), 'A')
|| setweight(to_tsvector('pg_catalog.italian', coalesce("DocumentFileDescription", '')), 'B')
|| setweight(to_tsvector('pg_catalog.italian', coalesce("DocumentFileContentString", '')), 'B')
WHERE "DocumentFileID" = 123;
The content looks like the following:
'011989':1A '5':7A 'cdp':2A 'contonu':10A 'elettr':6A 'grupp':8A 'impiant':5A 'manual':3A 'uso':4A
But if I try to run a query to get plurals or singular of manual (in Italian: manuale is one, manuali are 2 or more) it fails:
SELECT "DocumentFileID"
FROM public."DocumentFiles"
where "SearchVector"::tsvector ## 'manuali'::tsquery;
return nothing
SELECT "DocumentFileID"
FROM public."DocumentFiles"
where "SearchVector"::tsvector ## 'manuale'::tsquery;
return nothing
It only returns the record if I write exactly what is written in the searchvector field:
SELECT "DocumentFileID"
FROM public."DocumentFiles"
where "SearchVector"::tsvector ## 'manual'::tsquery;
What's wrong with it?

The problem is probably that the parameter default_text_search_configuration is not set to italian, so that a different stemming algorithm is used.
Be explicit and use to_tsquery('italian', 'manuali') rather than 'manuali'::tsquery.

Related

Query to check if we have any row which start with input String

lets say have column with values
"Html", "Java", "JQuery", "Sqlite"
and if user enters
"Java is my favorite programming language"
then result should be
1 row selected with value "Java"
because entered sentence starts with "Java"
but if user enters
"My application database is in Sqlite"
then query should return empty.
0 row selected
because entered sentence does not start with "Sqlite".
I am trying following query:
SELECT * FROM demo where 'Java' like demo.name; // work
but if enter long text then it fails
SELECT * FROM demo where 'Java is my favorite programming language' like demo.name; // Fails
I think you want something like this :
SELECT * FROM demo WHERE yourText LIKE demo.name || '%' ;
Just put the column name on the left side of the like operator, and concatenate a wildcard at the right side of your string keywork:
select * from demo where name like 'Java%' ; -- or like 'Sqlite%' etc
You can make it more generic like:
select * from demo where name like ? || %' ;
where ? is a bind parameter that represents the value provided by the user (you may also concatenate with the wildcard in your application before passing the parameter).
Concatenate your column with a wildcard:
SELECT * FROM demo where 'Java is my favorite programming language' like demo.name || '%';
You can make your query like this
Select * from SearchPojo where name GLOB '*' || :namestring|| '*'"
Not sure if I got the problem but ...
You can query these values (Html, Java, JQuery, Sqlite) put them in a collection, like List, then grab user's input and check it:
List<String> values ... // Html, Java, JQuery, Sqlite
for (String value : values) {
if (userInput.contains(value) {
// print
System.out.println("1 row selected with value \"Java\");
}
}
It supports cases where user type more then one option.

PL/SQL: Using a variable in UPDATEXML function

I need to update an XML node that contains a specific number. My query works if I hard code in the number (see figure 1), but I would like to make this dynamic by passing through a variable that contains a string (variableToBeReplaced). I currently wrote this (see figure 2) but it isn't reading the variable correctly so no changes are being made to the xml. Does anyone know how I can include a variable in an updatexml() function?
Figure 1
select updateXML(xmltype(xmlbod),'/LpnList/Lpn/LicensePlateNumber[text() = "12345"]','67890').getClobVal() doc
from myTable
where id = '1'
Figure 2
select updateXML(xmltype(xmlbod), '/LpnList/Lpn/LicensePlateNumber[text()= '|| variableToBeReplaced || ']','67890').getClobVal() doc
from myTable
where id = '1'
I was just missing "" around the variable.
select updateXML(xmltype(xmlbod), '/LpnList/Lpn/LicensePlateNumber[text()= "'|| variableToBeReplaced || '"]','67890').getClobVal() doc
from myTable
where id = '1'

Perl DBI - binding a list

How do I bind a variable to a SQL set for an IN query in Perl DBI?
Example:
my #nature = ('TYPE1','TYPE2'); # This is normally populated from elsewhere
my $qh = $dbh->prepare(
"SELECT count(ref_no) FROM fm_fault WHERE nature IN ?"
) || die("Failed to prepare query: $DBI::errstr");
# Using the array here only takes the first entry in this example, using a array ref gives no result
# bind_param and named bind variables gives similar results
$qh->execute(#nature) || die("Failed to execute query: $DBI::errstr");
print $qh->fetchrow_array();
The result for the code as above results in only the count for TYPE1, while the required output is the sum of the count for TYPE1 and TYPE2. Replacing the bind entry with a reference to #nature (\#nature), results in 0 results.
The main use-case for this is to allow a user to check multiple options using something like a checkbox group and it is to return all the results. A work-around is to construct a string to insert into the query - it works, however it needs a whole lot of filtering to avoid SQL injection issues and it is ugly...
In my case, the database is Oracle, ideally I want a generic solution that isn't affected by the database.
There should be as many ? placeholders as there is elements in #nature, ie. in (?,?,..)
my #nature = ('TYPE1','TYPE2');
my $pholders = join ",", ("?") x #nature;
my $qh = $dbh->prepare(
"SELECT count(ref_no) FROM fm_fault WHERE nature IN ($pholders)"
) or die("Failed to prepare query: $DBI::errstr");

How to use parameters with RPostgreSQL (to insert data)

I'm trying to insert data into a pre-existing PostgreSQL table using RPostgreSQL and I can't figure out the syntax for SQL parameters (prepared statements).
E.g. suppose I want to do the following
insert into mytable (a,b,c) values ($1,$2,$3)
How do I specify the parameters? dbSendQuery doesn't seem to understand if you just put the parameters in the ....
I've found dbWriteTable can be used to dump an entire table, but won't let you specify the columns (so no good for defaults etc.). And anyway, I'll need to know this for other queries once I get the data in there (so I suppose this isn't really insert specific)!
Sure I'm just missing something obvious...
I was looking for the same thing, for the same reasons, which is security.
Apparently dplyr package has the capacity that you are interested in. It's barely documented, but it's there. Scroll down to "Postgresql" in this vignette: http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html
To summarize, dplyr offers functions sql() and escape(), which can be combined to produce a parametrized query. SQL() function from DBI package seems to work in exactly same way.
> sql(paste0('SELECT * FROM blaah WHERE id = ', escape('random "\'stuff')))
<SQL> SELECT * FROM blaah WHERE id = 'random "''stuff'
It returns an object of classes "sql" and "character", so you can either pass it on to tbl() or possibly dbSendQuery() as well.
The escape() function correctly handles vectors as well, which I find most useful:
> sql(paste0('SELECT * FROM blaah WHERE id in ', escape(1:5)))
<SQL> SELECT * FROM blaah WHERE id in (1, 2, 3, 4, 5)
Same naturally works with variables as well:
> tmp <- c("asd", 2, date())
> sql(paste0('SELECT * FROM blaah WHERE id in ', escape(tmp)))
<SQL> SELECT * FROM blaah WHERE id in ('asd', '2', 'Tue Nov 18 15:19:08 2014')
I feel much safer now putting together queries.
As of the latest RPostgreSQL it should work:
db_connection <- dbConnect(dbDriver("PostgreSQL"), dbname = database_name,
host = "localhost", port = database_port, password=database_user_password,
user = database_user)
qry = "insert into mytable (a,b,c) values ($1,$2,$3)"
dbSendQuery(db_connection, qry, c(1, "some string", "some string with | ' "))
Here's a version using the DBI and RPostgres packages, and inserting multiple rows at once, since all these years later it's still very difficult to figure out from the documentation.
x <- data.frame(
a = c(1:10),
b = letters[1:10],
c = letters[11:20]
)
# insert your own connection info
con <- DBI::dbConnect(
RPostgres::Postgres(),
dbname = '',
host = '',
port = 5432,
user = '',
password = ''
)
RPostgres::dbSendQuery(
con,
"INSERT INTO mytable (a,b,c) VALUES ($1,$2,$3);",
list(
x$a,
x$b,
x$c
)
)
The help for dbBind() in the DBI package is the only place that explains how to format parameters:
The placeholder format is currently not specified by DBI; in the
future, a uniform placeholder syntax may be supported. Consult the
backend documentation for the supported formats.... Known examples are:
? (positional matching in order of appearance) in RMySQL and RSQLite
$1 (positional matching by index) in RPostgres and RSQLite
:name and $name (named matching) in RSQLite
? is also the placeholder for R package RJDBC.

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

I'm looking for a more efficient way to run many columns updates on the same table like this:
UPDATE TABLE table
SET col = regexp_replace( col, 'foo', 'bar' )
WHERE regexp_match( col, 'foo' );
Such that foo, and bar, will be a combination of 40 different regex-replaces. I doubt even 25% of the dataset needs to be updated at all, but what I'm wanting to know is it is possible to cleanly achieve the following in SQL.
A single pass update
A single match of the regex, triggers a single replace
Not running all possible regexp_replaces if only one matches
Not updating all columns if only one needs the update
Not updating a row if no column has changed
I'm also curious, I know in MySQL (bear with me)
UPDATE foo SET bar = 'baz'
Has an implicit WHERE bar != 'baz' clause
However, in PostgreSQL I know this doesn't exist: I think I could at least answer one of my questions if I knew how to skip a single row's update if the target columns weren't updated.
Something like
UPDATE TABLE table
SET col = *temp_var* = regexp_replace( col, 'foo', 'bar' )
WHERE col != *temp_var*
Do it in code. Open up a cursor, then: grab a row, run it through the 40 regular expressions, and if it changed, save it back. Repeat until the cursor doesn't give you any more rows.
Whether you do it that way or come up with the magical SQL expression, it's still going to be a row scan of the entire table, but the code will be much simpler.
Experimental Results
In response to criticism, I ran an experiment. I inserted 10,000 lines from a documentation file into a table with a serial primary key and a varchar column. Then I tested two ways to do the update. Method 1:
in a transaction:
opened up a cursor (select for update)
while reading 100 rows from the cursor returns any rows:
for each row:
for each regular expression:
do the gsub on the text column
update the row
This takes 1.16 seconds with a locally connected database.
Then the "big replace," a single mega-regex update:
update foo set t =
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(t,
E'\bcommit\b', E'COMMIT'),
E'\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b',
E'9ACF10762B5F3D3B1B33EA07792A936A25E45010'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:53:13\b', E'04:53:13'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bUpdate\b', E'UPDATE'),
E'\bversion\b', E'VERSION'),
E'\bto\b', E'TO'), E'\b2.9.1\b',
E'2.9.1'), E'\bcommit\b', E'COMMIT'),
E'\b61c89e56f361fa860f18985137d6bf53f48c16ac\b',
E'61C89E56F361FA860F18985137D6BF53F48C16AC'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:51:58\b', E'04:51:58'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bNEWS:\b', E'NEWS:'),
E'\bAdd\b', E'ADD'), E'\bnotes\b',
E'NOTES'), E'\bfor\b', E'FOR'),
E'\bthe\b', E'THE'), E'\b2.9.1\b',
E'2.9.1'), E'\brelease.\b',
E'RELEASE.'), E'\bThanks\b',
E'THANKS'), E'\bto\b', E'TO'),
E'\beveryone\b', E'EVERYONE'),
E'\bfor\b', E'FOR')
The mega-regex update takes 0.94 seconds to update.
At 0.94 seconds compared to 1.16, it's true that the mega-regex update is faster, running in 81% of the time of doing it in code. It is not, however a lot faster. And ye Gods, look at that update statement. Do you want to write that, or try to figure out what went wrong when Postgres complains that you dropped a parenthesis somewhere?
Code
The code used was:
def stupid_regex_replace
sql = Select.new
sql.select('id')
sql.select('t')
sql.for_update
sql.from(TABLE_NAME)
Cursor.new('foo', sql, {}, #db) do |cursor|
until (rows = cursor.fetch(100)).empty?
for row in rows
for regex, replacement in regexes
row['t'] = row['t'].gsub(regex, replacement)
end
end
sql = Update.new(TABLE_NAME, #db)
sql.set('t', row['t'])
sql.where(['id = %s', row['id']])
sql.exec
end
end
end
I generated the regular expressions dynamically by taking words from the file; for each word "foo", its regular expression was "\bfoo\b" and its replacement string was "FOO" (the word uppercased). I used words from the file to make sure that replacements did happen. I made the test program spit out the regex's so you can see them. Each pair is a regex and the corresponding replacement string:
[[/\bcommit\b/, "COMMIT"],
[/\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b/,
"9ACF10762B5F3D3B1B33EA07792A936A25E45010"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:53:13\b/, "04:53:13"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bUpdate\b/, "UPDATE"],
[/\bversion\b/, "VERSION"],
[/\bto\b/, "TO"],
[/\b2.9.1\b/, "2.9.1"],
[/\bcommit\b/, "COMMIT"],
[/\b61c89e56f361fa860f18985137d6bf53f48c16ac\b/,
"61C89E56F361FA860F18985137D6BF53F48C16AC"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:51:58\b/, "04:51:58"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bNEWS:\b/, "NEWS:"],
[/\bAdd\b/, "ADD"],
[/\bnotes\b/, "NOTES"],
[/\bfor\b/, "FOR"],
[/\bthe\b/, "THE"],
[/\b2.9.1\b/, "2.9.1"],
[/\brelease.\b/, "RELEASE."],
[/\bThanks\b/, "THANKS"],
[/\bto\b/, "TO"],
[/\beveryone\b/, "EVERYONE"],
[/\bfor\b/, "FOR"]]
If this were a hand-generated list of regex's, and not automatically generated, my question is still appropriate: Which would you rather have to create or maintain?
For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html.
This is not necessarily a win - but it might well be in your case.
Or perhaps you can just add that implicit check as an explicit one?