How to index name of an element in xml column in Postgres - sql

I am trying to index name of elements and I keep running into this error
ERROR: set-returning functions are not allowed in index expressions
This is what I have tried so far.
Sample xml:
<book><title>Manual</title><chapter>1</chapter></book>
DDL:
CREATE INDEX test2_element_name_idx
ON test2 USING GIN(xpath('local-name(/*)',unnest(xpath('//book/*', xml_data))));
Is it possible to index on element names? At the end I want to index on all elements that are under <book> (i.e <title> <chapter>)
One of the sample usecase is, I wanna query (with xpath) to learn how many books have title. And I believe that indexing it would make the queries more efficient. Please correct me if I my understanding is incorrect.

A stated by a_horse_with_no_name, you can't use a function which returns multiple rows for indexing a table. What you can do instead is to build an array with the multiple rows returned by your function. I propose here after a solution that may need to be adjusted because I never used the xml data type and functions (json is better :-) :
CREATE OR REPLACE FUNCTION xml_data_agg(xml_data xml)
RETURNS xml[] LANGUAGE sql AS
$$
SELECT array(SELECT xpath('local-name(/*)',unnest(xpath('//book/*', xml_data)))) ;
$$ ;
CREATE INDEX test2_element_name_idx
ON test2 USING GIN(xml_data_agg(xml_data));
Then you can use this index in queries where you can put this type of where clause : WHERE xml_data_agg(xml_data) #> array[list_of_the_xlm_elements_to_be_searched]

Related

Trouble with SDO_OVERLAPBDYDISJOINT and spatial index

i am using Oracle 11 g and I need to know if a specific point is inside de buffer of another point from a table with spatial index, i am using the follow sentence:
'''SELECT A.fieldX
FROM TABLE A
WHERE
SDO_OVERLAPBDYDISJOINT(sdo_geom.sdo_buffer(A.geometry,2,0.1),SDO_GEOMETRY(2001,NULL
,SDO_POINT_TYPE(497644.6,2432725.8,NULL),NULL,NULL)) = 'TRUE';'''
And I obtain the follow error:
13226. 00000 - "interface not supported without a spatial index"
Cause: The geometry table does not have a spatial index.
Action: Verify that the geometry table referenced in the spatial operator
has a spatial index on it.
The operator SDO_OVERLAPBDYDISJOINT uses only geometries from tables with spatial index, and I understand that this error is caused for the buffer operator but if I invert the order and put first the SDO_POINT_TYPE, I have the same error. Is there any way to use this operator or another similar without a spatial index?
I dont want to use pl sql because I need to use the sentence in a VBA code.
Thanks a lot!!!
What you essentially want is to find out all the geometries that are within some distance of another. This is easily and better done this way. It is also much more efficient.
SELECT A.fieldX
FROM TABLE A
WHERE sdo_within_distance(A.geometry,SDO_GEOMETRY(2001,NULL,SDO_POINT_TYPE(497644.6,2432725.8,NULL),NULL,NULL)),'distance=2') = 'TRUE';
I think your problem is that the A.geometry is indexed, but its buffer is not.
The first thing you should try, is to use
SDO_OVERLAPBDYDISJOINT(A.geometry, buffer(sdo_point(...),2,0.1)) - and, while at it, it would be more correct to use SDO_INSIDE here.
If this does not work, you should check if your index is, indeed, ok. You can easily test it using a specific id from your table - lets say, 10 - and run:
select a.id from your_table a, your_table b where a.id=b.id and b.id=10 and sdo_equals(a.geometry,b.geometry)='TRUE'; If it returns your id (10 in my example), your index is ok.

PostgreSQL full-text fuzzy seach ordered by rank

I'd like to implement full-text search on users.about column. For this purpose I have created the following GIN index:
CREATE OR REPLACE FUNCTION make_tsvector(about TEXT)
RETURNS tsvector AS
$$
BEGIN
RETURN to_tsvector(about);
END
$$
LANGUAGE plpgsql IMMUTABLE;
CREATE INDEX IF NOT EXISTS idx_fts_users ON users
USING gin(make_tsvector(about));
How to correctly construct the SQL query in order to be able to search by users.about column different query terms. For example, I'd like to use by following query term provided from UI:
'java c# dephY php hadoop'
I'd like to be able to search by this term independently of word order and ideally with fuzzy search capability(as you may see I made a mistake in dephY. I'd like to be able to find delhi in this case as well.). Result must be sorted by rank. Please advise how to construct such query in PostgreSQL.

Using Bookshelf to execute a query on Postgres JSONB array elements

I have a postgres table with jsonb array elements and I'm trying to do sql queries to extract the matching elements. I have the raw SQL query running from the postgres command line interface:
select * from movies where director #> any (array ['70', '45']::jsonb[])
This returns the results I'm looking for (all records from the movies table where the director jsonb elements contain any of the elements in the input element).
In the code, the value for ['70, '45'] would be a dynamic variable ie. fixArr and the length of the array is unknown.
I'm trying to build this into my Bookshelf code but haven't been able to find any examples that address the complexity of the use case. I've tried the following approaches but none of them work:
models.Movies.where('director', '#> any', '(array' + JSON.stringify(fixArr) + '::jsonb[])').fetchAll()
ERROR: The operator "#> any" is not permitted
db.knex.raw('select * from movies where director #> any(array'+[JSON.stringify(fixArr)]+'::jsonb[])')
ERROR: column "45" does not exist
models.Movies.query('where', 'director', '#>', 'any (array', JSON.stringify(fixArr) + '::jsonb[])').fetchAll()
ERROR: invalid input syntax for type json
Can anyone help with this?
As you have noticed, knex nor bookshelf doesn't bring any support for making jsonb queries easier. As far as I know the only knex based ORM that supports jsonb queries etc. nicely is Objection.js
In your case I suppose better operator to find if jsonb column contains any of the given values would be ?|, so query would be like:
const idsAsString = ids.map(val => `'${val}'`).join(',');
db.knex.raw(`select * from movies where director \\?| array[${idsAsString}]`);
More info how to deal with jsonb queries and indexing with knex can be found here https://www.vincit.fi/en/blog/objection-js-postgresql-power-json-queries/
No, you're just running into the limitations of that particular query builder and ORM.
The easiest way is using bookshelf.Model.query and knex.raw (whereRaw, etc.). Alias with AS and subclass your Bookshelf model to add these aliased attributes if you care about such things.
If you want things to look clean and abstracted through Bookshelf, you'll just need to denormalize the JSONB into flat tables. This might be the best approach if your JSONB is mostly flat and simple.
If you end up using lots of JSONB (it can be quite performant with appropriate indexes) then Bookshelf ORM is wasted effort. The knex query builder is only not a waste of time insofar as it handles escaping, quoting, etc.

How to optimize a XQUERY in a SELECT statement?

I am using Oracle database. I have a table in which one of the Column is of XMLTYPE. Now, the problem statement is that I need to extract the count of those record that have an XML with a particular root element and one more condition. Suppose the XML stored are of following formats:
<ns1:Warehouse whNo="102" xmlns:ns1="xyz">
<ns1:Building></ns1:Building>
</ns1:Warehouse>
and
<ns1:Warehouse whNo="102" xmlns:ns1="xyz">
<ns1:Building>Owned</ns1:Building>
</ns1:Warehouse>
and there are other XMLs with Root elements other than Warehouse
Now, I need to fetch those records which have
Root element as Warehouse
Building element as empty
I wrote the following SQL query:
select count(XMLQuery('declare namespace ns1="xyz.com";
for $i in /*
where fn:local-name($i) eq "Warehouse"
and fn:string-length($i/ns1:Building ) = 0
return <Test>{$i/ns1:Building}</Test>'
PASSING xml_response RETURNING CONTENT)) countOfWareHouses
from test
Here, test is the name of the table and *xml_response* is the name of the XMLTYPE column in the table test.
This query works fine when the records are less. I have tested it for around 500 records in the table and the time it takes is around 0.1s. But as you increase the number of records in the table, the time increases. When I increased the number of records to 5000, the time it took was ~11s. And for a production table, where the number of records currently stored are 185000, this query never completes.
Please help me to optimize this query or the xquery.
Edit 1:
I tried using this:
select count(XMLQuery(
'declare namespace ns1 = "xyz";
for $i in /
return /ns1:Warehouse[not(ns1:Building/text())]'
PASSING xml_response RETURNING CONTENT))
from test
and
select count(XMLQuery(
'declare namespace ns1 = "xyz";
return /ns1:Warehouse[fn:string-length(ns1:Building)=0]'
PASSING xml_response RETURNING CONTENT))
from test
But this is not working.
When I try to run these, it asks for binding values for Building and Warehouse.
Instead of where you should use predicates which would work faster like:
ns1:Warehouse[string-length(ns1:Building)=0]
Do not use local-name(...) if not necessary. Node tests will probably be faster and enable index use. You're also able to remove the string-length(...) call.
Search for <Warehouse/> elements, which do not have text nodes below their <Building/> node. If you also want to scan for arbitrary subnodes (including attributes!) use node() instead of text(). If you just want to make sure there's text somewhere possibly as child of other nodes, use ns1:Building//text() instead, for example in cases like this: <ns1:Building><foo>bar</foo></ns1:Building>.
This simple XPath expression is doing what you need:
/ns1:Warehouse[not(ns1:Building/text())]
If you need to construct those <Test/> elements, use
for $warehouse in /ns1:Warehouse[not(ns1:Building/text())]
return <Test>{$warehouse/ns1:Building}</Test>
which should be a real drop-in replacement to your XQuery.
I just realized all you want to know is the number, then better count in XQuery (I cannot tell you how to read the single result then though):
count(/ns1:Warehouse[not(ns1:Building/text())])

n-grams from text in PostgreSQL

I am looking to create n-grams from text column in PostgreSQL. I currently split(on white-space) data(sentences) in a text column to an array.
enter code hereselect regexp_split_to_array(sentenceData,E'\s+') from tableName
Once I have this array, how do I go about:
Creating a loop to find n-grams, and write each to a row in another table
Using unnest I can obtain all the elements of all the arrays on separate rows, and maybe I can then think of a way to get n-grams from a single column, but I'd loose the sentence boundaries which I wise to preserve.
Sample SQL code for PostgreSQL to emulate the above scenario
create table tableName(sentenceData text);
INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');
INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');
INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');
select regexp_split_to_array(sentenceData,E'\\s+') from tableName;
select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;
Check out pg_trgm: "The pg_trgm module provides functions and operators for determining the similarity of text based on trigram matching, as well as index operator classes that support fast searching for similar strings."