Parsing location from a search query in postgresql - sql

I have a table of location data that is stored in json format with an attributes column that contains data as below:-
{
"name" : "Common name of a place or a postcode",
"other_name":"Any aliases",
"country": "country"
}
This is indexed as follows:-
CREATE INDEX location_jsonb_ts_vector
ON location
USING gin (jsonb_to_tsvector('simple'::regconfig, attributes,'["string","numeric"]'::jsonb));
I can search this for a location using the query:-
SELECT *
FROM location
WHERE jsonb_to_tsvector('simple'::regconfig, attributes, '["string", "numeric"]'::jsonb) ## plainto_tsquery('place name')
This works well if just using place names. But I want to search using more complex text strings such as:-
'coffee shops with wifi near charing cross'
'all restaurants within 10 miles of swindon centre'
'london nightlife'
I want to get the location found first and then strip it from the search text and go looking for the items in other tables using my location record to narrow down the scope.
This does not work with my current search mechanism as the intent and requirement pollute the text search vector and can cause odd results. I know this is a NLP problem and needs proper parsing of the search string, but this is for a small proof of concept and needs to work entirely in postgres via SQL or PL/PGSQL.
How can I modify my search to get better matches? I've tried splitting into keywords and looking for them individually, but they risk not bring back results unless combined. For example; "Kings Cross" will bring back "Kings".

I've come up with a cheap and cheerful solution:-
WITH tsv AS (
SELECT to_tsquery('english', 'football | matches | in | swindon') AS search_vector,
'football matches in swindon' AS search_text
)
SELECT * FROM
(
SELECT attributes,
position(lower(ATTRIBUTES->>'name1') IN lower(search_text)) AS name1_position
FROM location,tsv
WHERE jsonb_to_tsvector('simple'::regconfig, attributes, '["string", "numeric"]'::jsonb) ## search_vector
) loc
ORDER BY name1_position DESC

Related

How to Query JSON Within A Database

I would like to query information from databases that were created in this format:
index
label
key
data
1
sneaker
UPC
{“size”: “value”, “color”: “value”, “location”: “shelf2”}
2
location
shelf2
{“height”: “value”, “row”: “value”, “column”: “value”}
Where a large portion of the data is in one cell stored in a json array. To make matters a bit tricky, the attributes in json aren’t in any particular order, and sometimes they reference other cells. Ie in the above example there is a “location” attribute which has more data in another row. Additionally sometimes the data cell is a multidimensional array where values are nested inside another json array.
I’m seeking to do certain query tasks like
Find all locations that have a sneaker
Or find all sneakers with a particular color etc
What’s the industry accepted solution on how to do this?
These are sqlite databases that I’m currently using DB Browser for SQLite to query. Definitely open to better solutions if they exist.
The design that you have needs SQLite's JSON1 extension.
The tasks that you mention in your question can be accomplished with the use of functions like json_extract().
Find all locations that have a sneaker
SELECT t1.*
FROM tablename t1
WHERE t1.label = 'location'
AND EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.label = 'sneaker'
AND json_extract(t2.data, '$.location') = t1.key
)
Find all sneakers with a particular color
SELECT *
FROM tablename
WHERE label = 'sneaker'
AND json_extract(data, '$.color') = 'blue'
See the demo.
For more complicated tasks, such as getting values out of json arrays there are other functions like json_each().

How can I assign pre-determined codes (1,2,3, etc,) to a JSON-type column in PostgreSQL?

I'm extracting a table of 2000+ rows which are park details. One of the columns is JSON type. Image of the table
We have about 15 attributes like this and we also have a documentation of pre-determined codes assigned to each attribute.
Each row in the extracted table has a different set of attributes that you can see in the image. Right now, I have cast(parks.services AS text) AS "details" to get all the attributes for a particular park or extract just one of them using the code below:
CASE
WHEN cast(parks.services AS text) LIKE '%uncovered%' THEN '2'
WHEN cast(parks.services AS text) LIKE '%{covered%' THEN '1' END AS "details"
This time around, I need to extract these attributes by assigning them the codes. As an example, let's just say
Park 1 - {covered, handicap_access, elevator} to be {1,3,7}
Park 2 - {uncovered, always_open, handicap_access} to be {2,5,3}
I have thought of using subquery to pre-assign the codes, but I cannot wrap my head around JSON operators - in fact, I don't know how to extract them on 2000+ rows.
It would be helpful if someone could guide me in this topic. Thanks a lot!
You should really think about normalizing your tables. Don't store arrays. You should add a mapping table to map the parks and the attribute codes. This makes everything much easier and more performant.
step-by-step demo:db<>fiddle
SELECT
t.name,
array_agg(c.code ORDER BY elems.index) as codes -- 3
FROM mytable t,
unnest(attributes) WITH ORDINALITY as elems(value, index) -- 1
JOIN codes c ON c.name = elems.value -- 2
GROUP BY t.name
Extract the array elements into one record per element. Add the WITH ORDINALITY to save the original order.
Join your codes on the elements
Create code arrays. To ensure the correct order, you can use the index values created by the WITH ORDINALITY clause.

Index for comparing to beginning of every word in a column

So I have a table
id | name | gender
---+-----------------+-------
0 | Markus Meskanen | M
1 | Jack Jackson | M
2 | Jane Jackson | F
And I've created an index
CREATE INDEX people_name_idx ON people (LOWER(name));
And then I query with
SELECT * FROM people WHERE name LIKE LOWER('Jack%');
Where %(name)s is the user's input. However, it now matches only to the beginning of the whole column, but I'd like it to match to the beginning of any of the words. I'd prefer not to use '%Jack%' since it would also result into invalid results from the middle of the word.
Is there a way to create an index so that each word gets a separate row?
Edit: If the name is something long like 'Michael Jackson's First Son Bob' it should match to the beginning of any of the words, i.e. Mich would match to Michael and Fir would match to First but ackson wouldn't match to anything since it's not from the beginning.
Edit 2: And we have 3 million rows so performance is an issue, thus I'm looking at indexes mostly.
Postgres has two index types to help with full text searches: GIN and GIST indexes (and I think GIN is the more commonly used one).
There is a brief overview of the indexes in the documentation. There is more extensive documentation for each index class, as well as plenty of blogs on the subject (here is one and here is another).
These can speed up the searches that you are trying to do.
The pg_trgm module does exactly what you want.
You need to create either:
CREATE INDEX people_name_idx ON people USING GIST (name gist_trgm_ops);
Or:
CREATE INDEX people_name_idx ON people USING GIN (name gin_trgm_ops);
See the difference here.
After that, these queries could use one of the indexes above:
SELECT * FROM people WHERE name ILIKE '%Jack%';
SELECT * FROM people WHERE name ~* '\mJack';
As #GordonLinoff answered, full text search is also capable of searching by prefix matches. But FTS is not designed to do that efficiently, it is best in matching lexemes. Though if you want to achieve the best performace, I advise you to give it a try too & measure each. In FTS, your query looks something like this:
SELECT * FROM people WHERE to_tsvector('english', name) ## to_tsquery('english', 'Jack:*');
Note: however if your query filter (Jack) comes from user input, both of these queries above needs some protection (i.e. in the ILIKE one you need to escape % and _ characters, in the regexp one you need to escape a lot more, and in the FTS one, well you'll need to parse the query with some parser & generate a valid FTS' tsquery query, because to_tsquery() will give you an error if its parameter is not valid. And in plainto_tsquery() you cannot use a prefix matching query).
Note 2: the regexp variant with name ~* '\mJack' will work best with english names. If you want to use the whole range of unicode (i.e. you want to use characters, like æ), you'll need a slightly different pattern. Something like:
SELECT * FROM people WHERE name ~* '(^|\s|,)Jack';
This will work with most of the names, plus this will work like a real prefix match with some old names too, like O'Brian.
You can use Regex expressions to find text inside name:
create table ci(id int, name text);
insert into ci values
(1, 'John McEnroe Blackbird Petrus'),
(2, 'Michael Jackson and Blade');
select id, name
from ci
where name ~ 'Pe+'
;
Returns:
1 John McEnroe Blackbird Petrus
Or can use something similar where substring(name, <regex exp>) is not null
Check it here: http://rextester.com/LHA16094
If you know that the words are space separated, You can do
SELECT * FROM people WHERE name LIKE LOWER('Jack%') or name LIKE LOWER(' Jack%') ;
For more control you can use RegEx with MySQl
see https://dev.mysql.com/doc/refman/5.7/en/regexp.html

Sql Server Contains search not Giving Result as expected

select * from table1 where contains(searchWord,"*comfort*")
I want result as
Uncomfortable
with search Word in between but it is showing
comfort xyz
only
You do not need contains function here. Searches for precise or fuzzy (less precise) matches to single words and phrases, words within a certain distance of one another, or weighted matches in SQL Server
You need simple predicate for the required result.
select * from table1 where searchWord like '%comfort%'

search criteria difference between Like vs Contains() in oracle

I created a table with two columns.I inserted two rows.
id name
1 narsi reddy
2 narei sia
one is simply number type and another one is CLOB type.So i decided to use indexing on that. I queried on that by using contains.
query:
select * from emp where contains(name,'%a%e%')>0
2 narei sia
I expected 2 would come,but not. But if i give same with like it's given what i wanted.
query:
select * from emp where name like '%a%e%'
ID NAME
1 (CLOB) narsi reddy
2 (CLOB) narei sia
2 rows selected
finally i understood that like is searching whole document or paragraph but contains is looking in words.
so how can i get required output?
LIKE and CONTAINS are fundamentally different methods for searching.
LIKE is a very simple string pattern matcher - it recognises two wildcards (%) and (_) which match zero-or-more, or exactly-one, character respectively. In your case, %a%e% matches two records in your table - it looks for zero or more characters followed by a, followed by zero or more characters followed by e, followed by zero or more characters. It is also very simplistic in its return value: it either returns "matched" or "not matched" - no shades of grey.
CONTAINS is a powerful search tool that uses a context index, which builds a kind of word tree which can be searched using the CONTAINS search syntax. It can be used to search for a single word, a combination of words, and has a rich syntax of its own, such as boolean operators (AND, NEAR, ACCUM). It is also more powerful in that instead of returning a simple "matched" or "not matched", it returns a "score", which can be used to rank results in order of relevance; e.g. CONTAINS(col, 'dog NEAR cat') will return a higher score for a document where those two words are both found close together.
I believe that your CONTAINS query is matching 'narei sia' because the pattern '%a%e%' matches the word 'narei'. It does not match against 'narsi reddy' because neither word, taken individually, matches the pattern.
I assume you want to use CONTAINS instead of LIKE for performance reasons. I am not by any means an expert on CONTAINS query expressions, but I don't see a simple way to do the exact search you want, since you are looking for letters that can be in the same word or different words, but must occur in a given order. I think it may be best to do a combination of the two techniques:
WHERE CONTAINS(name,'%a% AND %e%') > 0
AND name LIKE '%a%e%'
I think this would allow the text index to be used to find candidate matches (anything which has at least one word containing 'a' and at least one word containing 'e'). These would would then be filtered by the LIKE condition, enforcing the requirement that 'a' precede 'e' in the string.