Combine query to get all the matching search text in right order - sql

I have the following table:
postgres=# \d so_rum;
Table "public.so_rum"
Column | Type | Collation | Nullable | Default
-----------+-------------------------+-----------+----------+---------
id | integer | | |
title | character varying(1000) | | |
posts | text | | |
body | tsvector | | |
parent_id | integer | | |
Indexes:
"so_rum_body_idx" rum (body)
I wanted to do phrase search query, so I came up with the below query, for example:
select id from so_rum
where body ## phraseto_tsquery('english','Is it possible to toggle the visibility');
This gives me the results, which only match's the entire text. However, there are documents, where the distance between lexmes are more and the above query doesn't gives me back those data. For example: 'it is something possible to do toggle between the. . . visibility' doesn't get returned. I know I can get it returned with <2> (for example) distance operator by giving in the to_tsquery, manually.
But I wanted to understand, how to do this in my sql statement itself, so that I get the results first with distance of 1 and then 2 and so on (may be till 6-7). Finally append results with the actual count of the search words like the following query:
select count(id) from so_rum
where body ## to_tsquery('english','string & string . . . ')
Is it possible to do in a single query with good performance?

I don't see a canned solution to this. It sounds like you need to use plainto_tsquery to get all the results with all the lexemes, and then implement your own custom ranking function to rank them by distance between the lexemes, and maybe filter out ones with the wrong order.

Related

Split string and Pivot Result - SQL Server 2012

I am using SQL Server 2012 and I have a table called XMLData that looks like this:
| Tag | Attribute |
|--------------|-----------------------------|
| tag1 | Cantidad=222¬ClaveProdServ=1|
| tag1 | Cantidad=333¬ClaveProdServ=2|
The column Tag has many repeated values, what is different is the column Attribute that has a string of attributes separated by "¬". I want to separate the list of attributes and then pivot the table so the tags are the column names.
The result I want is like this:
| tag1 | tag1 |
|-----------------|----------------|
| Cantidad=222 | Cantidad=333 |
| ClaveProdServ=1 | ClaveProdServ=2|
I have a custom made function that splits the string since SQL server 2012 doesn't have a premade function that does this. The function I have receives a
string as a parameter and the delimiter like so:
select *
from [dbo].[Split]('lol1,lol2,lol3,lol4',',')
this function will return this:
| item |
|--------|
| lol1 |
| lol2 |
| lol3 |
I can't find a way to pass the values of the column Attribute as parameter of this function, something like this:
SELECT *
FROM Split(A.Attribute,'¬'),XMLData A
And then put the values of the column Tag as the the column names for each set of Attributes
My magic crystal ball tells me, that you have - why ever - decided to do it this way and any comments about don't store CSV data are just annoying to you.
How ever...
If this is just a syntax issue, try it like this:
SELECT t.Tag
,t.Attribute
,splitted.item
FROM YourTable AS t
CROSS APPLY dbo.Split(t.Attribute,'¬') AS splitted
Otherwise show some more relevant details. Please read How to ask a good SQL question and How to create a MCVE

How to find similar words in Full text search on postgresql?

I'm trying to use Full text search on postgresql:
select *
from entertainement
where to_tsvector('simple', name) ## to_tsquery('simple', 'word_to_search:*')
This query works well and give me what I want to display.However I found on some websites when I enter a word that is not found it shows me No result found for 'word_to_search' and give me some other propositions to some words similar to it.
For example if I put the word activityng I got
No result found for activityng
But it gives me some propositions containing word activity. However when I put the word activityns I got :
No result found for activityns
But I got some propositions containing the word activities. I didn't understand the logic of it because I thnik activityns is similiar to activity than to activities.
I tried to know the similiarity of this word using similarity of pg_trgm and I got:
select similarity('activity','activityns');
similarity: 0,6666667
select similarity('activities','activityns');
similarity: 0,4666667
Is there any other solutions to detect similarity between the words and gives more precise results?
FTS first reduce the token to lexeme and then compare, trigram compares three letters - you can't compare comparison results with so different algorithms, hereis example for FTS (showing why one is closerto another in your sample):
t=# with w(v) as (values('activityns'),('activity'),('activities'),('activit'))
select to_tsvector(v),v, to_tsvector(v) ## to_tsquery('activ:*'),to_tsvector(v) ## to_tsquery('activity'),to_tsvector(v) ## to_tsquery('activit:*') from w;
to_tsvector | v | ?column? | ?column? | ?column?
---------------+------------+----------+----------+----------
'activityn':1 | activityns | t | f | t
'activ':1 | activity | t | t | f
'activ':1 | activities | t | t | f
'activit':1 | activit | t | f | t
(4 rows)
look to which lexeme the word is reduced (first column) and take a look at what results give you wildcard usage depending on lexeme to which the word is reduced (3,4,5) columns
SELECT
courses.id,
courses.title,
courses.description,
rank_title,
rank_description,
similarity
FROM courses,
to_tsvector(courses.title || courses.description) document,
to_tsquery('sales') query,
NULLIF(ts_rank(to_tsvector(courses.title), query), 0) rank_title,
NULLIF(ts_rank(to_tsvector(courses.description), query), 0) rank_description,
SIMILARITY('sales', courses.title || courses.description) similarity
WHERE query ## document OR similarity > 0
ORDER BY rank_title, rank_description, similarity DESC NULLS LAST
https://leandronsp.com/a-powerful-full-text-search-in-postgresql-in-less-than-20-lines

Unexpected result from CASE referencing another expression

The following statement always returns the result from st_area(st_buffer(polygon,100)).
select st_Area(polygon) as area,
case when area>100000 then st_area(st_buffer(polygon,500))
else st_area(st_buffer(polygon,100))
end from polygons limit 10;
area | st_area
------------------+------------------
383287.287473659 | 723738.615102036
47642.5395246768 | 192575.823383778
45546.753026985 | 174122.420564731
435204.455923533 | 725419.735987631
839954.564052786 | 1268251.88626391
315213.27742828 | 630424.785088617
966620.061916605 | 1447647.57269461
38446.6010009923 | 151584.647252579
82576.1182937309 | 238095.988431594
321682.125463567 | 695462.262796463
(10 rows)
st_area should have been the result of st_buffer(polygon,500) when area>100000 as shown below:
area | st_area
------------------+------------------
383287.287473659 | 2702203.34758147
47642.5395246768 | 192575.823383778
45546.753026985 | 174122.420564731
435204.455923533 | 2507469.89929028
839954.564052786 | 3568866.96452707
315213.27742828 | 2453576.33477712
966620.061916605 | 3953365.12876066
38446.6010009923 | 151584.647252579
82576.1182937309 | 238095.988431594
321682.125463567 | 2628693.69179652
(10 rows)
Can someone explain?
It doesn't become completely clear from the question (yet), but my educated guess is you want this:
SELECT st_Area(polygon) AS area -- or pick some other name!
, CASE WHEN st_Area(polygon) > 100000
THEN st_area(st_buffer(polygon,500))
ELSE st_area(st_buffer(polygon,100)) END AS st_area
FROM polygons
LIMIT 10;
You cannot reference the column alias (name of the output column) in another item of the same SELECT list. You can only reference input column names. So you have to repeat the expression or use a subquery:
SELECT area
, CASE WHEN area > 100000
THEN st_area(st_buffer(polygon,500))
ELSE st_area(st_buffer(polygon,100)) END AS st_area
FROM (SELECT st_Area(polygon) AS area, polygon FROM polygons LIMIT 10) sub;
Normally you should get a syntax error immediately. Obviously, there is another column named area in your base table. Hence the confusion. Additional wisdom to take away from this:
It's better to use a name different from any input column when attaching an alias to an output column.
Always include table definitions in questions. Clarifies a lot.

How to select everything that is NOT part of this string in database field?

First: I'm using Access 2010.
What I need to do is pull everything in a field out that is NOT a certain string. Say for example you have this:
00123457*A8V*
Those last 3 characters that are bolded are just an example; that portion can be any combination of numbers/letters and from 2-4 characters long. The 00123457 portion will always be the same. So what I would need to have returned by my query in the example above is the "A8V".
I have a vague idea of how to do this, which involved using the Right function, with (field length - the last position in that string). So what I had was
SELECT Right(Facility.ID, (Len([ID) - InstrRev([ID], "00123457")))
FROM Facility;
Logically in this mind it would work, however Access 2010 complains that I am using the Right function incorrectly. Can someone here help me figure this out?
Many thanks!
Why not use a replace function?
REPLACE(Facility.ID, "00123457", "")
You are missing a closing square bracket in here Len([ID)
You also need to reverse this "00123457" in InStrRev(), but you don't need InStrRev(), just InStr().
If I understand correctly, you want the last three characters of the string.
The simple syntax: Right([string],3) will yield the results you desire.
(http://msdn.microsoft.com/en-us/library/ms177532.aspx)
For example:
(TABLE1)
| ID | STRING |
------------------------
| 1 | 001234567A8V |
| 2 | 008765432A8V |
| 3 | 005671234A8V |
So then you'd run this query:
SELECT Right([Table1.STRING],3) AS Result from Table1;
And the Query returns:
(QUERY)
| RESULT |
---------------
| A8V |
| A8V |
| A8V |
EDIT:
After seeing the need for the end string to be 2-4 characters while the original, left portion of the string is 00123457 (8 characters), try this:
SELECT Right([Table1].[string],(Len([Table1].[string])-'8')) AS Result
FROM table1;

Match a Query to a Regular Expression in SQL?

I'm trying to find a way to match a query to a regular expression in a database. As far as I can tell (although I'm no expert), while most DBMS like MySQL have a regex option for searching, you can only do something like:
Find all rows in Column 1 that match the regex in my query.
What I want to be able to do is the opposite, i.e.:
Find all rows in Column 1 such that the regex in Column 1 matches my query.
Simple example - say I had a database structured like so:
+----------+-----------+
| Column 1 | Column 2 |
+----------+-----------+
| [a-z]+ | whatever |
+----------+-----------+
| [\w]+ | whatever |
+----------+-----------+
| [0-9]+ | whatever |
+----------+-----------+
So if I queried "dog", I would want it to return the rows with [a-z]+ and [\w]+, and if I queried 123, it would return the row with [0-9]+.
If you know of a way to do this in SQL, a short SELECT example or a link with an example would be much appreciated.
For MySQL (and may be other databases too):
SELECT * FROM table WHERE "dog" RLIKE(`Column 1`)
In PostgreSQL it would be:
SELECT * FROM table WHERE 'dog' ~ "Column 1";