how to improve performance from a selection of the longest prefix within table - sql

There is a table holding some values like:
id | prefix | name
----+----------------+--------------------------
1 | record1 | name for record 1
2 | record2 | name for record 2
3 | record | name for record 3
4 | another rec | name for record 4
In order to select the longest prefix of a given text and return the name I use the following SQL:
select top 1 name from prefixes where :text like prefix + '%' order by prefix desc
And this is exactly what I need, when I give text record1 it returns me name for record 2 when record1 it returns me name for record 1, if I give a it returns me name for record 4.
But the problem is that this is executed a few times and the table is updated a lot, so the performance in my case (table with just 210000 rows) is around 300ms, I would like to reduce this, is there something could be improved on the query or even on the database?

I don't know Sybase internals really well. However, look at the plan to see if it is using the index. If so, is it doing a full scan of the index or is the engine smart enough to understand the "like".
My guess is that the engine is doing a full scan. You might be able to trick it to seeking to the right starting location by changing the query:
where prefix >= :text and :text like prefix + '%'
However, it will probably do a full scan from that point forward. You can fix this by having a maximum place to search:
where prefix >= :text and prefix <= :text + 'zzz'
(Assuming that you are using alpha-numeric values in the prefix, this should be ok. You can also use something like :text + '}', because '}' has a very high ASCII value, assuming you are using an ASCII collating sequence.)
Are your prefixes known in advance? That is, for "record1" is the prefix always "record"? Or are you considering "r", "re", and so on.
If the former, then add a new column which contains the "base" part of the prefix. Build an index on this column and change the join to equality. The engine will fetch only the records from the index.
The issue of having the column "name" in the index is to prevent the additional step of looking up the name on the data pages for the table. Once again, this depends on how Sybase optimizes the query. It should find the appropriate records only using the index and then look up the fields after applying the top 1. However, if it fetches all the values, then applies the top 1, having "name" in the index will be a benefit.

Related

Efficiently return words that match, or whose synonym(s), match a keyword

I have a database of industry-specific terms, each of which may have zero or more synonyms. Users of the system can search for terms by keyword and the results should include any term that contains the keyword or that has at least one synonym that contains the keyword. The result should then include the term and ONLY ONE of the matching synonyms.
Here's the setup... I have a term table with 2 fields: id and term. I also have a synonym table with 3 fields: id, termId, and synonym. So there would data like:
term Table
id | term
-- | -----
1 | dog
2 | cat
3 | bird
synonym Table
id | termId | synonym
-- | ------ | --------
1 | 1 | canine
2 | 1 | man's best friend
3 | 2 | feline
A keyword search for (the letter) "i" should return the following as a result:
id | term | synonym
-- | ------ | --------
1 | dog | canine <- because of the "i" in "canine"
2 | cat | feline <- because of the "i" in "feline"
3 | bird | <- because of the "i" in "bird"
Notice how, even though both "dog" synonyms contain the letter "i", only one was returned in the result (doesn't matter which one).
Because I need to return all matches from the term table regardless of whether or not there's a synonym and I need no more than 1 matching synonym, I'm using an OUTER APPLY as follows:
<!-- language: sql -->
SELECT
term.id,
term.term,
synonyms.synonym
FROM
term
OUTER APPLY (
SELECT
TOP 1
term.id,
synonym.synonym
FROM
synonym
WHERE
term.id = synonym.termId
AND synonym.synonym LIKE #keyword
) AS synonyms
WHERE
term.term LIKE #keyword
OR synonyms.synonym LIKE #keyword
There are indexes on term.term, synonym.termId and synonym.synonym. #Keyword is always something like '%foo%'. The problem is that, with close to 50,000 terms (not that much for databases, I know, but...), the performance is horrible. Any thoughts on how this can be done more efficiently?
Just a note, one thing I had thought to try was flattening the synonyms into a comma-delimited list in the term table so that I could get around the OUTER APPLY. Unfortunately though, that list can easily exceed 900 characters which would then prevent SQL Server from adding an index to that column. So that's a no-go.
Thanks very much in advance.
You've got a lot of unnecessary logic in there. There's no telling how SQL server is creating an execution path. It's simpler and more efficient to split this up into two separate db calls and then merge them in your code:
Get matches based on synonyms:
SELECT
term.id
,term.term
,synonyms.synonym
FROM
term
INNER JOIN synonyms ON term.termId = synonyms.termId
WHERE
synonyms.synonym LIKE #keyword
Get matches based on terms:
SELECT
term.id
,term.term
FROM
term
WHERE
term.term LIKE #keyword
For "flattening the synonyms into a comma-delimited list in the term table: - Have you considered using Full Text Search feature? It would be much faster even when your data goes on becoming bulky.
You can put all synonyms (as comma delimited) in "synonym" column and put full text index on the same.
If you want to get results also with the synonyms of the words, I recommend you to use Freetext. This is an example:
SELECT Title, Text, * FROM [dbo].[Post] where freetext(Title, 'phone')
The previous query will match the words with ‘phone’ by it’s meaning, not the exact word. It will also compare the inflectional forms of the words. In this case it will return any title that has ‘mobile’, ‘telephone’, ‘smartphone’, etc.
Take a look at this article about SQL Server Full Text Search, hope it helps

SQL Server Full Text Search to find containing characters

I have a table with a column Document that is FullText Index.
Let say I have this in this table:
| ID | Document |
| 1 | WINTER SUMMER SPRING OTHER |
My requirement is to find rows that contains 'ER'.
For this I am querying like this:
SELECT TOP 100
[FullTextSearch].[Document], [FullTextSearch].[ID]
FROM
[FullTextSearch]
WHERE
CONTAINS(Document, '"*ER*"')
But this is not working.
Please suggest what should be best way to do this using FullTextSearch.
I am expecting id 1 should be returned.
You can user LIKE operator to find the value.
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards used in conjunction with the LIKE operator:
% - The percent sign represents zero, one, or multiple characters
_ - The underscore represents a single character
Syntax,
SELECT column1, column2, ...
FROM table_name
WHERE columnN LIKE pattern;
This query can help to find the result.
SELECT Document,ID FROM FullTextSearch
WHERE Document LIKE '%ER%';
It's a wildcard query...This should work.
SELECT TOP 100
[FullTextSearch].[Document], [FullTextSearch].[ID]
FROM
[FullTextSearch]
WHERE
Document like '%ER%'
========OR=============
SELECT TOP 100
[FullTextSearch].[Document], [FullTextSearch].[ID]
FROM
[FullTextSearch]
WHERE
CONTAINS(Document, '%ER%')

Is condensing the number of columns in a database beneficial?

Say you want to record three numbers for every Movie record...let's say, :release_year, :box_office, and :budget.
Conventionally, using Rails, you would just add those three attributes to the Movie model and just call #movie.release_year, #movie.box_office, and #movie.budget.
Would it save any database space or provide any other benefits to condense all three numbers into one umbrella column?
So when adding the three numbers, it would go something like:
def update
...
#movie.umbrella = params[:movie_release_year]
+ "," + params[:movie_box_office] + "," + params[:movie_budget]
end
So the final #movie.umbrella value would be along the lines of "2015,617293,748273".
And then in the controller, to access the three values, it would be something like
#umbrella_array = #movie.umbrella.strip.split(',').map(&:strip)
#release_year = #umbrella_array.first
#box_office = #umbrella_array.second
#budget = #umbrella_array.third
This way, it would be the same amount of data (actually a little more, with the extra commas) but stored only in one column. Would this be better in any way than three columns?
There is no benefit in squeezing such attributes in a single column. In fact, following that path will increase the complexity of your code and will limit your capabilities.
Here's some of the possible issues you'll face:
You will not be able to add indexes to increase the performance of lookup of records with a specific attribute value or sort the filtering
You will not be able to query a specific attribute value
You will not be able to sort by a specific column value
The values will be stored and represented as Strings, rather than Integers
... and I can continue. There are no advantages, only disadvantages.
Agree with comments above, as an example try to use pg_column_size() to compare results:
WITH test(data_txt,data_int,data_date) AS ( VALUES
('9999'::TEXT,9999::INTEGER,'2015-01-01'::DATE),
('99999999'::TEXT,99999999::INTEGER,'2015-02-02'::DATE),
('2015-02-02'::TEXT,99999999::INTEGER,'2015-02-02'::DATE)
)
SELECT pg_column_size(data_txt) AS txt_size,
pg_column_size(data_int) AS int_size,
pg_column_size(data_date) AS date_size
FROM test;
Result is :
txt_size | int_size | date_size
----------+----------+-----------
5 | 4 | 4
9 | 4 | 4
11 | 4 | 4
(3 rows)

Postgres text search on multiple rows

I have a table called 'exclude' that contains hashtags:
-------------
id | tag
-------------
1 #oxford
2 #uk
3 #england
-------------
I have another table called 'post':
-----------------------------------------------
id | tags | text
1 #oxford #funtimes Sitting in the sun
2 #oz Beach fun
3 #england Milk was a bad choice
-----------------------------------------------
In order to do a text search on the posts tags I've been running a query like follows:
SELECT * FROM post WHERE to_tsvector(tags) ## plainto_tsquery('mysearchterm')
However, I now want to be able to exclude all posts where some or all of the tags are in my exclude table. Is there any easy way to do this in SQL/Postgres?
I tried converting the tags row into one column, and using this term within the plainto_tsquery function but it doesn't work (I don't know how to do a text search 'not equal' to either, hence the logic is actual wrong, albeit on the right lines in my mind):
select * from post where to_tsvector(tags) ## plainto_tsquery(
select array_to_string(array(select RTRIM(value) from exclude where key = 'tag'), ' ')
)
What version of PostgreSQL are you on? And how flexible is your schema design? In other words, can you change it at will? Or is this out of your control?
Two things immediately popped to mind when I read your questions. One is you should be able to use array and the the #> (contains) or <# (is contains by) operators.
Here is documentation
Second, you might be able to utilize an hstore and do a similar operation.
to:
hstore #> hstore
It's not a true hstore, because you are not using a real key=>value pair. But, I guess you could do {tagname}=True or {tagname}=NULL. Might be a bit hackish.
You can see the documentation (for PostgreSQL 9.1) hstore and how to use it here

How do I create sql query for searching partial matches?

I have a set of items in db .Each item has a name and a description.I need to implement a search facility which takes a number of keywords and returns distinct items which have at least one of the keywords matching a word in the name or description.
for example
I have in the db ,three items
1.item1 :
name : magic marker
description: a writing device which makes erasable marks on whiteboard
2.item2:
name: pall mall cigarettes
description: cigarette named after a street in london
3.item3:
name: XPigment Liner
description: for writing and drawing
A search using keyword 'writing' should return magic marker and XPigment Liner
A search using keyword 'mall' should return the second item
I tried using the LIKE keyword and IN keyword separately ,..
For IN keyword to work,the query has to be
SELECT DISTINCT FROM mytable WHERE name IN ('pall mall cigarettes')
but
SELECT DISTINCT FROM mytable WHERE name IN ('mall')
will return 0 rows
I couldn't figure out how to make a query that accommodates both the name and description columns and allows partial word match..
Can somebody help?
update:
I created the table through hibernate and for the description field, used javax.persistence #Lob annotation.Using psql when I examined the table,It is shown
...
id | bigint | not null
description | text |
name | character varying(255) |
...
One of the records in the table is like,
id | description | name
21 | 133414 | magic marker
First of all, this approach won't scale in the large, you'll need a separate index from words to item (like an inverted index).
If your data is not large, you can do
SELECT DISTINCT(name) FROM mytable WHERE name LIKE '%mall%' OR description LIKE '%mall%'
using OR if you have multiple keywords.
This may work as well.
SELECT *
FROM myTable
WHERE CHARINDEX('mall', name) > 0
OR CHARINDEX('mall', description) > 0