How do I improve full-text searching

How do I improve full-text searching - sql

I have a SQL Server 2008 database. This database has 2 tables:
Manufacturer
------------
ID,
Name nvarchar(256)
Product
-------
ID
ManufacturerID
Name nvarchar(256)
My application has a search box. I do not know if the user is going to provide a manufacturer name or a product name. In addition, I'm trying to be a little bit graceful and handle mis-spellings. In an effort to meet this criteria, I'm using the CONTAINSTABLE function. With this function, I have created the following query:
SELECT
*
FROM
[Manufacturer] m
INNER JOIN [Product] p ON m.[ID]=p.[ManufacturerID]
INNER JOIN CONTAINSTABLE(Manufacturer, Name, #searchQuery) r ON m.[ID]=r.[Key]
ORDER BY
r.[Rank]
My query is performing VERY SLOW with the CONTAINSTABLE function. Without the second INNER JOIN, the query runs in less than 1 second. With the second INNER JOIN included the query runs in a little over 30 seconds (too long).
Can anyone provide some performance recommendations? I have no clue how to overcome this hurdle.
Thank you,

Select just the fields you need, and not SELECT *.
Indexes on m and p are set?

Related

How can I write a SQL query to display ID, First Name, Last Name of all players with more than 2000 career hits based on schema

I am new to SQL and DB management. I am working on writing queries based on a schema which you can find below. This is an exercise for me to get familiar reading, writing queries on SQL Server for my job. Could you please help me out defining query based on the schema and simply explain the logic?
Thanks a lot!
SQL Server is my DBMS and here are the question
Display ID, First Name, Last Name, and Hits to display all players with more than 2000 career hits.

This one you can get by typing this query in Microsoft SQL Server
SELECT
MLB_PLAYERS.FIRST_NAME,
MLB_PLAYERS.LAST_NAME,
MLB_PLAYERS.ID,
CAREER_STATS.HITS
FROM
MLB_PLAYERS LEFT JOIN KEY_GAMES_STATS on MLB_PLAYERS.ID=CAREER_STATS.ID
WHERE
CAREER_STATS.HITS>2000
So you have a simple structure to follow:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
But you decide to get only 3 of them, which is select, from and where. By SELECT you decide which columns you wanna have as an output. Then in FROM you have to choose tables from which you wanna take your variables. But if you decide to use 2 different tables you need to join them. I used left join because I wanted to match hits to existing players. We can match them by similar key, in this case this is their ID. And eventually, you can use where to apply conditions to your queries

I guess you could do it with a join and a group
select p.MLB_PLAYERS.FIRST_NAME,
p.MLB_PLAYERS.LAST_NAME,
p.MLB_PLAYERS.ID,
count(g.KEY_GAMES_STATS.HITS) as hits
from MLB_PLAYERS p
left join KEY_GAMES_STATS on p.ID = g.ID -- not sure how to link there 2 tables
group by p.MLB_PLAYERS.FIRST_NAME,
p.MLB_PLAYERS.LAST_NAME,
p.MLB_PLAYERS.ID
having count(g.KEY_GAMES_STATS.HITS) > 2000

how to optimize SQL sub queries?

This my scenario I tried to search a record in the SQL table using the name. So, I tried to create a subquery and I used like operator also in Postgres. SQL query It's working fine. but it's taking so much time. So, I checked why it's taking so much time. the reason is the subquery. In the subquery it hitting all the records in the table. How to optimize subquery.
SQL QUery
SELECT
id, latitude,longitude,first_name,last_name,
contact_company_id,address,address2,city,state_id, zip,country_id,default_phone_id,last_contacted,image,contact_type_id
FROM contact
WHERE company_id = 001
AND contact_company_id IN (select id from contactcompany where lower( name ) ~*'jack')
So, I tried to run this query it's taking 2 seconds and it hit all records in the contact company table that only It's takes time.
How to optimize subquery using SQL?

Please try a sub query as a inner join with a main table, both query give same result.
Example here :
SELECT contact.id,
contact.latitude,
contact.longitude,
contact.first_name,
contact.last_name,
contact.contact_company_id,
contact.address,
contact.address2,
contact.city,
contact.state_id,
contact.zip,
contact.country_id,
contact.default_phone_id,
contact.last_contacted,
contact.image,
contact.contact_type_id
FROM contact As contact
Inner Join contactcompany As contactcompany On contactcompany.id = contact_company_id
WHERE company_id = 001
AND lower( name ) ~*'jack'

I would start by writing the query using exists. Then, company_id is either a string or a number. Let met guess that it is a string, because the constant is represented with leading zeros. If so, use single quotes:
SELECT c.*
FROM contact c
WHERE company_id = '001' AND
EXISTS (SELECT 1
FROM contactcompany cc
WHERE cc.name ~* 'jack' AND
cc.id = c.contact_company_id
);
Then an index on contact(compnay_id, contact_company_id) makes sense. And for the subquery, contactcompany(id, name).
There may be other alternatives for writing the query, but your question has not provided much information on table sizes, current performance, or the data types.

Long SQL subquery trouble

I just registered and want to ask.
I learn sql queries not so long time and I got a trouble when I decided to move a table to another database. A few articles were read about building long subqueries , but they didn't help me.
Everything works perfect before that my action.
I just moved the table and tried to rewrite the query while whole day.
update [dbo].Full
set [salary] = 1000
where [dbo].Full.id in (
select distinct k1.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.S_School)
) k1
where k1.id not in (
select distinct k2.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.Shool)
) k2,
List_School t3
where charindex (t3.NameApp, k2.Topic)>5
)
)
I moved table List_School to database [DB_1] and I can't to bend with it.
I can't write [DB_1].dbo.List_School. Should I use one more subquery?
I even thought about create a few temporary tables but it can influence on speed of execution.
Sql gurus , please invest some your time on me. Thank you in advance.
I will be happy for each hint, which you give me.

There appear to be a number of issues. You are comparing the user column to the topic_name column. An expected meaning of those column names would suggest you are not comparing the correct columns. But that is a guess.
In the final subquery you have an ansi join on table List_School but no join columns which means the join witk k2 is a cartesian product (aka cross join) which is not what you would want in most situations. Again a guess as no details of actual problem data or error messages was provided.

Why does breaking out this correlated subquery vastly improve performance?

I tried running this query against two tables which were very different sizes - #temp was about 15,000 rows, and Member is about 70,000,000, about 68,000,000 of which do not have the ID 307.
SELECT COUNT(*)
FROM #temp
WHERE CAST(individual_id as varchar) NOT IN (
SELECT IndividualID
FROM Member m
INNER JOIN Person p ON p.PersonID = m.PersonID
WHERE CompanyID <> 307)
This query ran for 18 hours, before I killed it and tried something else, which was:
SELECT IndividualID
INTO #source
FROM Member m
INNER JOIN Person p ON p.PersonID = m.PersonID
WHERE CompanyID <> 307
SELECT COUNT(*)
FROM #temp
WHERE CAST(individual_id AS VARCHAR) NOT IN (
SELECT IndividualID
FROM #source)
And this ran for less than a second before giving me a result.
I was pretty surprised by this. I'm a middle-tier developer rather than a SQL expert and my understanding of what goes on under the hood is a little murky, but I would have presumed that, since the sub-query in my first attempt is the exact same code, asking for the exact same data as in the second attempt, that these would be roughly equivalent.
But that's obviously wrong. I can't look at the execution plan for my original query to see what SQL Server is trying to do. So can someone kindly explain why splitting the data out into a temp table is so much faster?
EDIT: Table schemas and indexes
The #temp table has two columns, Individual_ID int and Source_Code varchar(50)
Member and Person are more complex. They has 29 and 13 columns respectively so I don't really want to post them all in full. PersonID is an int and is the PK on Person and an FK on Member. IndividualID is a column on Person - this is not clear in the query as written.
I tried using a LEFT JOIN instead of NOT IN before asking the question. The performance on the second query wasn't noticeably different - both were sub-second. On the first query I let it run for an hour before stopping it, presuming it would make no significant difference.
I also added an index on #source, just like on the original table, so the performance impact should be identical.

First, your query has two faux pas's that really stick out. You are converting to varchar(), but you do not include a length argument. This should not be allowed! The default length varies by context and you need to be explicit.
Second, you are matching two keys in different tables and they seemingly have different types. Foreign key references should always have the same type. This can have a very big impact on performance. If you are dealing with tables that have millions of rows, then you need to pay some attention to the data structure.
To understand the difference in performance, you need to understand execution plans. The two queries have very different execution plans. My (educated) guess is that the first version version is using a nested loop join algorithm. The second version is using a more sophisticated algorithm. In your case, this would be due to the ability of SQL Server to maintain statistics on tables. So, instantiating the intermediate results actually helps the optimizer produce a better query plan.
The subject of how best to write this logic has been investigated a lot. Here is a very good discussion on the subject by Aaron Bertrand.
I do agree with Aaron on the preference for not exists in this case:
SELECT COUNT(*)
FROM #temp t
WHERE NOT EXISTS (SELECT 1
FROM Member m JOIN
Person p
ON p.PersonID = m.PersonID
WHERE MemberID <> 307 and individual_id = t. individual_id
);
However, I don't know if this will have better performance in this particular case.

This line is probably what kills the first query
WHERE CAST(individual_id as varchar) NOT IN
My guess would be that this forces a table scan rather than using any indexes.

Improving performance on SQL query

I'm currently having performance problems with an expensive SQL query, and I'd like to improve it.
This is what the query looks like:
SELECT TOP 50 MovieID
FROM (SELECT [MovieID], COUNT(*) AS c
FROM [tblMovieTags]
WHERE [TagID] IN (SELECT TOP 7 [TagID]
FROM [tblMovieTags]
WHERE [MovieID]=12345
ORDER BY Relevance ASC)
GROUP BY [MovieID]
HAVING COUNT(*) > 1) a
INNER JOIN [tblMovies] m ON m.MovieID=a.MovieID
WHERE (Hidden=0) AND m.Active=1 AND m.Processed=1
ORDER BY c DESC, m.IMDB DESC
What I'm trying to find movies that have at least 2 matching tags for MovieID 12345.
Database basic scheme looks like:
Each movie has 4 to 5 tags. I want a list of movies similar to any movie based on the tags. A minimum of 2 tags must match.
This query is causing my server problems as I have hundreds of concurrent users at any given time.
I have already created indexes based on execution plan suggestions, and that has made it quicker, but it's still not enough.
Is there anything I could do to make this faster?

I Like to use temp tables, because they can speed up your queries (if used correctly) and make it easier to read. Try using the query below and see if it speeds it up any. There were a few fields (hidden,imdb) that weren't in your schema, so I left them out.
This query may, or may not, be exactly what you are looking for. The point of it is to show you how to use temp tables to increase the performance and improve readability. Some minor tweaks may be necessary.
SELECT TOP 7 [TagID],[MovieTagID],[MovieID]
INTO #MovieTags
FROM [tblMovieTags]
WHERE [MovieID]=12345
SELECT mt.MovieID, COUNT(mt.MovieTagID)
INTO #Movies
FROM #MovieTags mt
INNER JOIN tblMovies m ON m.MovieID=mt.MovieID AND m.Active=1 AND m.Process=1
GROUP BY [MovieID]
HAVING COUNT(mt.MovieTagID) > 1
SELECT TOP 50 * FROM #Movies
DROP TABLE #MovieTags
DROP TABLE #Movies
Edit
Parameterized Queries
You will also want to use parameterized queries, rather than concatenating your values in your SQL string. Check out this short, to the point, blog that explains why you should use parameterized queries. This, combined with the temp table method, should improve your performance significantly.

I want to see if there is some unnecessary processing happening from that query you wrote. Try the following query and let us know if it's faster slower etc And if it's even getting the same data.
I just threw this together so no guarantees on perfect syntax
SELECT TOP 7 [TagID]
INTO #MovieTags
FROM [tblMovieTags]
WHERE [MovieID]=12345
ORDER BY TagID
;cte_movies AS
(
SELECT
mt.MovieID
,mt.TagID
FROM
tblMovieTags mt
INNER JOIN #MovieTags t ON mt.TagId = t.TagId
INNER JOIN tblMovies m ON mt.MovieID = m.MovieID
WHERE
(Hidden=0) AND m.Active=1 AND m.Processed=1
),
cte_movietags AS
(
SELECT
MovieId
,COUNT(MovieId) AS TagCount
FROM
cte_movies
GROUP BY MovieId
)
SELECT
MovieId
FROM
cte_movietags
WHERE
TagCount > 1
ORDER BY
MovieId
GO
DROP TABLE #MovieTags

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I improve full-text searching - sql

Select just the fields you need, and not SELECT *. Indexes on m and p are set?

Related

How can I write a SQL query to display ID, First Name, Last Name of all players with more than 2000 career hits based on schema

how to optimize SQL sub queries?

Long SQL subquery trouble

Why does breaking out this correlated subquery vastly improve performance?

Improving performance on SQL query

Categories

Resources