SQL Fulltext: What items have not been indexed? - sql

I have a Fulltext index on one of my tables which contains some metadata and a document blob (PDF or Doc or RTF etc)
Sometimes there is an error indexing a row and therefore the row cannot be returned in Fulltext searches.
What query could I use to find out what items have NOT been indexed?
I thought something like this:
Select * from MyTable where MyTableID NOT IN
(
select MyTableID from MyTable
where contains(Title, Title)
)
And then work out which rows were not returned. But the inner query is not syntactically correct and I cant work it out.
Any ideas?
Cheers
Aaron

Bad news and good news:
Bad news - There is no way to find out what items have not been indexed just by using a simple query.
Good News - You can add a datetime on your fulltext table and store the insert date for each record on it. Then, you can create a Log table that will contains the last date that a population was executed. Using this table you can find out wich records were not indexed since last index population.
I dont know if I made myself clear. I just did what i said today. I created a job that will start a population, and another job that will check if the population is done and populate the log table with the last index population date.

Related

SQL or statement vs multiple select queries

I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?
SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.
A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)
Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)
Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)

SQL Server Full Text Search: One to many relationships

I am trying to retrieve data from tickets that meet search matches. The relevant bits of data here are that a ticket has a name, and any number of comments.
Currently I'm matching a search against the ticket name like so:
JOIN freetexttable(Tickets,TIC_Name,'Test ') s1
ON TIC_PK = s1.[key]
Where the [key] from the full text catalog is equal to TIC_PK.
This works well for me, and gives me access to s1.rank, which is important for me to sort by.
Now my problem is that this method wont work for ticket searching, because the key in the comment catalog is the comment PK, an doesn't give me any information I can use to link to the ticket.
I'm very perplexed about how to go about searching multiple descriptions and still getting a meaningful rank.
I'm pretty knew to full-text search and might be missing something obvious.
Heres my current attempt at getting what I need:
WHERE TIC_PK IN(
SELECT DES_TIC_FK FROM freetexttable(TicketDescriptions, DES_Description,'Test Query') as t
join TicketDescriptions a on t.[key] = a.DES_PK
GROUP BY DES_TIC_FK
)
This gets me tickets with comments that match the search, but I dont think it's possible to sort by the rank data freetexttable returns with this method.
To search the name and comments at the same time and get the most meaningful rank you should put all of this info into the same table -- a new table -- populated from your existing tables via an ETL process.
The new table could look something like this:
CREATE TABLE TicketsAndDescriptionsETL (
TIC_PK int,
TIC_Name varchar(100),
All_DES_Descriptions varchar(max),
PRIMARY KEY (TIC_PK)
)
GO
CREATE FULLTEXT INDEX ON TicketsAndDescriptionsETL (
TIC_Name LANGUAGE 'English',
All_DES_Descriptions LANGUAGE 'English'
)
Schedule this table to be populated either via a SQL job, triggers on the Tickets and TicketDescriptions tables, or some hook in your data layer. For tickets that have multiple TicketDescriptions records, combine the text of all of those comments into the All_DES_Descriptions column.
Then run your full text searches against this new table.
While this approach does add another cog to the machine, there's really no other way to perform full text searches across multiple tables and generate one rank.

Full text search records not showing up

I have a column in my table to which I have added a Full Text Index. When I write a query to do contains on that particular column, the query gives the output of those records which matches with contain and which were added to the table before 3-4 hours and later. Those records which are added recently to the table don't show up in the output even though their text matches with the contains text.
Create table Table1 (Id int, Name varchar(20), Message varchar(1000), CreatedAt datetime)
Message is the column which has full text index.
Can someone please help me as to why is this behavior of SQL and what can I do to rectify this?
It sounds like your full text index isn't being populated after changes to the data. You should either set up automatic population of the index or perform manual population whenever you update the data.
More here: MSDN: Populate Full-Text Indexes

query - select data by first inserted [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
select bottom rows in natural order
People imagine that i have this table :
persons
columns of the table are NAME and ID
and i insert this
insert into persons values ('name','id');
insert into persons values ('John','1');
insert into persons values ('Jack','3');
insert into persons values ('Alice','2');
How can i select this information order by the insertion? My query would like :
NAME ID
name id
John 1
Jack 3
Alice 2
Without indexs (autoincrements), it's possible?
I'm pretty sure its not. From my knowldege sql data order is not sequetional with respect to insertion. The only idea I have is along with each insertion have a timestamp and sort by that time stamp
This is not possible without adding a column or table containing a timestamp. You could add a timestamp column or create another table containing IDs and a timestamp and insert in to that at the same time.
You cannot have any assumptions about how the DBMS will store data and retrieve them without specifying order by clause. I.e. PostgreSQL uses MVCC and if you update any row, physically a new copy of a row will be created at the end of a table datafile. Using a plain select causes pg to use sequence scan scenario - it means that the last updated row will be returned as the last one.
I have to agree with the other answers, Without a specific field/column todo this... well its a unreliable way... While i have not actually ever had a table without an index before i think..
you will need something to index it by, You can go with many other approaches and methods... For example, you use some form of concat/join of strings and then split/separate the query results later.
--EDIT--
For what reason do you wish not to use these methods? time/autoinc
Without storing some sort of order information during insert, the database does not automatically keep track of every record ever inserted and their order (this is probably a good thing ;) ). Autoincrement cannot be avoided... even with timestamp, they can hold same value.

How are these tasks done in SQL?

I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.