SQL Server Full Text Search: One to many relationships - sql

I am trying to retrieve data from tickets that meet search matches. The relevant bits of data here are that a ticket has a name, and any number of comments.
Currently I'm matching a search against the ticket name like so:
JOIN freetexttable(Tickets,TIC_Name,'Test ') s1
ON TIC_PK = s1.[key]
Where the [key] from the full text catalog is equal to TIC_PK.
This works well for me, and gives me access to s1.rank, which is important for me to sort by.
Now my problem is that this method wont work for ticket searching, because the key in the comment catalog is the comment PK, an doesn't give me any information I can use to link to the ticket.
I'm very perplexed about how to go about searching multiple descriptions and still getting a meaningful rank.
I'm pretty knew to full-text search and might be missing something obvious.
Heres my current attempt at getting what I need:
WHERE TIC_PK IN(
SELECT DES_TIC_FK FROM freetexttable(TicketDescriptions, DES_Description,'Test Query') as t
join TicketDescriptions a on t.[key] = a.DES_PK
GROUP BY DES_TIC_FK
)
This gets me tickets with comments that match the search, but I dont think it's possible to sort by the rank data freetexttable returns with this method.

To search the name and comments at the same time and get the most meaningful rank you should put all of this info into the same table -- a new table -- populated from your existing tables via an ETL process.
The new table could look something like this:
CREATE TABLE TicketsAndDescriptionsETL (
TIC_PK int,
TIC_Name varchar(100),
All_DES_Descriptions varchar(max),
PRIMARY KEY (TIC_PK)
)
GO
CREATE FULLTEXT INDEX ON TicketsAndDescriptionsETL (
TIC_Name LANGUAGE 'English',
All_DES_Descriptions LANGUAGE 'English'
)
Schedule this table to be populated either via a SQL job, triggers on the Tickets and TicketDescriptions tables, or some hook in your data layer. For tickets that have multiple TicketDescriptions records, combine the text of all of those comments into the All_DES_Descriptions column.
Then run your full text searches against this new table.
While this approach does add another cog to the machine, there's really no other way to perform full text searches across multiple tables and generate one rank.

Related

stored procedure that returns in one row duplicate column names and converts 1->N to a string

I'm trying to put all the below in a single stored procedute that returns a single row because the data is up on Sql Azure and the rule for it is do everything in a single query with a single return.
I have the following tables:
Person (
PersonId
FirstName
...
)
CompanyDomains (
CompanyId
EmailDomain
)
Company (
CompanyId
CompanyName
Billing_PersonId
Admin_PersonId
...
)
I have two problems here. The first is I want to get all the elements of a Company row, and the 2 Person rows of data. That's easy with a join. But the columns for the 2 person columns will have duplicate names. I can do 'as' one by one, which is a pain as the database schema is still in a state of flux. Is there a global way to apply 'as' so all the columns brought in from Billing_PersonId get a Billing_ prepended to the column name and Admin_ prepended to the admin column name?
The second is there is a 1->N list of company domains. Is there a way to pull all those and add a column that is a single string that has "domain1;domain2;" in it? We have the distinct domains in the CompanyDomain table so we can quickly find the company that owns any domain. But a single string works fine when I'm reading the company in.
I know single SQL selects pretty well. But I've got very little experience with stored procedures (aside from calling them) and so what I'm asking here may be basic. If so, sorry. And again, this is for Sql Azure.
thanks - dave
If you are using Azure, then you application should be able to parse XML.
Write a stored procedure to join the three tables, select the data given an input like company id, and return an xml record containing information from all three.
Look at the following references.
FOR XML
http://msdn.microsoft.com/en-us/library/ms178107.aspx
CREATE PROCEUDRE
http://msdn.microsoft.com/en-us/library/ms187926.aspx
If you need more help, you need to post a simple schema with sample data.
USE MY EXAMPLE BELOW FOR SCHEMA + DATA
sql select a field into 2 columns
1 - Without detailed information, not one will be able to help you.
2 - Try it on your own. I can give you the answer but you will not learn anything.
Sincerely
John

Crystal Reports SQL Expression

I am having trouble figuring out how to pull a value from a secondary table, to use as selection criteria on a per-record basis.
I am working with Crystal Reports 2011 on Windows 7, over an ODBC connection to an Oracle 11g database.
I am creating an employee directory that utilizes information from two locations:
table: TEACHERS
view: PVSIS_CUSTOM_TEACHERS
The teachers table is set up with your predictable fields: id, lastname, firstname, telephone, address, city, state, zip, etc. etc. etc.
The view has the following fields available:
TEACHERID
FIELD_NAME
TEXT_VALUE
The database application I am using allows me to create "custom fields" that are related back to the main teachers table. In truth, the fields I am creating are actually stored in a separate table, but are then accessible through the PVSIS_CUSTOM_TEACHERS view. Since the database application allows for any number of "custom fields" to be created, the view can have any number of records in it that can be tied back to the records within the teachers table.
There are MANY custom fields that have been created, but for the purposes of my current project, only 3 of them matter:
empDirSupRecord
empDirSupPhone
empDirSupAddr
The view for my personal teacher record would look like this:
TeacherID Field_Name Text_Value
1 empDirSupRecord
1 empDirSupPhone 1
1 empDirSupAddr 1
1 AnotherField another_value
1 YetAnotherField yetanother_value
(This would indicate that I've asked for my phone and address to be suppressed, but would still want my name to be included in the directory)
These fields will each contain a '1' if the user has asked that their phone number, or address not be published, or if we need to suppress the entire record altogether.
When I first started my report, I pulled both the table and view into the database expert and linked them together with teachers.id = pvsis_custom_teachers.teacherid. However, this causes each teacher's name to print on the report once for every record with their teacher id in the view. Since that's not the behavior I want, I removed the view from the database expert, and tried using SQL Expression fields to retrieve the contents of the custom field. This is where I'm currently stuck. I would need to write the sql in a way that selects the correctly named field, for each of the teacher records as the record is being processed by the report.
Currently, my sql expressions statement is written as:
(SELECT text_value FROM pvsis_custom_teachers WHERE field_name = 'empDirSupRecord' AND teacherid = '1')
What I need to do is figure out how to get the report to intelligently select the record for teacherid = (whatever teacherid is currently being processed). I'm not sure if SQL Expression fields are the way to go to accomplish this, so am definitely open to alternate suggestions if my current approach will not work.
Thanks for taking a look. :-)
You're almost there with the SQL Expression. You can refer back to the main query with double quoted field names. So what you're looking for is:
case when "teacher"."id" is null then null
else (SELECT max(text_value)
FROM pvsis_custom_teachers
WHERE field_name = 'empDirSupRecord' AND teacherid = "teacher"."id")
end
Note that CR will likely complain without the null check and use of max(), since it wants to be sure that only a scalar will ever be returned.
The alternative, and likely less-performance-intensive way to do this, is to join the table and view like you first described. Then, you can group by {teacher.id} and keep track of each field name in the view via variables. This will require more work and more formulas, though. Something like this, for example:
//Place this formula in the Group Header
whileprintingrecords;
stringvar empDirSupRecord:="";
//Place this formula in the Details section
whileprintingrecords;
stringvar empDirSupRecord;
if {pvsis_custom_teachers.field_name} = 'empDirSupRecord'
then empDirSupRecord:={pvsis_custom_teachers.text_value}
//Place this formula in the Group Footer
whileprintingrecords;
stringvar empDirSupRecord;

Dynamically generate criteria in SQL

I have a Users table that contains dozens of columns like date of birth, year of vehicle owned, make and model of the vehicle, color and many other personal fields unrelated to the vehicle
There's also a 2nd table called Coupons that needs to be designed in a way to support a qualification like "user qualifies if younger than 30 yrs old", "user qualifies if vehicle is greater than 10 yrs old", "user qualifies if vehicle color is green".
When a user logs in, I need to present all coupons the user qualifies for. The problem that I'm having is that the coupon qualifications could be numerous, could have qualifiers like equal, greater than or less than and may have different combinations.
My only solution at this point is to store the actual sql string within one of the coupons table columns like
select * from Users where UserId = SOME_PLACEHOLDER and VehicleYear < 10
Then I could execute the sql for each coupon row and return true or false. Seems very inefficient as I would potentially have to execute 1000s of sql statements for each coupon code.
Any insight, help is appreciated. I do have server-side code where I could potentially be able to do looping.
Thank you.
Very difficult problem. Seems like users will be added at high volume speed, with coupons at a fairly regular frequency.
Adding SQL to a table to be used dynamically is workable - at least you'll get a fresh execution plan - BUT your plan cache may balloon up.
I have a feeling that running a single coupon for all users is probably likely to be your highest performing query because it's one single set of criteria which will be fairly selective on users first and total number of coupons is small, whereas running all coupons for a single user is separate criteria for each coupon for that user. Running all coupons for all users may still perform well, even though it's effectively a cross join first - I guess it is just going to depend.
Anyway, the case for all coupons for all users (or sliced either way, really) will be something like this:
SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
CASE WHEN <coupon.criteria> THEN <coupon.id> -- code generated from the coupon rules table
CASE WHEN <coupon.criteria> THEN <coupon.id> -- etc.
ELSE NULL
) = coupon.id
To generate the coupon rules, you can relatively easily do the string concatenation in a single swipe (and you can combine an individual rule lines design for a coupon with AND with a further inner template):
DECLARE #outer_template AS varchar(max) = 'SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
{template}
ELSE NULL
) = coupon.id
';
DECLARE #template AS varchar(max) = 'CASE WHEN {coupon.rule} THEN {coupon.id}{crlf}';
DECLARE #coupon AS TABLE (id INT, [rule] varchar(max));
INSERT INTO #coupon VALUES
(1, 'user.Age BETWEEN 20 AND 29')
,(2, 'user.Color = ''Yellow''');
DECLARE #sql AS varchar(MAX) = REPLACE(
#outer_template
,'{template}',
REPLACE((
SELECT REPLACE(REPLACE(
#template
,'{coupon.rule}', coupon.[rule])
, '{coupon.id}', coupon.id)
FROM #coupon AS coupon
FOR XML PATH('')
), '{crlf}', CHAR(13) + CHAR(10)));
PRINT #sql;
// EXEC (#sql);
There's ways to pretty that up - play with it here: https://data.stackexchange.com/stackoverflow/q/115098/
I would consider adding computed columns (possibly persisted and indexed) to assist. For instance, age - non-persisted computed column will likely perform better than a scalar function.
I would consider batching this with a table which says whether a coupon is valid for a user and when it was last validated.
Seems like ages can change and a user can become valid or invalid for a coupon as their birthday passes.
When a user logs in you could spawn a background job to update their coupons. On subsequent logons, there won't be any need to update (since it's not likely to change until the next day or a triggering event).
Just a few ideas.
I would also add that you should have a way to test a coupon before it is approved to ensure there are no syntax errors (since the SQL is ad hoc or arbitrary) - this can be done relatively easily - perhaps a test user table (test_user as user in the generated code template instead) is required to contain pass and fail rows and the coupon rule points to those. Not only does the EXEC have to work - the rows it returns should be the expected and only the expected rows for that coupon.
This is not an easy problem. Here are some quick ideas that may help depending on your domain requirements:
Restrict the type of criteria you will be filtering on so that you can use dynamic or non-dynamic sql to execute them efficiently. For example if you are going to only have integers between a range of min and max values as a criteria then the problem becomes simpler. (You only need to know the field name, and the min max values to describe a criterian, not the full where statement.)
Create a number of views which expose the attributes in a helpful way. Then perform queries against those views -- or have those views pre-select in some way. For example, an age group view that has a field which can contain the values < 21, 21-30, 30-45, >45. Then your select just needs to return the rows from this view that match these strings.
Create a table which stores the results of running your criteria matching query (This can be run off line by a back ground process). Then for a given user check for membership by looking where in the table this user's ID exists.
Thinking about this some more I realize all my suggestions are based on one idea.
A query for an individual user will work faster overall if you first perform an SQL query against all users and cache that result in some way. If every user is reproducing queries against the whole dataset you will lose efficiency. You need some way to cache results and reuse them.
Hope this helps -- comment if these ideas are not clear.
My first thought on an approach (similar to Hogan's) would be to test for coupon applicability at the time the coupon is created. Store those results in a table (User_Coupons for example). If any user data is changed, your system would then retest any changed users for which coupons are applicable to them. At coupon creation (or change) time it would only check versus that coupon. At use creation (or change) time it would only check versus that user.
The coupon criteria should be from a known set of possible criteria and any time that you want to add a new type of criteria, it would possibly involve a code change. For example, let's say that you have a table set up similar to this:
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
age_minimum SMALLINT NULL,
age_maximum SMALLINT NULL,
vehicle_color VARCHAR(20) NULL,
...
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id)
)
If you wanted to add the ability to base a coupon on vehicle age then you would have to add a column to the table and likewise you would have to adjust your search code. You would use NULL values to indicate that the criteria is unused for that coupon.
An example query for the above table:
SELECT
CC.coupon_id
FROM
Users U
INNER JOIN Coupon_Criteria CC ON
(CC.age_maximum IS NULL OR dbo.f_GetAge(U.birthday) <= age_maximum) AND
(CC.age_minimum IS NULL OR dbo.f_GetAge(U.birthday) >= age_minimum) AND
(CC.vehicle_color IS NULL OR U.vehicle_color = CC.vehicle_color) AND
...
This can get unwieldy if the number of possible criteria gets to be very large.
Another possibility would be to save the coupon criteria in XML and have a business object for your application use that to determine eligibility. It could use the XML to generate a proper query against the User table (and any other necessary tables).
Here's another possibility. Each criteria could be given a query template which you could append to your queries. This would just involve updates to the data instead of DDL and could have good performance. It would involve dynamic SQL.
CREATE TABLE Coupons (
coupon_id INT NOT NULL,
description VARCHAR(2000) NOT NULL,
...
CONSTRAINT PK_Coupons PRIMARY KEY CLUSTERED (coupon_id)
)
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
criteria_num SMALLINT NOT NULL,
description VARCHAR(50) NOT NULL,
code_template VARCHAR(500) NOT NULL,
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id, criteria_num),
CONSTRAINT FK_Coupon_Criteria_Coupon FOREIGN KEY (coupon_id) REFERENCES Coupons (coupon_id)
)
INSERT INTO Coupons (coupon_id, description)
VALUES (1, 'Young people save $200 on yellow vehicles!')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 1, 'Young people', 'dbo.Get_Age(U.birthday) <= 20')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 2, 'Yellow Vehicles', U.vehicle_color = ''Yellow''')
You could then build a query by simply concatenating all of the criteria for any given coupon. The big downside to this one is that it's only one-directional. Given a coupon you can easily find who is qualified for it, but given a user you cannot find all coupons for which they are eligible except by going through all of the coupons. My guess is that the second is what you'd probably be most interested in unfortunately. Maybe this will give you some other ideas though.
For example, you could potentially have it work the other way by having a set number of criteria in a table and for the coupon/criteria linking table indicate whether or not that criteria is active. When querying you could then include that in your query. In other words, the query would look something like:
WHERE
(CC.is_active = 0 OR <code from the code column>) AND
The querying gets very complex though since you either need to join once for every possible criteria or you need to query to compare the number of active requirements for a coupon versus the number that are fulfilled. That is possible in SQL, but it's similar to working with an EAV model - which is basically what this turns into: a variation on an EAV model (yuck)

SQL Fulltext: What items have not been indexed?

I have a Fulltext index on one of my tables which contains some metadata and a document blob (PDF or Doc or RTF etc)
Sometimes there is an error indexing a row and therefore the row cannot be returned in Fulltext searches.
What query could I use to find out what items have NOT been indexed?
I thought something like this:
Select * from MyTable where MyTableID NOT IN
(
select MyTableID from MyTable
where contains(Title, Title)
)
And then work out which rows were not returned. But the inner query is not syntactically correct and I cant work it out.
Any ideas?
Cheers
Aaron
Bad news and good news:
Bad news - There is no way to find out what items have not been indexed just by using a simple query.
Good News - You can add a datetime on your fulltext table and store the insert date for each record on it. Then, you can create a Log table that will contains the last date that a population was executed. Using this table you can find out wich records were not indexed since last index population.
I dont know if I made myself clear. I just did what i said today. I created a job that will start a population, and another job that will check if the population is done and populate the log table with the last index population date.

What is the preferred way of saving dynamic lists in database?

In our application user can create different lists (like sharepoint) for example a user can create a list of cars (name, model, brand) and a list of students (name, dob, address, nationality), e.t.c.
Our application should be able to query on different columns of the list so we can't just serialize each row and save it in one row.
Should I create a new table at runtime for each newly created list? If this was the best solution then probably Microsoft SharePoint would have done it as well I suppose?
Should I use the following schema
Lists (Id, Name)
ListColumns (Id, ListId, Name)
ListRows (Id, ListId)
ListData(RowId, ColumnId, Value)
Though a single row will create as many rows in list data table as there are columns in the list, this just doesn't feel right.
Have you dealt with this situation? How did you handle it in database?
what you did is called EAV (Entity-Attribute-Value Model).
For a list with 3 columns and 1000 entries:
1 record in Lists
3 records in ListColumns
and 3000 Entries in ListData
This is fine. I'm not a fan of creating tables on-the-fly because it could mess up your database and you would have to "generate" your SQL queries dynamically. I would get a strange feeling when users could CREATE/DROP/ALTER Tables in my database!
Another nice feature of the EAV model is that you could merge two lists easily without droping and altering a table.
Edit:
I think you need another table called ListRows that tells you which ListData records belong together in a row!
Well I've experienced something like this before - I don't want to share the actual table schema so lets do some thought exercises using some of the suggested table structures:
Lets have a lists table containing a list of all my lists
Lets also have a columns table containing the metadata (column names)
Now we need a values table which contains the column values
We also need a rows table which contains a list of all the rows, otherwise it gets very difficult to work out how many rows there actually are
To keep things simple lets just make everything a string (VARCAHR) and have a go at coming up with some queries:
Counting all the rows in a table
SELECT COUNT(*) FROM [rows]
JOIN [lists]
ON [rows].list_id = [Lists].id
WHERE [Lists].name = 'Cars'
Hmm, not too bad, compared to:
SELECT * FROM [Cars]
Inserting a row into a table
BEGIN TRANSACTION
DECLARE #row_id INT
DECLARE #list_id INT
SELECT #list_id = id FROM [lists] WHERE name = 'Cars'
INSERT INTO [rows] (list_id) VALUES (#list_id)
SELECT #row_id = ##IDENTITY
DECLARE #column_id INT
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Make'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Rover')
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Model'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Metro')
COMMIT TRANSACTION
Um, starting to get a little bit hairy compared to:
INSERT INTO [Cars] ([Make], [Model}) VALUES ('Rover', 'Metro')
Simple queries
I'm now getting bored of constructing tediously complex SQL statements so maybe you can have a go at coming up with equivalent queries for the followng statements:
SELECT [Model] FROM [Cars] WHRE [Make] = 'Rover'
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
WHERE [Owners].Age > 50
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
JOIN [Addresses] ON [Addresses].id = [Owners].address_id
WHERE [Addresses].City = 'London'
I hope you are beginning to get the idea...
In short - I've experienced this before and I can assure you that creating a database inside a database in this way is definitely a Bad Thing.
If you need to do anything but the most basic querying on these lists (and literally I mean "Can I have all the items in this list please?"), you should try and find an alternative.
As long as each user pretty much has their own database I'll definitely recommend the CREATE TABLE approach. Even if they don't I'd still recommend that you at least consider it.
Perhaps a potential solution would be the creating of lists can involve CREATE TABLE statements for those entities/lists?
It sounds like the db structure or schema can change at runtime, or at the user's command, so perhaps something like this might help?
User wants to create a new list of an entity never seen before. Call it Computer.
User defines the attributes (screensize, CpuSpeed, AmountRAM, NumberOfCores)
System allows user to create in the UI
system generally lets them all be strings, unless can tell when all supplied values are indeed dates or numbers.
build the CREATE scripts, execute them against the DB.
insert the data that the user defined into that new table.
Properly coded, we're working with the requirements given: let users create new entities. There was no mention of scale here. Of course, this requires all input to be sanitized, queries parameterized, actions logged, etc.
The negative comment below doesn't actually give any good reasons, but creates a bit of FUD. I'd be interested in addressing any concerns with this potential solution. We haven't heard about scale, security, performance, or usage (internal LAN vs. internet).
You should absolutely not dynamically create tables when your users create lists. That isn't how databases are meant to work.
Your schema is correct, and the pluralization is, in my opinion, also correct, though I would remove the camel case and call them lists, list_columns, list_rows and list_data.
I would further improve upon your schema by skipping rows and columns tables, they serve no purpose. Simply have a row/column number attached to each cell, and keep things sparse: Don't bother holding empty cells in the database. You retain the ability to query/sort based on row/column, your queries will be (potentially very much) faster because the number of list_cells will be reduced, and you won't have to do any crazy joining to link your data back to its table.
Here is the complete schema:
create table lists (
id int primary key,
name varchar(25) not null
);
create table list_cells (
id int primary key,
list_id int not null references lists(id)
on delete cascade on update cascade,
row int not null,
col int not null,
data varchar(25) not null
);
It sounds like you might have Sharepoint already deployed in your environment.
Consider integrating your application with Sharepoint, and have it be your datastore. No need to recreate all the things you like about Sharepoint, when you could leverage it.
It'd take a bit of configuring, but you could call SP web services to CRUD your list data for you.
inserting list data into Sharepoint via web services
reading SP lists via web services
Sharepoint 2010 can also expose lists via OData, which would be simple to consume from any application.