Maintaining a metadata table in SQL - sql

Can someone help giving me some direction to tackle a scenario like this.
A User table which contains all the user information, UserID is the primary key on User Table. I have another table called for example Comments, which holds all the comments created by any user. Comments table contains UserID as the foreign key. Now i have to rank the Users based on number of comments they added. The more comments a user added, the ranking goes up. I am trying to see what will be the best way to do this.
I would prefer to have another table, which basically contains all the attributes or statistics of a user(might have more attributes in future, right now only rank, based on comment count),rather than adding another column in User table itself.
If I create another table Called UserStats, and have UserID as the foreign Key, and have another column, called Rank, there is a possibility that everytime a user adds a comment, we might need to update the ranks. How do I write a SP that does this, Im not even sure, if this is the right way to do this.

This is not the right way to do this.
You don't want to be materializing those kinds of computed values until there is a performance problem - and you have options like Indexed Views to help you well before you get to the point of doing what you suggested.
Just create a View called UserRankings and have it look like:
SELECT c.UserId, COUNT(c.CommentId) [Ranking]
FROM Comments c
GROUP BY c.UserId
Not sure how you want to do your rankings, but you can also look at the RANK() and DENSE_RANK() functions in T-SQL: Ranking Functions (Transact-SQL)

You could do this from a query
SELECT UserID,
COUNT(UserID) CntOfUserID
FROM UserComments
GROUP BY UserID
ORDER BY COUNT(UserID) DESC
You could also do this using a ROW_NUMBER
DECLARE #Comments TABLE(
UserID INT,
Comment VARCHAR(MAX)
)
INSERT INTO #Comments SELECT 3, 'Foo'
INSERT INTO #Comments SELECT 3, 'Bar'
INSERT INTO #Comments SELECT 3, 'Tada'
INSERT INTO #Comments SELECT 2, 'T'
INSERT INTO #Comments SELECT 2, 'G'
SELECT UserID,
ROW_NUMBER() OVER (ORDER BY COUNT(UserID) DESC) ID
FROM #Comments
GROUP BY UserID

Storing that kind of information is actually a bad idea. The count of comments per user is something that can be calculated at any given time quickly and easily. And if your columns are properly indexed (on the foreign key,) the count operation ought to happen very quickly.
The only reason you might want to persist metadata is if the load on your database is fast and furious and you simply cannot afford to run select queries with counts per request. And that load will also inform whether you simply add a column to your user table or create a whole separate table. (The latter solution being the one for the most extreme server loads.)

A few comments:
Yes, I think you should keep the "score" metadata somewhere, otherwise, you'd have to run the scoring calc each time, which could ultimately get expensive.
Second, I don't think you should calculate an actual "rank" (vs other users). Just calculate a "score" (based on the number of comments posted), then your query can determine "rank" by retrieving scores in descending order.
Third, I would probably make a trigger that updates the "score" in the metadata table, based on each insert into the comments table.

Related

Create a row which does not violate a unique index

I have a table with Id, AccountId, PeriodId, and Comment. I have a unique index for (AccountId, PeriodId). I want to create such a row which does not violate the unique index. AccountId and PeriodId cannot be null, they are foreign keys.
My only idea is to cross join the Account and Period table to get all valid combination and discard the already existing combinations and chose one from the rest?
Is it a good approach or is there a better way?
Update 1: Some of you wanted to know the reason. I need this functionality in a program which adds a new row to the current table and only allows the user to modify the row after it is already in the db. This program now uses a default constructor to create a new row with AccountId1 and PeriodId1, empty comment. If this combination is available then after insertion the user can change it and provide a meaningful comment (there can be at most one comment for each (AccountId, PeriodId). But if the combination is not available then the original insertion will fail. So my task is to give a combination to the program which is not used, so it can be safely inserted.
As it turns out my original idea is quick enough. This query returns an unused (AccountId, PeriodId).
select top 1 *
from
(
select Account.Id as AccountId, [Period].Id as PeriodId
from Account cross join [Period]
except
select AccountId, PeriodId
from AccountPeriodFilename
) as T

Add column to SQL Table based on other rows in the table

I have a table that contains stop times for a transit system. The details aren't important, but my table essentially looks like this:
I am importing the data from a CSV file which contains everything except the next stop ID. I want to generate Next Stop ID to speed up some data processing I am going to do in my app.
For each row, the Next Stop ID should be the Stop ID from the next row with matching Trip ID and Service ID. The ordering should be based on the Stop Sequence, which will be increasing but not necessarily in order (1, 20, 21, 23, etc rather than 1,2,3,4...).
Here is an example of what I'm hoping it will look like. For simplicity, I kept all the service IDs the same and there are two Trip IDs. If there is no next stop I want that entry to just be blank.
I think it makes sense to do this entirely in SQL, but I'm not sure how best to do it. I know how I would do it in a standard programming language, but not SQL. Thank you for your help.
You can use lead():
select
t.*,
lead(stop_id)
over(partition by trip_id, service_id order by stop_sequence) next_stop_id
from mytable t
It is not ncessarily an good idea to actally store that derived information, since you can compute on the fly when needed (you can put the query in a view to make it easier to access it). But if you want this in an update, then, assuming that stop_id is the primary key of the table, that would look like:
update mytable
set next_stop_id = t.next_stop_id
from (
select
stop_id,
lead(stop_id) over(partition by trip_id, service_id order by stop_id) next_stop_id
from mytable
) t
where mytable.stop_id = t.stop_id

SQL or statement vs multiple select queries

I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?
SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.
A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)
Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)
Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)

SQL Server Full Text Search: One to many relationships

I am trying to retrieve data from tickets that meet search matches. The relevant bits of data here are that a ticket has a name, and any number of comments.
Currently I'm matching a search against the ticket name like so:
JOIN freetexttable(Tickets,TIC_Name,'Test ') s1
ON TIC_PK = s1.[key]
Where the [key] from the full text catalog is equal to TIC_PK.
This works well for me, and gives me access to s1.rank, which is important for me to sort by.
Now my problem is that this method wont work for ticket searching, because the key in the comment catalog is the comment PK, an doesn't give me any information I can use to link to the ticket.
I'm very perplexed about how to go about searching multiple descriptions and still getting a meaningful rank.
I'm pretty knew to full-text search and might be missing something obvious.
Heres my current attempt at getting what I need:
WHERE TIC_PK IN(
SELECT DES_TIC_FK FROM freetexttable(TicketDescriptions, DES_Description,'Test Query') as t
join TicketDescriptions a on t.[key] = a.DES_PK
GROUP BY DES_TIC_FK
)
This gets me tickets with comments that match the search, but I dont think it's possible to sort by the rank data freetexttable returns with this method.
To search the name and comments at the same time and get the most meaningful rank you should put all of this info into the same table -- a new table -- populated from your existing tables via an ETL process.
The new table could look something like this:
CREATE TABLE TicketsAndDescriptionsETL (
TIC_PK int,
TIC_Name varchar(100),
All_DES_Descriptions varchar(max),
PRIMARY KEY (TIC_PK)
)
GO
CREATE FULLTEXT INDEX ON TicketsAndDescriptionsETL (
TIC_Name LANGUAGE 'English',
All_DES_Descriptions LANGUAGE 'English'
)
Schedule this table to be populated either via a SQL job, triggers on the Tickets and TicketDescriptions tables, or some hook in your data layer. For tickets that have multiple TicketDescriptions records, combine the text of all of those comments into the All_DES_Descriptions column.
Then run your full text searches against this new table.
While this approach does add another cog to the machine, there's really no other way to perform full text searches across multiple tables and generate one rank.

Best way to implement SQL Many-to-Many joins

I have three tables, and their relevant columns are:
tPerson
-> PersonID
tPersonStatusHistory
-> PersonStatusHistoryID
-> PersonID
-> StatusID
-> PersonStatusDate
Status
-> StatusID
I want to store a full history of all the Statuses that a Person has ever had. But I also want easy access to the current status.
Query to get the current status:
SELECT TOP 1 StatusID FROM tPersonStatusHistory
WHERE PersonID = ? ORDER BY PersonStatusDate DESC
What I want is a query that will fetch me a list of Person records, with their most recent StatusID as a column in the query.
We have tried the following approaches:
Including the above query as a sub-query in the select.
Adding a CurrentPersonStatusHistoryID column to the tPerson table and maintaining it using a computed column that calls a User-Defined-Function.
Maintaining the CurrentPersonStatusHistoryID column using a trigger on the tPersonStatusHistory table.
The query to pull up the Person records is quite high use, so I don't want to have to look up the History table each time. The trigger approach is closest to what I want, since the data is persisted in the Person table and is only changed when an update is made (which is by comparison not very often).
I find triggers difficult to maintain and I would prefer to stay away from them. I've also found that when doing an Insert-Select, or an Update query involving multiple records, the trigger is only called on the first record and not the others.
What I really want is to put some logic into the column definition of CurrentPersonStatusHistoryID, press Save and have it persisted and updated behind the scenes without my intervention.
Given that Many-to-Many relationships are common I was wondering if anyone else had come across a similar situation and had some insight into the highest performance, and preferably least hassle, way of implementing this.
Another approach is to use something like the following query, perhaps as a view. It will give you the most recent StatusID for each Person.
SELECT PersonID, StatusID
FROM (
SELECT PersonID, StatusID,
rank() OVER(PARTITION BY PersonID ORDER BY PersonStatusDate DESC) as rnk
FROM tPersonStatusHistory
) A
WHERE rnk = 1
I'm not sure that this satisfies your requirement for performance, but it's something you could look into.