I have a database that I'm running large queries against that I want to simplify with a view. Though there's more of them, the tables basically look like this (pseudo code):
TABLE document (
Id int PRIMARY KEY,
/*more metadata columns*/
)
TABLE name (
Id int PRIMARY KEY,
documentId int FOREIGN KEY REFERENCES document.Id,
date DATETIME,
text varchar(MAX)
)
TABLE description (
Id int PRIMARY KEY,
documentId int FOREIGN KEY REFERENCES document.Id,
date DATETIME,
text varchar(MAX)
)
So the idea is that the 'document' table contains the basic information about a document and the Id that ties the rest of the tables to the document. All other tables are for individual attributes of the document that are updateable. Each update gets its own row with a timestamp. What I want the view to pull is one row per document with the most up to date versions of each attribute contained in the other tables (if this needs further elaboration or an example, please let me know and I will provide). What is the least convoluted way I can pull this off? Using SQL Server 2008.
A view won't increase efficiency. A view is just a macro that expands.
There is no magic in a view: but can suffer if you join onto this view because the expanded queries can get massive.
You can index a view, but these work best with Enterprise Edition unless you want to use the NOEXPAND hint all over.
That said, the query is quite easy: unless you want to index the view when you have limitations.
One approach is the CTE as per Stuart Ainsworth's approach. Another is the "Max one per group" approach I described here on dba.se. Neither of these are safe for indexed views.
You could use a CTE for each attribute inside the view to return the latest attribute values for the documentid, like so:
; WITH cName AS
(SELECT *
FROM (SELECT ID, documentID,
date, text,
ranking = ROW_NUMBER () OVER (PARTITION BY documentID ORDER BY date DESC)
FROM name) x
WHERE ranking = 1),
.... [more CTE's here]
SELECT columnlist
FROM document d JOIN cName cn ON d.id=cn.documentid
Sql server 2008 supports computed column in the index. So you could set a column - "is_latest" as 1 for the row with latest time for that document_id. Now while querying you could use the is_latest column and it would be much faster. Refer - http://msdn.microsoft.com/en-us/library/ms189292.aspx
Related
I want to store users' settings in a postgresql database.
I would like to keep full history of their settings, and also be able to query the latest settings for a given user.
I have tried storing settings in a table like this:
CREATE TABLE customer (
customer_id INTEGER PRIMARY KEY,
name VARCHAR NOT NULL
);
CREATE TABLE customer_settings (
customer_id INTEGER REFERENCES customer NOT NULL,
sequence INTEGER NOT NULL, -- start at 1 and increase, set by the application
settings JSONB NOT NULL,
PRIMARY KEY(customer_id, sequence)
);
So customer_settings is an append-only log, per customer.
Then to query latest settings I use a long query that will do a subquery to SELECT the max sequence for the given customer_id, then will select the settings for that id.
This is awkward! I wonder if there is a better way? May I use a view or a trigger to make a second table latest_customer_settings??
You can make a view. To get the settings for multiple customers in Postgres, I would recommend:
select distinct on (customer_id)
from customer_settings cs
order by customer_id, sequence desc;
And for this query, I would recommend an index on customer_settings(customer_id, sequence desc).
In addition, you can have Postgres generate the sequence -- if you can deal with one overall sequence number for all customers.
CREATE TABLE customer_settings (
customer_settings_id bigserial primary key,
customer_id INTEGER REFERENCES customer NOT NULL,
settings JSONB NOT NULL
);
Then, the application does not need to set a sequence number. You can just insert customer_id and settings into the table.
Having the application maintain this information has some short-comings. First, the application has to read from the database before it can insert anything into the table. Second, you can have race conditions if multiple threads are updating the table at the same time (in this case for the same customer).
you can use row_number() window function , It will help to you to get each customers latest settings
with cte as (select cs.*,
row_number() over(partition by c.customer_id order by sequence desc) rn
from customer c join customer_settings cs on c.customerid=cs.customerid
) select * from cte where rn=1
Assuming you just want the single latest log for a given user, and also assuming that the sequence is always increasing and unique, then actually you only need a simple query:
SELECT *
FROM customer_settings
WHERE customer_id = 123
ORDER BY sequence DESC
LIMIT 1;
If you want to invest some time into creating a better logging framework, then try looking into things like MDC (Mapped Diagnostic Context, see here). With MDC, each log statement is written out with a completely unique identifier, which also gets sent in the response header or body. Then, it becomes easy and foolproof to correlate an exception between backend and frontend or consumer.
I have a SQL Server table (as shown above) and I am ordering it in a table on my website by using this command
SELECT *
FROM [user]
ORDER BY idNum DESC;
This table (running on my website) has all the information my database holds (at least for the [user] table)
I have buttons to delete a row off the information (it gets the row number that I want to delete), as shown in this screenshot:
What I want to ask is there a way to delete a row using a row number?(Cause I get a row number off the button click I just want to delete that row)?
You could use a CTE here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY idNum DESC) rn
FROM [user]
)
Then delete using a row number, coming from the UI:
DELETE
FROM cte
WHERE rn = <some value from UI>;
But many (most?) UI frameworks, e.g. Angular, would have the ability to send the entire metadata for a user to the UI. So, you would typically be able to delete just using the idNum value coming directly from the button in the UI. As #marc_s just commented, deleting using the primary key is a safe way to do deletions.
you can use ROW_NUMBER()
DELETE FROM USER
WHERE idnum IN (SELECT idnum from (SELECT idnum,ROW_NUMBER() OVER(ORDER BY
idnum desc) AS rw FROM USER) res WHERE res.rw = #rw)
#rw is your row number
Your method of using row_number() is simply wrong. It is not thread-safe -- another user could be added into the database, throwing off all the "row numbers" that you have shown to users. Gosh, a user to load a page and keep the app open for a week, and a bunch of new users could have added or removed before the user gets around to deleting something.
Fundamentally, you have a malformed table. It doesn't have a primary key or even any unique constraints.
I would strongly advise:
create table Users (
userId int identity(1, 1) primary key,
. . .
);
This is the primary key that you should be using for deletion. You don't need to show the primary key to the user, as long as you can associate it with each row.
Primary keys are important for other reasons. In general, one uses databases because to store more than one table. The primary key is how you connect the different tables to each other (using foreign key relationships).
Can I select rows on row version?
I am querying a database table periodically for new rows.
I want to store the last row version and then read all rows from the previously stored row version.
I cannot add anything to the table, the PK is not generated sequentially, and there is no date field.
Is there any other way to get all the rows that are new since the last query?
I am creating a new table that contains all the primary keys of the rows that have been processed and will join on that table to get new rows, but I would like to know if there is a better way.
EDIT
This is the table structure:
Everything except product_id and stock_code are fields describing the product.
You can cast the rowversion to a bigint, then when you read the rows again you cast the column to bigint and compare against your previous stored value. The problem with this approach is the table scan each time you select based on the cast of the rowversion - This could be slow if your source table is large.
I haven't tried a persisted computed column of this, I'd be interested to know if it works well.
Sample code (Tested in SQL Server 2008R2):
DECLARE #TABLE TABLE
(
Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
LastChanged ROWVERSION NOT NULL
)
INSERT INTO #TABLE(Data)
VALUES('Hello'), ('World')
SELECT
Id,
Data,
LastChanged,
CAST(LastChanged AS BIGINT)
FROM
#TABLE
DECLARE #Latest BIGINT = (SELECT MAX(CAST(LastChanged AS BIGINT)) FROM #TABLE)
SELECT * FROM #TABLE WHERE CAST(LastChanged AS BIGINT) >= #Latest
EDIT: It seems I've misunderstood, and you don't actually have a ROWVERSION column, you just mentioned row version as a concept. In that case, SQL Server Change Data Capture would be the only thing left I could think of that fits the bill: http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx
Not sure if that fits your needs, as you'd need to be able to store the LSN of "the last time you looked" so you can query the CDC tables properly. It lends itself more to data loads than to typical queries.
Assuming you can create a temporary table, the EXCEPT command seems to be what you need:
Copy your table into a temporary table.
The next time you look, select everything from your table EXCEPT everything from the temporary table, extract the keys you need from this
Make sure your temporary table is up to date again.
Note that your temporary table only needs to contain the keys you need. If this is just one column, you can go for a NOT IN rather than EXCEPT.
I am developing an application that is required to store previous versions of database table rows to maintain a history of changes. I am recording the history in the same table but need the most current data to be accessible by a unique identifier that doesn't change with new versions. I have a few ideas on how this could be done and was just looking for some ideas on the best way of doing this or whether there is any reason not to use one of my ideas:
Create a new row for each row version, with a field to indicate which row was the current row. The drawback of this is that the new version has a different primary key and any references to the old version will not return the current version.
When data is updated, the old row version is duplicated to a new row, and the new version replaces the old row. The current row can be accessed by the same primary key.
Add a second table with only a primary key, add a column to the other table which is foreign key to new table's primary key. Use same method as described in option 1 for storing multiple versions and create a view which finds the current version by using the new table's primary key.
PeopleSoft uses (used?) "effective dated records". It took a little while to get the hang of it, but it served its purpose. The business key is always extended by an EFFDT column (effective date). So if you had a table EMPLOYEE[EMPLOYEE_ID, SALARY] it would become EMPLOYEE[EMPLOYEE_ID, EFFDT, SALARY].
To retrieve the employee's salary:
SELECT e.salary
FROM employee e
WHERE employee_id = :x
AND effdt = (SELECT MAX(effdt)
FROM employee
WHERE employee_id = :x
AND effdt <= SYSDATE)
An interesting application was future dating records: you could give every employee a 10% increase effective Jan 1 next year, and pre-poulate the table a few months beforehand. When SYSDATE crosses Jan 1, the new salary would come into effect. Also, it was good for running historical reports. Instead of using SYSDATE, you plug in a date from the past in order to see the salaries (or exchange rates or whatever) as they would have been reported if run at that time in the past.
In this case, records are never updated or deleted, you just keep adding records with new effective dates. Makes for more verbose queries, but it works and starts becoming (dare I say) normal. There are lots of pages on this, for example: http://peoplesoft.wikidot.com/effective-dates-sequence-status
#3 is probably best, but if you wanted to keep the data in one table, I suppose you could add a datetime column that has a now() value populated for each new row and then you could at least sort by date desc limit 1.
Overall though - multiple versions needs more info on what you want to do effectively as much as programatically...ie need more info on what you want to do.
R
Have you considered using AutoAudit?
AutoAudit is a SQL Server (2005, 2008) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table
Delete logs all final values to the Audit tbale
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table
For me, history tables are always separate. So, definitely I would go with that, but why create some complex versioning thing where you need to look at the current production record. In reporting, this results in nasty unions that are really unnecessary.
Table has a primary key and who cares what else.
TableHist has these columns: incrementing int/bigint primary key, history written date/time, history written by, record type (I, U, D for insert, update, delete), the PK from Table as an FK on TableHist, the remaining columns all other columns with the same name are in the TableHist table.
If you create this history table structure and populate it via triggers on Table, you will have all versions of every row in the tables you care about and can easily determine the original record, every change, and the deletion records as well. AND if you are reporting, you only need to use your historical tables to get all of the information you'd like.
create table table1 (
Id int identity(1,1) primary key,
[Key] varchar(max),
Data varchar(max)
)
go
create view view1 as
with q as (
select [Key], Data, row_number() over (partition by [Key] order by Id desc) as 'r'
from table1
)
select [Key], Data from q where r=1
go
create trigger trigger1 on view1 instead of update, insert as begin
insert into table1
select [Key], Data
from (select distinct [Key], Data from inserted) a
end
go
insert into view1 values
('key1', 'foo')
,('key1', 'bar')
select * from view1
update view1
set Data='updated'
where [Key]='key1'
select * from view1
select * from table1
drop trigger trigger1
drop table table1
drop view view1
Results:
Key Data
key1 foo
Key Data
key1 updated
Id Key Data
1 key1 bar
2 key1 foo
3 key1 updated
I'm not sure if the disctinct is needed.
Can someone help giving me some direction to tackle a scenario like this.
A User table which contains all the user information, UserID is the primary key on User Table. I have another table called for example Comments, which holds all the comments created by any user. Comments table contains UserID as the foreign key. Now i have to rank the Users based on number of comments they added. The more comments a user added, the ranking goes up. I am trying to see what will be the best way to do this.
I would prefer to have another table, which basically contains all the attributes or statistics of a user(might have more attributes in future, right now only rank, based on comment count),rather than adding another column in User table itself.
If I create another table Called UserStats, and have UserID as the foreign Key, and have another column, called Rank, there is a possibility that everytime a user adds a comment, we might need to update the ranks. How do I write a SP that does this, Im not even sure, if this is the right way to do this.
This is not the right way to do this.
You don't want to be materializing those kinds of computed values until there is a performance problem - and you have options like Indexed Views to help you well before you get to the point of doing what you suggested.
Just create a View called UserRankings and have it look like:
SELECT c.UserId, COUNT(c.CommentId) [Ranking]
FROM Comments c
GROUP BY c.UserId
Not sure how you want to do your rankings, but you can also look at the RANK() and DENSE_RANK() functions in T-SQL: Ranking Functions (Transact-SQL)
You could do this from a query
SELECT UserID,
COUNT(UserID) CntOfUserID
FROM UserComments
GROUP BY UserID
ORDER BY COUNT(UserID) DESC
You could also do this using a ROW_NUMBER
DECLARE #Comments TABLE(
UserID INT,
Comment VARCHAR(MAX)
)
INSERT INTO #Comments SELECT 3, 'Foo'
INSERT INTO #Comments SELECT 3, 'Bar'
INSERT INTO #Comments SELECT 3, 'Tada'
INSERT INTO #Comments SELECT 2, 'T'
INSERT INTO #Comments SELECT 2, 'G'
SELECT UserID,
ROW_NUMBER() OVER (ORDER BY COUNT(UserID) DESC) ID
FROM #Comments
GROUP BY UserID
Storing that kind of information is actually a bad idea. The count of comments per user is something that can be calculated at any given time quickly and easily. And if your columns are properly indexed (on the foreign key,) the count operation ought to happen very quickly.
The only reason you might want to persist metadata is if the load on your database is fast and furious and you simply cannot afford to run select queries with counts per request. And that load will also inform whether you simply add a column to your user table or create a whole separate table. (The latter solution being the one for the most extreme server loads.)
A few comments:
Yes, I think you should keep the "score" metadata somewhere, otherwise, you'd have to run the scoring calc each time, which could ultimately get expensive.
Second, I don't think you should calculate an actual "rank" (vs other users). Just calculate a "score" (based on the number of comments posted), then your query can determine "rank" by retrieving scores in descending order.
Third, I would probably make a trigger that updates the "score" in the metadata table, based on each insert into the comments table.