Hi all wonder if someone could advise a more efficient way to select rows from a table that has roughly 60 millions records in it. Each row has a date stored as a nvarchar, for example '20110527030000.106'. I want to select all rows that are 3 months or older based on this date field, so for example i'm only interested in the first part of the date field; '20110527'. I have the following code to do that, however its a bit slow and wondering if there was a better way?
DECLARE #tempDate varchar(12)
SET #tempDate = convert(varchar(12),DATEADD(m,-3,GETDATE()),112)
SELECT *
FROM [TABLE A]
WHERE SUBSTRING([DATE_FIELD],0,8) < #tempDate
Your query not only it can't use any index on [DATE_FIELD] and does a full scan but it also applies the SUBSTRING() function to all values of the (date_field column of the) table.
Don't apply any function on the column so the index of [DATE_FIELD] can be used and the function is only applied once, at the calculation of #tempDate :
SELECT *
FROM [TABLE A]
WHERE [DATE_FIELD] < #tempDate
The < comparison works for varchar values. The following will evaluate to True:
'20110526030000.106' < '20110527'
Is there any reason that the datetime is not stored as datetime type?
If you can modify the table you could add a datetime column and then run an update to populate it with the correct data.
If you can't modify the table then you could create a new table with a datetime column, extract the keys from the table you want to query into it and enforce a foriegn key contraint across the tables. Then you can popluate the datetime column as before and then join the tables when querying.
If you can't modify anything then I guess yiou could try benchmarking your solution against a solution where you cast the varchar date into a datetime on the fly (with a user defined function for example). This may actually run faster.
Hope this helps you some..
If you can modify the database you could add a new field isolder3months and set it to 1 for each new entry.
With triggers you can update that once a day for every entry with isolder3months = 1. This way you check / update only 1/n th of your entries.
This solution is only practical if 3 months is fix and if this query is used often.
Then your query would look like
SELECT *
FROM [TABLE A]
WHERE [isolder3months] = 1
Related
I have a SQL Server table with just 3 columns, one of which is of type varbinary. The data in this column is actually a Json document which among other properties contains information about when the data was last modified. Unfortunately the SQL table itself does not contain information about when its rows were modified.
Now when doing sorting and filtering of the data I of course don't want fetch all rows in order to find e.g. the latest 100 entries.
So my question is: does SQL Server somehow remember when a row was added/modified? I have tried adding a timestamp and this is applied to all existing rows but this is applied randomly I think, because the sorting doesn't work. I don't need a datetime or anything, I just want to be able sort the records based on when they were last modified.
Thanks
For those looking to insert a tamestamp column of type DateTime into an existing DB table, you can do this like so:
ALTER TABLE TestTable
ADD DateInserted DATETIME NOT NULL DEFAULT (GETDATE());
The existing records will automatically get the value equal to the date/time of the moment when column is added.
New records will get up-to-date value upon insertion.
SQL Server will not track historically when a row was inserted or modified so you need to rely on the JSON data to figure that out yourself. You are going to need a new column to make this efficient to query. Once you have your new column you have some options:
Loop through all your records populating the new column with the relevant value from the JSON data.
If your version of SQL Server is recent enough, you can query the JSON data directly. Populate this column using a query like this:
UPDATE MyTable
SET MyNewColumn = JSON_VALUE(JsonDataColumn, '$.Customer.DateCreated')
The downside of this method is that you need to maintain this
Make SQL Server compute the value from the JSON automatically, for example:
ALTER TABLE MyTable
ADD MyNewColumn AS JSON_VALUE(JsonDataColumn, '$.Customer.DateCreated')
And, create an index to make it efficient:
CREATE INDEX IX_MyTable_MyNewColumn
ON MyTable(MyNewColumn)
Use a new column CreatedDate and store datetime every time you make an Insert.
You could use GetDate() for inserting date in the column.
A UpdatedDate column can be used for updates.
in order to find e.g. the latest 100 entries.
Timestamp is indeed what you need.
It's ever-increasing value, it's updated automatically, so you are always able to find all last modified/inserted rows.
Here is an example:
create table dbo.test1 (id int);
insert into dbo.test1 values(1), (2), (3);
alter table dbo.test1 add ts timestamp;
update dbo.test1
set id = 10
where id = 2
select top 1 *
from dbo.test1
order by ts desc;
--id ts
--10 0x000000001FCFABD2
insert into dbo.test1 (id)
values (100);
select top 1 *
from dbo.test1
order by ts desc;
--id ts
--100 0x000000001FCFABD3
As you see, you always get the last modified/inserted row.
For your purpose just use
select top 100 *
...
order by ts desc;
Thanks. Apparently I didn't look hard enough before I posted this question. The question has been asked a couple of times before and the answer is: Nope! There is no easy solution to this.
SQL Server does not keep track of when a record was created or modified, which was somehow what I was looking for. So I will go for the next best solution, which is probably to create a datetime column, retrieve the modified date from the Json document and then update the record. Or rather, the 1,4 million records:-(
This question is largely driven by curiosity, as I do have a working query (it just takes a little longer than I would like).
I have a table with 4 million rows. The only index on this table is an auto-increment BigInt ID. The query is looking for distinct values in one of the columns, but only going back 1 day. Unfortunately, the ReportDate column that is evaluated is not of the DateTime type, or even a BigInt, but is char(8) in the format of YYYYMMDD. So the query is a bit slow.
SELECT Category
FROM Reports
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Note that the date converstion in the above statement is simply converting it to a YYYYMMDD format for comparison.
I was wondering if there was a way to optimize this query based on the fact that I know that the only data I am interested in is at the "bottom" of the table. I was thinking of some sort of recursive SELECT function which gradually grew a temporary table that could be used for the final query.
For example, in psuedo-sql:
N = 128
TemporaryTable = SELECT TOP {N} *
FROM Reports
ORDER BY ID DESC
/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today)
N = N**2
Repeat Select
/* We now have a smallish table to do our query */
SELECT Category
FROM TemproaryTable
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Does that make sense? Is something like that possible?
This is on MS SQL Server 2008.
I might suggest you do not need to convert the Date that is stored as char data in YYYYMMDD format; That format is inherently sortable all by itself. I would instead convert your date to output in that format.
Also, the way you have the conversion written, it is converting the current DateTime for every individual row, so even storing that value for the whole query could speed things up... but I think just converting the date you are searching for to that format of char would help.
I would also suggest getting the index(es) you need created, of course... but that's not the question you asked :P
Why not just create the index you need?
create index idx_Reports_ReportDate
on Reports(ReportDate, Category)
No, that doesn't make sense. The only way to optimize this query is to have a covering index for it:
CREATE INDEX ndxReportDateCategory ON Reports (ReportDate, Category);
Update
Considering your comment that you cannot modify the schema, then you should modify the schema. If you still can't, then the answer still applies: the solution is to have an index.
And finally, to answer more directly your question, if you have a strong correlation between ID and ReportData: the ID you seek is the biggest one that has a ReportDate smaller than the date you're after:
SELECT MAX(Id)
FROM Reports
WHERE ReportDate < 'YYYYMMDD';
This will do a reverse scan on the ID index and stop at the first ID that is previous to your desired date (ie. will not scan the entire table). You can then filter your reports base don this found max Id.
I think you will find the discussion on SARGability, on Rob Farley's Blog to be very interesting reading in relation to your post topic.
http://blogs.lobsterpot.com.au/2010/01/22/sargable-functions-in-sql-server/
An interesting alternative approach that does not require you to modify the existing column data type would be to leverage computed columns.
alter table REPORTS
add castAsDate as CAST(ReportDate as date)
create index rf_so2 on REPORTS(castAsDate) include (ReportDate)
One of the query patterns I occasionally use to get into a log table with similiar indexing to yours is to limit by subquery:
DECLARE #ReportDate varchar(8)
SET #ReportDate = Convert(varchar(8), GetDate(), 112)
SELECT *
FROM
(
SELECT top 20000 *
FROM Reports
ORDER BY ID desc
) sub
WHERE sub.ReportDate = #ReportDate
20k/4M = 0.5% of the table is read.
Here's a loop solution. Note: might want to make ID primary key and Reportdate indexed in the temp table.
DECLARE #ReportDate varchar(8)
SET #ReportDate = Convert(varchar(8), GetDate(), 112)
DECLARE #CurrentDate varchar(8), MinKey bigint
SELECT top 2000 * INTO #MyTable
FROM Reports ORDER BY ID desc
SELECT #CurrentDate = MIN(ReportDate), #MinKey = MIN(ID)
FROM #MyTable
WHILE #ReportDate <= #CurrentDate
BEGIN
SELECT top 2000 * INTO #MyTable
FROM Reports WHERE ID < #MinKey ORDER BY ID desc
SELECT #CurrentDate = MIN(ReportDate), #MinKey = MIN(ID)
FROM #MyTable
END
SELECT * FROM #MyTable
WHERE ReportDate = #ReportDate
DROP TABLE #MyTable
I need a fast way duplicate a DATETIME column in a table and give it a new name.
I have a column named myDate in my table called myResults, I need a query to make a new column in the table called newDate which has the exact same data as the myDate column.
Is there a faster way to do this than by doing the obvious 2 step approach of make a new column, and then copying all the data (it's a large table and I'm looking for the fastest approach)?
Obvious solution:
ALTER TABLE `myResults` ADD `newDate` DATETIME;
UPDATE `myResults` SET `newDate` = `myDate`;
UPDATE `table_name` SET `new_column` = `existing_column` WHERE `id`=`id`
The obvious solution is the only solution, unfortunately.
However note that in general you shouldn't be copying a column in relational databases.
If you just need a default in there, you can either choose what the default is statically or use a function call.
ALTER TABLE `myResults` ADD `newDate` DATETIME DEFAULT '2010-01-01';
or
ALTER TABLE `myResults` ADD `newDate` DATETIME DEFAULT current_timestamp;
Why would your workload ever demand a new datetime column, that duplicates another columns data? This sounds like horrable practice? How about telling us what you're trying to achieve? You can pull a second column with the same data in a few different ways, without actually duplicating the data:
SELECT date1 AS date_old, date1 AS date_new FROM table;
-or-, you can create a view
CREATE VIEW virtual_table AS
SELECT date1 AS date_old, date1 AS date_new FROM table
;
SELECT * FROM virtual_table;
I need to create a stored procedure that upon exceution checks if any new rows have been added to a table within the past 12 hours. If not, an warning email must be sent to a recipient.
I have the procedures for sending the email, but the problem is the query itself. I imagine I'd have to make an sql command that uses current date and compares that to the dates in the rows. But I'm a complete beginner in SQL so I can't even use the right words to find anything on google.
Short version:
Using MS SQL Server 2005, how can I check against the dates, then return a result based on whether new rows were created within the last 12 hours, and use that result to decide whether or not to send email?
Something like this should do what you wish.
Select ID
from TableName
where CreatedDate >= dateadd(hour,-12,getDate())
Hope this is clear but please feel free to pose further questions.
Cheers, John
Say your date field in the table is 'CreateDate' and it's of type DateTime.
Your time to compare with is: GETDATE()
(which returns date + time)
To get the datetime value of 12 hours before that, is done using DATEADD:
DATEADD(hour, -12, GETDATE())
so if we want the # of rows added in the last 12 hours, we'll do:
SELECT COUNT(*)
FROM Table
WHERE CreateDate >= DATEADD(hour, -12, GETDATE())
in your proc, you've to store the result of this query into a variable and check if it's > 0, so:
DECLARE #amount int
SELECT #amount=COUNT(*)
FROM Table
WHERE CreateDate >= DATEADD(hour, -12, GETDATE())
and then you'll check the #amount variable if it's > 0.
You could use a trigger, this link has several examples: http://msdn.microsoft.com/en-us/library/aa258254(SQL.80).aspx
USE pubs
IF EXISTS (SELECT name FROM sysobjects
WHERE name = 'reminder' AND type = 'TR')
DROP TRIGGER reminder
GO
CREATE TRIGGER reminder
ON titles
FOR INSERT, UPDATE, DELETE
AS
EXEC master..xp_sendmail 'MaryM',
'Don''t forget to print a report for the distributors.'
GO
If you do not want something for each insert/update, you could copy data to a another table then examine that table every 12 hours, report on the rows in it, then delete them...
assuming you have on this table :
- either a unique id autoincrementing
- either a created_timestamp field containing the timestamp of creation of the row
-> have a new table
reported_rows
- report_timestamp
- last_id_seen
(OR)
- last_timestamp_seen
fill the reported row each time you send your email with the actual value
and before sending the email, check with the previous values, so you know what rows have been added
If the table has an identity field, you could also save the max value (as a bookmark) and next time check if there are any rows with an ID greater than your saved bookmark. May be faster if the key is the clustered key.
I have a table in a database that represents dates textually (i.e. "2008-11-09") and I would like to replace them with the UNIX timestamp. However, I don't think that MySQL is capable of doing the conversion on its own, so I'd like to write a little script to do the conversion. The way I can think to do it involves getting all the records in the table, iterating through them, and updating the database records. However, with no primary key, I can't easily get the exact record I need to update.
Is there a way to get MySQL to assign temporary IDs to records during a SELECT so that I refer back to them when doing UPDATEs?
Does this not do it?
UPDATE
MyTable
SET
MyTimeStamp = UNIX_TIMESTAMP(MyDateTime);
If for some reason you do have to iterate (the other answers cover the situation where you don't), I can think of two ways to do it (these aren't MySQL-specific):
Add a column to the table that's an auto-assigned number. Use that as the PK for your updates, then drop the column afterwards (or just keep it around for future use).
In a table with no defined PK, as long as there are no exact duplicate rows, you can use the entire row as a composite PK; just use every column in the row as your distinguishing characteristic. i.e., if the table has 3 columns, "name", "address", and "updated", do the following:
UPDATE mytable SET updated = [timestamp value] WHERE name = [name] AND address = [address] AND timestamp = [old timestamp]
Many data access frameworks use this exact strategy to implement optimistic concurrency.
No, you should be able to do this with a single update statement. If all of the dates are yyyy-mm-dd and they are just stored in some sort of text column instead of DATETIME, you can just move the data over. SQL would be like:
ALTER TABLE t ADD COLUMN dates DATETIME;
UPDATE t set t.dates=t.olddate;
This shouldn't be dependent on a PK because MySQL can scan through each row in the table. The only time PK's become an issue is if you need to update a single row, but the row may not be unique.
You can generate values during a SELECT using the MySQL user variables feature, but these values do not refer to the row; they're temporary parts of the result set only. You can't use them in UPDATE statements.
SET #v := 0;
SELECT #v:=#v+1, * FROM mytable;
Here's how I'd solve the problem. You're going to have to create another column for your UNIX timestamps anyway, so you can add it first. Then convert the values in the old datetime column to the UNIX timestamp and place it in the new column. Then drop the old textual datetime column.
ALTER TABLE mytable ADD COLUMN unix_timestamp INT UNSIGNED NOT NULL DEFAULT 0;
UPDATE mytable
SET unix_timestamp = UNIX_TIMESTAMP( STR_TO_DATE( text_timestamp, '%Y-%m-%d' ) );
ALTER TABLE mytable DROP COLUMN text_timestamp;
Of course you should confirm that the conversion has been done correctly before you drop the old column!
See UNIX_TIMESTAMP() and STR_TO_DATE()