delete old records and keep 10 latest in sql compact

delete old records and keep 10 latest in sql compact - sql

i'm using a sql compact database(sdf) in MS SQL 2008.
in the table 'Job', each id has multiple jobs.
there is a system regularly add jobs into the table.
I would like to keep the 10 latest records for each id order by their 'datecompleted'
and delete the rest of the records
how can i construct my query? failed in using #temp table and cursor

Well it is fast approaching Christmas, so here is my gift to you, an example script that demonstrates what I believe it is that you are trying to achieve. No I don't have a big white fluffy beard ;-)
CREATE TABLE TestJobSetTable
(
ID INT IDENTITY(1,1) not null PRIMARY KEY,
JobID INT not null,
DateCompleted DATETIME not null
);
--Create some test data
DECLARE #iX INT;
SET #iX = 0
WHILE(#iX < 15)
BEGIN
INSERT INTO TestJobSetTable(JobID,DateCompleted) VALUES(1,getDate())
INSERT INTO TestJobSetTable(JobID,DateCompleted) VALUES(34,getDate())
SET #iX = #iX + 1;
WAITFOR DELAY '00:00:0:01'
END
--Create some more test data, for when there may be job groups with less than 10 records.
SET #iX = 0
WHILE(#iX < 6)
BEGIN
INSERT INTO TestJobSetTable(JobID,DateCompleted) VALUES(23,getDate())
SET #iX = #iX + 1;
WAITFOR DELAY '00:00:0:01'
END
--Review the data set
SELECT * FROM TestJobSetTable;
--Apply the deletion to the remainder of the data set.
WITH TenMostRecentCompletedJobs AS
(
SELECT ID, JobID, DateCompleted
FROM TestJobSetTable A
WHERE ID in
(
SELECT TOP 10 ID
FROM TestJobSetTable
WHERE JobID = A.JobID
ORDER BY DateCompleted DESC
)
)
--SELECT * FROM TenMostRecentCompletedJobs ORDER BY JobID,DateCompleted desc;
DELETE FROM TestJobSetTable
WHERE ID NOT IN(SELECT ID FROM TenMostRecentCompletedJobs)
--Now only data of interest remains
SELECT * FROM TestJobSetTable
DROP TABLE TestJobSetTable;

How about something like:
DELETE FROM
Job
WHERE NOT
id IN (
SELECT TOP 10 id
FROM Job
ORDER BY datecompleted)
This is assuming you're using 3.5 because nested SELECT is only available in this version or higher.
I did not read the question correctly. I suspect something more along the lines of a CTE will solve the problem, using similar logic. You want to build a query that identifies the records you want to keep, as your starting point.
Using CTE on SQL Server Compact 3.5

Related

Azure Data Warehouse Generate serial faster query

Environment is Azure DW
I have a raw table like below;
ID Start End Action date
1 10 15 Processed 25-10-2019
2 55 105 In-Progress 21-10-2019
.....
I need to expand/transform the Start and End column so that they become serial number;
SN Action date
10 Processed 25-10-2019
11 Processed 25-10-2019
12 Processed 25-10-2019
13 Processed 25-10-2019
14 Processed 25-10-2019
.....
Azure Data Warehouse doesn't support recursive CTE or Cursor. So, have tried a while loop,
create table #temp_output (SerialNumber int not null, startSerialNumber int not null, endSerialNumber int not null);
insert into #temp_output select startSerialNumber, startSerialNumber, endSerialNumber from dbo.raw
declare #rowcount int, #cnt int, #start int, #end int
set #cnt = 1
set #rowcount = (select count(*) from dbo.raw)
while #cnt <= #rowcount
begin
select top (#cnt) #start = startSerialNumber from dbo.raw
select top (#cnt) #end = endSerialNumber from dbo.raw
while #start <= #end
begin
insert #temp_output
select max(SerialNumber) + 1,
startSerialNumber,
endSerialNumber
from #temp_output group by startSerialNumber, endSerialNumber having max(SerialNumber) < endSerialNumber
set #start = #start + 1
end
set #cnt = #cnt + 1
end
select SerialNumber, startSerialNumber, endSerialNumber from #temp_output_delta order by SerialNumber
However this takes ages (6 hrs, when I cancelled the query) as the raw table has 50 million rows.
Need a better way to do this.
Updated information 31-10-2019
Distribution for the source table is hash. 500 DWu .
60 million row in source table.
Average difference between start and end 3000.
The start can be 2million as well.
No index on main table.
Column count 15
Clustered columnstore index on raw table.

Your sample is incomplete but you don't need a loop. You can join it to a tally table using BETWEEN
If you have a tally table (which is a table that simply has number from 1 to... 1 million in it)
SELECT T.TallyNumber As SN, E.Action, E.Date
FROM YourTable E
INNER JOIN TallyTable As T
ON T.TallyNumber BETWEEN E.Start AND E.End
Since you are loading this into a new table, you should use CTAS
CREATE TABLE [dbo].[NewTable]
WITH
(
DISTRIBUTION = HASH([Start])
,CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT T.TallyNumber As SN, E.Action, E.Date
FROM YourTable E
INNER JOIN TallyTable As T
ON T.TallyNumber BETWEEN E.[Start] AND E.[End];
Note there is a whole lot of design around DISTRIBUTION. You need to get this right for performance. The above statement is just an example. You should probably use a different hash.
You need to get the distribution of the two source tables as well as the distribution of the target table right for good performance.

SQL Server recent rows

I'm sure this is easy but I have googled a lot and searched.
Ok, I have a table WITHOUT dates etc with 100000000000000000 records.
I want to see the latest entries, i.e.
Select top 200 *
from table
BUT I want to see the latest entries. Is there a rowidentifier that I could use in a table?
ie
select top 200 *
from table
order by rowidentifer Desc
Thanks

Is there a row.identifier that i could use in a table ie set top 200 * from table order by row.identifer Desc
As already stated in the comment's, there is not. The best way is having an identity, timestamp or some other form of identifying the record. Here is an alternative way using EXCEPT to get what you need, but the execution plan isn't the best... Play around with it and change as needed.
--testing purposes...
DECLARE #tbl TABLE(FirstName VARCHAR(50))
DECLARE #count INT = 0
WHILE (#count <= 12000)
BEGIN
INSERT INTO #tbl(FirstName)
SELECT
'RuPaul ' + CAST(#count AS VARCHAR(5))
SET #count += 1
END
--adjust how many records you would like, example is 200
SELECT *
FROM #tbl
EXCEPT(SELECT TOP (SELECT COUNT(*) - 200 FROM #tbl) * FROM #tbl)
--faster than above
DECLARE #tblCount AS INT = (SELECT COUNT(*) FROM #tbl) - 200
SELECT *
FROM #tbl
EXCEPT(SELECT TOP (#tblCount) * FROM #tbl)
On another note, you could create another Table Variable that has an ID and other columns, then you could insert the records you would need. Then you can perform other operations against the table, for example OrderBy etc...
What you could do
ALTER TABLE TABLENAME
ADD ID INT IDENTITY
This will add another column to the table "ID" and automatically give it an ID. Then you have an identifier you can use...

Nope, in short, there is none, if you don`t have a column dedicated as one (ie. an IDENTITY, or a SEQUENCE, or something similar). If you did, then you could get an ordered result back.

How to make efficient pagination with total count

We have a web application which helps organizing biological experiments (users describe experiment and upload experiment data). In the main page, we show first 10 experiments and then below Previous Next 1 2 3 .. 30.
I bugs me how to make efficient total count and pagination. Currently:
select count(id) from experiments; // not very efficient in large datasets
but how does this scale when dealing with large datarecords > 200.000. I tried to import random experiments to table, but it still performs quite ok (0.6 s for 300.000 experiments).
The other alternative I thought about is to add addtional table statistics (column tableName, column recordsCount). So after each insert to table experiments I would increase recordsCount in statistics (this means inserting to one table and updating other, using sql transaction of course). Vice versa goes for delete statement (recordsCount--).
For pagination the most efficient way is to do where id > last_id as sql uses index of course. Is there any other better way?
In case results are to be filtered e.g. select * from experiment where name like 'name%', option with table statistics fails. We need to get total count as: select count(id) from experiment where name like 'name%'.
Application was developed using Laravel 3 in case it makes any difference.
I would like to develop pagination that always performs the same. Records count must not affect pagination nor total count of records.

Please have the query like below:
CREATE PROCEDURE [GetUsers]
(
#Inactive Bit = NULL,
#Name Nvarchar(500),
#Culture VarChar(5) = NULL,
#SortExpression VarChar(50),
#StartRowIndex Int,
#MaxRowIndex Int,
#Count INT OUTPUT
)
AS
BEGIN
SELECT ROW_NUMBER()
OVER
(
ORDER BY
CASE WHEN #SortExpression = 'Name' THEN [User].[Name] END,
CASE WHEN #SortExpression = 'Name DESC' THEN [User].[Name] END DESC
) AS RowIndex, [User].*
INTO #tmpTable
FROM [User] WITH (NOLOCK)
WHERE (#Inactive IS NULL OR [User].[Inactive] = #Inactive)
AND (#Culture IS NULL OR [User].[DefaultCulture] = #Culture)
AND [User].Name LIKE '%' + #Name + '%'
SELECT *
FROM #tmpTable WITH (NOLOCK)
WHERE #tmpTable.RowIndex > #StartRowIndex
AND #tmpTable.RowIndex < (#StartRowIndex + #MaxRowIndex + 1)
SELECT #Count = COUNT(*) FROM #tmpTable
IF OBJECT_ID('tempdb..#tmpTable') IS NOT NULL DROP TABLE #tmpTable;
END

SQL Azure doesn't support 'select into' - Is there another way?

I have a very complicated table I'd like to take a temporary backup of whilst I make some changes. Normally, I'd just do the following:
SELECT *
INTO temp_User
FROM dbo.[User] AS u
Unfortunately I'm using Azure, and it appears this isn't supported:
Msg 40510, Level 16, State 1, Line 2 Statement 'SELECT INTO' is not
supported in this version of SQL Server.
Is there a way to re-create this feature into a function, potentially? I could do this by scripting the table, creating it and then inserting data using a select statement but given how frequently I use Azure, and how many databases I need to work on in this area this is very unwieldy.

Azure requires a clustered index on all tables, therefore SELECT INTO is not supported.
You'll have to:
CREATE TABLE temp_User () --fill in table structure
INSERT INTO temp_User
SELECT *
FROM dbo.[User]
To script table easily you can write your own or use one of the answers to this question:
Script CREATE Table SQL Server
Update: As Jordan B pointed out, V12 will include support for heaps (no clustered index requirement) which means SELECT INTO will work. At the moment V12 Preview is available, Microsoft of course only recommends upgrading with test databases.

The new Azure DB Update preview has this problem resolved:
The V12 preview enables you to create a table that has no clustered
index. This feature is especially helpful for its support of the T-SQL
SELECT...INTO statement which creates a table from a query result.
http://azure.microsoft.com/en-us/documentation/articles/sql-database-preview-whats-new/

Unfortunately it cant be done. Here is how I worked around it:
Open SQL Server Management Studio
Right click on the table
Select Script as ... Create Table
Edit the generated script to change the table name to what you specified in your query
Execute your query

INSERT INTO temp_User
SELECT * FROM dbo.[User]
You can try the above. It's basically a select that is applied to an insert statement
http://blog.sqlauthority.com/2011/08/10/sql-server-use-insert-into-select-instead-of-cursor/

Lets assume you have a table with Id, Column1 and Column2. Then this could be your solution
CREATE TABLE YourTableName_TMP ....
GO
SET IDENTITY_INSERT YourTableName_TMP ON
GO
INSERT INTO YourTableName_TMP
([Id] ,[Column1] ,[Column2])
SELECT [Id] ,[Column1] ,[Column2]
FROM
(
SELECT *
FROM
(
SELECT [Id] ,[Column1] ,[Column2] ROW_NUMBER() OVER(ORDER BY ID DESC) AS RowNum
FROM YourTableName
)
WHERE RowNum BETWEEN 0 AND 500000
)
GO
SET IDENTITY_INSERT YourTableName_TMP OFF
GO
First you create a temporary table and then you insert rows windowed. It's a mess, I know. My experiences are, that executing this using SQL Server Management Studio from a client makes approximately 200.000 rows a minute.

As wrote above - you need to rewrite your query from using select into to create table like
It is my sample. Was :
select emrID, displayName --select into
into #tTable
from emrs
declare #emrid int
declare #counter int = 1
declare #displayName nvarchar(max)
while exists (select * from #tTable)
begin
-- some business logic
select top 1 #displayName = displayname
from #tTable
group by displayname
update emrs set groupId = #counter where #displayName = displayname
delete #tTable
where #displayName = displayname
set #counter = #counter + 1
end
drop table #tTable
Modified :
CREATE TABLE #tTable ([displayName] nvarchar(max)) --create table
INSERT INTO #tTable -- insert to next select :
select displayName
from emrs
declare #emrid int
declare #counter int = 1
declare #displayName nvarchar(max)
while exists (select * from #tTable)
begin
-- some business logic
select top 1 #displayName = t.displayName
from #tTable as t
group by t.displayname
update emrs set groupId = #counter where #displayName = displayname
delete #tTable
where #displayName = displayname
set #counter = #counter + 1
end
drop table #tTable
Do not forget to drop your temp table.
Also, you can find more simple example with description here :
http://www.dnnsoftware.com/wiki/statement-select-into-is-not-supported-in-this-version-of-sql-server

Sql server while loop

I have a select query that returns about 10million rows and I then need to insert them into a new table.
I want the performance to be ok so I want to insert them into the new table in batches of 10000. To give an example i created a simple select query below
Insert into new table
Select top 10000 * from applications
But now I need to get the next 10000 rows and insert them. Is there a way to iterate through the million rows to insert them in batches of 10000?? I'm using sql server 2008.

It will probably not be faster by batching it up. Probably the opposite. One statement is the fastest version most of the time. It might just require high amounts of temp space and log. But the fastest measured with the wall-clock.
Reason for that is that SQL Server automatically build a good plan that efficiently batches up all work at once.
To answer your question: The statement as you wrote it produces undefined rows because a table has no order. You should probably add a clustering key like an ID column. That way you can go along the table with a while loop, each time executing the following:
INSERT ...
SELECT TOP 10000 *
FROM T
WHERE ID > #lastMaxID
ORDER BY ID
Note, that the ORDER BY is required for correctness.

I wouldn't batch 10 million records.
If you are batching an insert, use an indexed field to define your batches.
DECLARE #intFlag INT
SET #intFlag = 1
WHILE (#intFlag <=10000000)
BEGIN
INSERT INTO yourTable
SELECT *
FROM applications
WHERE ID BETWEEN #intFlag AND #IntFlag + 9999
SET #intFlag = #intFlag + 10000
END
GO

Use CTE or While loop to insert like batches
;WITH q (n) AS (
SELECT 1
UNION ALL
SELECT n + 1
FROM q
WHERE n < 10000
)
INSERT INTO table1
SELECT * FROM q
OR
DECLARE #batch INT,
#rowcounter INT,
#maxrowcount INT
SET #batch = 10000
SET #rowcounter = 1
SELECT #maxrowcount = max(id) FROM table1
WHILE #rowcounter <= #maxrowcount
BEGIN
INSERT INTO table2 (col1)
SELECT col1
FROM table1
WHERE 1 = 1
AND id between #rowcounter and (#rowcounter + #batch)
-- Set the #rowcounter to the next batch start
SET #rowcounter = #rowcounter + #batch + 1;
END

As an option you can export query to a flat file by bcp and BULK IMPORT it into a table.
BULK IMPORT statement has BATCHSIZE option to limit number of rows.
In your case BATCHSIZE =10000 will work.
There is another option to create SSIS package. Select fast load in OLE DB destination and define 10000 number of rows in “Rows per batch:”. It is probably the easiest solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

delete old records and keep 10 latest in sql compact - sql

Related

Azure Data Warehouse Generate serial faster query

SQL Server recent rows

How to make efficient pagination with total count

SQL Azure doesn't support 'select into' - Is there another way?

Sql server while loop

Categories

Resources