How to delete row in the big database after store new database (SQL Server) - sql-server-2012

I have a SQL Server database which store a row per second, after 1 month, 2 month while database is very big over 1 million rows. I want to know how to backup or delete old rows after database is automatically updated to new database.

I use a kind of "generic code" I build some years ago for the Archives and it works quite well. I have adapted for you a little bit.
It assumes you have a field called DateRecord with the date that the record was added or modified last. Also, you have ProdDB and ArchiveDB and one table with same structure in ProdDB than in ArchiveDB except the ID in the ArchiveDB cannot be IDENTITY but can have a UNIQUE index:
-- Deletes from the Archive if older than one year
DELETE FROM ArchiveDB.dbo.YourTable
WHERE DateDiff(d, DateRecord, GetDate()) > 365
-- Insert into the ArchiveDB table if older than 30 days (set your request here)
INSERT INTO ArchiveDB.dbo.YourTable
SELECT * FROM ProdDB.dbo.YourTable SRC
WHERE DateRecord < DATEADD(d, -30, GETDATE())
AND NOT EXISTS (
SELECT 1 FROM ArchiveDB.dbo.YourTable DST
WHERE SRC.[ID] = DST.[ID] )
-- Delete from ProdDB if older than 30 days and has been moved to the ArchiveDB
DELETE FROM ProdDB.dbo.YourTable
WHERE DateRecord < DATEADD(d, -30, GETDATE())
AND EXISTS (
SELECT 1 FROM ArchiveDB.dbo.YourTable DST
WHERE [ID] = DST.[ID] )

I is consult #Angel M.'s code and i have result that operated. Thanks Angel
BEGIN
DECLARE #Ngayhientai Datetime = GETDATE();
DECLARE #ngayquakhu Datetime;
DECLARE #datrungay Datetime;
DECLARE #Daylakieuso int;
DECLARE #lastyear int;
SELECT TOP(1) #ngayquakhu = daylabientable.DateTime
FROM [DatabaseMobus].[dbo].TableModbus daylabientable;
SELECT #Daylakieuso = DATEDIFF(d,#ngayquakhu,#Ngayhientai)
FROM [DatabaseMobus].[dbo].[TableModbus];
IF #Daylakieuso>1
INSERT INTO [DatabaseMobus].[dbo].[SaoluuTable]([STT],[RegisterModbus],[DateTime])
SELECT [STT],[RegisterModbus],[DateTime]
FROM [DatabaseMobus].[dbo].[TableModbus] nguondulieu
WHERE NOT EXISTS(SELECT 1 FROM [DatabaseMobus].[dbo].[SaoluuTable] laylinkgoc
WHERE laylinkgoc.DateTime = nguondulieu.DateTime);
SELECT TOP(1) #datrungay = honmotthang.DateTime
FROM [DatabaseMobus].[dbo].[SaoluuTable] honmotthang;
SELECT #lastyear = DATEDIFF(d,#datrungay,#Ngayhientai)
FROM [DatabaseMobus].[dbo].[TableModbus];
IF #lastyear=365
INSERT INTO [DatabaseMobus].[dbo].[Dulieuhon1nam]([STT],[RegisterModbus],[DateTime])
SELECT [STT],[RegisterModbus],[DateTime]
FROM [DatabaseMobus].[dbo].[SaoluuTable] thoidinhe
WHERE NOT EXISTS(SELECT 1 FROM [DatabaseMobus].[dbo].[SaoluuTable] chisokhac
WHERE chisokhac.DateTime = thoidinhe.DateTime);
IF #lastyear>365
DELETE FROM [DatabaseMobus].[dbo].[SaoluuTable];
END

Related

Azure Data Warehouse Generate serial faster query

Environment is Azure DW
I have a raw table like below;
ID Start End Action date
1 10 15 Processed 25-10-2019
2 55 105 In-Progress 21-10-2019
.....
I need to expand/transform the Start and End column so that they become serial number;
SN Action date
10 Processed 25-10-2019
11 Processed 25-10-2019
12 Processed 25-10-2019
13 Processed 25-10-2019
14 Processed 25-10-2019
.....
Azure Data Warehouse doesn't support recursive CTE or Cursor. So, have tried a while loop,
create table #temp_output (SerialNumber int not null, startSerialNumber int not null, endSerialNumber int not null);
insert into #temp_output select startSerialNumber, startSerialNumber, endSerialNumber from dbo.raw
declare #rowcount int, #cnt int, #start int, #end int
set #cnt = 1
set #rowcount = (select count(*) from dbo.raw)
while #cnt <= #rowcount
begin
select top (#cnt) #start = startSerialNumber from dbo.raw
select top (#cnt) #end = endSerialNumber from dbo.raw
while #start <= #end
begin
insert #temp_output
select max(SerialNumber) + 1,
startSerialNumber,
endSerialNumber
from #temp_output group by startSerialNumber, endSerialNumber having max(SerialNumber) < endSerialNumber
set #start = #start + 1
end
set #cnt = #cnt + 1
end
select SerialNumber, startSerialNumber, endSerialNumber from #temp_output_delta order by SerialNumber
However this takes ages (6 hrs, when I cancelled the query) as the raw table has 50 million rows.
Need a better way to do this.
Updated information 31-10-2019
Distribution for the source table is hash. 500 DWu .
60 million row in source table.
Average difference between start and end 3000.
The start can be 2million as well.
No index on main table.
Column count 15
Clustered columnstore index on raw table.
Your sample is incomplete but you don't need a loop. You can join it to a tally table using BETWEEN
If you have a tally table (which is a table that simply has number from 1 to... 1 million in it)
SELECT T.TallyNumber As SN, E.Action, E.Date
FROM YourTable E
INNER JOIN TallyTable As T
ON T.TallyNumber BETWEEN E.Start AND E.End
Since you are loading this into a new table, you should use CTAS
CREATE TABLE [dbo].[NewTable]
WITH
(
DISTRIBUTION = HASH([Start])
,CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT T.TallyNumber As SN, E.Action, E.Date
FROM YourTable E
INNER JOIN TallyTable As T
ON T.TallyNumber BETWEEN E.[Start] AND E.[End];
Note there is a whole lot of design around DISTRIBUTION. You need to get this right for performance. The above statement is just an example. You should probably use a different hash.
You need to get the distribution of the two source tables as well as the distribution of the target table right for good performance.

How to improve while loop insert performance in sql server?

Here is my SQL Query. It's insert almost 6500+ row from temp table. But its takes 15+ mins! . How can i improve this ? Thanks
ALTER proc [dbo].[Process_bill]
#userid varchar(10),
#remark nvarchar(500),
#tdate date ,
#pdate date
as
BEGIN
IF OBJECT_ID('tempdb.dbo..#temptbl_bill', 'U') IS NOT NULL
DROP TABLE #temptbl_bill;
CREATE TABLE #temptbl_bill (
RowID int IDENTITY(1, 1),
------------
)
// instert into temp table
DECLARE #NumberRecords int, #RowCounter int
DECLARE #batch INT
SET #batch = 300
SET #NumberRecords = (SELECT COUNT(*) FROM #temptbl_bill)
SET #RowCounter = 1
SET NOCOUNT ON
BEGIN TRANSACTION
WHILE #RowCounter <= #NumberRecords
BEGIN
declare #clid int
declare #hlid int
declare #holdinNo nvarchar(150)
declare #clientid nvarchar(100)
declare #clientName nvarchar(50)
declare #floor int
declare #radius nvarchar(50)
declare #bill money
declare #others money
declare #frate int
declare #due money
DECLARE #fine money
DECLARE #rebate money
IF #RowCounter > 0 AND ((#RowCounter % #batch = 0) OR (#RowCounter = #NumberRecords))
BEGIN
COMMIT TRANSACTION
PRINT CONCAT('Transaction #', CEILING(#RowCounter/ CAST(#batch AS FLOAT)), ' committed (', #RowCounter,' rows)');
BEGIN TRANSACTION
END;
// multiple select
// insert to destination table
Print 'RowCount -' +cast(#RowCounter as varchar(20)) + 'batch -' + cast(#batch as varchar(20))
SET #RowCounter = #RowCounter + 1;
END
COMMIT TRANSACTION
PRINT CONCAT('Transaction #', CEILING(#RowCounter/ CAST(#batch AS FLOAT)), ' committed (',
#RowCounter,' rows)');
SET NOCOUNT OFF
DROP TABLE #temptbl_bill
END
GO
As has been said in comments, the loop is completely unnecessary. The way to improve the performance of any loop is to remove it completely. Loops are a last resort in SQL.
As far as I can tell your insert can be written with a single statement:
INSERT tbl_bill(clid, hlid, holdingNo,ClientID, ClientName, billno, date_month, unit, others, fine, due, bill, rebate, remark, payment_date, inserted_by, inserted_date)
SELECT clid = c.id,
hlid = h.id,
h.holdinNo ,
c.cliendID,
clientName = CAST(c.clientName AS NVARCHAR(50)),
BillNo = CONCAT(h.holdinNo, MONTH(#tdate), YEAR(#tdate)),
date_month = #tDate,
unit = 0,
others = CASE WHEN h.hfloor = 0 THEN rs.frate * (h.hfloor - 1) ELSE 0 END,
fine = bs.FineRate * b.Due / 100,
due = b.Due,
bill = #bill, -- This is declared but never assigned
rebate = bs.rebate,
remark = #remark,
payment_date = #pdate,
inserted_by = #userid,
inserted_date = GETDATE()
FROM ( SELECT id, clientdID, ClientName
FROM tbl_client
WHERE status = 1
) AS c
INNER JOIN
( SELECT id, holdinNo, [floor], connect_radius
FROM tx_holding
WHERE status = 1
AND connect_radius <> '0'
AND type = 'Residential'
) AS h
ON c.id = h.clid
LEFT JOIN tbl_radius_setting AS rs
ON rs.radius= CONVERT(real,h.connect_radius)
AND rs.status = 1
AND rs.type = 'Non-Govt.'
LEFT JOIN tbl_bill_setting AS bs
ON bs.Status = 1
LEFT JOIN
( SELECT hlid,
SUM(netbill) AS Due
FROM tbl_bill AS b
WHERE date_month < #tdate
AND (b.ispay = 0 OR b.ispay IS NULL)
GROUP BY hlid
) AS b
ON b.hlid = h.id
WHERE NOT EXISTS
( SELECT 1
FROM tbl_bill AS tb
WHERE EOMONTH(#tdate) = EOMONTH(date_month)
AND tb.holdingNo = h.holdinNo
AND (tb.update_by IS NOT NULL OR tb.ispay=1)
);
Please take this with a pinch of salt, it was quite hard work trying to piece together the logic, so it may need some minor tweaks and corrections
As well as adapting this to work as a single statement, I have made a number of modifications to your existing code:
Swapped NOT IN for NOT EXISTS to avoid any issues with null records. If holdingNo is nullable, they are equivalent, if holdingNo is nullable, NOT EXISTS is safer - Not Exists Vs Not IN
The join syntax you are using was replaced 27 years ago, so I switched from ANSI-89 join syntax to ANSI-92. - Bad habits to kick : using old-style JOINs
Changed predicates of YEAR(date_month) = YEAR(#tDate) AND MONTH(date_month) = MONTH(#tDate) to become EOMONTH(#tdate) = EOMONTH(date_month). These are syntactically the same, but EOMONTH is Sargable, whereas MONTH and YEAR are not.
Then a few further links/suggestions that are directly related to changes I have made
Although I removed the while lopp, don't fall into the trap of thinking this is better than a cursor. A properly declared cursor will out perform a while loop like yours - Bad Habits to Kick : Thinking a WHILE loop isn't a CURSOR
The general consensus is that prefixing object names is not a good idea. It should either be obvious from the context if an object is a table/view or function/procedure, or it should be irrelevant - i.e. There is no need to distinguish between a table or a view, and in fact, we may wish to change from one to the other, so having the prefix makes things worse, not better.
The average ratio of time spent reading code to time spent writing code is around 10:1 - It is therefore worth the effort to format your code when you are writing it so that it is easy to read. This is hugely subjective with SQL, and I would not recommend any particular conventions, but I cannot believe for a second you find your original code free flowing and easy to read. It took me about 10 minutes just unravel the first insert statement.
EDIT
The above is not correct, EOMONTH() is not sargable, so does not perform any better than YEAR(x) = YEAR(y) AND MONTH(x) = MONTH(y), although it is still a bit simpler. If you want a truly sargable predicate you will need to create a start and end date using #tdate, so you can use:
DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101')
to get the first day of the month for #tdate, then almost the same forumla, but add months to 1st February 1900 rather than 1st January to get the start of the next month:
DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201')
So the following:
DECLARE #Tdate DATE = '2019-10-11';
SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101'),
DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201');
Will return 1st October and 1st November respectively. Putting this back in your original query would give:
WHERE NOT EXISTS
( SELECT 1
FROM tbl_bill AS tb
WHERE date_month >= DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101'),
AND date_month < DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201')
AND tb.holdingNo = h.holdinNo
AND (tb.update_by IS NOT NULL OR tb.ispay=1)
);

SQL query with start and end dates - what is the best option?

I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x

selecting max date in range, excluding multiple other date ranges

my first time posting.
I have a tricky task of finding the latest date within a range, but excluding multiple other date ranges. I have code that does work, but it seems awfully taxing.
I am selecting the MAX(Date) within a range. However, I have a table, bfShow, where each show has its own date-range (stored as DateStart and DateEnd). So I need the MAX(Date) within the range which does NOT have a show on that date (there may be 0 to 99 shows overlapping my date-range).
Note: I have dbo.fnSeqDates which works great (found via Google) and returns all dates within a range - makes for very fast filling in 6/1/12, 6/2/12, 6/3/12...6/30/12, etc.
What I'm doing (below) is creating a table with all the dates (within range) in it, then find all the Shows within that range (#ShowIDs) and iterate through those shows, one at a time, deleting all those dates (from #DateRange). Ultimately, #DateRange is left with only "empty" dates. Thus, the MAX(Date) remaining in #DateRange is my last date in the month without a show.
Again, my code below does work, but there's got to be a better way. Thoughts?
Thank you,
Todd
CREATE procedure spLastEmptyDate
#DateStart date
, #DateEnd date
as
begin
-- VARS...
declare #ShowID int
declare #EmptyDate date
-- TEMP TABLE...
create table #DateRange(dDate date)
create table #ShowIDs(ShowID int)
-- LOAD ALL DATES IN RANGE (THIS MONTH-ISH)...
insert into #DateRange(dDate)
select SeqDate
from dbo.fnSeqDates(#DateStart, #DateEnd)
-- LOAD ALL SHOW IDs IN RANGE (THIS MONTH-IS)...
insert into #ShowIDs(ShowID)
select s.ShowID
from bfShow s
where s.DateStart = #DateStart
-- PRIME SHOW ID...
set #ShowID = 0
select #ShowID = min(ShowID)
from #ShowIDs
-- RUN THRU ALL, REMOVING DATES AS WE GO...
while (#ShowID > 0)
begin
-- REMOVE FROM TEMP...
delete DR
from #DateRange DR
, bfShow s
where DR.dDate between s.DateStart and s.DateEnd
and s.ShowID = #ShowID
-- DROP THAT ONE FROM TEMP...
delete from #ShowIDs
where ShowID = #ShowID
-- GET NEXT ID...
set #ShowID = 0
select #ShowID = min(ShowID)
from #ShowIDs
end
-- GET LAST EMPTY SPOT...
select #EmptyDate = max(dDate)
from #DateRange
-- CLEAN UP...
drop table #DateRange
drop table #ShowIDs
-- RETURN DATA...
select #EmptyDate as LastEmptyDateInRange
end
Let us know what version of SQL Server you're on because that will help determine your options, but you should be able to use the BETWEEN operator in a JOIN between the fnSeqDates function (it's a table-valued function, so you can join to it directly rather than inserting them into a temp table) and the bfShow tables:
SELECT TOP 1 tDate.SeqDate
FROM dbo.fnSeqDates('6/1/2012', '6/30/2012') tDate
LEFT JOIN bfShow tShow
ON tDate.SeqDate BETWEEN tShow.DateStart AND tShow.DateEnd
WHERE tShow.ShowID IS NULL -- no matches found
ORDER BY tDate.SeqDate DESC -- to pull the most recent date
Okay, I thought I'd re-phrase the question, and try to expose some edge cases. I'm not using your function at all. If this isn't right, can you give an example where it fails?
create table bfShow (
DateStart date,
DateEnd date
)
go
CREATE procedure spLastEmptyDate
#DateStart date
, #DateEnd date
as
--Return #DateEnd, or, if that is within a show, find the contiguous
--region of shows covering it, and select the day before that
;with ShowsCovering as (
select DateStart,DateEnd from bfShow where DateStart <= #DateEnd and DateEnd >= #DateEnd
union all
select s1.DateStart,s2.DateEnd
from
bfShow s1
inner join
ShowsCovering s2
on
s1.DateStart < s2.DateStart and
(
--This join would be helped by an indexed computed column on bfShow, either Start-1 or End+1
s1.DateEnd >= s2.DateStart or
s1.DateEnd = DATEADD(day,-1,s2.DateStart)
)
where
s2.DateStart > #DateStart
), Earliest as (
select MIN(DateStart) as MinDate from ShowsCovering
)
--1) If there are no rows, the answer is #DateEnd
--2) If there are rows, and the MIN(DateStart) = #DateStart, then no day exists
--3) Otherwise, the answer is MIN(DateStart)-1
, Answer as (
select #DateEnd as Result where exists(select * from Earliest where MinDate is null)
union all
select DATEADD(day,-1,MinDate) from Earliest where MinDate > #DateStart
)
select Result from Answer
go
insert into bfShow(DateStart,DateEnd)
values ('20120601','20120612'),
('20120619','20120630')
go
exec spLastEmptyDate '20120601','20120625'
--Result = 2012-06-18
go
exec spLastEmptyDate '20120525','20120625'
--Result = 2012-06-18
go
exec spLastEmptyDate '20120601','20120705'
--Result = 2012-07-05
go
insert into bfShow(DateStart,DateEnd)
values ('20120613','20120618')
go
exec spLastEmptyDate '20120601','20120625'
--Result - no rows
By the way, in your current solution, these lines:
drop table #DateRange
drop table #ShowIDs
Are unnecessary. Temp tables created within a stored procedure are automatically dropped when the stored procedure exits. So you can avoid the little dance at the end and make the last line just select max(dDate) as LastEmptyDateInRange from #DateRange, if you want to continue using your solution.

delete old records and keep 10 latest in sql compact

i'm using a sql compact database(sdf) in MS SQL 2008.
in the table 'Job', each id has multiple jobs.
there is a system regularly add jobs into the table.
I would like to keep the 10 latest records for each id order by their 'datecompleted'
and delete the rest of the records
how can i construct my query? failed in using #temp table and cursor
Well it is fast approaching Christmas, so here is my gift to you, an example script that demonstrates what I believe it is that you are trying to achieve. No I don't have a big white fluffy beard ;-)
CREATE TABLE TestJobSetTable
(
ID INT IDENTITY(1,1) not null PRIMARY KEY,
JobID INT not null,
DateCompleted DATETIME not null
);
--Create some test data
DECLARE #iX INT;
SET #iX = 0
WHILE(#iX < 15)
BEGIN
INSERT INTO TestJobSetTable(JobID,DateCompleted) VALUES(1,getDate())
INSERT INTO TestJobSetTable(JobID,DateCompleted) VALUES(34,getDate())
SET #iX = #iX + 1;
WAITFOR DELAY '00:00:0:01'
END
--Create some more test data, for when there may be job groups with less than 10 records.
SET #iX = 0
WHILE(#iX < 6)
BEGIN
INSERT INTO TestJobSetTable(JobID,DateCompleted) VALUES(23,getDate())
SET #iX = #iX + 1;
WAITFOR DELAY '00:00:0:01'
END
--Review the data set
SELECT * FROM TestJobSetTable;
--Apply the deletion to the remainder of the data set.
WITH TenMostRecentCompletedJobs AS
(
SELECT ID, JobID, DateCompleted
FROM TestJobSetTable A
WHERE ID in
(
SELECT TOP 10 ID
FROM TestJobSetTable
WHERE JobID = A.JobID
ORDER BY DateCompleted DESC
)
)
--SELECT * FROM TenMostRecentCompletedJobs ORDER BY JobID,DateCompleted desc;
DELETE FROM TestJobSetTable
WHERE ID NOT IN(SELECT ID FROM TenMostRecentCompletedJobs)
--Now only data of interest remains
SELECT * FROM TestJobSetTable
DROP TABLE TestJobSetTable;
How about something like:
DELETE FROM
Job
WHERE NOT
id IN (
SELECT TOP 10 id
FROM Job
ORDER BY datecompleted)
This is assuming you're using 3.5 because nested SELECT is only available in this version or higher.
I did not read the question correctly. I suspect something more along the lines of a CTE will solve the problem, using similar logic. You want to build a query that identifies the records you want to keep, as your starting point.
Using CTE on SQL Server Compact 3.5