How to get data from sql server with condition? - sql

I am working with Sql Server!
My question is: I have 15000 records in my customer table , And I want to process first 5000 records in one day, next day I process on next 5000 records on daily basis. Every day operation is perform in limited number of records, Data of customer table changes frequently. And also get number of pending records which are not processed. Please give your helpful suggestions how to do this . Thanks
Further Details:
datetime stamp using in table
Fields: [first_name] ,[middle_name] ,[last_name] ,[created] ,[created_by] ,[customer_number]

The simplest way can be by adding two column (if not exist). updated_at and processed_at. updated_at column will be updated on update of row. processed_at column will be updated when you started process that row by your daily job. Now your query will be something like.
select * from your_table where updated_at > processed_at limit 5000;

I'm going to assume you have some form of ID in your table...
So you set a start date in your procedure, and compare to that (I have used '2016-01-01'):
with CTE as
(
select t1.*, row_number() over(order by customer_id) as r_ord
from Mytable t1
)
select CTE.*
from CTE
where (mod(datediff(day, '2016-01-01', getdate()),3) = 0 and r_ord <= 5000)
or (mod(datediff(day, '2016-01-01', getdate()),3) = 1 and r_ord between 5001 and 10000)
or (mod(datediff(day, '2016-01-01', getdate()),3) = 2 and r_ord > 10000)

Related

SQL Server iterating through time series data

I am using SQL Server and wondering if it is possible to iterate through time series data until specific condition is met and based on that label my data in other table?
For example, let's say I have a table like this:
Id Date Some_kind_of_event
+--+----------+------------------
1 |2018-01-01|dsdf...
1 |2018-01-06|sdfs...
1 |2018-01-29|fsdfs...
2 |2018-05-10|sdfs...
2 |2018-05-11|fgdf...
2 |2018-05-12|asda...
3 |2018-02-15|sgsd...
3 |2018-02-16|rgw...
3 |2018-02-17|sgs...
3 |2018-02-28|sgs...
What I want to get, is to calculate for each key the difference between two adjacent events and find out if there exists difference > 10 days between these two adjacent events. In case yes, I want to stop iterating for that specific key and put label 'inactive', otherwise 'active' in my other table. After we finish with one key, we start with another.
So for example id = 1 would get label 'inactive' because there exists two dates which have difference bigger that 10 days. The final result would be like that:
Id Label
+--+----------+
1 |inactive
2 |active
3 |inactive
Any ideas how to do that? Is it possible to do it with SQL?
When working with a DBMS you need to get away from the idea of thinking iteratively. Instead you need to try and think in sets. "Instead of thinking about what you want to do to a row, think about what you want to do to a column."
If I understand correctly, is this what you're after?
CREATE TABLE SomeEvent (ID int, EventDate date, EventName varchar(10));
INSERT INTO SomeEvent
VALUES (1,'20180101','dsdf...'),
(1,'20180106','sdfs...'),
(1,'20180129','fsdfs..'),
(2,'20180510','sdfs...'),
(2,'20180511','fgdf...'),
(2,'20180512','asda...'),
(3,'20180215','sgsd...'),
(3,'20180216','rgw....'),
(3,'20180217','sgs....'),
(3,'20180228','sgs....');
GO
WITH Gaps AS(
SELECT *,
DATEDIFF(DAY,LAG(EventDate) OVER (PARTITION BY ID ORDER BY EventDate),EventDate) AS EventGap
FROM SomeEvent)
SELECT ID,
CASE WHEN MAX(EventGap) > 10 THEN 'inactive' ELSE 'active' END AS Label
FROM Gaps
GROUP BY ID
ORDER BY ID;
GO
DROP TABLE SomeEvent;
GO
This assumes you are using SQL Server 2012+, as it uses the LAG function, and SQL Server 2008 has less than 12 months of any kind of support.
Try this. Note, replace #MyTable with your actual table.
WITH Diffs AS (
SELECT
Id
,DATEDIFF(DAY,[Date],LEAD([Date],1,0) OVER (ORDER BY [Id], [Date])) Diff
FROM #MyTable)
SELECT
Id
,CASE WHEN MAX(Diff) > 10 THEN 'Inactive' ELSE 'Active' END
FROM Diffs
GROUP BY Id
Just to share another approach (without a CTE).
SELECT
ID
, CASE WHEN SUM(TotalDays) = (MAX(CNT) - 1) THEN 'Active' ELSE 'Inactive' END Label
FROM (
SELECT
ID
, EventDate
, CASE WHEN DATEDIFF(DAY, EventDate, LEAD(EventDate) OVER(PARTITION BY ID ORDER BY EventDate)) < 10 THEN 1 ELSE 0 END TotalDays
, COUNT(ID) OVER(PARTITION BY ID) CNT
FROM EventsTable
) D
GROUP BY ID
The method is counting how many records each ID has, and getting the TotalDays by date differences (in days) between the current the next date, if the difference is less than 10 days, then give me 1, else give me 0.
Then compare, if the total days equal the number of records that each ID has (minus one) would print Active, else Inactive.
This is just another approach that doesn't use CTE.

SQL Server 2008 query, time in each status

I'm wondering if anybody can help with a query I am working on. I'm trying to gather information for 'Time in each status' from my call activity table.
I need to set up 3 time ranges in days: <3 days, 4-5 days, 6+ days, returning the number of days each CallID is spending in each status.
The trouble I'm having is that I need to identify from the table below when there was a status change. This table records any activity to the call, i.e changed customer details and not just when a status has been changed.
Apologies if this is unclear, let me know if you need further details.
I'm using SQL Server 2008. Here is the table I'm using and related values:
CREATE TABLE Activity ( CallID varchar(30), Call_Date datetime, [User] varchar(30), Status varchar(10) );
INSERT INTO Activity VALUES (366,'2013/09/27 12:24:33',13,9);
INSERT INTO Activity VALUES (366,'2013/09/28 17:36:14',13,9);
INSERT INTO Activity VALUES (366,'2013/09/29 07:29:18',13,10);
INSERT INTO Activity VALUES (366,'2013/09/30 06:22:12',13,-1);
INSERT INTO Activity VALUES (367,'2013/09/27 12:13:16',9,6);
INSERT INTO Activity VALUES (367,'2013/09/27 12:25:03',9,6);
INSERT INTO Activity VALUES (367,'2013/09/29 12:25:29',9,6);
INSERT INTO Activity VALUES (367,'2013/09/30 12:45:55',9,7);
INSERT INTO Activity VALUES (367,'2013/10/01 12:46:04',9,8);
INSERT INTO Activity VALUES (367,'2013/10/02 15:12:27',9,-1);
INSERT INTO Activity VALUES (368,'2013/08/01 15:09:01',5,10);
INSERT INTO Activity VALUES (368,'2013/08/02 14:11:20',5,13);
INSERT INTO Activity VALUES (368,'2013/08/04 16:41:11',5,13);
INSERT INTO Activity VALUES (368,'2013/08/05 01:12:56',5,-1);
Desired Output 1: E.g. if CallID 35931 took 2 days to change from status 1 to status 2, 2 days would be added to the count in the <3 column
Status <3 Days 4-5 days 6+ Days
------ ------- -------- -------
1 10 3 1
2 8 1 2
3 5 3 1
I'm stuck in the first stage trying to identify the rows where there are status changes and ignoring the rest. I'm working on a subquery which selects the top date for each change of status. It's bringing back negative values. See here:
select CallID, T2.[status], Call_Date,
sum(datediff(dd, nextDate, [Call_Date]) - (datediff(wk, nextDate, [Call_Date]) * 2) -
case when datepart(wk, nextDate) = 1 then 1 else 0 end +
case when datepart(wk, [Call_Date]) = 7 then 1 else 0 end) as TotalDays
from (select *,
(select MAX( T0.[Call_Date])
from [Activity] T0
where T0.[Call_Date] > T1.[Call_Date] and
T0.CallID = T1.CallID
) as nextDate
from [Activity] T1
) T2
where T2.[status] <> '-1'
group by Call_Date, T2.[status], CallID
Thanks for your help in advance.
First of all i think that you need only the rows with the minimum date for each id and status as they would show a status change. This can be done with a CTE and using ROW_NUMBER.
Then you should join the results in a way that on the same record you would have the old status date and the new status date. On the first time you would have nulls for the first status.
;WITH CallsCTE AS
(
SELECT CallId,
Call_Date,
Status,
ROW_NUMBER() OVER(PARTITION BY CallId, Status ORDER BY Call_Date) AS rn
FROM Activity
),
StatusChangesCTE AS
(
SELECT CallID,
Call_Date,
Status
FROM CallsCTE
WHERE rn = 1
)
SELECT Sold.*,
Snew.*
FROM StatusChangesCTE Snew
LEFT JOIN StatusChangesCTE Sold
ON Snew.CallID = Sold.CallID
AND Sold.Call_Date = (SELECT MAX(Call_Date) FROM StatusChangesCTE WHERE CallID = Sold.CallID AND Call_Date < Snew.Call_Date)
I think that you can find your way using the above, as you could use DateDiff on Snew.Call_Date and Sold.Call_Date to find the time needed for a status change.
Let me know if you need any more assistance.

Why would the query show data from the wrong month?

I have a query:
;with date_cte as(
SELECT r.starburst_dept_name,r.monthly_past_date as PrevDate,x.monthly_past_date as CurrDate,r.starburst_dept_average - x.starburst_dept_average as Average
FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
) r
JOIN
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
Where month(monthly_past_date) > month(DATEADD(m,-2,monthly_past_date))
) x
ON r.starburst_dept_name = x.starburst_dept_name AND r.rowid = x.rowid+1
Where r.starburst_dept_name is NOT NULL
)
Select *
From date_cte
Order by Average DESC
So doing some testing, I have alter some columns data, to see why it gives me certain information. I don't know why when I run the query it gives my a date column that should not be there from "january" (row 4) like the picture below:
The database has more data that has the same exact date '2014-01-25 00:00:00.000', so I'm not sure why it would only get that row and compare the average?
I did before I run the query alter the column in that row and change the date? But I'm not sure if that would have something to do with it.
UPDATE:
I have added the sqlfinddle,
What I would like to get it subtract the average
from last_month - last 2 month ago.
It Was actually working until I made a change and alter the data.
I made the changes to test a certain situation, which obviously lead
to learning that there are flaws to the query.
Based on your SQL Fiddle, this eliminates joins from prior than month-2 from showing up.
SELECT
thismonth.starburst_dept_name
,lastmonth.monthtly_past_date [PrevDate]
,thismonth.monthtly_past_date [CurrDate]
,thismonth.starburst_dept_average - lastmonth.starburst_dept_average as Average
FROM dbo.cse_reports thismonth
inner join dbo.cse_reports lastmonth on
thismonth.starburst_dept_name = lastmonth.starburst_dept_name
AND month(DATEADD(MONTH,-1,thismonth.monthtly_past_date))=month(lastmonth.monthtly_past_date)
WHERE MONTH(thismonth.monthtly_past_date)=month(DATEADD(MONTH,-1,GETDATE()))
Order by thismonth.starburst_dept_average - lastmonth.starburst_dept_average DESC

Calculating current consecutive days from a table

I have what seems to be a common business request but I can't find no clear solution. I have a daily report (amongst many) that gets generated based on failed criteria and gets saved to a table. Each report has a type id tied to it to signify which report it is, and there is an import event id that signifies the day the imports came in (a date column is added for extra clarification). I've added a sqlfiddle to see the basic schema of the table (renamed for privacy issues).
http://www.sqlfiddle.com/#!3/81945/8
All reports currently generated are working fine, so nothing needs to be modified on the table. However, for one report (type 11), not only I need pull the invoices that showed up today, I also need to add one column that totals the amount of consecutive days from date of run for that invoice (including current day). The result should look like the following, based on the schema provided:
INVOICE MESSAGE EVENT_DATE CONSECUTIVE_DAYS_ON_REPORT
12345 Yes July, 30 2013 6
54355 Yes July, 30 2013 2
644644 Yes July, 30 2013 4
I only need the latest consecutive days, not any other set that may show up. I've tried to run self joins to no avail, and my last attempt is also listed as part of the sqlfiddle file, to no avail. Any suggestions or ideas? I'm quite stuck at the moment.
FYI: I am working in SQL Server 2000! I have seen a lot of neat tricks that have come out in 2005 and 2008, but I can't access them.
Your help is greatly appreciated!
Something like this? http://www.sqlfiddle.com/#!3/81945/14
SELECT
[final].*,
[last].total_rows
FROM
tblEventInfo AS [final]
INNER JOIN
(
SELECT
[first_of_last].type_id,
[first_of_last].invoice,
MAX([all_of_last].event_date) AS event_date,
COUNT(*) AS total_rows
FROM
(
SELECT
[current].type_id,
[current].invoice,
MAX([current].event_date) AS event_date
FROM
tblEventInfo AS [current]
LEFT JOIN
tblEventInfo AS [previous]
ON [previous].type_id = [current].type_id
AND [previous].invoice = [current].invoice
AND [previous].event_date = [current].event_date-1
WHERE
[current].type_id = 11
AND [previous].type_id IS NULL
GROUP BY
[current].type_id,
[current].invoice
)
AS [first_of_last]
INNER JOIN
tblEventInfo AS [all_of_last]
ON [all_of_last].type_id = [first_of_last].type_id
AND [all_of_last].invoice = [first_of_last].invoice
AND [all_of_last].event_date >= [first_of_last].event_date
GROUP BY
[first_of_last].type_id,
[first_of_last].invoice
)
AS [last]
ON [last].type_id = [final].type_id
AND [last].invoice = [final].invoice
AND [last].event_date = [final].event_date
The inner most query looks up the starting record of the last block of consecutive records.
Then that joins on to all the records in that block of consecutive records, giving the final date and the count of rows (consecutive days).
Then that joins on to the row for the last day to get the message, etc.
Make sure that in reality you have an index on (type_id, invoice, event_date).
You have multiple problems. Tackle them separately and build up.
Problems:
1) Identifying consecutive ranges: subtract the row_number from the range column and group by the result
2) No ROW_NUMBER() functions in SQL 2000: Fake it with a correlated subquery.
3) You actually want DENSE_RANK() instead of ROW_NUMBER: Make a list of unique dates first.
Solutions:
3)
SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date
2)
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.event_date > t1.event_date),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
ORDER BY invoice,grp,event_date
1)
SELECT
t3.invoice AS INVOICE,
MAX(t3.event_date) AS EVENT_DATE,
COUNT(t3.event_date) AS CONSECUTIVE_DAYS_ON_REPORT
FROM (
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.id > t1.id),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
) t3
GROUP BY t3.invoice,t3.grp
The rest of your question is a little ambiguous. If two ranges are of equal length, do you want both or just the most recent? Should the output MESSAGE be 'Yes' if any message = 'Yes' or only if the most recent message = 'Yes'?
This should give you enough of a breadcrumb though
I had a similar requirement not long ago getting a "Top 5" ranking with a consecutive number of periods in Top 5. The only solution I found was to do it in a cursor. The cursor has a date = #daybefore and inside the cursor if your data does not match quit the loop, otherwise set #daybefore = datediff(dd, -1, #daybefore).
Let me know if you want an example. There just seem to be a large number of enthusiasts, who hit downvote when they see the word "cursor" even if they don't have a better solution...
Here, try a scalar function like this:
CREATE FUNCTION ConsequtiveDays
(
#invoice bigint, #date datetime
)
RETURNS int
AS
BEGIN
DECLARE #ct int = 0, #Count_Date datetime, #Last_Date datetime
SELECT #Last_Date = #date
DECLARE counter CURSOR LOCAL FAST_FORWARD
FOR
SELECT event_date FROM tblEventInfo
WHERE invoice = #invoice
ORDER BY event_date DESC
FETCH NEXT FROM counter
INTO #Count_Date
WHILE ##FETCH_STATUS = 0 AND DATEDIFF(dd,#Last_Date,#Count_Date) < 2
BEGIN
#ct = #ct + 1
END
CLOSE counter
DEALLOCATE counter
RETURN #ct
END
GO

SQL query group by nearby timestamp

I have a table with a timestamp column. I would like to be able to group by an identifier column (e.g. cusip), sum over another column (e.g. quantity), but only for rows that are within 30 seconds of each other, i.e. not in fixed 30 second bucket intervals. Given the data:
cusip| quantity| timestamp
============|=========|=============
BE0000310194| 100| 16:20:49.000
BE0000314238| 50| 16:38:38.110
BE0000314238| 50| 16:46:21.323
BE0000314238| 50| 16:46:35.323
I would like to write a query that returns:
cusip| quantity
============|=========
BE0000310194| 100
BE0000314238| 50
BE0000314238| 100
Edit:
In addition, it would greatly simplify things if I could also get the MIN(timestamp) out of the query.
From Sean G solution, I have removed Group By on complete Table. In Fact re adjected few parts for Oracle SQL.
First after finding previous time, assign self parent id. If there a null in Previous Time, then we exclude giving it an ID.
Now based on take the nearest self parent id by avoiding nulls so that all nearest 30 seconds cusip fall under one Group.
As There is a CUSIP column, I assumed the dataset would be large market transactional data. Instead using group by on complete table, use partition by CUSIP and final Group Parent ID for better performance.
SELECT
id,
sub.parent_id,
sub.cusip,
timestamp,
quantity,
sum(sub.quantity) OVER(
PARTITION BY cusip, parent_id
) sum_quantity,
MIN(sub.timestamp) OVER(
PARTITION BY cusip, parent_id
) min_timestamp
FROM
(
SELECT
base_sub.*,
CASE
WHEN base_sub.self_parent_id IS NOT NULL THEN
base_sub.self_parent_id
ELSE
LAG(base_sub.self_parent_id) IGNORE NULLS OVER(
PARTITION BY cusip
ORDER BY
timestamp, id
)
END parent_id
FROM
(
SELECT
c.*,
CASE
WHEN nvl(abs(EXTRACT(SECOND FROM to_timestamp(previous_timestamp, 'yyyy/mm/dd hh24:mi:ss') - to_timestamp
(timestamp, 'yyyy/mm/dd hh24:mi:ss'))), 31) > 30 THEN
id
ELSE
NULL
END self_parent_id
FROM
(
SELECT
my_table.id,
my_table.cusip,
my_table.timestamp,
my_table.quantity,
LAG(my_table.timestamp) OVER(
PARTITION BY my_table.cusip
ORDER BY
my_table.timestamp, my_table.id
) previous_timestamp
FROM
my_table
) c
) base_sub
) sub
Below are the Table Rows
Input Data:
Below is the Output
RESULT
Following may be helpful to you.
Grouping of 30 second periods stating form a given time. Here it is '2012-01-01 00:00:00'. DATEDIFF counts the number of seconds between time stamp value and stating time. Then its is divided by 30 to get grouping column.
SELECT MIN(TimeColumn) AS TimeGroup, SUM(Quantity) AS TotalQuantity FROM YourTable
GROUP BY (DATEDIFF(ss, TimeColumn, '2012-01-01') / 30)
Here minimum time stamp of each group will output as TimeGroup. But you can use maximum or even grouping column value can be converted to time again for display.
Looking at the above comments, I'm assuming Chris's first scenario is the one you want (all 3 get grouped even though values 1 and 3 are not within 30 seconds of eachother, but are each within 30 seconds of value 2). Also going to assume that each row in your table has some unique ID called 'id'. You can do the following:
Create a new grouping, determining if the preceding row in your partition is more than 30 seconds behind the current row (e.g. determine if you need a new 30 second grouping, or to continue the previous). We'll call that parent_id.
Sum quantity over parent_id (plus any other aggregations)
The code could look like this
select
sub.parent_id,
sub.cusip,
min(sub.timestamp) min_timestamp,
sum(sub.quantity) quantity
from
(
select
base_sub.*,
case
when base_sub.self_parent_id is not null
then base_sub.self_parent_id
else lag(base_sub.self_parent_id) ignore nulls over (
partition by
my_table.cusip
order by
my_table.timestamp,
my_table.id
) parent_id
from
(
select
my_table.id,
my_table.cusip,
my_table.timestamp,
my_table.quantity,
lag(my_table.timestamp) over (
partition by
my_table.cusip
order by
my_table.timestamp,
my_table.id
) previous_timestamp,
case
when datediff(
second,
nvl(previous_timestamp, to_date('1900/01/01', 'yyyy/mm/dd')),
my_table.timestamp) > 30
then my_table.id
else null
end self_parent_id
from
my_table
) base_sub
) sub
group by
sub.time_group_parent_id,
sub.cusip