How to write SQL query for the following case.? - sql

I have one Change Report Table which has two columns ChangedTime,FileName
Please consider this table has over 1000 records
Here I need to query all the changes based on following factors
i) Interval (i.e-1mins )
ii) No of files
It means when we have given Interval 1 min and No Of files 10.
If the the no of changed files more than 10 in any of the 1 minute interval, we need to get all the changed files exists in that 1 minute interval
Example:
i) Consider we have 15 changes in the interval 11:52 to 11:53
ii)And consider we have 20 changes in the interval 12:58 to 12:59
Now my expected results would be 35 records.
Thanks in advance.

You need to aggregate by the interval and then do the count. Assuming that an interval starting at time 0 is ok, the following should work:
declare #interval int = 1;
declare #limit int = 10;
select sum(cnt)
from (select count(*) as cnt
from t
group by DATEDIFF(minute, 0, ChangedTime)/#interval
) t
where cnt >= #limit;
If you have another time in mind for when intervals should start, then substitute that for 0.
EDIT:
For your particular query:
select sum(ChangedTime)
from (select count(*) as ChangedTime
from [MyDB].[dbo].[Log_Table.in_PC]
group by DATEDIFF(minute, 0, ChangedTime)/#interval
) t
where ChangedTime >= #limit;
You can't have a three part alias name on a subquery. t will do.

Something like this should work:
You count the number of records using the COUNT() function.
Then you limit the selection with the WHERE clause:
SELECT COUNT(FileName)
FROM "YourTable"
WHERE ChangedTime >= "StartInteval"
AND ChangedTime <= "EndInterval";
Another method that is useful in a where clause is BETWEEN : http://msdn.microsoft.com/en-us/library/ms187922.aspx.
You didn't state which SQL DB you are using so I assume its MSSQL.

select count(*) from (select a.FileName,
b.ChangedTime startTime,
a.ChangedTime endTime,
DATEDIFF ( minute , a.ChangedTime , b.ChangedTime ) timeInterval
from yourtable a, yourtable b
where a.FileName = b.FileName
and a.ChangedTime > b.ChangedTime
and DATEDIFF ( minute , a.ChangedTime , b.ChangedTime ) = 1) temp
group by temp.FileName

Related

Retrieve date from subtracting variable number of days from date on calendar table

In our system we have a created table that lists all the days that extends out to 20230 with a special field to specify a holiday/weekend.
SAMPLE BELOW:
DATE_FIELD HOLIDAY_FIELD
20200430 N
20200501 N
20200502 Y
20200503 Y
20200504 N
20200505 N
20200506 N
20200507 N
..............
My goal is to provide a date variable and subtract x number of days from the provided date.
The number of days is not a constant field, it can be different so FETCH and LIMIT wont work.
Ive already tried the code below and it works just as i want it if i always want to subtract 5 days from the given date:
select date_field
from table.calendar
where date_field <= '20200507' and holiday_field = 'N'
order by date_field desc
LIMIT 5,1
This will give me the result I want '20200430' because it skips the weekends.
However I want to be able to do something like below:
select date_field
from table.calendar
where date_field <= (variable date) and holiday_field = 'N'
order by date_field desc
LIMIT (variable n),1
But from what Ive read you cannot specify a variable for a fetch or limit.
Also to add this select statement will be used in a sub select.
So it most likely use as it is below:
SELECT table1.*,
( select date_field
from table.calendar
where date_field <= (table1.date) and holiday_field = 'N'
order by date_field desc
LIMIT (table1.days n),1 ) AS DATE
from table1
order by table1.date
Ive tried using row_number() but have no clue on how to pass the date and days variable.
This would start from the absolute top of the list and go down. I need it to start from a specific date.
with CALENDAR AS(
SELECT x.* FROM (
select date_field
row_number() over() as rownum
from table.calendar X
where holiday_field = 'N'
order by date_field
) AS t
)
select table1.*, A.date_field
from table1
left join CALENDAR A on A.date_field <= table1.date and A.rownum = 5
I also understand i could easily do this in a user created function but my ultimate goal is to produce a sql views to export to a 3rd party software. Their is severe performance slow down when using user functions in sql views.
Any suggestions?
The solution with global variables.
CREATE OR REPLACE VARIABLE GV_DATE_FIELD VARCHAR(8) DEFAULT (TO_CHAR(CURRENT DATE, 'YYYYMMDD'));
CREATE OR REPLACE VARIABLE GV_DAYS INT DEFAULT 5;
CREATE OR REPLACE VIEW CALENDAR_V AS
select date_field
from
(
select date_field, rownumber() over (order by date_field desc) rn
from calendar
where date_field <= GV_DATE_FIELD and holiday_field = 'N'
)
where rn = GV_DAYS;
-- GV_DATE_FIELD == TO_CHAR(CURRENT DATE, 'YYYYMMDD')
-- GV_DAYS == 5
select * from CALENDAR_V;
SET GV_DAYS = 4;
-- GV_DATE_FIELD == TO_CHAR(CURRENT DATE, 'YYYYMMDD')
-- GV_DAYS == 4
select * from CALENDAR_V;
This couple of global variables are set for every session to their default values and work as parameters.
You may set their values explicitly (with the SET statement as described) before running the statement using them (the CALENDAR_V view in this case) to get the corresponding result.

Irregular grouping of timestamp variable

I have a table organized as follows:
id lateAt
1231235 2019/09/14
1242123 2019/09/13
3465345 NULL
5676548 2019/09/28
8986475 2019/09/23
Where lateAt is a timestamp of when a certain loan's payment became late. So, for each current date - I need to look at these numbers daily - there's a certain amount of entries which are late for 0-15, 15-30, 30-45, 45-60, 60-90 and 90+ days.
This is my desired output:
lateGroup Count
0-15 20
15-30 22
30-45 25
45-60 32
60-90 47
90+ 57
This is something I can easily calculate in R, but to get the results back to my BI dashboard I'd have to create a new table in my database, which I don't think is a good practice. What is the SQL-native approach to this problem?
I would define the "late groups" using a range, the join against the number of days:
with groups (grp) as (
values
(int4range(0,15, '[)')),
(int4range(15,30, '[)')),
(int4range(30,45, '[)')),
(int4range(45,60, '[)')),
(int4range(60,90, '[)')),
(int4range(90,null, '[)'))
)
select grp, count(t.user_id)
from groups g
left join the_table t on g.grp #> current_date - t.late_at
group by grp
order by grp;
int4range(0,15, '[)') creates a range from 0 (inclusive) and 15 (exclusive)
Online example: https://rextester.com/QJSN89445
The quick and dirty way to do this in SQL is:
SELECT '0-15' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 0
AND (CURRENT_DATE - t.lateAt) < 15
UNION
SELECT '15-30' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 15
AND (CURRENT_DATE - t.lateAt) < 30
UNION
SELECT '30-45' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 30
AND (CURRENT_DATE - t.lateAt) < 45
-- Etc...
For production code, you would want to do something more like Ross' answer.
You didn't mention which DBMS you're using, but nearly all of them will have a construct known as a "value constructor" like this:
select bins.lateGroup, bins.minVal, bins.maxVal FROM
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
If your DBMS doesn't have it, then you can probably use UNION ALL:
SELECT '0-15' as lateGroup, 0 as minVal, 15 as maxVal
union all SELECT '15-30',15,30
union all SELECT '30-45',30,45
Then your complete query, with the sample data you provided, would look like this:
--- example from SQL Server 2012 SP1
--- first let's set up some sample data
create table #temp (id int, lateAt datetime);
INSERT #temp (id, lateAt) values
(1231235,'2019-09-14'),
(1242123,'2019-09-13'),
(3465345,NULL),
(5676548,'2019-09-28'),
(8986475,'2019-09-23');
--- here's the actual query
select lateGroup, count(*) as Count
from #temp as T,
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
) AS bins(lateGroup,minVal,maxVal)
where datediff(day,lateAt,getdate()) between minVal and maxVal
group by lateGroup
order by lateGroup
--- remove our sample data
drop table #temp;
Here's the output:
lateGroup Count
15-30 2
30-45 2
Note: rows with null lateAt are not counted.
I think you can do it all in one clear query :
with cte_lategroup as
(
select *
from (values(0,15,'0-15'),(15,30,'15-30'),(30,45,'30-45')) as t (mini, maxi, designation)
)
select
t2.designation
, count(*)
from test t
left outer join cte_lategroup t2
on current_date - t.lateat >= t2.mini
and current_date - lateat < t2.maxi
group by t2.designation;
With a preset like yours :
create table test
(
id int
, lateAt date
);
insert into test
values (1231235, to_date('2019/09/14', 'yyyy/mm/dd'))
,(1242123, to_date('2019/09/13', 'yyyy/mm/dd'))
,(3465345, null)
,(5676548, to_date('2019/09/28', 'yyyy/mm/dd'))
,(8986475, to_date('2019/09/23', 'yyyy/mm/dd'));

How to query database for rows from next 5 days

How can I make a query in SQL Server to query for all rows for the next 5 days.
The problem is that it has to be days with records, so the next 5 days, might become something like, Today, Tomorrow, some day in next month, etc...
Basically I want to query the database for the records for the next non empty X days.
The table has a column called Date, which is what I want to filter.
Why not split the search into 2 queries. First one searches for the date part, the second uses that result to search for records IN the dates returned by the first query.
#Anagha is close, just a little modification and it is OK.
SELECT *
FROM TABLE
WHERE DATE IN (
SELECT DISTINCT TOP 5 DATE
FROM TABLE
WHERE DATE >= referenceDate
ORDER BY DATE
)
You can use following SQL query where 5 different dates are fetched at first then all rows for those selected dates are displayed
declare #n int = 5;
select *
from myData
where
datecol in (
SELECT distinct top (#n) cast(datecol as date) as datecol
FROM myData
WHERE datecol >= '20180101'
ORDER BY datecol
)
Try this:
select date from table where date in (select distinct top 5 date
from table where date >= getdate() order by date)
If your values are dates, you can use `dense_rank():
select t.*
from (select t.*, dense_rank() over (order by datecol) as seqnum
from t
where datecol >= cast(getdate() as date)
) t
where seqnum <= 5;
If the column has a time component and you still want to define days by midnight-to-midnight (as suggested by the question), just convert to date:
select t.*
from (select t.*,
dense_rank() over (order by cast(datetimecol as date)) as seqnum
from t
where datetimecol >= cast(getdate() as date)
) t
where seqnum <= 5;

SQL Count of date values that dont match hour and day in select statement

I have a problem unique to a business process. My user needs to know how many dates, counted, are before a specific end time that do not match on the hour or the day.
Here is an example.
AAA, 2016-03-15 16:00:28.967, 2016-03-15 16:02:58.487, 2016-03-17 14:01:24.243
In the example above id AAA has 3 entries. I need to count only the ones that don't have a matching hour and day. So the actual count should come out to be 2.
I have to do this all in SQL and can't use a CTE. It needs to be either a sub select or some type of join.
Something like this.
SELECT id, date, (
SELECT COUNT(*)
FROM x
WHERE day!=day
AND hour!=hour AND date < z
) AS DateCount
Results would be AAA, 2
I am thinking some type of recursive comparison but I am not sure how to accomplish this without a CTE.
In SQL Server you can try something like this:
SELECT id, CONVERT(VARCHAR(13), [date], 120) AS [Date], COUNT(*) AS DateCount
FROM YourTable
WHERE [date] < #ENDDATE
GROUP BY id, CONVERT(VARCHAR(13), [date], 120)
SELECT a AS current_a, COUNT(*) AS b,day AS day, hour as hour,
(SELECT COUNT(*)
FROM t
WHERE day != day
AND hour != hour
AND date < z ) as datecount
FROM t GROUP BY a ORDER by b DESC

SQL Server - Select all top of the hour records

I have a large table with records created every second and want to select only those records that were created at the top of each hour for the last 2 months. So we would get 24 selected records for every day over the last 60 days
The table structure is Dateandtime, Value1, Value2, etc
Many Thanks
You could group by on the date part (cast(col1 as date)) and the hour part (datepart(hh, col1). Then pick the minimum date for each hour, and filter on that:
select *
from YourTable yt
join (
select min(dateandtime) as dt
from YourTable
where datediff(day, dateandtime, getdate()) <= 60
group by
cast(dateandtime as date)
, datepart(hh, dateandtime)
) filter
on filter.dt = yt.dateandtime
Alternatively, you can group on a date format that only includes the date and the hour. For example, convert(varchar(13), getdate(), 120) returns 2013-05-11 18.
...
group by
convert(varchar(13), getdate(), 120)
) filter
...
For clarity's sake, I would probably use a two-step, CTE-based approach (this works in SQL Server 2005 and newer - you didn't clearly specify which version of SQL Server you're using, so I'm just hoping you're not on an ancient version like 2000 anymore):
-- define a "base" CTE to get the hour component of your "DateAndTime"
-- column and make it accessible under its own name
;WITH BaseCTE AS
(
SELECT
ID, DateAndTime,
Value1, Value2,
HourPart = DATEPART(HOUR, DateAndTime)
FROM dbo.YourTable
WHERE DateAndTime >= #SomeThresholdDateHere
),
-- define a second CTE which "partitions" the data by this "HourPart",
-- and number all rows for each partition starting at 1. So each "last"
-- event for each hour is the one with the RN = 1 value
HourlyCTE AS
(
SELECT ID, DateAndTime, Value1, Value2,
RN = ROW_NUMBER() OVER(PARTITION BY HourPart ORDER BY DateAndTime DESC)
FROM BaseCTE
)
SELECT *
FROM HourlyCTE
WHERE RN=1
Also: I wasn't sure what exactly you mean by "top of the hour" - the row that's been created right at the beginning of each hour (e.g. at 04:00:00) - or rather the last row created in that hour's time span? If you mean the first one for each hour - then you'd need to change the ORDER BY DateAndTime DESC to ORDER BY DateAndTime ASC
You can use option with EXISTS operator
SELECT *
FROM dbo.tableName t
WHERE t.DateAndTime >= #YourDateCondition
AND EXISTS (
SELECT 1
FROM dbo.tableName t2
WHERE t2.Dateandtime >= DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime), 0)
AND t2.Dateandtime < DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime)+1, 0)
HAVING MAX(t2.Dateandtime) = t.Dateandtime
)
OR option with CROSS APPLY operator
SELECT *
FROM dbo.test83 t CROSS APPLY (
SELECT 1
FROM dbo.test83 t2
WHERE t2.Dateandtime >= DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime), 0)
AND t2.Dateandtime < DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime)+1, 0)
HAVING MAX(t2.Dateandtime) = t.Dateandtime
) o(IsMatch)
WHERE t.DateAndTime >= #YourDateCondition
For improving performance use this index:
CREATE INDEX x ON dbo.test83(DateAndTime) INCLUDE(Value1, Value2)
Try:
select * from mytable
where datepart(mi, dateandtime)=0 and
datepart(ss, dateandtime)=0 and
datediff(d, dateandtime, getdate()) <=60
You can use window functions for this:
select dateandtime, val1, val2, . . .
from (select t.*,
row_number() over (partition by cast(dateandtime as date), hour(dateandtime)
order by dateandtime
) as seqnum
from t
) t
where seqnum = 1
The function row_number() assigns a sequential number to each group defined by the partition clause -- in this case each hour of each day. Within this group, it orders by the dateandtime value, so the one closest to the top of the hour gets a value of 1. The outer query just selects this one record for each group.
You may need an additional filter clause to get records in the last 60 days. Use this in the subquery:
where dateandtime >= getdate() - 60
This helped me get the top of the hour. Anything that ends in ":00:00".
WHERE (CAST(DATETIME as VARCHAR(19))) LIKE '%:00:00'