Is there any way how to (in the best case, without using cursor) count number of transactions that the same user made in previous 1 hour.
That means that for this table
CREATE TABLE #TR (PK INT, TR_DATE DATETIME, USER_PK INT)
INSERT INTO #TR VALUES (1,'2018-07-31 06:02:00.000',10)
INSERT INTO #TR VALUES (2,'2018-07-31 06:36:00.000',10)
INSERT INTO #TR VALUES (3,'2018-07-31 06:55:00.000',10)
INSERT INTO #TR VALUES (4,'2018-07-31 07:10:00.000',10)
INSERT INTO #TR VALUES (5,'2018-07-31 09:05:00.000',10)
INSERT INTO #TR VALUES (6,'2018-07-31 06:05:00.000',11)
INSERT INTO #TR VALUES (7,'2018-07-31 06:55:00.000',11)
INSERT INTO #TR VALUES (8,'2018-07-31 07:10:00.000',11)
INSERT INTO #TR VALUES (9,'2018-07-31 06:12:00.000',12)
The result should be:
The solution could be something like: COUNT(*) OVER (PARTITION BY USER_PK ORDER BY TR_DATE ROWS BETWEEN ((WHERE DATEADD(HH,-1,PRECENDING.TR_DATE) > CURRENT ROW.TR_DATE) AND CURRENT ROW ...but I know that ROWS BETWEEN can not be used like that...
I am guessing SQL Server based on the syntax. In SQL Server, you can use apply:
select t.*, tr2.result
from #tr tr outer apply
(select count(*) as result
from #tr tr2
where tr2.user_id = tr.user_id and
tr2.tr_date > dateadd(hour, -1, tr.date) and
tr2.tr_date <= tr.tr_date
) tr2;
SELECT USER_PK, COUNT(*) AS TransactionCount
FROM #TR
WHERE DATEDIFF(MINUTE, TR_DATE, GETDATE()) <= 60
AND DATEDIFF(MINUTE, TR_DATE, GETDATE()) >= 0
GROUP BY USER_PK
You can change GETDATE() with whatever you want, but they need to have the same value
Related
I have a hypothetical SQL table "EVENTS", with two columns, a UUID index column, and a DateTime column,
The table is populated with values ranging from 1900-01-01 to today, it is not ordered, there are numerous dates missing.
The query that I have to run is basically 'retrieve all events that happened at the requested date (start to the end of the day) or the closest previous date'
If I were looking for all events in a day that I know that exists in the database it would be something as simple as:
SELECT * FROM Events e
WHERE
e.date BETWEEN $START_OF_DAY AND $END_OF_DAY;
But if that date doesn't exist I must retrieve the latest date up to the requested date.
Grab current day, but if no records found, will return all records from the nearest previous day with records.
So in my sample data, Jan 2 returns 3 events dated Jan 1
SQL Server Solution
DECLARE #Input DATE = '2022-01-02' /*Try Jan 1,2,3, or 4*/
DROP TABLE IF EXISTS #Event
CREATE TABLE #Event (ID INT IDENTITY(1,1),EventDateTime DATETIME)
INSERT INTO #Event
VALUES
('2022-01-01 08:00')
,('2022-01-01 09:00')
,('2022-01-01 10:00')
,('2022-01-03 12:00')
SELECT TOP (1) WITH TIES *
FROM #Event AS A
CROSS APPLY (SELECT EventDate = CAST(EventDateTime AS DATE)) AS B
WHERE B.EventDate <= #Input
ORDER BY B.EventDate DESC
SQL Fiddle wasn't letting me create a variable, but here's a the code conceptually for a more efficient version for MySQL. It grabs the desired date range in the first query, then uses it to filter in the second query. I think it should perform far better than the accepted answer assuming you have an index on EventDateTime
CREATE TABLE Event (
ID MEDIUMINT NOT NULL AUTO_INCREMENT
,EventDateTime DATETIME
,PRIMARY KEY (ID));
INSERT INTO Event (EventDateTime)
VALUES
('2022-01-01 08:00')
,('2022-01-01 09:00')
,('2022-01-01 10:00')
,('2022-01-03 12:00');
/*Need to save these off to variables to use in later query*/
SELECT TIMESTAMP(CAST(EventDateTime AS DATE)) AS StartRange
,TIMESTAMP(CAST(EventDateTime AS DATE)) + INTERVAL 1 DAY AS EndRange
FROM Event
WHERE EventDateTime < DATE_ADD('2022-01-04' /*Input*/,INTERVAL 1 DAY)
ORDER BY EventDateTime DESC
LIMIT 1;
SELECT *
FROM Event
WHERE EventDateTime >= StartRange
AND EventDateTime < EndRange
Calculate the most recent date, and do a self join. Although I'm using MYSQL, I believe this is the most generic workaround
CREATE TABLE d0207Event (ID INT ,EventDateTime DATETIME)
INSERT INTO d0207Event
VALUES
(1,'2022-01-01 08:00')
,(2,'2022-01-01 09:00')
,(3,'2022-01-01 10:00')
,(4,'2022-01-03 12:00')
INSERT INTO d0207Event
VALUES
(5, '2021-12-12 08:00');
select t1.*
from d0207Event t1,
(
select min(t1.dat) mindat
from (
select t1.*,
DATEDIFF('2022-01-02', cast(t1.EventDateTime as date)) dat
from d0207Event t1
) t1
where t1.dat >= 0
) t2
where DATEDIFF('2022-01-02', cast(t1.EventDateTime as date)) = t2.mindat
;
There are also many advanced syntaxes that can solve this problem better, depending on which DB you use and your specific application scenario
It seems that you can also choose a database with more syntax, then using an analytic function usually solves the efficiency problem well, since the EVENT table only needs to be queried once.
CREATE TABLE Event (
ID MEDIUMINT NOT NULL AUTO_INCREMENT
,EventDateTime DATETIME
,PRIMARY KEY (ID));
INSERT INTO Event (EventDateTime)
VALUES
('2022-01-01 08:00')
,('2022-01-01 09:00')
,('2022-01-01 10:00')
,('2022-01-03 12:00');
select *
from (
select t1.*,
first_value(cast(t1.EventDateTime as date))
over(order by cast(t1.EventDateTime as date) desc) fv
from event t1
where cast(t1.EventDateTime as date) <= '2022-01-03'
) t1
where cast(t1.EventDateTime as date) = fv
Creating a functional index cast(t1.EventDateTime as date), or creating a virtual column directly can make the query easier, otherwise using date_add() is a good way
Edited to live up to the rules here. Sorry about my first attempt.
I got the following sample data:
CREATE TABLE SampleData(
[Time] [time](7) NOT NULL
) ON [PRIMARY]
GO
INSERT INTO SampleData([Time]) VALUES ('01:00:00')
INSERT INTO SampleData([Time]) VALUES ('02:00:00')
INSERT INTO SampleData([Time]) VALUES ('02:00:00')
INSERT INTO SampleData([Time]) VALUES ('03:00:00')
INSERT INTO SampleData([Time]) VALUES ('03:00:00')
INSERT INTO SampleData([Time]) VALUES ('03:00:00')
INSERT INTO SampleData([Time]) VALUES ('04:00:00')
INSERT INTO SampleData([Time]) VALUES ('04:00:00')
INSERT INTO SampleData([Time]) VALUES ('04:00:00')
INSERT INTO SampleData([Time]) VALUES ('04:00:00')
GO
This is my query:
DECLARE #Counter INT
SET #Counter = 1
WHILE (#Counter <= 4)
BEGIN
SELECT Count([Time]) AS OrdersAmount
FROM SampleData
WHERE DATEPART(HOUR, [Time]) = #Counter
SET #Counter = #Counter + 1
END
This is the result of the query:
OrdersAmount
1
-----
OrdersAmount
2
-----
OrdersAmount
3
-----
OrdersAmount
4
So 4 seperate tables. What I need is one table, with alle values in it, on each their own row, like this:
OrdersAmount
1
2
3
4
I tried with cte and declaring a temp table, but I just can't make it work.
I don't have the data so can't test.
But if I get your problem right, this should work for you.
select PromisedPickupDt = cast(PromisedPickupDate as date),
[Hour] = datepart(hour, PromisedPickupDate),
HourlyAmount = sum(OrdersAmount)
from [FLX].[dbo].[NDA_SAP_Bestillinger]
where cast(PromisedPickupDate as date) = cast(getdate() as date)
group by cast(PromisedPickupDate as date), datepart(hour, PromisedPickupDate)
You need an Hours table to join and group against. You can create this using a VALUES constructor:
SELECT
Count(*) AS OrdersAmount
FROM (VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23)
) AS v(Hour)
LEFT JOIN [FLX].[dbo].[NDA_SAP_Bestillinger]
ON v.Hour = DATEPART(hour, PromisedPickupTime)
AND PromisedPickupDate >= CAST(CAST(GETDATE() AS date) AS datetime)
AND PromisedPickupDate < CAST(DATEADD(day, 1, CAST(GETDATE() AS date)) AS datetime)
GROUP BY v.Hour;
What is going on here is that we start with a filter by the current date.
Then we join a constructed Hours table against the hour-part of the date. Because it is on the left-side of a LEFT JOIN, we now have all the hours whether or not there is any value in that hour. We can now group by it.
Note the correct method to compare to a date range: a half-open interval >= AND <
Do not use YEAR\MONTH etc in join and filter predicates, as indexes cannot be used for this.
I'm trying to retrieve the latest set of rows from a source table containing a foreign key, a date and other fields present. A sample set of data could be:
create table #tmp (primaryId int, foreignKeyId int, startDate datetime,
otherfield varchar(50))
insert into #tmp values (1, 1, '1 jan 2010', 'test 1')
insert into #tmp values (2, 1, '1 jan 2011', 'test 2')
insert into #tmp values (3, 2, '1 jan 2013', 'test 3')
insert into #tmp values (4, 2, '1 jan 2012', 'test 4')
The form of data that I'm hoping to retrieve is:
foreignKeyId maxStartDate otherfield
------------ ----------------------- -------------------------------------------
1 2011-01-01 00:00:00.000 test 2
2 2013-01-01 00:00:00.000 test 3
That is, just one row per foreignKeyId showing the latest start date and associated other fields - the primaryId is irrelevant.
I've managed to come up with:
select t.foreignKeyId, t.startDate, t.otherField from #tmp t
inner join (
select foreignKeyId, max(startDate) as maxStartDate
from #tmp
group by foreignKeyId
) s
on t.foreignKeyId = s.foreignKeyId and s.maxStartDate = t.startDate
but (a) this uses inner queries, which I suspect may lead to performance issues, and (b) it gives repeated rows if two rows in the original table have the same foreignKeyId and startDate.
Is there a query that will return just the first match for each foreign key and start date?
Depending on your sql server version, try the following:
select *
from (
select *, rnum = ROW_NUMBER() over (
partition by #tmp.foreignKeyId
order by #tmp.startDate desc)
from #tmp
) t
where t.rnum = 1
If you wanted to fix your attempt as opposed to re-engineering it then
select t.foreignKeyId, t.startDate, t.otherField from #tmp t
inner join (
select foreignKeyId, max(startDate) as maxStartDate, max(PrimaryId) as Latest
from #tmp
group by foreignKeyId
) s
on t.primaryId = s.latest
would have done the job, assuming PrimaryID increases over time.
Qualms about inner query would have been laid to rest as well assuming some indexes.
Following is the sample data. I need to make 3 copies of this data in t sql without using loop and return as one resultset. This is sample data not real.
42 South Yorkshire
43 Lancashire
44 Norfolk
Edit: I need multiple copies and I have no idea in advance that how many copies I need I have to decide this on the basis of dates. Date might be 1st jan to 3rd Jan OR 1st jan to 8th Jan.
Thanks.
Don't know about better but this is definatley more creative! you can use a CROSS JOIN.
EDIT: put some code in to generate a date range, you can change the date range, the rows in the #date are your multiplier.
declare #startdate datetime
, #enddate datetime
create table #data1 ([id] int , [name] nvarchar(100))
create table #dates ([date] datetime)
INSERT #data1 SELECT 42, 'South Yorkshire'
INSERT #data1 SELECT 43, 'Lancashire'
INSERT #data1 SELECT 44, 'Norfolk'
set #startdate = '1Jan2010'
set #enddate = '3Jan2010'
WHILE (#startdate <= #enddate)
BEGIN
INSERT #dates SELECT #startdate
set #startdate=#startdate+1
END
SELECT [id] , [name] from #data1 cross join #dates
drop table #data1
drop table #dates
You could always use a CTE to do the dirty work
Replace the WHERE Counter < 4 with the amount of duplicates you need.
CREATE TABLE City (ID INTEGER PRIMARY KEY, Name VARCHAR(32))
INSERT INTO City VALUES (42, 'South Yorkshire')
INSERT INTO City VALUES (43, 'Lancashire')
INSERT INTO City VALUES (44, 'Norfolk')
/*
The CTE duplicates every row from CTE for the amount
specified by Counter
*/
;WITH CityCTE (ID, Name, Counter) AS
(
SELECT c.ID, c.Name, 0 AS Counter
FROM City c
UNION ALL
SELECT c.ID, c.Name, Counter + 1
FROM City c
INNER JOIN CityCTE cte ON cte.ID = c.ID
WHERE Counter < 4
)
SELECT ID, Name
FROM CityCTE
ORDER BY 1, 2
DROP TABLE City
This may not be the most efficient way of doing it, but it should work.
(select ....)
union all
(select ....)
union all
(select ....)
Assume the table is named CountyPopulation:
SELECT * FROM CountyPopulation
UNION ALL
SELECT * FROM CountyPopulation
UNION ALL
SELECT * FROM CountyPopulation
Share and enjoy.
There is no need to use a cursor. The set-based approach would be to use a Calendar table. So first we make our calendar table which need only be done once and be somewhat permanent:
Create Table dbo.Calendar ( Date datetime not null Primary Key Clustered )
GO
; With Numbers As
(
Select ROW_NUMBER() OVER( ORDER BY S1.object_id ) As [Counter]
From sys.columns As s1
Cross Join sys.columns As s2
)
Insert dbo.Calendar([Date])
Select DateAdd(d, [Counter], '19000101')
From Numbers
Where [Counter] <= 100000
GO
I populated it with a 100K dates which goes into 2300. Obviously you can always expand it. Next we generate our test data:
Create Table dbo.Data(Id int not null, [Name] nvarchar(20) not null)
GO
Insert dbo.Data(Id, [Name]) Values(42,'South Yorkshire')
Insert dbo.Data(Id, [Name]) Values(43, 'Lancashire')
Insert dbo.Data(Id, [Name]) Values(44, 'Norfolk')
GO
Now the problem becomes trivial:
Declare #Start datetime
Declare #End datetime
Set #Start = '2010-01-01'
Set #End = '2010-01-03'
Select Dates.[Date], Id, [Name]
From dbo.Data
Cross Join (
Select [Date]
From dbo.Calendar
Where [Date] >= #Start
And [Date] <= #End
) As Dates
By far the best solution is CROSS JOIN. Most natural.
See my answer here: How to retrieve rows multiple times in SQL Server?
If you have a Numbers table lying around, it's even easier. You can DATEDIFF the dates to give you the filter on the Numbers table
I would like to randomly sort a result in a repeatable fashion for purposes such as paging. For this NEWID() is too random in that the same results cannot be re-obtained. Order by Rand(seed) would be ideal as with the same seed the same random collection would result. Unfortunately, the Rand() state resets with every row, does anyone have a solution?
declare #seed as int;
set #seed = 1000;
create table temp (
id int,
date datetime)
insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')
-- re-seeds for every item
select *, RAND(), RAND(id+#seed) as r from temp order by r
--1 2009-01-19 00:00:00.000 0.277720118060575 0.732224964471124
--2 2009-01-18 00:00:00.000 0.277720118060575 0.732243597442382
--3 2009-01-17 00:00:00.000 0.277720118060575 0.73226223041364
--4 2009-01-16 00:00:00.000 0.277720118060575 0.732280863384898
--5 2009-01-15 00:00:00.000 0.277720118060575 0.732299496356156
--6 2009-01-14 00:00:00.000 0.277720118060575 0.732318129327415
-- Note how the last column is +=~0.00002
drop table temp
-- interestingly this works:
select RAND(#seed), RAND()
--0.732206331499865 0.306382810665955
Note, I tried Rand(ID) but that just turns out to be sorted. Apparently Rand(n) < Rand(n+1)
Building off of gkrogers hash suggestion this works great. Any thoughts on performance?
declare #seed as int;
set #seed = 10;
create table temp (
id int,
date datetime)
insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')
-- re-seeds for every item
select *, HASHBYTES('md5',cast(id+#seed as varchar)) r
from temp order by r
--1 2009-01-19 00:00:00.000 0x6512BD43D9CAA6E02C990B0A82652DCA
--5 2009-01-15 00:00:00.000 0x9BF31C7FF062936A96D3C8BD1F8F2FF3
--4 2009-01-16 00:00:00.000 0xAAB3238922BCC25A6F606EB525FFDC56
--2 2009-01-18 00:00:00.000 0xC20AD4D76FE97759AA27A0C99BFF6710
--3 2009-01-17 00:00:00.000 0xC51CE410C124A10E0DB5E4B97FC2AF39
--6 2009-01-14 00:00:00.000 0xC74D97B01EAE257E44AA9D5BADE97BAF
drop table temp
EDIT: Note, the declaration of #seed as it's use in the query could be replace with a parameter or with a constant int if dynamic SQL is used. (declaration of #int in a TSQL fashion is not necessary)
You can use a value from each row to re-evaluate the rand function:
Select *, Rand(#seed + id) as r from temp order by r
adding the ID ensures that the rand is reseeded for each row. But for a value of seed you will always get back the same sequence of rows (provided that the table does not change)
Creating a hash can be much more time consuming than creating a seeded random number.
To get more variation in the ourput of RAND([seed]) you need to make the [seed] vary significantly too. Possibly such as...
SELECT
*,
RAND(id * 9999) AS [r]
FROM
temp
ORDER BY
r
Using a constant ensures the replicability you asked for. But be careful of the result of (id * 9999) causing an overflow if you expect your table to get big enough...
SELECT *, checksum(id) AS r FROM table ORDER BY r
This kind of works. Although the output from checksum() does not look all that random to me. The MSDN Documentation states:
[...], we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.
But may be it faster.
After doing some reading this is an accepted method.
Select Rand(#seed) -- now rand is seeded
Select *, 0 * id + Rand() as r from temp order by r
Having id in the expression causes it to be reevaluated every row. But multiplying it by 0 ensures that it doesnt not affect the outcome of rand.
What a horrible way of doing things!
create table temp (
id int,
date datetime)
insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')
-- re-seeds for every item
select *, NEWID() r
from temp order by r
drop table temp
This has worked well for me in the past, and it can be applied to any table (just bolt on the ORDER BY clause):
SELECT *
FROM MY_TABLE
ORDER BY
(SELECT ABS(CAST(NEWID() AS BINARY(6)) % 1000) + 1);