SQL server datetimeoffset data aggregation - sql

I am currently developing an application that makes use of an SQL database. In this database I maintain a table called session, which has three fields: a session id (int), a date_created (datetimeoffset), and a date_expired (datetimeoffset) field.
I want to group my sessions in clusters of sessions that the minimum date_created and the maximum date_expired of the sessions not to be more than 6 hours. Also, I don't want my groups to overlap, i.e. If session s1 belongs to group 1, I do not want it to be also in group 2.
Any ideas?

I suggest you create your 4 data groups like 0-6, 6-12, 12-18 and 18-24, so you can do like this:
FYI: for the sake of simplicity, I did the case on the date column only, you will need to use a datediff between your date_created and date_expired
FYI2: change the values on the between as it suits you better and in the end the query will return values to 1, 2, 3 and 4, which you should change to "0 to 6", "6 to 12" and so one..
with MyCTE as (
select case
when datepart(hh,date ) between 0 and 6 then 1
when datepart(hh,date ) between 6 and 12 then 2
when datepart(hh,date ) between 13 and 18 then 3
else 4
end
as myDatebucket
, *
from session
)
select myDatebucket, count(*)
from MyCTE
group by myDatebucket
order by myDatebucket

Related

Using Parameter within timestamp_trunc in SQL Query for DataStudio

I am trying to use a custom parameter within DataStudio. The data is hosted in BigQuery.
SELECT
timestamp_trunc(o.created_at, #groupby) AS dateMain,
count(o.id) AS total_orders
FROM `x.default.orders` o
group by 1
When I try this, it returns an error saying that "A valid date part name is required at [2:35]"
I basically need to group the dates using a parameter (e.g. day, week, month).
I have also included a screenshot of how I have created the parameter in Google DataStudio. There is a default value set which is "day".
A workaround that might do the trick here is to use a rollup in the group by with the different levels of aggregation of the date, since I am not sure you can pass a DS parameter to work like that.
See the following example for clarity:
with default_orders as (
select timestamp'2021-01-01' as created_at, 1 as id
union all
select timestamp'2021-01-01', 2
union all
select timestamp'2021-01-02', 3
union all
select timestamp'2021-01-03', 4
union all
select timestamp'2021-01-03', 5
union all
select timestamp'2021-01-04', 6
),
final as (
select
count(id) as count_orders,
timestamp_trunc(created_at, day) as days,
timestamp_trunc(created_at, week) as weeks,
timestamp_trunc(created_at, month) as months
from
default_orders
group by
rollup(days, weeks, months)
)
select * from final
The output, then, would be similar to the following:
count | days | weeks | months
------+------------+----------+----------
6 | null | null | null <- this, represents the overall (counted 6 ids)
2 | 2021-01-01| null | null <- this, the 1st rollup level (day)
2 | 2021-01-01|2020-12-27| null <- this, the 1st and 2nd (day, week)
2 | 2021-01-01|2020-12-27|2021-01-01 <- this, all of them
And so on.
At the moment of visualizing this on data studio, you have two options: setting the metric as Avg instead of Sum, because as you can see there's kind of a duplication at each stage of the day column; or doing another step in the query and get rid of nulls, like this:
select
*
from
final
where
days is not null and
weeks is not null and
months is not null

How to run a query for multiple independent date ranges?

I would like to run the below query that looks like this for week 1:
Select week(datetime), count(customer_call) from table where week(datetime) = 1 and week(orderdatetime) < 7
... but for weeks 2, 3, 4, 5 and 6 all in one query and with the 'week(orderdatetime)' to still be for the 6 weeks following the week(datetime) value.
This means that for 'week(datetime) = 2', 'week(orderdatetime)' would be between 2 and 7 and so on.
'datetime' is a datetime field denoting registration.
'customer_call' is a datetime field denoting when they called.
'orderdatetime' is a datetime field denoting when they ordered.
Thanks!
I think you want group by:
Select week(datetime), count(customer_call)
from table
where week(datetime) = 1 and week(orderdatetime) < 7
group by week(datetime);
I would also point out that week doesn't take the year into account, so you might want to include that in the group by or in a where filter.
EDIT:
If you want 6 weeks of cumulative counts, then use:
Select week(datetime), count(customer_call),
sum(count(customer_call)) over (order by week(datetime)
rows between 5 preceding and current row) as running_sum_6
from table
group by week(datetime);
Note: If you want to filter this to particular weeks, then make this a subquery and filter in the outer query.

Oracle Database Temporal Query Implementation - Collapse Date Ranges

This is the result of one of my queries:
SURGERY_D
---------
01-APR-05
02-APR-05
03-APR-05
04-APR-05
05-APR-05
06-APR-05
07-APR-05
11-APR-05
12-APR-05
13-APR-05
14-APR-05
15-APR-05
16-APR-05
19-APR-05
20-APR-05
21-APR-05
22-APR-05
23-APR-05
24-APR-05
26-APR-05
27-APR-05
28-APR-05
29-APR-05
30-APR-05
I want to collapse the date ranges which are continuous, into intervals. For examples,
[01-APR-05, 07-APR-05], [11-APR-05, 16-APR-05] and so on.
In terms of temporal databases, I want to 'collapse' the dates. Any idea how to do that on Oracle? I am using version 11. I searched for it and read a book but couldn't find/understand how to do it. It might be simple, but everyone has their own flaws and Oracle is mine. Also, I am new to SO so my apologies if I have violated any rules. Thank You!
You can take advantage of the ROW_NUMBER analytical function to generate a unique, sequential number for each of the records (we'll assign that number to the dates in ascending order).
Then, you group the dates by difference between the date and the generated number - the consecutive dates will have the same difference:
Date Number Difference
01-APR-05 1 1 -- MIN(date_val) in group with diff. = 1
02-APR-05 2 1
03-APR-05 3 1
04-APR-05 4 1
05-APR-05 5 1
06-APR-05 6 1
07-APR-05 7 1 -- MAX(date_val) in group with diff. = 1
11-APR-05 8 3 -- MIN(date_val) in group with diff. = 3
12-APR-05 9 3
13-APR-05 10 3
14-APR-05 11 3
15-APR-05 12 3
16-APR-05 13 3 -- MAX(date_val) in group with diff. = 3
Finally, you select the minimal and maximal date in each of the groups to get the beginning and ending of each range.
Here's the query:
SELECT
MIN(date_val) start_date,
MAX(date_val) end_date
FROM (
SELECT
date_val,
row_number() OVER (ORDER BY date_val) AS rn
FROM date_tab
)
GROUP BY date_val - rn
ORDER BY 1
;
Output:
START_DATE END_DATE
------------ ----------
01-04-2005 07-04-2005
11-04-2005 16-04-2005
19-04-2005 24-04-2005
26-04-2005 30-04-2005
You can check how that works on SQLFidlle: Dates ranges example

Comparing data between reporting periods MS Access

I have a big table that contains records for each reporting period in my project.
The period is indentified by an integer 1, 2, 3, 4, 5, 6, 7.
Each period contains about 7000 rows of tasks each identified a unique ID. These tasks all have a percent complete column which is an integer.
I want to add a comparison column.
so it would look up the unique id in the previous period then return the difference in percent complete.
eg
for
Period: 8 Item Unique ID: 42w3wer324wer32 Percent complete: 50
it would find:
Period: 7 Item Unique ID: 42w3wer324wer32 Percent complete: 40
the fill in the field with: 10.
If it could not find the Item Unique ID in the previous period then it would default to 0.
thanks
As I understand your description, you could pull the data for period 8 like this:
SELECT item_id, pct_complete
FROM YourTable
WHERE rpt_period = 8;
And the previous period would be the same query except substituting 7 as the period.
So take the period 8 query and left join it to a subquery for period 7.
SELECT
y.item_id,
(y.pct_complete - Nz(sub.pct_complete, 0)) AS change_in_pct_complete
FROM YourTable AS y
LEFT JOIN
(
SELECT item_id, pct_complete
FROM YourTable
WHERE rpt_period = 7
) AS sub
ON y.item_id = sub.item_id
WHERE rpt_period = 8;
That Nz() expression will substitute 0 for Null when no period 7 match exists for a period 8 item_id.
If you need a query which will not be running from within an Access session, the Nz() function will not be available. In that case, you can use an IIf() expression ... it's not as concise, but it will get the job done.
IIf(sub.pct_complete Is Null, 0, sub.pct_complete)

Sql Query - Limiting query results

I am quite certain we cannot use the LIMIT clause for what I want to do - so wanted to find if there are any other ways we can accomplish this.
I have a table which captures which user visited which store. Every time a user visits a store, a row is inserted into this table.
Some of the fields are
shopping_id (primary key)
store_id
user_id
Now what I want is - for a given set of stores, find the top 5 users who have visited the store max number of times.
I can do this 1 store at a time as:
select store_id,user_id,count(1) as visits
from shopping
where store_id = 60
group by user_id,store_id
order by visits desc Limit 5
This will give me the 5 users who have visited store_id=60 the max times
What I want to do is provide a list of 10 store_ids and for each store fetch the 5 users who have visited that store max times
select store_id,user_id,count(1) as visits
from shopping
where store_id in (60,61,62,63,64,65,66)
group by user_id,store_id
order by visits desc Limit 5
This will not work as the Limit at the end will return only 5 rows rather than 5 rows for each store.
Any ideas on how I can achieve this. I can always write a loop and pass 1 store at a time but wanted to know if there is a better way
Using two user variable and counting the same consecutive store_id, you can replace <= 5 with whatever limit you want
SELECT a.*
FROM (
SELECT store_id, user_id, count(1) as visits
FROM shopping
WHERE store_id IN (60,61,62,63,64,65,66)
GROUP BY store_id, user_id
ORDER BY store_id, visits desc, user_id
) a,
(SELECT #prev:=-1, #count:=1) b
WHERE
CASE WHEN #prev<>a.store_id THEN
CASE WHEN #prev:=a.store_id THEN
#count:=1
END
ELSE
#count:=#count+1
END <= 5
Edit as requested some explanation :
The first subquery (a) is the one that group and order the data so you will have data like:
store_id | user_id | visits
---------+---------+-------
60 1 5
60 2 3
60 3 1
61 2 4
61 3 2
the second subquery (b) init the user variable #prev with -1 and #count with 1
then we choose all data from the subquery (a) verifying the condition in the case.
verify that the previous store_id (#prev) we have seen is different from the current store_id.
Since the first #prev is equal to -1 there is nothing that match the current store_id so the condition <> is true we enter then is the second case who just serve to change the value #prev with the current store_id. This is the trick so i can change the two user variable #count and #prev in the same condition.
if the previous store_id is equal to #prev just increment the #count variable.
we check that the count is within the value we want so the <= 5
So with our test data the:
step | #prev | #count | store_id | user_id | visits
-----+-------+--------+----------+---------+-------
0 -1 1
1 60 1 60 1 5
2 60 2 60 2 3
3 60 3 60 3 1
4 61 1 61 2 4
5 61 2 61 3 2
Major concern over here is number of times you query a database.
If you query multiple times from your script. Its simply wastage of resources and must be avoided.
That is you must NOT run a loop to run the SQL multiple times by incrementing certain value. In your case 60 to 61 and so on.
Solution 1:
Create a view
Here is the solution
CREATE VIEW myView AS
select store_id,user_id,count(1) as visits
from shopping
where store_id = 60
group by user_id,store_id
order by visits desc Limit 5
UNION
select store_id,user_id,count(1) as visits
from shopping
where store_id = 61
group by user_id,store_id
order by visits desc Limit 5
UNION
select store_id,user_id,count(1) as visits
from shopping
where store_id = 62
group by user_id,store_id
order by visits desc Limit 5
Now use
SELECT * from MyView
This is limited because you cant make it dynamic.
What if you need 60 to 100 instead of 60 to 66.
Solution 2:
Use Procedure.
I wont go into how to write a procedure cuz its late night and I got to sleep. :)
Well, procedure must accept two values 1st inital number (60) and 2nd Count (6)
Inside the procedure create a temporary table (cursor) to store data then run a loop from initial number till count times
In your case from 60 to 66
Inside the loop write desired script Replacing 60 with a looping variable.
select store_id,user_id,count(1) as visits
from shopping
where store_id = 60
group by user_id,store_id
order by visits desc Limit 5
And append the result in the temporary table (cursor).
Hope this will solve your problem.
Sorry I couldn't give you the code. If you still need it plz send me a message. I will give it to you when I wake up next morning.
UNION may be what you are looking for.
-- fist store
(select store_id,user_id,count(1) as visits
from shopping
where store_id = 60
group by user_id,store_id
order by visits desc Limit 5)
UNION ALL
-- second store
(select store_id,user_id,count(1) as visits
from shopping
where store_id = 61
group by user_id,store_id
order by visits desc Limit 5)
...
http://dev.mysql.com/doc/refman/5.0/en/union.html
If you will not save data about when a user visited a store or something like this, you could simply update the table each time a user visits a store instead of appending a new row.
Something like this:
INSERT INTO `user_store` (`user_id`, `store_id`, `visits`) VALUES ('USER', 'SHOP', 1)
ON DUPLICATE KEY UPDATE `visits` = `visits` + 1
But I think this would not work, because neither user_id nor store_id are unique. You have to add a unique primary key like this: user#store or something else.
Another opinion would be to save this data (how often a user was in a store) in a separate table containing of ID, user_id, store_id, visits and increment visits everytime you also add a new row to you existing table.
To get the Top5 you can then use:
SELECT `visits`, `user_id` FROM `user_store_times` WHERE `store_id`=10 ORDER BY `visits` DESC LIMIT 5
The simplest way would be to issue 10 separate queries, one for each store. If you use parameterized queries (e.g. using PDO in PHP) this will be pretty fast since the query will be part-compiled.
If this still proves to be too resource-intensive, then another solution would be to cache the results in the store table - i.e. add a field that lists the top 5 users for each store as a simple comma-separated list. It does mean your database would not be 100% normalised but that shouldn't be a problem.