SQL Query to Group Clusters of Related Events with Criteria

SQL Query to Group Clusters of Related Events with Criteria - sql

OK, this is tough to explain without drawing it out on a whiteboard or something... But here it goes. I've tried to be as clear as possible but let me know if this doesn't make sense....
I have a MS Access project that processes time series datasets from multiple Source Objects or "SOURCES", and multiple observation points or "RECEIVERS", and identifies events of interest based on time and spatial proximity. This gives me a table of triggers of possibly related events with the following fields.
CORRELATION_ID
RECEIVER_EVENT_ID
RECEIVER_NAME
RECEIVER_START_DATETIME
RECEIVER_END_DATETIME
SOURCE_EVENT_ID
SOURCE_NAME
SOURCE_START_DATETIME
SOURCE_END_DATETIME
Because I can get multiple source and receiver triggers happening at overlapping times, or times that are close to each other, I get a massive list of triggers and I would like to refine this list of triggers by grouping them further based on additional criteria.
I would like to specify 2 criteria for max allowable time gap between source events, MAX_SOURCE_GAP, and maximum allowable gap between receiver events MAX_RECEIVER_GAP. GAP is calculated from Start time of one trigger minus the end time of another trigger.
If the events are within this gap range then they need to be grouped, and the resulting group record must store the start time of the earliest event and the end time of the latest event. For the RECEIVER events, the RECEIVER_NAME must be the same. (ie I dont want to group events from different RECEIVERS because I still want to end up with a list of related RECEIVER<>SOURCE events) For the SOURCE events, the event must have been picked up by the same Receiver, in otherwords the RECEIVER_NAME again must be the same. I would also like the record to return a list of the names of either the Sources that are grouped. For this I was thinking I could implement Allen Browne's ConcatRelated() function.
updated The 3rd criteria required defines the relationship between the grouped source events and the grouped receiver events, MAX_SOURCE_TO_RECEIVER_DELAY. This is the maximum allowable time delay after start time of a source that the receiver can be triggered. In otherwords startTime_receiver - startTime_source <= MAX_SOURCE_TO_RECEIVER_DELAY. The receiver can also not trigger before the source, so startTime_receiver < startTime_source.
I think basically this will require a few steps. At least one subquery to group the SOURCE events. At least one subquery to group the RECEIVER events. And then a step to combine them so I can return something like this.
RECEIVER_NAME
MIN-RECEIVER_START_DATETIME
MAX-RECEIVER_END_DATETIME
MIN-SOURCE_END_DATETIME
MAX-SOURCE_END_DATETIME
LIST_OF_SOURCES <--field that looks like "SOURCE10, SOURCE 24, SOURCE 51" generated from Allen Browne's ConcatRelated() function.
I think I understand the methodology but I am having trouble properly grouping things where there are more than 2 triggers. I can probably tackle concatenating the names of the sources with ConcatRelated if I get the proper time grouping figured out.
--Update -
I have uploaded some sample data to SQLfiddle.com click here for sample dataset
The resulting table I am essentially trying to come up with would look like this for this sample data set:
RECEIVER_NAME MIN-RECEIVER_START_DATETIME MAX-RECEIVER_END_DATETIME SOURCE_LIST MIN-SOURCE_START_DATETIME MAX-SOURCE_END_DATETIME
RECEIVER1 2012-04-08 05:08 2012-04-08 06:22 SOURCE1,SOURCE2,SOURCE3 2012-04-08 02:10 2012-04-08 05:25
RECEIVER2 2012-05-08 10:05 2012-04-08 14:55 SOURCE1,SOURCE2 2012-05-08 10:01 2012-05-08 13:45
RECEIVER2 2012-06-08 06:55 2012-06-08 21:19 SOURCE2 2012-05-08 14:55 2012-05-08 16:22
sorry, wow what a pain trying to post a table. I couldn't find any better way.

as i mentioned in my comment no criteria has been used to yield the result. your events are grouped by RECEIVER_EVENT_ID and RECEIVER_EVENT_START_TIME. ( i guess receiver_event_id is always related to receiver_name hence i chose event_id but you can also group by receiver_name its up to you)
this will give for now:
240 | RECEIVER1 | August, 04 2012 05:08:00+0000
241 | RECEIVER2 | August, 05 2012 10:05:00+0000
242 | RECEIVER2 | August, 05 2012 14:15:00+0000
243 | RECEIVER2 | August, 06 2012 06:55:00+0000
then you can find the min and max values related to the events that are being grouped.
if you would like to group the events 241 & 242, you need to find a logic that group them both together.
here is the code markdown for grouping events and event start time:
Hope this gives you an idea about group_concatenation function in MySQL as well as grouping. Let me know if you have found the exact SQL statement for you question or a faster solution. I'm very much interested to see that too.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE relatedEvents
(
CORRELATION_ID INT auto_increment primary key,
RECEIVER_EVENT_ID INT,
RECEIVER_NAME VARCHAR(20),
RECEIVER_START_DATETIME DATETIME,
RECEIVER_END_DATETIME DATETIME,
SOURCE_EVENT_ID INT,
SOURCE_NAME VARCHAR(20),
SOURCE_START_DATETIME DATETIME,
SOURCE_END_DATETIME DATETIME
);
INSERT INTO relatedEvents
(RECEIVER_EVENT_ID, RECEIVER_NAME, RECEIVER_START_DATETIME,
RECEIVER_END_DATETIME, SOURCE_EVENT_ID, SOURCE_NAME, SOURCE_START_DATETIME, SOURCE_END_DATETIME)
VALUES
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '1', 'SOURCE1', '2012-08-04 02:10', '2012-08-04 02:40'),
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '2', 'SOURCE2', '2012-08-04 02:30', '2012-08-04 03:10'),
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '3', 'SOURCE2', '2012-08-04 03:15', '2012-08-04 03:30'),
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '4', 'SOURCE3', '2012-08-04 05:01', '2012-08-04 05:25'),
('241', 'RECEIVER2', '2012-08-05 10:05:00', '2012-08-05 10:35', '5', 'SOURCE1', '2012-08-05 10:01', '2012-08-05 10:15'),
('241', 'RECEIVER2', '2012-08-05 10:05:00', '2012-08-05 10:35', '6', 'SOURCE2', '2012-08-05 12:15', '2012-08-05 12:17'),
('242', 'RECEIVER2', '2012-08-05 14:15:00', '2012-08-05 14:55', '7', 'SOURCE1', '2012-08-05 13:35', '2012-08-05 13:45'),
('243', 'RECEIVER2', '2012-08-06 06:55:00', '2012-08-06 21:19', '8', 'SOURCE2', '2012-08-05 14:55', '2012-08-05 16:22');
Query 1:
SELECT
RECEIVER_EVENT_ID as EVENT_ID,
o_r.receiver_name as Receiver_name,
(select min(RECEIVER_START_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID)as min_r_st,
(select Max(RECEIVER_END_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID) as max_r_et,
(Select group_concat(DISTINCT source_name) from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID) as Sources,
(select min(SOURCE_START_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID)as min_s_st,
(select Max(SOURCE_END_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID) as max_s_et,
count(RECEIVER_START_DATETIME) as RST
FROM relatedEvents as O_R
group by RECEIVER_EVENT_ID, RECEIVER_START_DATETIME
order by RECEIVER_START_DATETIME asc
Results:
| EVENT_ID | RECEIVER_NAME | MIN_R_ST | MAX_R_ET | SOURCES | MIN_S_ST | MAX_S_ET | RST |
|----------|---------------|-------------------------------|-------------------------------|-------------------------|-------------------------------|-------------------------------|-----|
| 240 | RECEIVER1 | August, 04 2012 05:08:00+0000 | August, 04 2012 06:22:00+0000 | SOURCE1,SOURCE2,SOURCE3 | August, 04 2012 02:10:00+0000 | August, 04 2012 05:25:00+0000 | 4 |
| 241 | RECEIVER2 | August, 05 2012 10:05:00+0000 | August, 05 2012 10:35:00+0000 | SOURCE1,SOURCE2 | August, 05 2012 10:01:00+0000 | August, 05 2012 12:17:00+0000 | 2 |
| 242 | RECEIVER2 | August, 05 2012 14:15:00+0000 | August, 05 2012 14:55:00+0000 | SOURCE1 | August, 05 2012 13:35:00+0000 | August, 05 2012 13:45:00+0000 | 1 |
| 243 | RECEIVER2 | August, 06 2012 06:55:00+0000 | August, 06 2012 21:19:00+0000 | SOURCE2 | August, 05 2012 14:55:00+0000 | August, 05 2012 16:22:00+0000 | 1 |

Related

Fill in gap with prior record value having a populated quantity LIMIT: no analytics can be used

Assume data with structure like this: Demo
WITH CAL AS(
SELECT 2022 YR, '01' PERIOD UNION ALL
SELECT 2022 YR, '02' PERIOD UNION ALL
SELECT 2022 YR, '03' PERIOD UNION ALL
SELECT 2022 YR, '04' PERIOD UNION ALL
SELECT 2022 YR, '05' PERIOD UNION ALL
SELECT 2022 YR, '06' PERIOD UNION ALL
SELECT 2022 YR, '07' PERIOD UNION ALL
SELECT 2022 YR, '08' PERIOD UNION ALL
SELECT 2022 YR, '09' PERIOD UNION ALL
SELECT 2022 YR, '10' PERIOD UNION ALL
SELECT 2022 YR, '11' PERIOD UNION ALL
SELECT 2022 YR, '12' PERIOD ),
Data AS (
SELECT 2022 YR, '01' PERIOD, 10 qty UNION ALL
SELECT 2022 YR, '02' PERIOD, 5 qty UNION ALL
SELECT 2022 YR, '04' PERIOD, 10 qty UNION ALL
SELECT 2022 YR, '05' PERIOD, 7 qty UNION ALL
SELECT 2022 YR, '09' PERIOD, 1 qty)
SELECT *
FROM CAL A
LEFT JOIN data B
on A.YR = B.YR
and A.Period = B.Period
WHERE A.Period <10 and A.YR = 2022
ORDER by A.period
Giving us:
+------+--------+------+--------+-----+
| YR | PERIOD | YR | PERIOD | qty |
+------+--------+------+--------+-----+
| 2022 | 01 | 2022 | 01 | 10 |
| 2022 | 02 | 2022 | 02 | 5 |
| 2022 | 03 | | | |
| 2022 | 04 | 2022 | 04 | 10 |
| 2022 | 05 | 2022 | 05 | 7 |
| 2022 | 06 | | | |
| 2022 | 07 | | | |
| 2022 | 08 | | | |
| 2022 | 09 | 2022 | 09 | 1 |
+------+--------+------+--------+-----+
With Expected result of:
+------+--------+------+--------+-----+
| YR | PERIOD | YR | PERIOD | qty |
+------+--------+------+--------+-----+
| 2022 | 01 | 2022 | 01 | 10 |
| 2022 | 02 | 2022 | 02 | 5 |
| 2022 | 03 | 2022 | 03 | 5 | -- SQL derives
| 2022 | 04 | 2022 | 04 | 10 |
| 2022 | 05 | 2022 | 05 | 7 |
| 2022 | 06 | 2022 | 06 | 7 | -- SQL derives
| 2022 | 07 | 2022 | 07 | 7 | -- SQL derives
| 2022 | 08 | 2022 | 08 | 7 | -- SQL derives
| 2022 | 09 | 2022 | 09 | 1 |
+------+--------+------+--------+-----+
QUESTION:
How would one go about filling in the gaps in period 03, 06, 07, 08 with a record quantity referencing the nearest earlier period/year. Note example is limited to a year, but gap could be on period 01 of 2022 and we would need to return 2021 period 12 quantity if populated or keep going back until quantity is found, or no such record exists.
LIMITS:
I am unable to use table value functions. (No lateral, no Cross Apply)
I'm unable to use analytics (no lead/lag)
correlated subqueries are iffy.
Why the limits? this must be done in a HANA graphical calculation view. Which supports neither of those concepts. I've not done enough to know how to do a correlated subquery at this time to know if it's possible.
I can create any number of inline views or materialized datasets needed.
STATISTICS:
this table has over a million rows and grows at a rate of productlocationperiodsyears. so if you have 100020126=1.4 mil+ in 6 years with just 20 locations and 1000 products...
each product inventory may be recorded at at the end of a month for a given location. (no activity for product/location, no record hence a gap. Silly mainframe save storage technique used in a RDBMS... I mean how do I know the system just didn't error on inserting the record for that material; or omit it for some reason... )
In the cases where it is not recorded, we need to fill in the gap. The example provided is broken down to the bear bones without location and material as I do not believe it is not salient to a solution.
ISSUE:
I'll need to convert the SQL to a "HANA Graphical calculation view"
Yes, I know I could create a SQL Script to do this. This is not allowed.
Yes, I know I could create a table function to do this. This is not allowed.
This must be accomplished though Graphical calculation view which supports basic SQL functions
BASIC Joins (INNER, OUTER, FULL OUTER, Cross), filters, aggregation, a basic rank at a significant performance impact if all records are evaluated. (few other things) but not window functions, not cross Join, lateral...
as to why it has to do with maintenance and staffing. The staffed area is a reporting area who uses tools to create views used in universes. The area wishes to keep all Scripts out of use to keep cost for employees lower as SQL knowledge wouldn’t be required for future staff positions, though it helps!
For those familiar this issue is sourced from MBEWH table in an ECC implementation

This can be done with graphical calculation views in SAP HANA.
It's not pretty and probably not very efficient, though.
Whether or not the persons that are supposedly able to maintain graphical calc. views but not SQL statement will be able to successfully maintain this is rather questionable.
First, the approach in SQL, so that the approach becomes clear:
create column table calendar
( yr integer
, period nvarchar (2)
, primary key (yr, period))
insert into calendar
( select year (generated_period_start) as yr
, ABAP_NUMC( month(generated_period_start), 2) as period
from series_generate_date ('INTERVAL 1 MONTH', '2022-01-01', '2023-01-01'));
create column table data
( yr integer
, period nvarchar (2)
, qty integer
, primary key (yr, period));
insert into data values (2022, '01', 10);
insert into data values (2022, '02', 5);
insert into data values (2022, '04', 10);
insert into data values (2022, '05', 7);
insert into data values (2022, '09', 1);
SELECT *
FROM CALendar A
LEFT JOIN data B
on A.YR = B.YR
and A.Period = B.Period
WHERE A.Period <'10' and A.YR =2022
ORDER BY A.period;
/*
YR PERIOD YR PERIOD QTY
2,022 01 2,022 01 10
2,022 02 2,022 02 5
2,022 03 ? ? ?
2,022 04 2,022 04 10
2,022 05 2,022 05 7
2,022 06 ? ? ?
2,022 07 ? ? ?
2,022 08 ? ? ?
2,022 09 2,022 09 1
*/
The NUMC() function creates ABAP NUMC strings (with leading zeroes) from integers. Other than this it's pretty much the tables from OP.
The general approach is to use the CALENDAR table as the main driving table that establishes for which dates/periods there will be output rows.
This is outer joined with the DATA table, leaving "missing" rows with NULL in the corresponding columns.
Next, the DATA table is joined again, this time with YEAR||PERIOD combinations that are strictly smaller then the YEAR||PERIOD from the CALENDAR table. This gives us rows for all the previous records in DATA.
Next, we need to pick which of the previous rows we want to look at.
This is done via the ROWNUM() function and a filter to the first record.
As graphical calculation views don't support ROWNUM() this can be exchanged with RANK() - this works as long as there are no two actual DATA records for the same YEAR||PERIOD combination.
Finally, in the projection we use COALESCE to switch between the actual information available in DATA and - if that is NULL - the previous period information.
/*
CAL_YR CAL_PER COALESCE(DAT_YR,PREV_YR) COALESCE(DAT_PER,PREV_PER) COALESCE(DAT_QTY,PREV_QTY)
2,022 01 2,022 01 10
2,022 02 2,022 02 5
2,022 03 2,022 02 5
2,022 04 2,022 04 10
2,022 05 2,022 05 7
2,022 06 2,022 05 7
2,022 07 2,022 05 7
2,022 08 2,022 05 7
2,022 09 2,022 09 1
*/
So far, so good.
The graphical calc. view for that looks like this:
As it's cumbersome to screenshoot every single node, I will include the just most important ones:
1. CAL_DAT_PREV
Since only equality joins are supported in graphical calc. views we have to emulate the "larger than" join. For that, I created to calculated/constant columns join_const with the same value (integer 1 in this case) and joined on those.
2. PREVS_ARE_OLDER
This is the second part of the emulated "larger than" join: this projection simply filters out the records where cal_yr_per is larger or equal than prev_yr_per. Equal values must be allowed here, since we don't want to loose records for which there is no smaller YEAR||PERIOD combination. Alternatively, one could insert an intial record into the DATA table, that is guranteed to be smaller than all other entries, e.g. YEAR= 0001 and PERIOD=00 or something similar. If you're familiar with SAP application tables, then you've seen this approach.
By the way - for convenience reasons, I created calculated columns that combine the YEAR and PERIOD for the different tables - cal_yr_per, dat_yr_per, and prev_yr_per.
3. RANK_1
Here the rank is created for PREV_YR_PR, picking the first one only, and starting a new group for every new value fo cal_yr_per.
This value is returned via Rank_Column.
4. REDUCE_PREV
The final piece of the puzzle: using a filter on Rank_Column = 1 we ensure to only get one "previous" row for every "calendar" row.
Also: by means of IF(ISNULL(...), ... , ...) we emulate COALESCE(...) in three calculated columns, aptly named FILL....
And that's the nuts and bolts of this solution.
"It's works on my computer!" is probably the best I can say about it.
SELECT "CAL_YR", "CAL_PERIOD"
, "DAT_YR", "DAT_PER", "DAT_QTY"
, "FILL_YR", "FILL_QTY", "FILL_PER"
FROM "_SYS_BIC"."scratch/QTY_FILLUP"
ORDER BY "CAL_YR" asc, "CAL_PERIOD" asc;
/*
CAL_YR CAL_PERIOD DAT_YR DAT_PER DAT_QTY FILL_YR FILL_QTY FILL_PER
2,022 01 2,022 01 10 2,022 10 01
2,022 02 2,022 02 5 2,022 5 02
2,022 03 ? ? ? 2,022 5 02
2,022 04 2,022 04 10 2,022 10 04
2,022 05 2,022 05 7 2,022 7 05
2,022 06 ? ? ? 2,022 7 05
2,022 07 ? ? ? 2,022 7 05
2,022 08 ? ? ? 2,022 7 05
2,022 09 2,022 09 1 2,022 1 09
2,022 10 ? ? ? 2,022 1 09
2,022 11 ? ? ? 2,022 1 09
2,022 12 ? ? ? 2,022 1 09
*/

Using SQL Server 2012 LAG

I am trying to write a query using SQL Server 2012 LAG function to retrieve data from my [Order] table where the datetime difference between a row and the previous row is less than equal to 2 minutes.
The result I'm expecting is
1234 April, 28 2012 09:00:00
1234 April, 28 2012 09:01:00
1234 April, 28 2012 09:03:00
5678 April, 28 2012 09:40:00
5678 April, 28 2012 09:42:00
5678 April, 28 2012 09:44:00
but I'm seeing
1234 April, 28 2012 09:00:00
1234 April, 28 2012 09:01:00
1234 April, 28 2012 09:03:00
5678 April, 28 2012 09:40:00
5678 April, 28 2012 09:42:00
5678 April, 28 2012 09:44:00
91011 April, 28 2012 10:00:00
The last row should not be returned. Here is what I have tried: SQL Fiddle
Any one with ideas?

Okay first of all I added a row to show you where someone else's answer doesn't work but they deleted it now.
Now for the logic in my query. You said you want each row that is within two minutes of another row. That means you have to look not only backwards, but also forwards with LEAD(). In your query, you returned when previous time was NULL so it simply returned the first value of each OrderNumber regardless if it was right or wrong. By chance, the first values of each of your OrderNumbers needed to be returned until you get to the last OrderNumber where it broke. My query corrects that and should work for all your data.
CREATE TABLE [Order]
(
OrderNumber VARCHAR(20) NOT NULL
, OrderDateTime DATETIME NOT NULL
);
INSERT [Order] (OrderNumber, OrderDateTime)
VALUES
('1234', '2012-04-28 09:00:00'),
('1234', '2012-04-28 09:01:00'),
('1234', '2012-04-28 09:03:00'),
('5678', '2012-04-28 09:40:00'),
('5678', '2012-04-28 09:42:00'),
('5678', '2012-04-28 09:44:00'),
('91011', '2012-04-28 10:00:00'),
('91011', '2012-04-28 10:25:00'),
('91011', '2012-04-28 10:27:00');
with Ordered as (
select
OrderNumber,
OrderDateTime,
LAG(OrderDateTime,1) over (
partition by OrderNumber
order by OrderDateTime
) as prev_time,
LEAD(OrderDateTime,1) over (
partition by OrderNumber
order by OrderDateTime
) as next_time
from [Order]
)
SELECT OrderNumber,
OrderDateTime
FROM Ordered
WHERE DATEDIFF(MINUTE,OrderDateTime,next_time) <= 2 --this says if the next value is less than or equal to two minutes away return it
OR DATEDIFF(MINUTE,prev_time,OrderDateTime) <= 2 --this says if the prev value is less than or equal to 2 minutes away return it
Results(Remember I added a row):
OrderNumber OrderDateTime
-------------------- -----------------------
1234 2012-04-28 09:00:00.000
1234 2012-04-28 09:01:00.000
1234 2012-04-28 09:03:00.000
5678 2012-04-28 09:40:00.000
5678 2012-04-28 09:42:00.000
5678 2012-04-28 09:44:00.000
91011 2012-04-28 10:25:00.000
91011 2012-04-28 10:27:00.000

Lost trying to use DISTINCT and GROUP BY

I'm having trouble with something that I thought would've been simple...
I have a simple model Statistic that stores a date (created_at), a user_fingerprint and a structure_id. From that, I'd like to create a graph to show #visitors per day.
So I did
#structure.statistics.order('DATE(created_at) ASC').group('DATE(created_at)').count
Which works and return what I expect:
=> {Sat, 18 May 2014=>50, Mon, 19 May 2014=>90}
Now I'd like the same, but I want to squeeze all rows with the same couple (created_at, user_fingerprint). For instance:
| created_at | user_fingerprint | structure_id |
|----------------------|------------------|--------------|
| Sat, 18 May 2014 2PM | '124512341' | 12 |
| Sat, 18 May 2014 4PM | '124512341' | 12 |
| Mon, 19 May 2014 6PM | '124512341' | 12 |
With this data, I would have:
=> {Sat, 18 May 2014=>1, Mon, 19 May 2014=>1}
# instead of
=> {Sat, 18 May 2014=>2, Mon, 19 May 2014=>1}
I would be able to do it in Ruby but I wondered if I could directly do it with SQL & Arel.
Solution regarding your answers
Here is what I did at the end:
#impressions = {}
# The following is to ensure I will have a key when there is no stat for a day.
(15.days.ago.to_date..Date.today).each { |date| #impressions[date] = 0 }
#structure.statistics.where( Statistic.arel_table[:created_at].gt(Date.today - 15.days) )
.order('DATE(created_at) ASC')
.group('DATE(created_at)')
.select('DATE(created_at) as created_at, COUNT(DISTINCT(user_fingerprint)) as user_count')
.each{ |stat| #impressions[stat.created_at] = stat.user_count }
I need to do a bit of Ruby though but that's good for me.

your query would look something like (Oracle dialect)
select trunc(created_at), user_fingerprint, count(distinct user_fingerprint)
from statistic
group by trunc(created_at), user_fingerprint
there is no SQL standard for getting date portion out of datetime data field.
oracle: trunc(dt_column)
sql server: cast(dt_column As Date)
mysql: DATE(dt_column)

#structure.statistics.order('DATE(created_at) ASC').group('DATE(created_at)').select('count(distinct(user_fingerprint)) as user_count').first.user_count

check date ranges with other date ranges

I have the table Distractionswith the following columns:
id startTime endTime(possible null)
Also, I have two parameters, it's period. pstart and pend.
I have to find all distractions for the period and count hours.
For example, we have:
Distractions:
`id` `startTime` `endTime`
1 01.01.2014 00:00 03.01.2014 00:00
2 25.03.2014 00:00 02.04.2014 00:00
3 27.03.2014 00:00 null
The columns contains time, but don't use them.
Period is pstart = 01.01.2014 and pend = 31.03.2014
For example above the result is equal:
for id = 1 - 72 hours
for id = 2 - 168 hours(7 days from 25 to
31 - end of period)
for id = 3 - 120 hours (5 days from 27 to 31 - the distraction not completed, therefore select end of period)
the sum is equal 360.
My code:
select
sum ((ds."endTime" - ds."startTime")*24) as hoursCount
from "Distractions" ds
--where ds."startTime" >= :pstart and ds."endTime" <= :pend
-- I don't know how to create where condition properly.

You'll have to take care of cases where date ranges are outside the input range and also account for starttime and endtime being null.
This where clause should keep only the valid data ranges. I have substituted the null startime with a earliest date and null endtime with a date
far in the future.
where coalesce(endtime,date'9999-12-31') >= :pstart
and coalesce(starttime,date'0000-01-01') <= :pend
Once you have filtered records, you need to adjust the date values so that anything starting before the input :pstart is moved forward to the :pstart,
and anything ending after :pend is moved back to :pend. Subtracting these two should give the value you are looking for. But, there is a catch. Since
the time is 00:00:00, when you subtract the dates, it will miss one full day. So, add 1 to it.
SQL Fiddle
Oracle 11g R2 Schema Setup:
create table myt(
id number,
starttime date,
endtime date
);
insert into myt values( 1 ,date'2014-01-01', date'2014-01-03');
insert into myt values( 2 ,date'2014-03-25', date'2014-04-02');
insert into myt values( 3 ,date'2014-03-27', null);
insert into myt values( 4 ,null, date'2013-04-02');
insert into myt values( 5 ,date'2015-03-25', date'2015-04-02');
insert into myt values( 6 ,date'2013-12-25', date'2014-04-09');
insert into myt values( 7 ,date'2013-12-26', date'2014-01-09');
Query 1:
select id,
case when coalesce(starttime,date'0000-01-01') < date'2014-01-01'
then date'2014-01-01'
else starttime
end adj_starttime,
case when coalesce(endtime,date'9999-12-31') > date'2014-03-31'
then date'2014-03-31'
else endtime
end adj_endtime,
(case when coalesce(endtime,date'9999-12-31') > date'2014-03-31'
then date'2014-03-31'
else endtime
end -
case when coalesce(starttime,date'0000-01-01') < date'2014-01-01'
then date'2014-01-01'
else starttime
end
+ 1) * 24 hoursCount
from myt
where coalesce(endtime,date'9999-12-31') >= date'2014-01-01'
and coalesce(starttime,date'0000-01-01') <= date'2014-03-31'
order by 1
Results:
| ID | ADJ_STARTTIME | ADJ_ENDTIME | HOURSCOUNT |
|----|--------------------------------|--------------------------------|------------|
| 1 | January, 01 2014 00:00:00+0000 | January, 03 2014 00:00:00+0000 | 72 |
| 2 | March, 25 2014 00:00:00+0000 | March, 31 2014 00:00:00+0000 | 168 |
| 3 | March, 27 2014 00:00:00+0000 | March, 31 2014 00:00:00+0000 | 120 |
| 6 | January, 01 2014 00:00:00+0000 | March, 31 2014 00:00:00+0000 | 2160 |
| 7 | January, 01 2014 00:00:00+0000 | January, 09 2014 00:00:00+0000 | 216 |

Columns to Rows in SQL Server

I have a query returning a table which looks like:
Location | November | December | January | February | March | ... | October |
CT 30 70 80 90 60 30
etc.
and I'd like it to look like:
Location | Month | Value |
CT November 30
CT December 70
CT January 80
...
CT October 30
It looks like an unpivot, but I didn't pivot to get it into that form since the base table has the months as columns (the values are just sums of values grouped by location). I've seen plenty of rows-to-columns questions but I haven't found a good columns-to-rows answer, so I'm hoping someone can help me.
Thanks!

You will want to use the UNPIVOT function. This transforms the values of your columns and turns it into rows:
select location, month, value
from <yourquery here>
unpivot
(
value
for month in (January, February, March, April, May, June, July, August,
September, October, November, December)
) unpiv
See SQL Fiddle with Demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query to Group Clusters of Related Events with Criteria - sql

Related

Fill in gap with prior record value having a populated quantity LIMIT: no analytics can be used

Using SQL Server 2012 LAG

Lost trying to use DISTINCT and GROUP BY

check date ranges with other date ranges

Columns to Rows in SQL Server

Categories

Resources