Counting rows in SQL stament using grouping by incremnental variables

Counting rows in SQL stament using grouping by incremnental variables - sql

Table I'm trying to get analytical result from is voice calls record. Each call (one row) has duration in seconds (just int value, not datetime). I'm trying to get number of records grouped by 15 seconds spans like this:
+-------------------+
|Period | Count |
+-------------------+
| 0-15 | 213421 |
|15-30 | 231123 |
|30-45 | 1234 |
+-------------------+
Starts of period is 0, end is 86400.
I've tried some combinations with setting variables for start and end and then incrementing them by 15, but i'm not very successful, and i can't understand what is going wrong.
Please help out. Thank you!

The following query fits your needs:
SELECT
(duration - duration % 15),
COUNT(*)
FROM
call
GROUP BY
(duration - duration % 15)
ORDER BY
1;
You can also add some string formatting, in case you need the output exactly as you described it:
SELECT
(duration - duration % 15)::text || '-' || (duration - duration % 15 + 15)::text,
COUNT(*)
FROM
call
GROUP BY
(duration - duration % 15)
ORDER BY
1;

In MySQL:
SELECT CONCAT(span * 15, '-', span * 15 + 15), COUNT(*) AS cnt
FROM (
SELECT v.*, FLOOR(period / 15) AS span
FROM voice_calls v
) q
GROUP BY
span
UPDATE:
The solution you posted will work, assuming there will always be more than 5760 rows.
But you better create a dummy rowset of 5760 rows and use it in OUTER JOIN:
CREATE TABLE spans (span INT NOT NULL PRIMARY KEY);
INSERT
INTO spans
VALUES (0),
(1),
...
(5759)
SELECT span * 15, COUNT(*)
FROM spans
LEFT JOIN
calls
ON call.duration >= span * 15
AND call.duration < span * 15 + 15
GROUP BY
span
It will be more efficient and sane, as it can neither underflow (if less than 5760 rows in calls), nor take to much time if there are millions of rows there.

I think a query like this should work; You'll have to pivot the results yourself on the display though (eg, this query gets the results horizontally, you'll have to display them vertically):
SELECT
SUM(CASE WHEN CallLength BETWEEN 0 AND 15 THEN CallLength ELSE 0 END) AS ZeroToFifteen,
SUM(CASE WHEN CallLength BETWEEN 16 AND 30 THEN CallLength ELSE 0 END) AS FifteenToThirty
FROM CallTable
But after re-reading the question, putting case statements up to 86400 is probably out of the question... Oh well :)

This worked at last:
SET #a:=-15, #b:=0;
SELECT t.start_time, t.end_time, count(c.duration)
FROM calls c,
(
SELECT (#a:=#a+15) as start_time, (#b:=#b+15) as end_time
FROM `calls`
GROUP BY
cdr_id
) as t
WHERE c.duration BETWEEN t.start_time and t.end_time
GROUP BY
t.start_time, t.end_time

Related

SQL query to detect when accumulated value reaches limit

Given the following table in PostgreSQL
CREATE TABLE my_table (
time TIMESTAMP NOT NULL,
minutes INTEGER NOT NULL);
I am forming a query to detect when the accumulated value of 'minutes' crosses an hour boundary. For example with the following data in the table:
time | minutes
-------------------------
<some timestamp> | 55
<some timestamp> | 4
I want to know how many minutes remain before we reach 60 (one hour). In the example the answer would be 1 since 55 + 4 + 1 = 60.
Further, I would like to know this at insert time, so if my last insert made the accumulated number of minutes cross an "hour boundary" I would like it to return boolean true.
My naive attempt, without the insert part, looks like this:
SELECT
make_timestamptz(
date_part('year', (SELECT current_timestamp))::int,
date_part('month', (SELECT current_timestamp))::int,
date_part('day', (SELECT current_timestamp))::int,
date_part('hour', (SELECT current_timestamp))::int,
0,
0
) AS current_hour,
SUM(minutes) as sum_minutes
FROM
my_table
WHERE
sum_minutes >= 60
I would then take a row count above 0 to mean we crossed the boundary. But it is hopelessly inelegant, and does not work. Is this even possible? Would be possible to make it somewhat performant?
I am using Timescaledb/PostgreSQL on linux.

Hmmm . . . insert doesn't really return values. But you can use a CTE to do the insert and then sum the values after the insert:
with i as (
insert into my_table ( . . . )
values ( . . . )
returning *
)
select ( coalesce(i.minutes, 0) + coalesce(t.minutes, 0) ) > 60
from (select sum(minutes) as minutes from i) i cross join
(select sum(minutes) as minutes from my_table) t

The INSERT could look like this:
WITH cur_sum AS (
SELECT coalesce(sum(minutes), 0) AS minutes
FROM my_table
WHERE date_trunc('hour', current_timestamp) = date_trunc('hour', time)
)
INSERT INTO my_table (time, minutes)
SELECT current_timestamp, 12 FROM cur_sum
RETURNING cur_sum.minutes + 12 > 60;
This example inserts 12 minutes at the current time.

SQL query for MS Access :: Retrieve column data by after some interval

I want to retrieve a column data during 5 min difference for whole day, From morning 6 to 12 night from MS access DB.
e.g. I have EMP_DATA table:
The result will be,
I want a sql query which give me above result.

If your data is complete, you can just filter on the minute and then use the mod operator:
select t.*
from t
where minute(time_value) mod 5 = 0;

I tried below (ms sql) query which provide correct result,
select * from yourTableName where CAST(DATEPART(minute, Date_Time) as int) % 5 = 0 and CAST(DATEPART(second, Date_Time) as int) = 0 OR CAST(DATEPART(second, Date_Time) as int) = 1 order by Date_Time
CAST(DATEPART(minute, Date_Time) as int) % 5 = 0 extract minute give minute remainder 0 and CAST(DATEPART(second, Date_Time) as int) = 0 gives 00 sec records. So the expected results will retrieve.

Irregular grouping of timestamp variable

I have a table organized as follows:
id lateAt
1231235 2019/09/14
1242123 2019/09/13
3465345 NULL
5676548 2019/09/28
8986475 2019/09/23
Where lateAt is a timestamp of when a certain loan's payment became late. So, for each current date - I need to look at these numbers daily - there's a certain amount of entries which are late for 0-15, 15-30, 30-45, 45-60, 60-90 and 90+ days.
This is my desired output:
lateGroup Count
0-15 20
15-30 22
30-45 25
45-60 32
60-90 47
90+ 57
This is something I can easily calculate in R, but to get the results back to my BI dashboard I'd have to create a new table in my database, which I don't think is a good practice. What is the SQL-native approach to this problem?

I would define the "late groups" using a range, the join against the number of days:
with groups (grp) as (
values
(int4range(0,15, '[)')),
(int4range(15,30, '[)')),
(int4range(30,45, '[)')),
(int4range(45,60, '[)')),
(int4range(60,90, '[)')),
(int4range(90,null, '[)'))
)
select grp, count(t.user_id)
from groups g
left join the_table t on g.grp #> current_date - t.late_at
group by grp
order by grp;
int4range(0,15, '[)') creates a range from 0 (inclusive) and 15 (exclusive)
Online example: https://rextester.com/QJSN89445

The quick and dirty way to do this in SQL is:
SELECT '0-15' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 0
AND (CURRENT_DATE - t.lateAt) < 15
UNION
SELECT '15-30' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 15
AND (CURRENT_DATE - t.lateAt) < 30
UNION
SELECT '30-45' AS lateGroup,
COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 30
AND (CURRENT_DATE - t.lateAt) < 45
-- Etc...
For production code, you would want to do something more like Ross' answer.

You didn't mention which DBMS you're using, but nearly all of them will have a construct known as a "value constructor" like this:
select bins.lateGroup, bins.minVal, bins.maxVal FROM
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
If your DBMS doesn't have it, then you can probably use UNION ALL:
SELECT '0-15' as lateGroup, 0 as minVal, 15 as maxVal
union all SELECT '15-30',15,30
union all SELECT '30-45',30,45
Then your complete query, with the sample data you provided, would look like this:
--- example from SQL Server 2012 SP1
--- first let's set up some sample data
create table #temp (id int, lateAt datetime);
INSERT #temp (id, lateAt) values
(1231235,'2019-09-14'),
(1242123,'2019-09-13'),
(3465345,NULL),
(5676548,'2019-09-28'),
(8986475,'2019-09-23');
--- here's the actual query
select lateGroup, count(*) as Count
from #temp as T,
(VALUES
('0-15',0,15),
('15-30',15.0001,30), -- increase by a small fraction so bins don't overlap
('30-45',30.0001,45),
('45-60',45.0001,60),
('60-90',60.0001,90),
('90-99999',90.0001,99999)
) AS bins(lateGroup,minVal,maxVal)
) AS bins(lateGroup,minVal,maxVal)
where datediff(day,lateAt,getdate()) between minVal and maxVal
group by lateGroup
order by lateGroup
--- remove our sample data
drop table #temp;
Here's the output:
lateGroup Count
15-30 2
30-45 2
Note: rows with null lateAt are not counted.

I think you can do it all in one clear query :
with cte_lategroup as
(
select *
from (values(0,15,'0-15'),(15,30,'15-30'),(30,45,'30-45')) as t (mini, maxi, designation)
)
select
t2.designation
, count(*)
from test t
left outer join cte_lategroup t2
on current_date - t.lateat >= t2.mini
and current_date - lateat < t2.maxi
group by t2.designation;
With a preset like yours :
create table test
(
id int
, lateAt date
);
insert into test
values (1231235, to_date('2019/09/14', 'yyyy/mm/dd'))
,(1242123, to_date('2019/09/13', 'yyyy/mm/dd'))
,(3465345, null)
,(5676548, to_date('2019/09/28', 'yyyy/mm/dd'))
,(8986475, to_date('2019/09/23', 'yyyy/mm/dd'));

Modulo Time in SQL Server 2005 - Return data every n hours

I have something like this:
SELECt *
FROM (
SELECT prodid, date, time, tmp, rowid
FROM live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
) b
WHERE b.rowid % 400 = 0
FYI: The reason for the convert in the where clause, is because my date is stored as a varchar(10), I had to convert it to datetime in order to get the correct range of data. (I tried a bunch of different things and this worked)
I'm wondering how I can return the data I want every 4 hours during those selected dates. I have data collected approximately every 5 seconds (with some breaks in data) - ie data wasn't collected during a 2 hour period, but then continues at 5 second increments.
In my example I just used a modulo with my rowid - and the syntax works, but as I mentioned above there are some periods where data isnt collected so using logic like: if you take data every 5 seconds and multiple that by 4 hours you can approximately say how many rows are in between wont work.
My time column is a varchar column and is in the form hh:mm:ss
My ideal output is:
| prodid | date | time | tmp |
| 4 | 3/19/2012 | 10:00:00 | 2.3 |
| 7 | 3/19/2012 | 14:00:24 | 3.2 |
As you can see I can be a bit off (in terms of seconds) - I more so need the approximate value in terms of time.
Thank you in advance.

This should work
select prodid, date, time, tmp, rowid
from live_pilot_plant as lpp
inner join (
select min(prodid) as prodid -- is prodid your PK?? if not change it to rowid or whatelse is your PK
from live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101) -- or whatever you want
AND CONVERT(DATETIME, '3/31/2012', 101) -- for better performance it is on the inner select
group by date,
floor( -- floor makes the trick
convert(float,convert(datetime, time)) -- assumes "time" column is a varchar containing data like '19:23:05'
* 6 -- 6 comes form 24 hours / 4 hours
)
) as filter on lpp.prodid = filter.prodid -- if prodid is not the PK also correct here.
A side note for everyone else who have date + time data in only one datetime field, suppose named "when_it_was", the group by can be as simple as:
group by floor(when_it_was * 6) -- again, 6 comes from 24/4

something along the lines of the following should work. Basically create date + time partitions, each partition representing a block of 4 hours and pick the record with the highest rank from each partition
select * from (
select *,
row_number() over (partition by date,cast(left( time, charindex( ':', time) - 1) as int) / 4 order by
date, time) as ranker from live_pilot_plant
) Z where ranker = 1

Assuming rowid is a PK and increased with date/time. Just convert time field to 4 hours interval number substring(time,1,2))/4 and select MIN(rowid) from each of 4 hours groups in a day:
select prodid, date, time, tmp, rowid from live_pilot_plant where rowid in
(
select min(rowid)
from live_pilot_plant
WHERE CONVERT(DATETIME, date, 101) BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
group by date,convert(int,substring(time,1,2))/4
)
order by CONVERT(DATETIME, date, 101),time

MySQL AVG function for recent 15 records by date (order date desc) in every symbol

I am trying to create a statement in SQL (for a table which holds stock symbols and price on specified date) with avg of 5 day price and avg of 15 days price for each symbol.
Table columns:
symbol
open
high
close
date
The average price is calculated from last 5 days and last 15 days. I tried this for getting 1 symbol:
SELECT avg(close),
avg(`trd_qty`)
FROM (SELECT *
FROM cashmarket
WHERE symbol = 'hdil'
ORDER BY `M_day` desc
LIMIT 0,15 ) s
but I couldn't get the desired list for showing avg values for all symbols.

You can either do it with row numbers as suggested by astander, or you can do it with dates.
This solution will also take the last 15 days if you don't have rows for every day while the row number solution takes the last 15 rows. You have to decide which one works better for you.
EDIT: Replaced AVG, use CASE to avoid division by 0 in case no records are found within the period.
SELECT
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.close * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS close_5,
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.trd_qty * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS trd_qty_5,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.close * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS close_15,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.trd_qty * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS trd_qty_15
FROM
(
SELECT
cashmarket.*,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 15, 1, 0) AS is_15,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 5, 1, 0) AS is_5
FROM cashmarket
) c
The query returns the averages of close and trd_qty for the last 5 and the last 15 days. Current date is included, so it's actually today plus the last 4 days (replace < by <= to get current day plus 5 days).

Use:
SELECT DISTINCT
t.symbol,
x.avg_5_close,
y.avg_15_close
FROM CASHMARKET t
LEFT JOIN (SELECT cm_5.symbol,
AVG(cm_5.close) 'avg_5_close',
AVG(cm_5.trd_qty) 'avg_5_qty'
FROM CASHMARKET cm_5
WHERE cm_5.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
GROUP BY cm_5.symbol) x ON x.symbol = t.symbol
LEFT JOIN (SELECT cm_15.symbol,
AVG(cm_15.close) 'avg_15_close',
AVG(cm_15.trd_qty) 'avg_15_qty'
FROM CASHMARKET cm_15
WHERE cm_15.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 15 DAY) AND NOW()
GROUP BY cm_15.symbol) y ON y.symbol = t.symbol
I'm unclear on what trd_qty is, or how it factors into your equation considering it isn't in your list of columns.
If you want to be able to specify a date rather than the current time, replace the NOW() with #your_date, an applicable variable. And you can change the interval values to suit, in case they should really be 7 and 21.

Have a look at How to number rows in MySQL
You can create the row number per item for the date desc.
What you can do is to retrieve the Rows where the rownumber is between 1 and 15 and then apply the group by avg for the selected data you wish.

trdqty is the quantity traded on particular day.
the days are not in order coz the market operates only on weekdays and there are holidays too so date may not be continuous

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Counting rows in SQL stament using grouping by incremnental variables - sql

This worked at last: SET #a:=-15, #b:=0; SELECT t.start_time, t.end_time, count(c.duration) FROM calls c, ( SELECT (#a:=#a+15) as start_time, (#b:=#b+15) as end_time FROM `calls` GROUP BY cdr_id ) as t WHERE c.duration BETWEEN t.start_time and t.end_time GROUP BY t.start_time, t.end_time

Related

SQL query to detect when accumulated value reaches limit

SQL query for MS Access :: Retrieve column data by after some interval

Irregular grouping of timestamp variable

Modulo Time in SQL Server 2005 - Return data every n hours

MySQL AVG function for recent 15 records by date (order date desc) in every symbol

Categories

Resources