SQL query - limit and top - sql

I am writing a SQL command to try and get the average duration of a transaction for the last 30
SELECT SUM (end_time - start_time) as sum, 30 as count
FROM orders
WHERE customer_id = ".$customer_id." AND status = 'end'
How can I edit the sum part at the start so I only get the first 30 to average, as currently its taking the sum of every row
cheers
Jack

You could select from a subquery that only retrieves the 30 youngest orders (by start_time).
The syntax may vary depending on your DBMS.
E.g.
SELECT sum(end_time - start_time),
30 count
FROM (SELECT start_time,
end_time
FROM orders
ORDER BY start_time DESC
LIMIT 30) x;
or
SELECT sum(end_time - start_time),
30 count
FROM (SELECT TOP 30
start_time,
end_time
FROM orders
ORDER BY start_time DESC) x;
or maybe others.
Maybe this helps to point you in the right direction.

Related

SQL to find when amount reached a certain value for the first time

I have a table that has 3 columns: user_id, date, amount. I need to find out on which date the amount reached 1 Million for the first time. The amount can go up or down on any given day.
I tried using partition by user_id order by date desc but I can't figure out how to find the exact date on which it reached 1 Million for the first time. I am exploring lead, lag functions. Any pointers would be appreciated.
You may use conditional aggregation as the following:
select user_id,
min(case when amount >= 1000000 then date end) as expected_date
from table_name
group by user_id
And if you want to check where the amount reaches exactly 1M, use case when amount = 1000000 ...
If you meant that the amount is a cumulative amount over the increasing of date, then query will be:
select user_id,
min(case when cumulative_amount >= 1000000 then date end) as expected_date
from
(
select *,
sum(amount) over (partition by user_id order by date) cumulative_amount
from table_name
) T
group by user_id;
Try this:
select date,
sum(amount) as totalamount
from tablename
group by date
having totalamount>=1000000
order by date asc
limit 1
This would summarize the amount for each day and return 1 record where it reached 1M for the first time.
Sample result on SQL Fiddle.
And if you want it to be grouped for both date and user_id, add user_id in select and group by clauses.
select user_id, date,
sum(amount) as totalamount
from tablename
group by user_id,date
having totalamount>=1000000
order by date asc
limit 1
Example here.

How do I select a data every second with PostgreSQL?

I've got a SQL query that selects every data between two dates and now I would like to add the time scale factor so that instead of returning all the data it returns one data every second, minute or hour.
Do you know how I can achieve it ?
My query :
"SELECT received_on, $1 FROM $2 WHERE $3 <= received_on AND received_on <= $4", [data_selected, table_name, date_1, date_2]
The table input:
As you can see there are several data the same second, I would like to select only one per second
If you want to select data every second, you may use ROW_NUMBER() function partitioned by 'received_on' as the following:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY received_on ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
If you want to select data every minute or hour, you may use the extract function to get the number of seconds in 'received_on' and divide it by 60 to get the minutes or divide it by 3600 to get the hours.
epoch: For date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); for interval values, the total number of seconds in the interval
Group by minutes:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / 60) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
Group by hours:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / (60*60)) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
See a demo.
When there are several rows per second, and you only want one result row per second, you can decide to pick one of the rows for each second. This can be a randomly chosen row or you pick the row with the greatest or least value in a column as shown in Ahmed's answer.
It would be more typical, though, to aggregate your data per second. The columns show figures and you are interested in those figures. Your sample data shows two times the value 2509 and three times the value 2510 for the adc_v column at 2022-07-29, 15:52. Consider what you would like to see. Maybe you don't want this value go below some boundary, so you show the minimum value MIN(adc_v) to see how low it went in the second. Or you want to see the value that occured most often in the second MODE(adc_v). Or you'd like to see the average value AVG(adc_v). Make this decision for every value, so as to get the informarion most vital to you.
select
received_on,
min(adc_v),
avg(adc_i),
...
from mytable
group by received_on
order by received_on;
If you want this for another interval, say an hour instead of the month, truncate your received_on column accordingly. E.g.:
select
date_trunc('hour', received_on) as received_hour,
min(adc_v),
avg(adc_i),
...
from mytable
group by date_trunc('hour', received_on)
order by date_trunc('hour', received_on);

Teradata Query for extracting data based on time interval (10 minutes

Can someone help me with a query in Teradata/SQL that I can use to extract all users that have more than 3 transactions in a timestamp of 10 minutes. Below is an extract of the table in question.
Kind regards,
You can use lag()/lead() and time comparisons. To get the rows where there are 2 such transactions before:
select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, -2) over (partition by userid order by transaction_timestamp) + interval '10' minute
If you only want the users:
select distinct userid
from (select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, 2) over (partition by userid order by transaction_timestamp) + interval '10' minute
) t

How to write a SQL query to find the first time when sum greater than a number?

I have a postgresql table:
create table orders
(
id int,
cost int,
time timestamp
);
How to write a PostgreSQL query to find the first time when sum(cost) is greater than 200?
For example:
id cost time
------------------
1 120 2019-10-10
2 50 2019-11-11
3 80 2019-12-12
4 60 2019-12-16
The first time sum(cost) greater than 200 is 2019-12-12.
This is a variation of Nick's answer (which would be correct with an ORDER BY). However, this version is more efficient:
select d.*
from (select d.*,
sum(d.cost) over (order by d.time) as running_cost
from d
) d
where running_cost - cost < 200 and
running_cost >= 200;
Note that this does not require an order by in the outer query to work correctly.
There is also almost a way to solve this without using a subquery:
select o.*
from orders o
order by (sum(cost) over (order by time) >= 200) desc,
time asc
limit 1;
The only issue is that this will return a row if no row matches the condition. You could get around this by using a subquery in the limit:
limit (case when (select sum(cost) from orders) >= 400 then 1 else 0 end)
But then a subquery would be needed.
For PostgreSQL, you can get this result by using a CTE to calculate the SUM of cost for rows up to and including the current one, and then selecting the first row which has total cost >= 200:
WITH CTE AS (
SELECT time,
SUM(cost) OVER (ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total
FROM data
)
SELECT *
FROM CTE
WHERE total >= 200
ORDER BY total
LIMIT 1
Output:
time total
2019-12-12 250
Demo on SQLFiddle

Vertica Analytic function to count instances in a window

Let's say I have a dataset with two columns: ID and timestamp. My goal is to count return IDs that have at least n timestamps in any 30 day window.
Here is an example:
ID Timestamp
1 '2019-01-01'
2 '2019-02-01'
3 '2019-03-01'
1 '2019-01-02'
1 '2019-01-04'
1 '2019-01-17'
So, let's say I want to return a list of IDs that have 3 timestamps in any 30 day window.
Given above, my resultset would just be ID = 1. I'm thinking some kind of windowing function would accomplish this, but I'm not positive.
Any chance you could help me write a query that accomplishes this?
A relatively simple way to do this involves lag()/lead():
select t.*
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;
The lag() looks at the third timestamp in a series. The where checks if this is within 30 days of the original one. The result is rows where this occurs.
If you just want the ids, then:
select distinct id
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;