How to write limit by of clickhouse in gorm? - go-gorm

The SQL statement I am running in clickhouse is
SELECT
toStartOfInterval(create_time, INTERVAL 1 minute) AS time,
src_ip,
dst_ip,
SUM(pack_size) AS p
FROM
flow
WHERE
(
create_time < FROM_UNIXTIME(1676455200)
AND create_time >= FROM_UNIXTIME(1676433600)
)
GROUP BY
time,
src_ip,
dst_ip
ORDER BY time, p DESC
LIMIT 5 by time
But I don't know how to write the code corresponding to limit by in gorm. My code is as follows.
g.db.Table(g.table).Select("toStartOfInterval(create_time, INTERVAL 1 minute) AS time, src_ip, dst_ip, SUM(pack_size) AS p").
Where("create_time < FROM_UNIXTIME(?) AND create_time >= FROM_UNIXTIME(?)", endTime, startTime).
Group("time, src_ip, dst_ip").
Order("time").Order("p DESC")
How should I write the limit part of my sql statement in gorm? In gorm, I only found the code of limit n but did not find the way of writing limit n by xxx.

Related

SQL Big Query - How to write a COUNTIF statement applied to an INTERVAL column

I have a trip_duration column in interval format. I want to remove all observations less than 90 seconds and count how many observations match this condition.
My current SQL query is
WITH
org_table AS (
SELECT
ended_at - started_at as trip_duration
FROM `cyclistic-328701.12_month_user_data_cyclistic.20*`
)
SELECT
COUNTIF(x < 1:30) AS false_start
FROM trip_duration AS x;
I returns Syntax error: Expected ")" but got ":" at [8:16]
I have also tried
SELECT
COUNTIF(x < "0-0 0 0:1:30") AS false_start
FROM trip_duration AS x
It returns Table name "trip_duration" missing dataset while no default dataset is set in the request.
I've read through other questions and have not been able to write a solution.
My first thought is to cast the trip_duration from INTERVAL to TIME format so COUNT IF statements can reference a TIME formatted column instead of INTERVAl.
~ Marcus
Below example shows you the way to handle intervals
with trip_duration as (
select interval 120 second as x union all
select interval 10 second union all
select interval 2 minute union all
select interval 50 second
)
select
count(*) as all_starts,
countif(x < interval 90 second) as false_starts
from trip_duration
with output
To filter the data without the durations less than 90 secs:
SELECT
* # here is whatever field(s) you want to return
FROM
`cyclistic-328701.12_month_user_data_cyclistic.20*`
WHERE
TIMESTAMP_DIFF(ended_at, started_at, SECOND) > 90
You can read about the TIMESTAMP_DIFF function here.
To count the number of occurrences:
SELECT
COUNTIF(TIMESTAMP_DIFF(ended_at, started_at,SECOND) < 90) AS false_start,
COUNTIF(TIMESTAMP_DIFF(ended_at, started_at,SECOND) >= 90) AS non_false_start
FROM
`cyclistic-328701.12_month_user_data_cyclistic.20*`

Teradata Query for extracting data based on time interval (10 minutes

Can someone help me with a query in Teradata/SQL that I can use to extract all users that have more than 3 transactions in a timestamp of 10 minutes. Below is an extract of the table in question.
Kind regards,
You can use lag()/lead() and time comparisons. To get the rows where there are 2 such transactions before:
select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, -2) over (partition by userid order by transaction_timestamp) + interval '10' minute
If you only want the users:
select distinct userid
from (select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, 2) over (partition by userid order by transaction_timestamp) + interval '10' minute
) t

SQL query - limit and top

I am writing a SQL command to try and get the average duration of a transaction for the last 30
SELECT SUM (end_time - start_time) as sum, 30 as count
FROM orders
WHERE customer_id = ".$customer_id." AND status = 'end'
How can I edit the sum part at the start so I only get the first 30 to average, as currently its taking the sum of every row
cheers
Jack
You could select from a subquery that only retrieves the 30 youngest orders (by start_time).
The syntax may vary depending on your DBMS.
E.g.
SELECT sum(end_time - start_time),
30 count
FROM (SELECT start_time,
end_time
FROM orders
ORDER BY start_time DESC
LIMIT 30) x;
or
SELECT sum(end_time - start_time),
30 count
FROM (SELECT TOP 30
start_time,
end_time
FROM orders
ORDER BY start_time DESC) x;
or maybe others.
Maybe this helps to point you in the right direction.

Selecting timeranges based on insertion date of matched result

I have a messages(id, inserted_at) table
I want to select the N most recent messages whose inserted_at column is with say, 2 minutes of the single most recent message.
Is this possible?
You could do that with a sub select in the where clause:
select *
from messages
where inserted_at >=
( select max(inserted_at) - interval '90 minute'
from messages
)
order by inserted_at desc
limit 2
... and just specify the interval of your choice, and the limit value.
Note that the two conditions (record limit N, date limit) are in competition, and you may get fewer records than N, or else get some messages excluded although they are within the date/time limit.
See SQL fiddle
If you meant that the date/time condition was to be a minimum time difference, then turn around the where condition from >= to <=:
select *
from messages
where inserted_at <=
( select max(inserted_at) - interval '90 minute'
from messages
)
order by inserted_at desc
limit 2

Optimizing Max Value query

I wanted to ask for advice on how I could optimize my query? I hope to make it run faster as I feel it takes away from the UX with the speed.
My program collects data every hour and I want to optimize my query which takes the latest data and creates the top 100 people for a specific event,
SELECT a.user_id as user, nickname, value, s.created_on
FROM stats s,accounts a
WHERE a.user_id = s.user_id AND event_id = 1 AND s.created_on in
(SELECT created_on FROM stats WHERE created_on >= NOW() - '1 hour'::INTERVAL)
ORDER BY value desc
LIMIT 100
The query I have returns the top 100 from the last hour for event_id = 1 but I wish to optimize it and I believe the subquery is the root cause of the problem. I've tried other queries but they end up with either duplicates or the result is not from the latest dataset
Thank you
EDIT::
The table account contains [user_id, nickname]
the stats table contains [user_id, event_id, value, created_on]
NOW() - '1 hour'::INTERVAL in not MySQL syntax; perhaps you meant NOW() - INTERVAL 1 HOUR?
IN ( SELECT ... ) optimizes very poorly.
Not knowing the relationship between accounts and stats (1:1, 1:many, etc), I can only guess at what might work:
SELECT a.user_id as user, nickname, value, s.created_on
FROM stats s,accounts a
WHERE a.user_id = s.user_id
AND event_id = 1
AND s.created_on >= NOW() - INTERVAL 1 HOUR)
ORDER BY value desc
LIMIT 100
INDEX(event_id, value) -- if they are both in `a`
INDEX(user_id, created_on)
or...
SELECT user_id as user, nickname, value,
( SELECT MAX(created_on) FROM stats
WHERE user_id = a.user_id ) AS created_on
FROM accounts
AND event_id = 1
AND EXISTS
( SELECT *
FROM stats
WHERE created_on >= NOW() - INTERVAL 1 HOUR
AND user_id = a.user_id
)
ORDER BY value desc
LIMIT 100
INDEX(user_id, created_on)
INDEX(event_id, value)
Please provide
SHOW CREATE TABLE for each table
EXPLAIN SELECT ...; for any reasonable candidates