Analytics in sql

Analytics in sql - sql

I have a table with the following structure:
use_id (int) - event (str) - time (timestamp) - value (int)
Event can take several values : install, login, buy, etc.
I need to get all user records before updating the application.
For example moment of release of my application - 1 January 2019, but users may be install new version on any day.
How can i get sum(value) by the first and second versions. ---------
I tried self-join table, but I think that this is not the best solution.
Help me, please.

Here is the definition of your table (as I understood it from your comments and description):
CREATE TABLE user_events (
user_id integer,
event varchar,
time timestamp without time zone,
value integer
);
Here is the query you asked for:
SELECT
COUNT(user_id),
SUM(value)
FROM (
SELECT
DISTINCT ON (user_id)
user_id,time,value
FROM user_events
WHERE event='install'
ORDER BY user_id, time DESC
) last_installations
WHERE
time BETWEEN date '2018-01-01' AND date '2019-01-01';
Some explanations:
inner query ( last_installations ) selects last install events for each user
outer query filters out only installations of first and second versions, and calculates SUM(value) (as you asked) and COUNT(user_id) (I added for clarity - how many users are using 1 and 2 versions now)
UPDATE
sum value for all events by version
SELECT
event,
CASE
WHEN time BETWEEN date '2018-01-01' AND timestamp '2018-05-30 23:59:59' THEN 1
WHEN time BETWEEN date '2018-06-01' AND timestamp '2018-12-31 23:59:59' THEN 2
WHEN time > date '2018-01-01' THEN 3
ELSE 0 -- unknown version
END AS version,
SUM(value)
FROM user_events
GROUP BY 1,2

Related

Automatically execute script for each day

I have a script which calculates the metrics for yesterday automatically and inserts the data into the table, but I want to fill the table with all missing dates.Is it even possible to do automatically or I should manually start the script for each day?
Here is the simplified version of the script:
select sum(amount),id,yesterday
where date < yesterday
group by id
But for example the day before yesterday is also missing in the table, so I want the above script to execute, and also the script:
select sum(amount),id,day_before_yesterday
where date < day_before_yesterday
group by id

use the last date you have in the target table :
select sum(amount),id, max(date)
from table
where date < (select max(date) - interval 1 day from target)
group by id

SQL question: count of occurrence greater than N in any given hour

I'm looking through login logs (in Netezza) and trying to find users who have greater than a certain number of logins in any 1 hour time period (any consecutive 60 minute period, as opposed to strictly a clock hour) since December 1st. I've viewed the following posts, but most seem to address searching within a specific time range, not ANY given time period. Thanks.
https://dba.stackexchange.com/questions/137660/counting-number-of-occurences-in-a-time-period
https://dba.stackexchange.com/questions/67881/calculating-the-maximum-seen-so-far-for-each-point-in-time
Count records per hour within a time span

You could use the analytic function lag to look back in a sorted sequence of time stamps to see whether the record that came 19 entries earlier is within an hour difference:
with cte as (
select user_id,
login_time,
lag(login_time, 19) over (partition by user_id order by login_time) as lag_time
from userlog
order by user_id,
login_time
)
select user_id,
min(login_time) as login_time
from cte
where extract(epoch from (login_time - lag_time)) < 3600
group by user_id
The output will show the matching users with the first occurrence when they logged a twentieth time within an hour.

I think you might do something like that (I'll use a login table, with user, datetime as single column for the sake of simplicity):
with connections as (
select ua.user
, ua.datetime
from user_logons ua
where ua.datetime >= timestamp'2018-12-01 00:00:00'
)
select ua.user
, ua.datetime
, (select count(*)
from connections ut
where ut.user = ua.user
and ut.datetime between ua.datetime and (ua.datetime + 1 hour)
) as consecutive_logons
from connections ua
It is up to you to complete with your columns (user, datetime)
It is up to you to find the dateadd facilities (ua.datetime + 1 hour won't work); this is more or less dependent on the DB implementation, for example it is DATE_ADD in mySQL (https://www.w3schools.com/SQl/func_mysql_date_add.asp)
Due to the subquery (select count(*) ...), the whole query will not be the fastest because it is a corelative subquery - it needs to be reevaluated for each row.
The with is simply to compute a subset of user_logons to minimize its cost. This might not be useful, however this will lessen the complexity of the query.
You might have better performance using a stored function or a language driven (eg: java, php, ...) function.

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times

Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.

The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().

Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

Get a count of records created each week in SQL

I have a table Questions. How can I get a count of all questions asked in a week?
More generically, how can I bucket records by the week they were created in?
Questions
id created_at title
----------------------------------------------------
1 2014-12-31 09:43:42 "Add things"
2 2013-11-23 02:98:55 "How do I ruby?"
3 2015-01-15 15:11:19 "How do I python?"
...
I'm using SQLLite, but PG answers are fine too.
Or if you have the answer using Rails ActiveRecord, that is amazing, but not required.
I've been trying to use DATEPART() but haven't come up with anything successful yet: http://msdn.microsoft.com/en-us/library/ms174420.aspx

In postgreSQL it's as easy as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
FROM Questions
If you wanted to get the # of questions per week, simply do the following:
SELECT date_trunc('week', created_at) created_week, COUNT(*) weekly_cnt
FROM Questions
GROUP BY date_trunc('week', created_at)
Hope this helps. Note that date_trunc() will return a date and not a number (i.e., it won't return the ordinal number of the week in the year).
Update:
Also, if you wanted to accomplish both in a single query you could do so as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
, COUNT(*) OVER ( PARTITION BY date_trunc('week', created_at) ) weekly_cnt
FROM Questions
In the above query I'm using COUNT(*) as a window function and partitioning by the week in which the question was created.

If the created_at field is already indexed, I would simply look for all rows with a created_at value between X and Y. That way the index can be used.
For instance, to get rows with a created_at value in the 3rd week of 2015, you would run:
select *
from questions
where created_at between '2015-01-11' and '2015-01-17'
This would allow the index to be used.
If you want to be able to specify a week in the where clause, you could use the date_part or extract functions to add a column to this table storing the year and week #, and then index that column so that queries can take advantage of it.
If you don't want to add the column, you could of course use either function in the where clause and query against the table, but you won't be able to take advantage of any indexes.
Because you mentioned not wanting to add a column to the table, I would recommend adding a function based index.
For example, if your ddl were:
create table questions
(
id int,
created_at timestamp,
title varchar(20)
);
insert into questions values
(1, '2014-12-31 09:43:42','"Add things"'),
(2, '2013-11-23 02:48:55','"How do I ruby?"'),
(3, '2015-01-15 15:11:19','"How do I python?"');
create or replace function to_week(ts timestamp)
returns text
as 'select concat(extract(year from ts),extract(week from ts))'
language sql
immutable
returns null on null input;
create index week_idx on questions (to_week(created_at));
You could run:
select q.*, to_week(created_at) as week
from questions q
where to_week(created_at) = '20153';
And get:
| ID | CREATED_AT | TITLE | WEEK |
|----|--------------------------------|--------------------|-------|
| 3 | January, 15 2015 15:11:19+0000 | "How do I python?" | 20153 |
(reflecting the third week of 2015, ie. '20153')
Fiddle: http://sqlfiddle.com/#!15/c77cd/3/0
You could similarly run:
select q.*,
concat(extract(year from created_at), extract(week from created_at)) as week
from questions q
where concat(extract(year from created_at), extract(week from created_at)) =
'20153';
Fiddle: http://sqlfiddle.com/#!15/18c1e/3/0
But it would not take advantage of the function based index, because there is none. In addition, it would not use any index you might have on the created_at field because, while that field might be indexed, you really aren't searching on that field. You are searching on the result of a function applied against that field. So the index on the column itself cannot be used.
If the table is large you will either want a function based index or a column holding that week that is itself indexed.

SQLite has no native datetime type like MS SQL Server does, so the answer may depend on how you are storing dates. Not all T-SQL will work in SQLite.
You can store datetime as an integer that counts seconds since 1/1/1970 12:00 AM. There are 604,800 seconds in a week. So you could query on an expression like
rawdatetime / 604800 -- iff rawdatetime is integer
More on handling datetimes in SQLite here: https://www.sqlite.org/datatype3.html

Get the week number using strfdate(%V)
Store it in DB, and use it to identify in which week a question was asked
http://apidock.com/ruby/DateTime/strftime
SQL can do it too with the DATE_FORMAT(datetime,'%u')
So use:
SELECT DATE_FORMAT(column,'%u') FROM Table

How do I produce a time interval query in SQLite?

I have an events based table that I would like to produce a query, by minute for the number of events that were occuring.
For example, I have an event table like:
CREATE TABLE events (
session_id TEXT,
event TEXT,
time_stamp DATETIME
)
Which I have transformed into the following type of table:
CREATE TABLE sessions (
session_id TEXT,
start_ts DATETIME,
end_ts DATETIME,
duration INTEGER
);
Now I want to create a query that would group the sessions by a count of those that were active during a particular minute. Where I would essentially get back something like:
TIME_INTERVAL ACTIVE_SESSIONS
------------- ---------------
18:00 1
18:01 5
18:02 3
18:03 0
18:04 2

Ok, I think I got more what I wanted. It doesn't account for intervals that are empty, but it is good enough for what I need.
select strftime('%Y-%m-%dT%H:%M:00.000',start_ts) TIME_INTERVAL,
(select count(session_id)
from sessions s2
where strftime('%Y-%m-%dT%H:%M:00.000',s1.start_ts) between s2.start_ts and s2.end_ts) ACTIVE_SESSIONS
from sessions s1
group by strftime('%Y-%m-%dT%H:%M:00.000',start_ts);
This will generate a row per minute for the period that the data covers with a count for the number of sessions that were had started (start_ts) but hadn't finished (end_ts).

PostgreSQL allows the following query.
In contrast to your example, this returns an additional column for the day, and it omits the minutes where nothing happened (count=0).
select
day, hour, minute, count(*)
from
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),
(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),
(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),
(50),(51),(52),(53),(54),(55),(56),(57),(58),(59))
as minutes (minute),
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23))
as hours (hour),
(select distinct cast(start_ts as date) from sessions
union
select distinct cast(end_ts as date) from sessions)
as days (day),
sessions
where
(day,hour,minute)
between (cast(start_ts as date),extract(hour from start_ts),extract(minute from start_ts))
and (cast(end_ts as date), extract(hour from end_ts), extract(minute from end_ts))
group by
day, hour, minute
order by
day, hour, minute;

This isn't exactly your query, but I think it could help. Did you look into the SQLite R-Tree module? This would allow you to create a virtual index on the start/stop time:
CREATE VIRTUAL TABLE sessions_index USING rtree (id, start, end);
Then you could search via:
SELECT * FROM sessions_index WHERE end >= <first minute> AND start <= <last minute>;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Analytics in sql - sql

Related

Automatically execute script for each day

SQL question: count of occurrence greater than N in any given hour

sql query to get today new records compared with yesterday

Get a count of records created each week in SQL

How do I produce a time interval query in SQLite?

Categories

Resources