Another newbie PostgreSQL question.
I have something like this:
CREATE TABLE user (
userID bigserial primary key,
name varchar(50) NOT NULL,
created timestamp NULL DEFAULT CURRENT_TIMESTAMP
)
CREATE TABLE session (
sessionID bigserial primary key,
userID int NOT NULL,
lastAction timestamp NULL DEFAULT NULL,
created timestamp NULL DEFAULT CURRENT_TIMESTAMP
)
CREATE TABLE action (
actionID bigserial primary key,
sessionID int NOT NULL,
lastAction timestamp NULL DEFAULT NULL,
created timestamp NULL DEFAULT CURRENT_TIMESTAMP
)
A user can have many sessions, each with multiple session actions.
Each user has sessions which expire, in which case a new one is inserted and any action they take is catalogued there.
My question is, how do I go about grabbing actions only for a select user, only from his sessions, and only if they happened 1 day ago, 2 days ago, a week ago, a month ago, or for all time.
I've looked at the docs and I think interval() is what I'm looking for but I only really know how to expire sessions:
(part of a join here) e.lastAction >= now() - interval '4 hours'
That one either returns me what I need or it doesn't. But how do I make it return all the records that have been created since 1 day ago, 2 days ago, etc. SQL syntax and logic is still a bit confusing.
So in an ideal world I'll want to ask a question like, how many actions has this user taken in 2 days? I have the relationships and timestamps created but I writing a query I've been met with failure.
I'm not sure which timestamp you want from the actions table -- the created or the last action timestamp. In any case, the query you want is a basic join, where you filter on the user id and the time stamp:
select a.*
from actions a join
sessions s
on a.sessionid = s.sessionid
where s.userid = v_userid and
a.created >= now() - interval '1 day';
If you want the number of transactions in the past two days, you would use aggregation:
select count(*)
from actions a join
sessions s
on a.sessionid = s.sessionid
where s.userid = v_userid and
a.created >= now() - interval '2 day';
Related
Sorry for titlegore.
Users can have at most 3 active posts, and each post is active for 48 hours.
I was thinking about just putting active_post_{1/2/3}_id and active_post_{1/2/3}_expires_at on the users table, but wondering if there is a better way to handle something like this.
I would store only the timestamp of the post and use middle-tier logic to restrict the number of active posts to three.
If you have a table like so:
create table posts (
id int generated always as identity,
user_id int not null references users(id),
created_at timestamptz not null,
post_text text not null
);
You can get the number of active posts with this query and disable the user's ability to create a new post if the result is more than three.
select count(*)
from posts
where user_id = ?
and created_at > now() - interval '48 hours';
This could be defeated by a determined attacker through multiple active sessions in your application, but if that is a concern, then I would use the same logic to restrict visible posts to only three per user. When pulling the list of posts to display:
with rnums as (
select user_id, created_at, post_text,
row_number() over (partition by user_id
order by created_at desc) as rn
from posts
where created_at > now() - interval '48 hours'
)
select user_id, created_at, post_text
from rnums
where rn <= 3
order by user_id, created_at desc;
If you want to use PostgreSQL to enforce this constraint, then you would need to bring triggers into the mix.
I'm using a postgres database and my problem includes two tables, simplified versions of them are below.
CREATE TABLE events(
id SERIAL PRIMARY KEY NOT NULL,
max_persons INTEGER NOT NULL
);
and
CREATE TABLE requests(
id SERIAL PRIMARY KEY NOT NULL,
confirmed BOOLEAN NOT NULL,
creation_time TIMESTAMP DEFAULT NOW(),
event_id INTEGER NOT NULL /*foreign key*/
);
There are n events and each event can have up to events.max_persons participants. New requests need to be confirmed and are valid up to 30 minutes. After that period the requests will be ignored, if they were not confirmed.
Now what I want to do is only insert a new request, when the sum of all confirmed requests and all requests that are still valid, but not confirmed, is less than events.max_persons.
I already have a query to select a single event. Here is a simplified version of it, just to give you an idea, how it should work
SELECT
e.id,
SUM(CASE WHEN r.confirmed = 1 THEN 1 ELSE 0 END) AS number_confirmed
SUM(CASE WHEN r.creation_time > (CURRENT_TIMESTAMP - INTERVAL '30 MINUTE') AND r.confirmed = 0 THEN 1 ELSE 0 END) AS number_reserved,
e.max_persons
FROM events e, requests r
WHERE l.id = ?
AND r.event_id = e.id
AND (r.confirmed = 1 OR r.creation_time > (CURRENT_TIMESTAMP - INTERVAL '30 MINUTE'))
GROUP BY e.id, e.max_persons
HAVING SUM(CASE WHEN r.confirmed = 1 OR (r.creation_time > (CURRENT_TIMESTAMP - INTERVAL '30 MINUTE')) THEN 1 ELSE 0 END) < e.max_persons";
Is it possibile to achieve this with a single INSERT - command?
You could do that like this:
INSERT INTO requests
SELECT * FROM (VALUES (...)) row
WHERE ...
and write a WHERE clause that is only true if your condition is satisfied.
But there is a fundamental problem with that approach, namely that it is subject to a race condition.
If two such statements run at the same time, both may find the condition satisfied, but when each one has added its row and commits, the condition can be violated. That is because none of the statements can see the effects of the other one before they commit.
There are two solutions for this:
Lock the table before you test and insert. That is simple, but very bad for concurrency.
Use SERIALIZABLE transactions throughout. Then this should cause a serialization error, and one of the statements has to be retried and will find the condition violated when it does.
I'm looking for input on getting a COUNT of records that were 'active' in a certain date range.
CREATE TABLE member {
id int identity,
name varchar,
active bit
}
The scenario is one where "members" number fluctuate over time. So I could have linear growth where I have 10 members at the beginning of the month and 20 at the end. Currently We go off the number of CURRENTLY ACTIVE (as marked by an 'active' flag in the DB) AT THE TIME OF REPORT. - this is hardly accurate and worse, 6 months from now, my "members" figure may be substantially different than now. and Since I'm doing averages per user, if I run a report now, and 6 months from now - the figures will probably be different.
I don't think a simple "dateActive" and "dateInactive" will do the trick... due to members coming and going and coming back etc. so:
JOE may be active 12-1 and deactivated 12-8 and activated 12-20
so JOE counts as being a 'member' for 8 days and then 11 days for a total of 19 days
but the revolving door status of members means keeping a separate table (presumably) of UserId, status, date
CREATE TABLE memberstatus {
member_id int,
status bit, -- 0 for in-active, 1 for active
date date
} (adding this table would make the 'active' field in members obsolete).
In order to get a "good" Average members per month (or date range) - it seems I'd need to get a daily average, and do an average of averages over 'x' days. OR is there some way in SQL to do this already.
This extra "status" table would allow an accurate count going back in time. So in a case where you have a revenue or cost figure, that DOESN'T change or is not aggregate, it's fixed, that when you want cost/members for last June, you certainly don't want to use your current members count, you want last Junes.
Is this how it's done? I know it's one way, but it the 'better' way...
#gordon - I got ya, but I guess I was looking at records like this:
Members
1 Joe
2 Tom
3 Sue
MemberStatus
1 1 '12-01-2014'
1 0 '12-08-2014'
1 1 '12-20-2014'
In this way I only need the last record for a user to get their current status, but I can track back and "know" their status on any give day.
IF I'm understanding your method it might look like this
CREATE TABLE memberstatus {
member_id int,
active_date,
inactive_date
}
so on the 1-7th the record would look like this
1 '12-01-2014' null
and on the 8th it would change to
1 '12-01-2014' '12-08-2014'
the on the 20th
1 '12-01-2014' '12-08-2014'
1 '12-20-2014' null
Although I can get the same data out, it seems more difficult without any benefit - am i missing something?
You could also use a 2 table method to have a one-to-many relationship for working periods. For example you have a User table
User
UserID int, UserName varchar
and an Activity table that holds ranges
Activity
ActivityID int, UserID int, startDate date, (duration int or endDate date)
Then whenever you wanted information you could do something like (for example)...
SELECT User.UserName, count(*) from Activity
LEFT OUTER JOIN User ON User.UserID = Activity.UserID
WHERE startDate >= '2014-01-01' AND startDate < '2015-01-01'
GROUP BY User.UserID, User.UserName
...to get a count grouped by user (and labeled by username) of the times they were became active in 2014
I have used two main ways to accomplish what you want. First would be something like this:
CREATE TABLE [MemberStatus](
[MemberID] [int] NOT NULL,
[ActiveBeginDate] [date] NOT NULL,
[ActiveEndDate] [date] NULL,
CONSTRAINT [PK_MemberStatus] PRIMARY KEY CLUSTERED
(
[MemberID] ASC,
[ActiveBeginDate] ASC
)
Every time a member becomes active, you add an entry, and when they become inactive you update their ActiveEndDate to the current date.
This is easy to maintain, but can be hard to query. Another option is to do basically what you are suggesting. You can create a scheduled job to run at the end of each day to add entries to the table .
I recommend setting up your tables so that you store more data, but in exchange the structure supports much simpler queries to achieve the reporting you require.
-- whenever a user's status changes, we update this table with the new "active"
-- bit, and we set "activeLastModified" to today.
CREATE TABLE member {
id int identity,
name varchar,
active bit,
activeLastModified date
}
-- whenever a user's status changes, we insert a new record here
-- with "startDate" set to the current "activeLastModified" field in member,
-- and "endDate" set to today (date of status change).
CREATE TABLE memberStatusHistory {
member_id int,
status bit, -- 0 for in-active, 1 for active
startDate date,
endDate date,
days int
}
As for the report you're trying to create (average # of actives in a given month), I think you need yet another table. Pure SQL can't calculate that based on these table definitions. Pulling that data from these tables is possible, but it requires programming.
If you ran something like this once-per-day and stored it in a table, then it would be easy to calculate weekly, monthly and yearly averages:
INSERT INTO myStatsTable (date, activeSum, inactiveSum)
SELECT
GETDATE(), -- based on DBMS, eg., "current_date" for Postgres
active.count,
inactive.count
FROM
(SELECT COUNT(id) FROM member WHERE active = true) active
CROSS JOIN
(SELECT COUNT(id) FROM member WHERE active = true) inactive
I'm actually beginner in SQL and i working on Oracle engine. I have a problem to do arithmetic manipulation using Interval, to add on timestamp column - integer value, that exist in other table and convert it to Minute.
To test my schemas i used in data generator. As a result, Some of the data produced, are not reliable and i need to check overlapping between two appointments, when the same patient invited for two treatments overlap.
I have treatments_appointments table that contains these attributes:
treatments_appointments(app_id NUMBER(38) NOT NULL,
[fk] care_id NUMBER(38) NOT NULL,
[fk] doctor_id NUMBER(38) NOT NULL,
[fk] room_id NUMBER(38) NOT NULL,
[fk] branch_id NUMBER(38) NOT NULL,
[fk] patient_id NUMBER(38) NOT NULL,
appointment_time TIMESTAMP NOT NULL)
Below is the code what i wrote and it's get an error message:
SELECT app1.app_id
FROM treatment_appointment app1
INNER JOIN treatment_appointment app2
ON app1.patient_id = app2.patient_id
WHERE app1.appointment_time >= app2.appointment_time AND
app1.appointment_time <=
app2.appointment_time + interval (to_char(select care_categories.care_duration where app2.care_id = care_categories.care_id)) minute
AND
app1.app_id != app2.app_id
The error message is:
ORA-00936: missing expression
Sorry about my English and thanks for answering my question!
You can only use a fixed string value for an INTERVAL literal, not a variable, an expression or a column value. But you can use the NUMTODSINTERVAL function to convert a number of minutes into an interval. Instead of:
interval (to_char(select care_categories.care_duration
where app2.care_id = care_categories.care_id)) minute
Use:
numtodsinterval(select care_categories.care_duration
where app2.care_id = care_categories.care_id, 'MINUTE')
Although you should join to that table in the main query rather than doing a subquery for every row:
SELECT app1.app_id
FROM treatment_appointment app1
INNER JOIN treatment_appointment app2
ON app1.patient_id = app2.patient_id
INNER JOIN care_categories cc
ON app2.care_id = cc.care_id
WHERE app1.appointment_time >= app2.appointment_time AND
app1.appointment_time <=
app2.appointment_time + numtodsinterval(cc.care_duration, 'MINUTE') AND
app1.app_id != app2.app_id
I have a table called prices which includes the closing price of stocks that I am tracking daily.
Here is the schema:
CREATE TABLE `prices` (
`id` int(21) NOT NULL auto_increment,
`ticker` varchar(21) NOT NULL,
`price` decimal(7,2) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ticker` (`ticker`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=2200 ;
I am trying to calculate the % price drop for anything that has a price value greater than 0 for today and yesterday. Over time, this table will be huge and I am worried about performance. I assume this will have to be done on the MySQL side rather than PHP because LIMIT will be needed here.
How do I take the last 2 dates and do the % drop calculation in MySQL though?
Any advice would be greatly appreciated.
One problem I see right off the bat is using a timestamp data type for the date, this will complicate your sql query for two reasons - you will have to use a range or convert to an actual date in your where clause, but, more importantly, since you state that you are interested in today's closing price and yesterday's closing price, you will have to keep track of the days when the market is open - so Monday's query is different than tue - fri, and any day the market is closed for a holiday will have to be accounted for as well.
I would add a column like mktDay and increment it each day the market is open for business. Another approach might be to include a 'previousClose' column which makes your calculation trivial. I realize this violates normal form, but it saves an expensive self join in your query.
If you cannot change the structure, then you will do a self join to get yesterday's close and you can calculate the % change and order by that % change if you wish.
Below is Eric's code, cleaned up a bit it executed on my server running mysql 5.0.27
select
p_today.`ticker`,
p_today.`date`,
p_yest.price as `open`,
p_today.price as `close`,
((p_today.price - p_yest.price)/p_yest.price) as `change`
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.`date`) = date(p_yest.`date`) + INTERVAL 1 DAY
and p_today.price > 0
and p_yest.price > 0
and date(p_today.`date`) = CURRENT_DATE
order by `change` desc
limit 10
Note the back-ticks as some of your column names and Eric's aliases were reserved words.
Also note that using a where clause for the first table would be a less expensive query - the where get's executed first and only has to attempt to self join on the rows that are greater than zero and have today's date
select
p_today.`ticker`,
p_today.`date`,
p_yest.price as `open`,
p_today.price as `close`,
((p_today.price - p_yest.price)/p_yest.price) as `change`
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.`date`) = date(p_yest.`date`) + INTERVAL 1 DAY
and p_yest.price > 0
where p_today.price > 0
and date(p_today.`date`) = CURRENT_DATE
order by `change` desc
limit 10
Scott brings up a great point about consecutive market days. I recommend handling this with a connector table like:
CREATE TABLE `market_days` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY USING BTREE (`market_day`),
UNIQUE KEY USING BTREE (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
As more market days elapse, just INSERT new date values in the table. market_day will increment accordingly.
When inserting prices data, lookup the LAST_INSERT_ID() or corresponding value to a given date for past values.
As for the prices table itself, you can make storage, SELECT and INSERT operations much more efficient with a useful PRIMARY KEY and no AUTO_INCREMENT column. In the schema below, your PRIMARY KEY contains intrinsically useful information and isn't just a convention to identify unique rows. Using MEDIUMINT (3 bytes) instead of INT (4 bytes) saves an extra byte per row and more importantly 2 bytes per row in the PRIMARY KEY - all while still affording over 16 million possible dates and ticker symbols (each).
CREATE TABLE `prices` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`price` decimal (7,2) NOT NULL DEFAULT '00000.00',
PRIMARY KEY USING BTREE (`market_day`,`ticker_id`),
KEY `ticker_id` USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
;
In this schema each row is unique across each pair of market_day and ticker_id. Here ticker_id corresponds to a list of ticker symbols in a tickers table with a similar schema to the market_days table:
CREATE TABLE `tickers` (
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`ticker_symbol` VARCHAR(5),
`company_name` VARCHAR(50),
/* etc */
PRIMARY KEY USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
This yields a similar query to others proposed, but with two important differences: 1) There's no functional transformation on the date column, which destroys MySQL's ability to use keys on the join; in the query below MySQL will use part of the PRIMARY KEY to join on market_day. 2) MySQL can only use one key per JOIN or WHERE clause. In this query MySQL will use the full width of the PRIMARY KEY (market_day and ticker_id) whereas in the previous query it could only use one (MySQL will usually pick the more selective of the two).
SELECT
`market_days`.`date`,
`tickers`.`ticker_symbol`,
`yesterday`.`price` AS `close_yesterday`,
`today`.`price` AS `close_today`,
(`today`.`price` - `yesterday`.`price`) / (`yesterday`.`price`) AS `pct_change`
FROM
`prices` AS `today`
LEFT JOIN
`prices` AS `yesterday`
ON /* uses PRIMARY KEY */
`yesterday`.`market_day` = `today`.`market_day` - 1 /* this will join NULL for `today`.`market_day` = 0 */
AND
`yesterday`.`ticker_id` = `today`.`ticker_id`
INNER JOIN
`market_days` /* uses first 3 bytes of PRIMARY KEY */
ON
`market_days`.`market_day` = `today`.`market_day`
INNER JOIN
`tickers` /* uses KEY (`ticker_id`) */
ON
`tickers`.`ticker_id` = `today`.`ticker_id`
WHERE
`today`.`price` > 0
AND
`yesterday`.`price` > 0
;
A finer point is the need to also join against tickers and market_days in order to display the actual ticker_symbol and date, but these operations are very fast since they make use of keys.
Essentially, you can just join the table to itself to find the given % change. Then, order by change descending to get the largest changers on top. You could even order by abs(change) if you want the largest swings.
select
p_today.ticker,
p_today.date,
p_yest.price as open,
p_today.price as close,
--Don't have to worry about 0 division here
(p_today.price - p_yest.price)/p_yest.price as change
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.date) = date(date_add(p_yest.date interval 1 day))
and p_today.price > 0
and p_yest.price > 0
and date(p_today.date) = CURRENT_DATE
order by change desc
limit 10