I have such an assignment. I believe my guess is correct, however I didn't find anything confirming my assumption how frequency works with count function.
What was the most popular bike route (start/end station combination) in DC’s bike-share program over the first 3 months of 2012? How many times was the route taken?
• duration seconds: duration of the ride (in seconds)
• start time, end time: datetimes representing the beginning and end of the ride
• start station, end station: name of the start and end stations for the ride
This is the code I wrote, wanted to see if my guess regarding most popular route (i believe it is a frequency) is correct with COUNT combination.
If someone can confirm if my guess is right, I will appreciate.
SELECT start_station, end_station, count(*) AS ct_route_taken
FROM tutorial.dc_bikeshare_q1_2012
GROUP BY start_station, end_station
ORDER BY ct_route_taken DESC
LIMIT 1;
Just count(*).
The name of the table would indicate that we need no WHERE clause.
If that's misleading and it covers a greater time interval, add a (proper!) WHERE clause like this:
WHERE start_time >= '2012-01-01'
AND start_time < '2012-04-01'
Your query would eliminate most of '2012-03-31', since start_time is supposed to be a "datetime" type. Depending on which type exactly and where the date of "the first 3 months" is supposed to be located, we might need to adjust for time zone also.
See:
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_BETWEEN_.28especially_with_timestamps.29
Ignoring time zones altogether in Rails and PostgreSQL
From description and the query look like ok if the start station and end station description are same for each and every station. however without looking into the table data it is little difficult to confirm.
I would like to make an SQL-Statement in order to find the amount of users that are using a channel by date and time. Let me give you an example:
Let's call this table Data:
Date Start End
01.01.2020 17:00 17:30
01.01.2020 17:01 17:03
01.01.2020 17:29 18:30
Data is a table that shows when an user started the connection on a channel and the time the connection was closed. A connection can be made any time, which means from 00:00 until the next day.
What I am trying to achieve is to count the maximum number of connections that were made over a big period if time. Let's say 1st February to 1st April.
My idea was to make another table with timestamps in Excel. The table would display a Timestamp for every Minute in a specific date.
Then I tried to make a statement like:
SELECT *
FROM Data,Timestamps
WHERE Timestamps.Time BETWEEN Data.Start AND Data.End.
Now logically this statement does what is supposed to do. The only problem is that it is not really performant and therefore not finishing. With the amount of timestamps and the amount of data I have to check it is not able to finish.
Could anybody help me with this problem? Any other ideas I can try or how to improve my statement?
Regards!
So why dou you create another table in Excel and not directly in MS Access and then why won't you set up the indexes of the timestamps right. That will speed it up by factors.
By the way I think that your statement will print repeat every user that happened to match your Start .. End period, so the amount of rows produced will be enormous. You shall rather try
SELECT Timestamps.Time, COUNT(*)
FROM Data,Timestamps
WHERE Timestamps.Time BETWEEN Data.Start AND Data.End
GROUP BY Timestamps.Time;
But sorry if the syntax in MS Access is different.
I have a table that contains something similar to the following columns:
infopath_form_id (integer)
form_type (integer)
approver (varchar)
event_timestamp (datetime)
This table contains the approval history for an infopath form and each form that is submitted in the system is given a unique infopath_form_id for this to be stored against. There is no consistent number of approvers for each form (as it differs based on the value of the transaction) however there is always at least two approvers for a form. Each approval task is written as another row to the table and only history of previous approvals is stored within this table.
What I need to find out is the average time that is taken between approvals for each form type. I've tried tackling this every which way using partitions but I'm getting stuck given that there isn't a fixed number of approvers for each form. How should I approach this problem?
I believe you want this:
SELECT infopath_form_id
, DATEDIFF(Minutes,MIN(event_timestamp),MAX(event_timestamp))/CAST(COUNT(*)-1 AS FLOAT)
FROM Table
GROUP BY infopath_form_id
That will give you the average number of minutes between the first and last entry for each InfoPath_form_id.
Explanation of functions used:
MIN() returns the earliest date
MAX() returns the latest date
DATEDIFF() returns the difference between two dates in a given unit (Minutes in this example)
COUNT() returns the number of rows per grouping item (ie InfoPath_form_id)
So simply divide the total minutes elapsed by one less than the number of records giving you the average number of minutes between events.
Hi I am doing a project where I am stuck in the following the question asks me to make a booking entry for a travel agency using previous records such as bookingid, customerid, flightID number, passenger details etc and also the booking can have a status of reserved or held . If the seat is confirmed right away it is reserved and if not the passenger has 24 hrs to reserve and change it from held to reserve status. Also, if the seat isn't booked after 24 hrs it changes to expired status.
so far what I was able to come up with is
INSERT (values) INTO the different tables and when it is booked right bookingid.status = R or bookingid.status = bookingtime > 24 = E
without a clue here so appreciate some help !!!
There are many ways of doing this, but the easiest would be to initialize the booking status to reserved if it is booked right away and if it's not you would put held. Now, you would have to rely on a view (or another similar approach) to get the dynamically calculated status. If the user reserves his booking at a later time, you simply have to update the booking status to reserve.
Note that I dont necessary suggest to represent statuses as strings, its just for the example.
SELECT
CASE
WHEN status = 'held' AND DATEDIFF(hh, booking_date, now()) > 24 THEN 'expired'
ELSE status
END AS status
FROM booking
I want to record user states and then be able to report historically based on the record of changes we've kept. I'm trying to do this in SQL (using PostgreSQL) and I have a proposed structure for recording user changes like the following.
CREATE TABLE users (
userid SERIAL NOT NULL PRIMARY KEY,
name VARCHAR(40),
status CHAR NOT NULL
);
CREATE TABLE status_log (
logid SERIAL,
userid INTEGER NOT NULL REFERENCES users(userid),
status CHAR NOT NULL,
logcreated TIMESTAMP
);
That's my proposed table structure, based on the data.
For the status field 'a' represents an active user and 's' represents a suspended user,
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 's', '2008-01-01');
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 'a', '2008-02-01');
So this user was suspended on 1st Jan and active again on 1st of February.
If I wanted to get a suspended list of customers on 15th January 2008, then userid 1 should show up. If I get a suspended list of customers on 15th February 2008, then userid 1 should not show up.
1) Is this the best way to structure this data for this kind of query?
2) How do I query the data in either this structure or in your proposed modified structure so that I can simply have a date (say 15th January) and find a list of customers that had an active status on that date in SQL only? Is this a job for SQL?
This can be done, but would be a lot more efficient if you stored the end date of each log. With your model you have to do something like:
select l1.userid
from status_log l1
where l1.status='s'
and l1.logcreated = (select max(l2.logcreated)
from status_log l2
where l2.userid = l1.userid
and l2.logcreated <= date '2008-02-15'
);
With the additional column it woud be more like:
select userid
from status_log
where status='s'
and logcreated <= date '2008-02-15'
and logsuperseded >= date '2008-02-15';
(Apologies for any syntax errors, I don't know Postgresql.)
To address some further issues raised by Phil:
A user might get moved from active, to suspended, to cancelled, to active again. This is a simplified version, in reality, there are even more states and people can be moved directly from one state to another.
This would appear in the table like this:
userid from to status
FRED 2008-01-01 2008-01-31 s
FRED 2008-02-01 2008-02-07 c
FRED 2008-02-08 a
I used a null for the "to" date of the current record. I could have used a future date like 2999-12-31 but null is preferable in some ways.
Additionally, there would be no "end date" for the current status either, so I think this slightly breaks your query?
Yes, my query would have to be re-written as
select userid
from status_log
where status='s'
and logcreated <= date '2008-02-15'
and (logsuperseded is null or logsuperseded >= date '2008-02-15');
A downside of this design is that whenever the user's status changes you have to end date their current status_log as well as create a new one. However, that isn't difficult, and I think the query advantage probably outweighs this.
Does Postgres support analytic queries? This would give the active users on 2008-02-15
select userid
from
(
select logid,
userid,
status,
logcreated,
max(logcreated) over (partition by userid) max_logcreated_by_user
from status_log
where logcreated <= date '2008-02-15'
)
where logcreated = max_logcreated_by_user
and status = 'a'
/
#Tony the "end" date isn't necessarily applicable.
A user might get moved from active, to suspended, to cancelled, to active again. This is a simplified version, in reality, there are even more states and people can be moved directly from one state to another.
Additionally, there would be no "end date" for the current status either, so I think this slightly breaks your query?
#Phil
I like Tony's solution. It seems to most approriately model the situation described. Any particular user has a status for a given period of time (a minute, an hour, a day, etc.), but it is for a duration, not an instant in time. Since you want to know who was active during a certain period of time, modeling the information as a duration seems like the best approach.
I am not sure that additional statuses are a problem. If someone is active, then suspended, then cancelled, then active again, each of those statuses would be applicable for a given duration, would they not? It may be a vey short duration, such as a few seconds or a minute, but they would still be for a length of time.
Are you concerned that a person's status can change multiple times in a given day, but you want to know who was active for a given day? If so, then you just need to more specifically define what it means to be active on a given day. If it is enough that they were active for any part of that day, then Tony's answer works well as is. If they would have to be active for a certain amount of time in a given day, then Tony's solution could be modified to simply determine the length of time (in hours, or minutes, or days), and adding further restrictions in the WHERE clause to retrieve for the proper date, status, and length of time in that status.
As for there being no "end date" for the current status, that is no problem either as long as the end date were nullable. Simply use something like this "WHERE enddate <= '2008-08-15' or enddate is null".