finding time concurrency with an oracle select - sql

I have the following data in a DB table and would like to find concurrent transactions based on the start and end time.
Patient Start Time End Time
John 08:31A 10:49A
Jim 10:14A 10:30A
Jerry 10:15A 10:28A
Alice 10:18A 12:29P
Bobby 10:32A 10:49A
Sally 10:46A 10:55A
Jane 10:52A 11:29A
Jules 10:54A 11:40A
Adam 10:58A 11:25A
Ben 11:00A 11:20A
Ann 11:31A 11:56A
Chris 11:49A 11:57A
Nick 12:00P 12:21P
Dave 12:00P 12:35P
Steve 12:23P 12:29P
If I want to find any overlapping times with a particular input, how would I write it? For example say I want to find overlaps with the 10:58A-11:25A time. This needs to find potential time concurrencies. I am looking for a count of the maximum number of concurrent overlaps. In my example for 10:58A-11:25A, I would want to see the following (count would be 5):
Patient Start Time End Time
Alice 10:18A 12:29P
Jane 10:52A 11:29A
Jules 10:54A 11:40A
Adam 10:58A 11:25A
Ben 11:00A 11:20A
In my second example some of the concurrent times overlap with the time I am looking for, but they are over before another range start. So, say I am looking for 10:46A-10:55A. I would expect a count of 4. In this example 08:31A-10:49A is dropped because it is over before 10:52A-11:29A starts. Same with 10:32A-10:49A, it was over before 10:52A-11:29A started. So, the most concurrent with the 10:46A-10:55A range would be 4, even though overall there is 6 (2 dropped).
Patient Start Time End Time
Alice 10:18A 12:29P
Sally 10:46A 10:55A
Jane 10:52A 11:29A
Jules 10:54A 11:40A
Can this be done with a sql statement?

The most easy and clean method is to exclude rows:
All rows except:
rows than ends before start time
rows than begins after end time
sample:
select * from your table t
where
not (
t.end < begin_time
or
t.start > end_time
)
Sample in sqlfiddle

Related

Counting Records Over a Rolling Period of Time SQL (Based on Last Record)

I'm relatively new to SQL and trying to pull the following from an Oracle SQL database.
Let's say I have a table of users and the time that they logged in that looks like this:
Name LOG_IN
Jim 13:00:05
Patrick 13:02:23
Steve 13:02:44
Emma 13:03:16
Steve 13:04:44
Jim 13:04:05
Jim 13:05:05
Jim 13:05:06
Patrick 13:05:17
Emma 13:05:18
Steve 13:08:13
Say I want to run a report which tells me their last user login in and all logins that happened within a timeframe of 5 minutes before the last login. If I was doing this in another language, I would just a for .. loop to get the last login and then count back to previous logins and compare if the login time falls within the 5 min window. I am unsure how I would accomplish the same thing with Oracle SQL. For example, for someone like Jim, his last login is at 13:05:06 so I would want all the times he logged in between 13:00:06 and 13:05:06, which would be:'
Name LOG_IN
Jim 13:04:05
Jim 13:05:05
Jim 13:05:06
So the very first login (at 13:00:05) would not be included because it's not in the range.
The same report would return results for the other users as well, so for Steve, the following would be returned:
Name LOG_IN
Steve 13:04:44
Steve 13:08:13
And the first login (at 13:02:44) would not be returned.
When I first looked at this, I thought the requirement was to pull all transactions within a 5 minute of the time of the report, but I have since learned I need to do this rolling period calculation based on last login.
select Name, LOG_IN
from <table_name> A where LOG_IN >
(select max(LOG_IN) from <table_name> where Name=A.Name)-(1/24./60.*5.);
Here's a sqlfiddle link:
http://sqlfiddle.com/#!4/b231e/11/0
(don't know how long it will be persistent...)

Keeping an updated tally of changing records

I have a list of students and their subjects:
id | student | subject
---|---------|--------
1 | adam | math
2 | bob | english
3 | charlie | math
4 | dan | english
5 | erik | math
And I create a tally from the above list aggregating how many students are there in each subject:
id | subject | students
---|---------|--------
1 | math | 3
2 | english | 2
The student list will keep on expanding and this aggregation will be done at regular intervals.
The reason I'm keeping the Tally in a separate table in the first place is because the original table is supposed to be massive (this is just a simplification of my original problem) and so querying the original table for a current tally on-the-fly is unfeasible to do quickly enough.
Anyways, so the aggregating is pretty straight forward as long as the students don't change their subject.
But now I want to add a feature to allow students to change their subject.
My previous approach was this: while updating the Tally, I keep a counter variable up to which row of students I've already accounted for. Next time I only consider records added after that row.
Also the reason why I keep a counter is because the Students table is massive, and I don't want to scan the whole table every time as it won't scale well.
It works fine if all students are unique and no one changes their subject.
But it breaks apart now because I can no longer account for rows that come before the counter and were updated.
My second approach was using a updated_at field (instead of counter) and keep track of newly modified rows that way.
But still I don't know how to actually update the Tally accurately.
Say, Erik changes his subject from "math" to "english" in the above scenario. When I run the script to update the Tally, while it does finds the newly updated row but it simply says {"erik": "english"}. How would I know what it changed from? I need to know this to correctly decrement "math" in the Tally table while incrementing "english".
Is there a way this can be solved?
To summarize my question again, I want to find a way to be able to update the Tally table accurately (a process that runs at regular interval) with the updated/modified rows in the Student table.
I'm using NodeJS and PostgreSQL if it matters.
Why don't you do it when student add subject, remove subject, or change subject.
When student add new subject: Just increase UPDATE tbl_tally SET student = student + 1 WHERE subject = :subject;
When student remove subject: Just decrease UPDATE tbl_tally SET student = student - 1 WHERE subject = :subject;
When student change subject: Just increase new subject by one and decrease old subject by one
UPDATE tbl_tally SET student = student - 1 WHERE subject = :old_subject;
UPDATE tbl_tally SET student = student + 1 WHERE subject = :new_subject;
I am not familiar with PostgreSQL, but in MySQL, you can even do it with trigger. I think PostgreSQL also has trigger.

I am wondering if there is an elegant way to apply either a combination of query, Arrayformula, sort, functions in Google Sheets to do the following

Google Sheets Problem. I have a master list that has columns which are employers, job post, # of spots, parameter x, parameter y,...etc.
"Master Sheet" #a tab
Employers Job Spots
John Cleaner 1
Mike Cleaner 2
John Cleaner 3
John Server 5
Alice Cook 1
Dave Cook 1
Mary Cleaner 3
Alice Server 5
Alice Cleaner 2
Dave Server 4
Mike Server 3
Alice Server 1
This is what I would like "Output Sheet" #another tab with two columns. 1st is Jobs and 2nd is # of employers that account for 80% of the jobs in that category plus any additional filters. The idea is to give a single # that gives an 80/20 rule type metric. The trick is to Sort one column from highest to lowest first. I can do this but in multiple steps that seem annoyingly inefficient. I wonder if there is a better way where I can put everything in one cell and drag down or do a query function. The output looks like below.
Job # of employers that account for ~80% of all the jobs in that category + filters
Cleaner ~3
Cook 1
Server ~3
#because total Cleaner jobs is 11. 80% is 8.8. And sorting employers highest to lowest (after accounting for duplicates), 3 employers represent 80% of the Cleaner jobs available. Server total is 21, 80% is 16.8, so ~3 employers represent 80% of the Server jobs available.
Thank you all for your help.
To take 80%:
=query(A15:C26, "Select B, sum(C)*8/100 group by B label B 'Job'")
you will get
{0.88, 0.16, 1.44)
But the next you can continue by yourself

Oracle Find duplicate records that are similar but aren't exact matches

I'm trying to find a way to search a text field to identify rows that could be similar so I can identify if they are duplicates that should be merged. For example if my data looks like this:
MyText_Column
Bob
Bobby
Robert
Jane
Janey
Janie
Joe
John
Johnathan
A GROUP BY won't work because none of the values are exactly the same, but if I could have a query that would return a list of the likelihood that one row is similar would be great. Maybe there's a better layout but what I am imagining is a result like this:
Query Result
Search Compare Likely_Match
Bob Bobby 96%
Bob Robert 12%
Bob Jane 0%
Bob Janey 0%
.....
Jane Janey 87%
Jane Janie 69%
Jane Joe 12%
Then with result like that I could sort by likelihood and and visually scan to determine if the results are duplicates or not.
The UTL_MATCH package has a couple of methods to do that-- my guess is that you would want to use the Jaro-Winkler similarity algorithm. Something like
SELECT a.mytext_column search,
b.mytext_column compare,
utl_match.jaro_winkler_similarity( a.mytext_column, b.mytext_column ) similarity
FROM table_name a,
table_name b
WHERE a.<<primary key>> != b.<<primary key>>
ORDER BY utl_match.jaro_winkler_similarity( a.mytext_column, b.mytext_column ) desc
That will generate a result set of N * (N-1) rows which may be unwieldy depending on the number of rows in the original data set. You may want to restrict things by only returning the best matches for a particular search term or only returning the rows that have a similarity score greater than some threshold.
You could also use the SOUNDEX function.

How to tally and store votes for a web site?

I am using SQL Server 2005.
I have a site that people can vote on awesome motorcycles. Each time a user votes, there is one for the first bike and one vote against the second bike. Two votes are stored in the database. The vote table looks like this:
VoteID VoteDate BikeID Vote
1 2012-01-12 123 1
2 2012-01-12 125 0
3 2012-01-12 126 0
4 2012-01-12 129 1
I want to tally the votes for each bike quite frequently, say each hour. My idea is to store the tally as a percentage of contest won versus lost on the bike table as an attribute of the bike. So, if a bike won 10 contests and lost 20 contest, they would have a score (tally) of 33. I would tally up daily, weekly, and monthly scores.
BikeID BikeName DailyTally WeeklyTally MonthlyTally
1 Big Dog 5 10 50
2 Big Cat 3 15 40
3 Small Dog 9 8 0
4 Fish Face 19 21 0
Right now, there are about 500 votes per day being cast. We anticipate 2500 - 5000 per day in the next month or so.
What is the best way to tally the data and what is the best way to store it? Should the tallies be on their own table? Should a trigger be used to run a new tally each time a bike is voted on? Should a stored procedure be run hourly to get all tallies?
Any ideas would be very helpful!
Store your VoteDate as a datetime value instead of just date.
For your tallies, you can just make that a view and calculate it on the fly. This should be very simple to do using GROUP BY and DATEPART functions. If you need exact code for how to do this, please open a new question.
For that low volume of rows it doesn't make any sense to store aggregations in a table when you can just calculate them whenever you want to see them and get accurate and immediate results that are up-to-date.
I agree with #JNK try a view or just a normal stored proc to calculate the outputs on the fly. If you find it becomes too slow as your data grows I would investigate other routes then (like caching the data in another table etc). Probably worth keeping it simple to start with; you can always resuse the logic from the SP/VIEW later if you do want to setup a scheduled task.
Edit :
Removed the index view as per #Damien_The_Unbeliever comments its not deterministic and i'm stupid :)