How to query based on moduloed date - sql

Writing a task that will run daily, but only looking for users where created_at is at week long increments ago. I want to do something along the lines of
User.where("created_at.days_ago % 7 = 0")
How might I do this?
EDIT
For reference the task is for verifying a user's email. They can continue using the product without verifying for some amount of time, but I want to email them periodically (once per week) to verify. I'm using the heroku scheduler to do this and the max time between runs it allows is 1 day, which is why I need only the people who are on exactly 1 week increments from when they were created

You could look at generating a list of the dates themselves, using something along the lines of:
((User.minimum(:created_at).to_date)..(Date.today)).to_a.select{|d| (Date.today - d) % 7 == 0}
Since created_at is a timestamp you'd probably need to apply a SQL function to it, to truncate it to a date.
days = ((Date.today-1.years)..(Date.today)).to_a.select{|d| (Date.today - d) % 7 == 0}
User.where("created_at::date in (?)", days)

Related

How to find where a total condition exist

I am trying to create a report that will show how long an automated sprinkler system has run for. The system is comprised of several sprinklers, with each one keeping track of only itself, and then sends that information to a database. My problem is that each sprinkler has its own run time (I.E. if 5 sprinklers all ran at the same time for 10 minutes, it would report back a total run time of 50 minutes), and I want to know only the net amount of run time - in this example, it would be 10 minutes.
The database is comprised of a time stamp and a boolean, where it records the time stamp every time a sprinkler is shut on or off (its on/off state is indicated by the 1/0 of the boolean).
So, to figure out the total net time the system was on each day - whether it was 1 sprinkler running or all of them - I need to check the database for time frames where no sprinklers were turned at all (or where ANY sprinkler at all was turned on). I would think the beginning of the query would look something like
SELECT * FROM MyTable
WHERE MyBoolean = 0
AND [ ... ]
But I'm not sure what the conditional statements that would follow the AND would be like to check the time stamps.
Is there a query I can send to the database that will report back this format of information?
EDIT:
Here's the table the data is recorded to - it's literally just a name, a boolean, and a datetime of when the boolean was changed, and that's the entire database
Every time a sprinkler turns on the number of running sprinklers increments by 1, and every time one turns off the number decrements by 1. If you transform the data so you get this:
timestamp on/off
07:00:05 1
07:03:10 1
07:05:45 -1
then you have a sequence of events in order; which sprinklers they refer to is irrelevant. (I've changed the zeros to -1 for reasons that will become evident in a moment. You can do this with "(2 * value) - 1")
Now put a running total together:
select a.timestamp, (SELECT SUM(a.on_off)
FROM sprinkler_events b
WHERE b.timestamp <= a.timestamp) as run_total
from sprinkler_events a
order by a.timestamp;
where sprinkler_events is the transformed data I listed above. This will give you:
timestamp run_total
07:00:05 1
07:03:10 2
07:05:45 1
and so on. Every row in this which has a run total of zeros is a time at which all sprinklers were turned off, which I think is what you're looking for. If you need to sum the time they were on or off, you'll need to do additional processing: search for "date difference between consecutive rows" and you'll see solutions for that.
You might consider looking for whether all the sprinklers are currently off. For example:
SELECT COUNT (DISTINCT s._NAME) AS sprinkers_currently_off
FROM (
SELECT
_NAME,
_VALUE,
_TIMESTAMP,
ROW_NUMBER() OVER (PARTITION BY _NAME ORDER BY _TIMESTAMP DESC, _VALUE) AS latest_rec
FROM sprinklers
) s
WHERE
_VALUE = 0
AND latest_rec = 1
The inner query orders the records so that you can get the latest status of all the sprinklers, and the outer query counts how many are currently off. If you have 10 sprinklers you would report them all off when this query returns 10.
You could modify this by applying a date range to the inner query if you wanted to look into the past, but this should get you on the right track.

How To Compare two dates, SQL/SSIS Task

So I've got a task that takes a random 20% of a table's results from the previous day to use as a control group. These results are put into a table, and then shoved into a .CSV file for use by the employer.
That works perfectly well. The only problem is, it's in a group of tasks that are often tested, which means that when the task gets repeated, more random data gets dumped into the file - meaning manual deletion of rows. I'm looking for a fix.
Because the process is run once a day, a unique key is the TransactionDateID, formatted INT (20150603). I need to check against that column to make sure that nothing has been run on that same day. The problem is exacerbated because it involves yesterday's records.
For example. In order to check todays date to see if it has been run, getDate() would be used to get today's date, then converted to INT (20150604). But I can't simply check to see if there is a numerical difference of 1, because once the month switches, a simple +1 will throw the entire thing out of whack:
(20150631) + 1 =/= (20150701)
I'm just wondering if this is going to be casting/converting back and forth because of the difference in variable types, or if there's something I can do with a BIT to add a column if the task has been completed for the day, something along those lines.
A colleague suggested using MAX(TransactionDateID) and then checking getDate() against that column.
Unfortunately, I run into a problem the following day:
Initial task run at 2015-06-04-09:30:ss:mm
2015-06-04-11:45:ss:mm etc.. > 2015-06-04-09:30:ss:mm, DO NOT RUN
2015-06-05-09:30:ss:mm etc.. > 2015-06-04-09:30:ss:mm, I want it to run ...
To convert your day to a formatted int, try this:
DECLARE #today date = getdate()
select year(#today) * 10000 + month(#today) * 100 + day(#today)

SQL query for finding finished records published for different time periods

So imagine there is a blog and the owner can publishe his posts for different time periods, for example:
1 week
2 weeks
1 months
2 months
6 months
after which the post should be deleted.
The time period is set in the column named, well , time_period, with integer value in it with 0 standing for 1 week, 1 standing for 2 weeks and etc
The date of creation is set in created_at column.
And so I need to write a sql query to fetch all the records that have expired publication date and there are 2 ways I can see to solve this problem:
1) Write a query with a lot of conditions like:
.where("(time_period = 0 AND created_at <= :one_week_ago) OR
(time_period = 1 AND created_at <= :two_weeks_ago)",
one_week_ago: Time.now - 1.week,
two_weeks_ago: Time.now - 2.weeks)
but for all the conditions
2) Or simply fetch all the records (with find_each method) and check each one for meeting the requirements
and I will probably go with the second one but I just wonder if there's an efficient way to write a query for this kind of situation ? Maybe some database functions or something like that?
If i was doing this then i would, for my own sanity, add a date/datetime field to store the actual expiry time, which you can simply compare with the current time to see if a post has expired. This would be set from the time_period option in a before_save callback.
class Post
before_save :set_expiry_date
def set_expiry_date
self.expiry_date = self.calculate_expiry_date
end
def calculate_expiry_date
#logic which takes time_period and multiplies it by weeks or whatever you do
end
Now you can just look in the database and see which posts have expired. Keep it simple.
I agree with #a_horse_with_no_name. However if that is not an option you might be able to do something like:
where created_at <= case time_period when 0 then :one_week_ago
when 1 then :two_weeks_ago
...
end

In Crystal Report print only first record in group and leave it summable

I have a table that lists every task an operator completed during a day. This is gathered by a Shop Floor Control program. There is also a column that has the total hours worked that day, this field comes from their time punches. The table looks something like this:
Operator 1 Bestupid 0.5 8 5/12/1986
Operator 1 BeProductive 0.1 8 5/12/1986
Operator 1 Bestupidagain 3.2 8 5/12/1986
Operator 1 Belazy 0.7 8 5/13/1986
Operator 2 BetheBest 1.7 9.25 5/12/1986
I am trying to get an efficiency out of this by summing the process hours and comparing it to the hours worked. The problem is that when I do any kind of summary on the hours worked column it sums EVERY DETAIL LINE.
I have tried:
If Previous (groupingfield) = (groupingfield) Then
HoursWorked = 0
Else
HoursWorked = HoursWorked
I have tried a global three formula trick, but neither of the above leave me with a summable field, I get "A summary has been specified on a non-recurring field"
I currently use a global variable, reset in the group header, but not WhilePrintinganything. However it is missing some records and upon occasion I will get two hoursworked > 0 in the same group :(
Any ideas?
I just want to clarify, I have three groups:
Groups: Work Center --> Operator --> Date
I can summarize the process hours across any group and that's fine. However, the hours worked prints on every detail line even though it really should only print once per Date. Therefore when I summarize the Hours Worked for an operator the total is WAY off because it is adding up 8hours for each entry instead of 8 hours for each day.
Try grouping by the operators. Then create a running total for the process hours that sum for each record and reset on change of group. In the group footer you can display the running total and any other stats for that operator you care to.
Try another running total for the daily hours but pick maximum as the type of summary. Since all the records for the day will have the same hours work the maximum will be correct. Reset with the change of the date group and you should be good to go.

Simultaneous calls from CDR

I need to come up with an analysis of simultaneus events, when having only starttime and duration of each event.
Details
I've a standard CDR call detail record, that contains among others:
calldate (timedate of each call start
duration (int, seconds of call duration)
channel (a string)
What I need to come up with is some sort of analysys of simultaneus calls on each second, for a given timedate period. For example, a graph of simultaneous calls we had yesterday.
(The problem is the same if we have visitors logs with duration on a website and wish to obtain simultaneous clients for a group of web-pages)
What would your algoritm be?
I can iterate over records in the given period, and fill an array, where each bucket of the array corresponds to 1 second in the overall period. This works and seems to be fast, but if the timeperiod is big (say..1 year), I would need lots of memory (3600x24x365x4 bytes ~ 120MB aprox).
This is for a web-based, interactive app, so my memory footprint should be small enough.
Edit
By simultaneous, I mean all calls on a given second. Second would be my minimum unit. I cannot use something bigger (hour for example) becuse all calls during an hour do not need to be held at the same time.
I would implement this on the database. Using a GROUP BY clause with DATEPART, you could get a list of simultaneous calls for whatever time period you wanted, by second, minute, hour, whatever.
On the web side, you would only have to display the histogram that is returned by the query.
#eric-z-beard: I would really like to be able to implement this on the database. I like your proposal, and while it seems to lead to something, I dont quite fully understand it. Could you elaborate? Please recall that each call will span over several seconds, and each second need to count. If using DATEPART (or something like it on MySQL), what second should be used for the GROUP BY. See note on simultaneus.
Elaborating over this, I found a way to solve it using a temporary table. Assuming temp holds all seconds from tStart to tEnd, I could do
SELECT temp.second, count(call.id)
FROM call, temp
WHERE temp.second between (call.start and call.start + call.duration)
GROUP BY temp.second
Then, as suggested, the web app should use this as a histogram.
You can use a static Numbers table for lots of SQL tricks like this. The Numbers table simply contains integers from 0 to n for n like 10000.
Then your temp table never needs to be created, and instead is a subquery like:
SELECT StartTime + Numbers.Number AS Second
FROM Numbers
You can create table 'simultaneous_calls' with 3 fields: yyyymmdd Char(8),
day_second Number, -- second of the day,
count Number -- count of simultaneous calls
Your web service can take 'count' value from this table and make some statistics.
Simultaneous_calls table will be filled by some batch program which will be started every day after end of the day.
Assuming that you use Oracle, the batch may start a PL/SQL procedure which does the following:
Appends table with 24 * 3600 = 86400 records for each second of the day, with default 'count' value = 0.
Defines the 'day_cdrs' cursor for the query:
Select to_char(calldate, 'yyyymmdd') yyyymmdd,
(calldate - trunc(calldate)) * 24 * 3600 starting_second,
duration duration
From cdrs
Where cdrs.calldate >= Trunc(Sysdate -1)
And cdrs.calldate
Iterates the cursor to increment 'count' field for the seconds of the call:
For cdr in day_cdrs
Loop
Update simultaneos_calls
Set count = count + 1
Where yyyymmdd = cdr.yyyymmdd
And day_second Between cdr.starting_second And cdr.starting_second + cdr.duration;
End Loop;