Summarizing records by date in Rails 3 ActiveRecord - ruby-on-rails-3

I have a large table with many records that share a timestamp. I want to get a result set that has a column summed by timestamp. I see how you can simply use the 'sum' method to get a columns total. I need to, however, group by a date column. This is far less obvious. I know I can use 'find_by_sql' but it will be hideous to code as I have to do this for over 20 columns. I assume AR must have some magic to do this which escapes me?
Date set example:
table/model: games/Game
player_name, points_scored, game_date
john, 20, 08-20-2012
sue, 30, 08-20-2012
john, 12, 08-21-2012
sue, 10, 08-21-2012
What i want to see in my results is:
game_date, total_points
08-20-2012, 50
08-21-2012, 22
Here is a crude example of what the SQL query would look like:
SELECT game_date, SUM(points_scored)
FROM games
GROUP BY game_date
Mind you, I actually have 20 'score' columns to SUM by timestamp.
How can I simply use AR to do this? Thanks in advance.

Ok. It took some digging and playing around but I figured it out. I was hoping to find something better than 'find_by_sql' and I did, but it isn't a whole lot better. Again, knowing that I need to SUM 20+ columns by timestamp, here is the solution in the context of the example above.
results = Game.select( 'game_date, SUM(points_scored) as "points_scored"').group( 'game_date' )
Now, that doesn't look so bad, but I have to type in the 20+ SUM() within that 'select' method. Doesn't save a whole lot of work from 'find_by_sql' but it works.

Related

Partition By Logic

I have a dataset that has roughly 1 million rows. Without hard coding any claims, what would be the way to get the resulting output? From my research I determined something like DENSE_RANK or ROW_NUMBER() with a partition expression should do the trick.Is there a way to use DENSE_RANK to say "go down the list of PATNO and if the PATNO is the same, then keep going, but if it changes group the above PATNO and make them one claim".
The dates do not matter in this case. Basically I just want to find a way to tell SQL to automically recognize sets of claims based on Patno. Sometimes there are 50 lines with the same Patno which makes 1 claim, and sometimes there could be only 1-2 lines with the same Patno that make up a claim.
If you want the sum of charges for a patno, then you want a group by, I think:
select patno, sum(charges)
from t
group by patno;
I think you are overcomplicating the problem.

SQL: Reduce resultset to X rows?

I have the following MYSQL table:
measuredata:
- ID (bigint)
- timestamp
- entityid
- value (double)
The table contains >1 billion entries. I want to be able to visualize any time-window. The time window can be size of "one day" to "many years". There are measurement values round about every minute in DB.
So the number of entries for a time-window can be quite different. Say from few hundrets to several thousands or millions.
Those values are ment to be visualiuzed in a graphical chart-diagram on a webpage.
If the chart is - lets say - 800px wide, it does not make sense to get thousands of rows from database if time-window is quite big. I cannot show more than 800 values on this chart anyhow.
So, is there a way to reduce the resultset directly on DB-side?
I know "average" and "sum" etc. as aggregate function. But how can I i.e. aggregate 100k rows from a big time-window to lets say 800 final rows?
Just getting those 100k rows and let the chart do the magic is not the preferred option. Transfer-size is one reason why this is not an option.
Isn't there something on DB side I can use?
Something like avg() to shrink X rows to Y averaged rows?
Or a simple magic to just skip every #th row to shrink X to Y?
update:
Although I'm using MySQL right now, I'm not tied to this. If PostgreSQL f.i. provides a feature that could solve the issue, I'm willing to switch DB.
update2:
I maybe found a possible solution: https://mike.depalatis.net/blog/postgres-time-series-database.html
See section "Data aggregation".
The key is not to use a unixtimestamp but a date and "trunc" it, avergage the values and group by the trunc'ed date. Could work for me, but would require a rework of my table structure. Hmm... maybe there's more ... still researching ...
update3:
Inspired by update 2, I came up with this query:
SELECT (`timestamp` - (`timestamp` % 86400)) as aggtimestamp, `entity`, `value` FROM `measuredata` WHERE `entity` = 38 AND timestamp > UNIX_TIMESTAMP('2019-01-25') group by aggtimestamp
Works, but my DB/index/structue seems not really optimized for this: Query for last year took ~75sec (slow test machine) but finally got only a one value per day. This can be combined with avg(value), but this further increases query time... (~82sec). I will see if it's possible to further optimize this. But I now have an idea how "downsampling" data works, especially with aggregation in combination with "group by".
There is probably no efficient way to do this. But, if you want, you can break the rows into equal sized groups and then fetch, say, the first row from each group. Here is one method:
select md.*
from (select md.*,
row_number() over (partition by tile order by timestamp) as seqnum
from (select md.*, ntile(800) over (order by timestamp) as tile
from measuredata md
where . . . -- your filtering conditions here
) md
) md
where seqnum = 1;

Order by in subquery behaving differently than native sql query?

So I am honestly a little puzzled by this!
I have a query that returns a set of transactions that contain both repair costs and an odometer reading at the time of repair on the master level. To get an accurate Cost per mile reading I need to do a subquery to get both the first meter reading between a starting date and an end date, and an ending meter.
(select top 1 wf2.ro_num
from wotrans wotr2
left join wofile wf2
on wotr2.rop_ro_num = wf2.ro_num
and wotr2.rop_fac = wf2.ro_fac
where wotr.rop_veh_num = wotr2.rop_veh_num
and wotr.rop_veh_facility = wotr2.rop_veh_facility
AND ((#sdate = '01/01/1900 00:00:00' and wotr2.rop_tran_date = 0)
OR ([dbo].[udf_RTA_ConvertDateInt](#sdate) <= wotr2.rop_tran_date
AND [dbo].[udf_RTA_ConvertDateInt](#edate) >= wotr2.rop_tran_date))
order by wotr2.rop_tran_date asc) as highMeter
The reason I have the tables aliased as xx2 is because those tables are also used in the main query, and I don't want these to interact with each other except to pull the correct vehicle number and facility.
Basically when I run the main query it returns a value that is not correct; it returns the one that is second(keep in mind that the first and second have the same date.) But when I take the subquery and just copy and paste it into it's own query and run it, it returns the correct value.
I do have a work around for this, but I am just curious as to why this happening. I have searched quite a bit and found not much(other than the fact that people don't like order bys in subqueries). Talking to one of my friends that also does quite a bit of SQL scripting, it looks to us as if the subquery is ordering differently than the subquery by itsself when you have multiple values that are the same for the order by(i.e. 10 dates of 08/05/2016).
Any ideas would be helpful!
Like I said I have a work around that works in this one case, but don't know yet if it will work on a larger dataset.
Let me know if you want more code.

creating table with correct data

Im having problems finding the correct data. I have a Table which contains customers(customerID). Each customer is connected to a certain phonenumber(PhoneNr). Every number starts with 2-9.
Every customer have a callcenter(CallCenterID) they can call iff needed.
I want to know how many customers call each callcenter, divided from 2-9(PhoneNumber).
So I want to know how many calls a callcenter gets from every customer with 5, as there starting number in phonenumber.
So far so good. My Code in sql:
Select CallCenter, Count(Customers) AS Number
from ******
Where PhoneNumber Like '45%' --Just need the numbers from Danish customers.
Group By Callcenter;
Im new to much of this, but i've tried the whole day to come up with the right result.
Right now Im getting every callcenter, and the number for every call to them.
Can anyone help me?
:)
If I'm understanding correctly, you want the counts for all CallCenter's broken down by the first digit in the PhoneNumber:
SELECT CallCenter, SUBSTR(PhoneNumber, 1, 1) as startsWith, COUNT(*) as number
FROM myTable
GROUP BY CallCenter, SUBSTR(PhoneNumber, 1, 1)
ORDER BY 2, 3
If that's not what you wanted, please explain your question a bit better.

Convert row data into columns Access 07 without using PIVOT

I am on a work term from school. I am not very comfortable using SQL, I am trying to get a hold of it....
My supervisor gave me a task for a user in which I need to take row data and make columns. We used the Crosstab Wizard and automagically created the SQL to get what we needed.
Basically, we have a table like this:
ReqNumber Year FilledFlag(is a checkbox) FilledBy
1 2012 (notchecked) ITSchoolBoy
1 2012 (checked) GradStudent
1 2012 (notchecked) HighSchooler
2 etc, etc.
What the user would like is to have a listing of all of the req numbers and what is checked
Our automatic pivot code gives us all of the FilledBy options (there are 9 in total) as column headings, and groups it all by reqnumber.
How can you do this without the pivot? I would like to wrap my head around this. Nearest I can find is something like:
SELECT
SUM(IIF(FilledBy = 'ITSchoolboy',1,0) as ITSchoolboy,
SUM(IIF(FilledBy = 'GradStudent',1,0) as GradStudent, etc.
FROM myTable
Could anyone help explain this to me? Point me in the direction of a guide? I've been searching for the better part of a day now, and even though I am a student, I don't think this will be smiled upon for too long. But I would really like to know!
I think your boss' suggestion could work if you GROUP BY ReqNumber.
SELECT
ReqNumber,
SUM(IIF(FilledBy = 'ITSchoolboy',1,0) as ITSchoolboy,
SUM(IIF(FilledBy = 'GradStudent',1,0) as GradStudent,
etc.
FROM myTable
GROUP BY ReqNumber;
A different approach would be to JOIN multiple subqueries. This example pulls in 2 of your categories. If you need to extend it to 9 categories, you would have a whole lot of joining going on.
SELECT
itsb.ReqNumber,
itsb.ITSchoolboy,
grad.GradStudent
FROM
(
SELECT
ReqNumber,
FilledFlag AS ITSchoolboy
FROM myTable
WHERE FilledBy = "ITSchoolboy"
) AS itsb
INNER JOIN
(
SELECT
ReqNumber,
FilledFlag AS GradStudent
FROM myTable
WHERE FilledBy = "GradStudent"
) AS grad
ON itsb.ReqNumber = grad.ReqNumber
Please notice I'm not suggesting you should use this approach. However, since you asked about alternatives to your pivot approach (which works) ... this is one. Stay tuned in case someone else offers a simpler alternative. :-)