Grouping, totaling in Rails and Active Record - ruby-on-rails-3

I'm trying to group a series of records in Active Record so I can do some calculations to normalize that quantity attribute of each record for example:
A user enters a date and a quantity. Dates are not unique, so I may have 10 - 20 quantities for each date. I need to work with only the totals for each day, not every individual record. Because then, after determining the highest and lowest value, I convert each one by basically dividing by n which is usually 10.
This is what I'm doing right now:
def heat_map(project, word_count, n_div)
return "freezing" if word_count == 0
words = project.words
counts = words.map(&:quantity)
max = counts.max
min = counts.min
return "max" if word_count == max
return "min" if word_count == min
break_point = (max - min).to_f/n_div.to_f
heat_index = (((word_count - min).to_f)/break_point).to_i
end
This works great if I display a table of all the word counts, but I'm trying to apply the heat map to a calendar that displays running totals for each day. This obviously doesn't total the days, so I end up with numbers that are out of the normal scale.
I can't figure out a way to group the word counts and total them by day before I do the normalization. I tried doing a group_by and then adding the map call, but I got an error an undefined method error. Any ideas? I'm also open to better / cleaner ways of normalizing the word counts, too.

Hard to answer without knowing a bit more about your models. So I'm going to assume that the date you're interested in is just the created_at date in the words table. I'm assuming that you have a field in your words table called word where you store the actual word.
I'm also assuming that you might have multiple entries for the same word (possibly with different quantities) in the one day.
So, this will give you an ordered hash of counts of words per day:
project.words.group('DATE(created_at)').group('word').sum('quantity')
If those guesses make no sense, then perhaps you can give a bit more detail about the structure of your models.

Related

Summing the value of the aggregate function Last()

enter image description hereI am trying to design a report in ssrs that returns the last opening stock for each product in the dataset. To achieve this I used
=Last(Fields!CustProdAdj_new_openingstockValue.Value).
This works fine. But where I encountered a problem is in getting the sum of all the opening stock for each product. I tried using
=sum(Last(Fields!CustProdAdj_new_openingstockValue.Value))
but I got the error message
[Error on Preview]
Please is there another way to go about this
I have tried using aggregate(), runningValue(), to no avail
This is the dataset
This is the report layout
On previewing having used max()
This is probably easier to do directly in the dataset query but assuming you cannot change that, then this should work...
This assumes your data is ordered by the CustProdAdj_createdon column and that this is a date, datetime or some other ordered value, change this bit if required.
=SUM(
IIF(Fields!CustProdAdj_createdon.Value = Max(Fields!CustProdAdj_createdon.Value, "MyRowGroupNameHere"),
CustProdAdj_new_openingstockValue,
0)
)
Change the MyRowGroupNameHere to be the name of the rowgroup spelled exactly as it is in the rowgroup panel below the main design panel. Case sensitive and include quotes.
What this does is, for each row within the rowgroup "MyRowGroupNameHere", compare the CustProdAdj_createdon date to the max CustProdAdj_createdon across the rowgroup. . If it is the same then return the CustProdAdj_new_openingstockValue else return 0.
This will return the value required on only 1 record within the group.
For example, if you had 1 row per day then only on the last day of the month would a value be returned other than 0 because only the date of the last record would match the maximum date within the group.
Then it simply sums the results of this up.

Microsoft Access query to retrieve random transactions with seemingly impossible criteria

I was asked to assist with developing a report to retrieve a 25% sample of random transactions within a specific date range. I am not a programmer but I was able to devise the following fairly quickly:
SELECT TOP 25 PERCENT account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data, log.to_data, log.tran_id, log.init
FROM account INNER JOIN log ON account.ACCT = log.ACCT
GROUP BY account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data, log.to_data, log.tran_id, log.init
HAVING (((log.DATE) Between #2/7/2018# And #6/15/2018#) AND ((log.action_txt)="mod" Or (log.action_txt)="del") AND ((log.init)="J1X"
ORDER BY log.tran_dt
This returns 25% of the records within the date range. Each record row is unique but each account number potentially has multiple records on each day. In some cases the records have the same date and tran_id as well.
Upon further discussion with the requester, he actually wants to see all of the transactions for 25% of the accounts that have activity on each day within the date range. Thus if there were 100 accounts on 3/1/2018 with records in this table, he wants to see all of the transactions for 25 of those accounts; if there were 60 accounts on 3/2/2018 with records in this table, he wants to see all of the transactions for 15 of those accounts; and so on.
I was thinking that an Access module would work best in this scenario as I believe there are multiple parts to this. I figured that I need a function to loop through the date range and for each day:
1. Count the account numbers only one time
2. Return all of the transactions for 25% of the total accounts
But as I mentioned, I am not a programmer and I am exhausted from searching possible solutions for the many parts.
I think the key to your question is that you only really need a pseudo random selection of results for your report. So you can force the Random number generator to reorder your results based on a value in the record and the current time.
Something like this should work - I assume your actiontxt field is a text field and pull out the length of each field and apply that to current date/time to create a pseudo random number that can be sorted.
All I really do is change your ORDER BY line
See if this works for you
SELECT TOP 25 PERCENT
account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data,
log.to_data, log.tran_id, log.init
FROM account
INNER JOIN log ON account.ACCT = log.ACCT
GROUP BY account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data, log.to_data, log.tran_id, log.init
HAVING (((log.DATE) Between #2/7/2018# And #6/15/2018#) AND ((log.action_txt)="mod" Or (log.action_txt)="del") AND ((log.init)="J1X"
ORDER BY Rnd(CLng(Now()*Len(log.action_txt))-(Now()*Len(log.action_txt)));
Modified similar idea from other StackOverflow question and response

Wrapping a range of data

How would I select a rolling/wrapping* set of rows from a table?
I am trying to select a number of records (per type, 2 or 3) for each day, wrapping when I 'run out'.
Eg.
2018-03-15: YyBiz, ZzCo, AaPlace
2018-03-16: BbLocation, CcStreet, DdInc
These are rendered within a SSRS report for Dynamics CRM, so I can do light post-query operations.
Currently I get to:
2018-03-15: YyBiz, ZzCo
2018-03-16: AaPlace, BbLocation, CcStreet
First, getting a number for each record with:
SELECT name, ROW_NUMBER() OVER (PARTITION BY type ORDER BY name) as RN
FROM table
Within SSRS, I then adjust RN to reflect the number of each type I need:
OnPageNum = FLOOR((RN+num_of_type-1)/num_of_type)-1
--Shift RN to be 0-indexed.
Resulting in AaPlace, BbLocation and CcStreet having a PageNum of 0, DdInc of 1, ... YyBiz and ZzCo of 8.
Then using an SSRS Table/Matrix linked to the dataset, I set the row filter to something like:
RowFilter = MOD(DateNum, NumPages(type)) == OnPageNum
Where DateNum is essentially days since epoch, and each page has a separate table and day passed in.
At this point, it is showing only N records of type per page, but if the total number of records of a type isn't a multiple of the number of records per page of that type, there will pages with less records than required.
Is there an easier way to approach this/what's the next step?
*Wrapping such as Wraparound found in videogames, seamless resetting to 0.
To achieve this effect, I found that offsetting the RowNumber by -DateNum*num_of_type (negative for positive ordering), then modulo COUNT(type) would provide the correct "wrap around" effect.
In order to achieve the desired pagination, it then just had to be divided by num_of_type and floor'd, as below:
RowFilter: FLOOR(((RN-DateNum*num_of_type) % count(type))/num_of_type) == 0

Select case mess in JFreeChart

I have a Column(cliente_x_hora, a numeric field) i put in a interval and count the number in each interval.I have 3 textfields(number of intervals,value between intervals and initial value). When I select the two first(with 5 intervals and 1000 value), the query run flawless and generate the expect barchart.
Query(with two select textfields):
SELECT INTERVAL, COUNT(*) TOTAL FROM (
SELECT CASE WHEN CLIENTE_X_HORA>0 AND CLIENTE_X_HORA<=1000.00 THEN '0<CLIENTE_X_HORA> <=1000.00'
WHEN CLIENTE_X_HORA>1000.00 AND CLIENTE_X_HORA<=2000.00 THEN '1000.00<CLIENTE_X_HORA><=2000.00'
WHEN CLIENTE_X_HORA>2000.00 AND CLIENTE_X_HORA<=3000.00 THEN '2000.00<CLIENTE_X_HORA><=3000.00'
WHEN CLIENTE_X_HORA>3000.00 AND CLIENTE_X_HORA<=4000.00 THEN '3000.00<CLIENTE_X_HORA><=4000.00'
ELSE '4000.00<CLIENTE_X_HORA' END INTERVAL, CLIENTE_X_HORA FROM SGD_CAUSA)
GROUP BY INTERVAL ORDER BY TOTAL
The barchart is
The problem is when I select the last field(initial value with, per example 2000), my barchart go crazy(i believe is adding up the discarded values below 2000):
That ELSE(>6000) should be much smaller than is showing.How can I solve that?
Best Regards,
DDias
CLARIFICATION from OP:
The query is the same as above but begins in 2000:
SELECT CASE WHEN CLIENTE_X_HORA>2000 AND CLIENTE_X_HORA<=3000.00... and ends in 6000:ELSE '6000.00<CLIENTE_X_HORA' END INTERVAL, CLIENTE_X_HORA FROM SGD_CAUSA) GROUP BY INTERVAL ORDER BY TOTAL
put the result in table form is impractical(we are talking about over 87 thousand rows) That happens always when i give an initial value different than ZERO.
Your ELSE is just that. It includes everything that is not matched by specific WHENs.
So if you do not start from zero, that last column will include everything below a lowest limit in addition to greater than highest limit.
So if you do not want this behavior, do not use ELSE at all. Use WHEN CLIENTE_X_HORA > 6000.00 (or whatever your highest limit is) as the last condition.
EDIT:
In your internal query filter out (with WHERE) the values that are below the lowest limit.
Since we no longer have unneeded low range, you no longer need the HAVING clause we added and you can even go back to using ELSE.
If your lowest limit is zero, then you will be filtering everything below 0, which I assume is nothing.

How to add "weights" to a MySQL table and select random values according to these?

I want to create a table, with each row containing some sort of weight. Then I want to select random values with the probability equal to (weight of that row)/(weight of all rows). For example, having 5 rows with weights 1,2,3,4,5 out of 1000 I'd get approximately 1/15*1000=67 times first row and so on.
The table is to be filled manually. Then I'll take a random value from it. But I want to have an ability to change the probabilities on the filling stage.
I found this nice little algorithm in Quod Libet. You could probably translate it to some procedural SQL.
function WeightedShuffle(list of items with weights):
max_score ← the sum of every item’s weight
choice ← random number in the range [0, max_score)
current ← 0
for each item (i, weight) in items:
current ← current + weight
if current ≥ choice or i is the last item:
return item i
The easiest (and maybe best/safest?) way to do this is to add those rows to the table as many times as you want the weight to be - say I want "Tree" to be found 2x more often then "Dog" - I insert it 2 times into the table and I insert "Dog" once and just select elements at random one by one.
If the rows are complex/big then it would be best to create a separate table (weighted_Elements or something) in which you'll just have foreign keys to the real rows inserted as many times as the weights dictate.
The best possible scenario (if i understand your question properly) is to setup your table as you normally would and then add two columns both INT's.
Column 1: Weight - This column would hold your weight value going from -X to +X, X being the highest value you want to have as a weight (IE: X=100, -100 to 100). This value is populated to give the row an actual weight and increase or decrease the probability of it coming up.
Column 2: *Count** - This column would hold the count of how many times this row has come up, this column is needed only if you want to use fair weighting. Fair weighting prevents one row from always showing up. (IE: if you have one row weighted at 100 and another at 2 the row with 100 will always show up, this column will allow weight 2 to be more 'valueable' as you get more weight 100 results). This column should be incremented by 1 each time a row result is pulled but you can make the logic more advanced later so it adds the weight etc.
Logic: - Its really simple now, your query simply has to request all rows as you normally would then make an extra select that (you can change the logic here to whatever you want) takes the weights and subtracts the count and order by that column.
The end result should be a table where you will get your weights appearing more often until a certain point where the system will evenly distribute itself out (leave out column 2) and you will have a system that will always return the same weighted order unless you offset the base of the query (IE: LIMIT [RANDOM NUMBER], [NUMBER OF ROWS TO RETURN])
I'm not an expert in probability theory, but assuming you have a column called WEIGHT, how about
select FIELD_1, ... FIELD_N, (rand() * WEIGHT) as SCORE
from YOURTABLE
order by SCORE
limit 0, 10
This would give you 10 records, but you can change the limit clause, of course.
The problem is called Reservoir Sampling (https://en.wikipedia.org/wiki/Reservoir_sampling)
The A-Res algorithm is easy to implement in SQL:
SELECT *
FROM table
ORDER BY pow(rand(), 1 / weight) DESC
LIMIT 10;
I came looking for the answer to the same question - I decided to come up with this:
id weight
1 5
2 1
SELECT * FROM table ORDER BY RAND()/weight
it's not exact - but it is using random so i might not expect exact. I ran it 70 times to get number 2 10 times. I would have expect 1/6th but i got 1/7th. I'd say that's pretty close. I'd have to run a script to do it a few thousand times to get a really good idea if it's working.