understanding group by statements in rails - sql

Given a invoices table like this:
invoice_date customer total
2012/01/01 A 780
2013/05/01 A 3800
2013/12/01 A 1500
2012/07/01 B 15
2013/03/01 B 21
Say that i want both:
the count of invoices of each customer of each year
the sum of the amounts of all the invoices of each customer of each year
the max amount among all the invoices of each customer of each year
That is, in SQL, very easily:
SELECT CUSTOMER, YEAR(invoice_date) as INVOICE_YEAR, MAX(total) AS MAX_TOTAL, SUM(total) AS SUM_AMOUNTS, count(*) AS INVOICES_NUM AS SUM_TOTAL FROM invoices GROUP BY YEAR(invoice_date), CUSTOMER;
(the function to extract the year of a date may be YEAR(date) or something else depending on the database server, on sqllite is strftime('%y', invoice_date))
Ok, i've tryed to translate this in rails/ActiveRecord:
Invoice.count(:group => 'customer')
This works, but how can i get both count and sum and max?
The idea i'm familiar with is that (in SQL) a group by generates the rows (well, to be correct, determines which rows should exist in the result table), and then you pass an arbitrary number of aggregation functions that are applyed on every disaggregate set of rows that are behind a single result row. E.G: group by customer means: one row for Customer A, one row for customer B; then I can pass how many aggregation function i want: count(*), max(total), max(date), min(total) just to list the most common.
Looking at the rails ActiveRecord API it seems that you're supposed to do just one function at a time, because the group is an argument of the count. And if i want a multiple aggregation functions, say max, sum etc?
Second attempt
irb> i = Invoice.select('customer, sum(total)').group('customer')
Invoice Load (0.3ms) SELECT customer, sum(total) AS TOTAL_GROUP FROM "invoices" GROUP BY customer
=> [#, #]
That is: it doesn't give back the field with the sum...

Well it does, it just doesn't get printed out.
Say you query is i = Invoice.select('customer, sum(total) as sum_total').group('customer')
So i is an array(technically it's not an array, but not important here) containing all the result. So i[0].sum_total will give you the sum of the first customer, but of course you should iterate it to get everything you want.

Related

I want NAV price as per (Today date minus 1) date

I have two tables. One is NAV where product daily new price is updated. Second is TDK table where item wise stock is available.
Now I want to get a summery report as per buyer name where all product wise total will come and from table one latest price will come.
I have tried below query...
SELECT dbo.TDK.buyer, dbo.NAV.Product_Name, sum(dbo.TDK.TD_UNITS) as Units, sum(dbo.TDK.TD_AMT) as 'Amount',dbo.NAV.NAValue
FROM dbo.TDK INNER JOIN
dbo.NAV
ON dbo.TDK.Products = dbo.NAV.Product_Name
group by dbo.TDK.buyer, dbo.NAV.Product_Name, dbo.NAV.NAValue
Imnportant: Common columns in both tables...
Table one NAV has column as Products
Table two TDK has column as Product_Name
If I have NAValue 4 records for one product then this query shows 4 lines with same total.
What I need??
I want this query to show only one line with latest NAValue price.
I want display one more line with Units*NAValue (latest) as "Latest Market Value".
Please guide.
What field contains the quote date? I am assuming you have a DATIME field, quoteDate, in dbo.NAV table and my other assumption is that you only store the Date part (i.e. mid-night, time = 00:00:00).
SELECT
t.buyer,
n.Product_Name,
sum(t.TD_UNITS) as Units,
sum(t.TD_AMT) as 'Amount',
n.NAValue
FROM dbo.TDK t
INNER JOIN dbo.NAV n
ON t.Products = n.Product_Name
AND n.quoteDate > getdate()-2
group by t.buyer, n.Product_Name, n.NAValue, n.QuoteDate
GetDate() will give you the current date and time. Subtracting 2 would get it before yesterday but after the day before yesterday.
Also, add n.quoteDate in your select and group by. Even though you don't need it, in case that one day you have a day of bad data with double record in NAV table, one with midnight time and another with 6 PM time.
Your code looks like SQL Server. I think you just want APPLY:
SELECT t.buyer, n.Product_Name, t.TD_UNITS as Units, t.TD_AMT as Amount, n.NAValue
FROM dbo.TDK t CROSS APPLY
(SELECT TOP (1) n.*
FROM dbo.NAV n
WHERE t.Products = n.Product_Name
ORDER BY ?? DESC -- however you define "latest"
) n;

Find most recent date of purchase in user day table

I'm trying to put together a query that will fetch the date, purchase amount, and number of transactions of the last time each user made a purchase. I am pulling from a user day table that contains a row for each time a user does anything in the app, purchase or not. Basically all I am trying to get is the most recent date in which the number of transactions field was greater than zero. The below query returns all days of purchase made by a particular user when all I'm looking for is the last purchase so just the 1st row shown in the attached screenshot is what I am trying to get.
screen shot of query and result set
select tuid, max(event_day),
purchases_day_rev as last_dop_rev,
purchases_day_num as last_dop_quantity,
purchases_day_rev/nullif(purchases_day_num,0) as last_dop_spend_pp
from
(select tuid, event_day,purchases_day_rev,purchases_day_num
from
app.user_day
where purchases_day_num > 0
and tuid='122d665e-1d71-4319-bb0d-05c7f37a28b0'
group by 1,2,3,4) a
group by 1,3,4,5
I'm not going to comment on the logic of your query... if all you want is the first row of your result set, you can try:
<your query here> ORDER BY 2 DESC LIMIT 1 ;
Where ORDER BY 2 DESC orders the result set on max(event_day) and LIMIT 1 extracts only the first row.
I don't know all of the ins and outs of your data, but I don't understand why you are grouping within the subquery without any aggregate function (sum, average, min, max, etc). With that said, I would try something like this:
select tuid
,event_day
,purchases_day_rev as last_dop_rev
,purchases_day_num as last_dop_quantity
,purchases_day_rev/nullif(purchases_day_num,0) as last_day_spend_pp
from app.user_day a
inner join
(
select tuid
,max(event_day) as MAX_DAY
from app.user_day
where purchases_day_num > 0
and tuid='122d665e-1d71-4319-bb0d-05c7f37a28b0'
group by 1
) b
on a.tuid = b.tuid
and a.event_day = b.max_day;

SQL-How to Sum Data of Clients Over Time?

Goal: SUM/AVG Client Data over multiple dates/transactions.
Detailed Question: How do I properly Group clients ('PlayerID') then SUM the int(MinsPlayed), then AVG (AvgBet)?
Current Issue: my Results are giving individual transactions day by day over the 90 day time period instead of the SUM/AVG over the 90 days.
Current Script/Results: FirstName-Riley is showing each individual daily transaction instead of 1 total SUM/AVG over set time period
Firstly, you don't need to use DISTINCT as you are going to be aggregating the results using GROUP BY, so you can take that out.
The reason you are returning a row for each transaction is that your GROUP BY clause includes the column you are trying to aggregate (e.g. TimePlayed). Typically, you only want to GROUP BY the columns that are not being aggregated, so remove all the columns from the GROUP BY clause that you are aggregating using SUM or AVG (TimePlayed, PlayerSkill etc.).
Here's your current SQL:
SELECT DISTINCT CDS_StatDetail.PlayerID,
StatType,
FirstName,
LastName,
Email,
SUM(TimePlayed)/60 AS MinsPlayed,
SUM(CashIn) AS AvgBet,
SUM(PlayerSkill) AS AvgSkillRating,
SUM(PlayerSpeed) AS Speed,
CustomFlag1
FROM CDS_Player INNER JOIN CDS_StatDetail
ON CDS_Player.Player_ID = CDS_StatDetail.PlayerID
WHERE StatType='PIT' AND CDS_StatDetail.GamingDate >= '1/02/17' and CDS_StatDetail.GamingDate <= '4/02/2017' AND CustomFlag1='N'
GROUP BY CDS_StatDetail.PlayerID, StatType, FirstName, LastName, Email, TimePlayed, CashIn, PlayerSkill, PlayerSpeed, CustomFlag1
ORDER BY CDS_StatDetail.PlayerID
You want something like:
SELECT CDS_StatDetail.PlayerID,
SUM(TimePlayed)/60 AS MinsPlayed,
AVG(CashIn) AS AvgBet,
AVG(PlayerSkill) AS AvgSkillRating,
SUM(PlayerSpeed) AS Speed,
FROM CDS_Player INNER JOIN CDS_StatDetail
ON CDS_Player.Player_ID = CDS_StatDetail.PlayerID
WHERE StatType='PIT' AND CDS_StatDetail.GamingDate BETWEEN '2017-01-02' AND '2017-04-02' AND CustomFlag1='N'
GROUP BY CDS_StatDetail.PlayerID
Next time, please copy and paste your text, not just linking to a screenshot.

Subtracting 2 values from a query and sub-query using CROSS JOIN in SQL

I have a question that I'm having trouble answering.
Find out what is the difference in number of invoices and total of invoiced products between May and June.
One way of doing it is to use sub-queries: one for June and the other one for May, and to subtract the results of the two queries. Since each of the two subqueries will return one row you can (should) use CROSS JOIN, which does not require the "on" clause since you join "all" the rows from one table (i.e. subquery) to all the rows from the other one.
To find the month of a certain date, you can use MONTH function.
Here is the Erwin document
This is what I got so far. I have no idea how to use CROSS JOIN in this situation
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 5
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 6
If you guys can help that would be great.
Thanks
The problem statement suggests you use the following steps:
Construct a query, with a single result row giving the values for June.
Construct a query, with a single result row giving the values for May.
Compare the results of the two queries.
The issue is that, in SQL, it's not super easy to do that third step. One way to do it is by doing a cross join, which yields a row containing all the values from both subqueries; it's then easy to use SELECT (b - a) ... to get the differences you're looking for. This isn't the only way to do the third step, but what you have definitely doesn't work.
can't you do something with subqueries? I haven't tested this, but something like the below should give you 4 columns, invoices and products for may and june.
select (
select 'stuff' a, count(*) as june_invoices, sum(products) as products from invoices
where month = 'june'
) june , (
select 'stuff' a, count(*) as may_invoices, sum(products) as products from invoices
where month = 'may'
) may
where june.a = may.a

Aggregated data from transactional table for sparklines

I'm working on an Ruby-on-Rails app which contains a list type of report. Two columns within that table are an aggregation from a transactional table.
So let's say we have these two tables:
**items**
id
name
group
price
**transactions**
id
item_id
type
date
qty
These two tables are connected with item_id in the transactions table.
Now I want to show some set of lines within the items table in a table and have two calculated columns within that table:
Calculated column 1 (Sparkline data):
Sparkline for transactions for the item with type="actuals" for the last 12 months. The result from the database should be text with aggregated qty for each month seperated by comma. Example:
15,20,0,12,44,33,6,4,33,23,11,65
Calculated column 2 (6m total sale):
Total qty for the item multiplied by sale for the last 6 months.
So the results would how columns like these:
Item name - Sparkline data - 6m total sale
So the result could by many thousand of lines, but would probably be paged.
So the question is, how is the most straightforward way of doing this in Rails models which doesn't sacrifice to much performance? Although this is a ruby-on-rails question it might contain more of a sql type solution.
The core sql could be something similar:
select
i.id,
i.name,
y.sparkline,
i.price*s.sum totalsale6m
from
items i left join
(select
x.item_id,
GROUP_CONCAT(x.sumqtd order by datemonth asc SEPARATOR ',') sparkline
from
(select
t.item_id,
date_format(date, '%m') datemonth,
sum(qtd) sumqtd
from
transactions t
where
t.type='actuals' and
t.date>date_sub(now(), interval 1 year)
group by
t.item_id, datemonth
) x
group by
x.item_id
) y on i.id=y.item_id
left join
(select
t.item_id,
sum(qtd) sumqtd
from
transactions t
where
t.date>date_sub(now(), interval 6 month)
group by
t.item_id
) s on i.id=s.item_id
group by
i.id, i.name
A few comments:
I wasn't able to test it without real data.
If there are gaps in the sales, I mean no sales in a given month, then the list will not contain 12 elements. In this case you need to adjust x,y tables
If you need the result only for a given few items, then probably you can put the item id filter deeper into the subqueries sparing time.