SubQuery Aggregates in ActiveRecord

SubQuery Aggregates in ActiveRecord - sql

I'm trying to avoid using straight up SQL in my Rails app, but need to do a quite large version of this:
SELECT ds.product_id,
( SELECT SUM(units) FROM daily_sales WHERE (date BETWEEN '2015-01-01' AND '2015-01-08') AND service_type = 1 ) as wk1,
( SELECT SUM(units) FROM daily_sales WHERE (date BETWEEN '2015-01-09' AND '2015-01-16') AND service_type = 1 ) as wk2
FROM daily_sales as ds group by ds.product_id
I'm sure it can be done, but i'm struggling to write this as an active record statement. Can anyone help?

If you must do this in a single query, you'll need to write some SQL for the CASE statements. The following is what you need:
ranges = [ # ordered array of all your date-ranges
Date.new(2015, 1, 1)..Date.new(2015, 1, 8),
Date.new(2015, 1, 9)..Date.new(2015, 1, 16)
]
overall_range = (ranges.first.min)..(ranges.last.max)
grouping_sub_str = \
ranges.map.with_index do |range, i|
"WHEN (date BETWEEN '#{range.min}' AND '#{range.max}') THEN 'week#{i}'"
end.join(' ')
grouping_condition = "CASE #{grouping_sub_str} END"
grouping_columns = ['product_id', grouping_condition]
DailySale.where(date: overall_range).group(grouping_columns).sum(:units)
That will produce a hash with array keys and numeric values. A key will be of the form [product_id, 'week1'] and the value will be the corresponding sum of units for that week.

Simplify your SQL to the following and try converting it..
SELECT ds.product_id,
, SUM(CASE WHEN date BETWEEN '2015-01-01' AND '2015-01-08' AND service_type = 1
THEN units
END) WK1
, SUM(CASE WHEN date BETWEEN '2015-01-09' AND '2015-01-16' AND service_type = 1
THEN units
END) WK2
FROM daily_sales as ds
group by ds.product_id

Every rail developer sooner or later hits his/her head against the walls of Active Record query interface just to find the solution in Arel.
Arel gives you the flexibility that you need in creating your query without using loops, etc. I am not going to give runnable code rather some hints how to do it yourself:
We are going to use arel_tables to create our query. For a model called for example Product, getting the Arel table is as easy as products = Product.arel_table
Getting sum of a column is like daily_sales.project(daily_sales[:units].count).where(daily_sales[:date].gt(BEGIN_DATE).where(daily_sales[:date].lt(END_DATE). You can chain as many wheres as you want and it will be translated into SQL ANDs.
Since we need to have multiple sums in our end result you need to make use of Common Table Expressions(CTE). Take a look at docs and this answer for more info on this.
You can use those CTEs from step 3 in combination with group and you are done!

Related

Select only the row with the max value, but the column with this info is a SUM()

I have the following query:
SELECT DISTINCT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT
FROM TGFCAB CAB, TGFPAR PAR, TSIBAI BAI
WHERE CAB.CODPARC = PAR.CODPARC
AND PAR.CODBAI = BAI.CODBAI
AND CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI
Which the result is this. Company names and neighborhood hid for obvious reasons
The query at the moment, for those who don't understand Latin languages, is giving me clients, company name, company neighborhood, and the total value of movements.
in the WHERE clause it is only filtering sales movements of companies from an established city.
But if you notice in the Select statement, the column that is retuning the value that aggregates the total amount of value of sales is a SUM().
My goal is to return only the company that have the maximum value of this column, if its a tie, display both of em.
This is where i'm struggling, cause i can't seem to find a simple solution. I tried to use
WHERE AMOUNT = MAX(AMOUNT)
But as expected it didn't work

You tagged the question with the whole bunch of different databases; do you really use all of them?
Because, "PL/SQL" reads as "Oracle". If that's so, here's one option.
with temp as
-- this is your current query
(select columns,
sum(vrlnota) as amount
from ...
where ...
)
-- query that returns what you asked for
select *
from temp t
where t.amount = (select max(a.amount)
from temp a
);

You should be able to achieve the same without the need for a subquery using window over() function,
WITH T AS (
SELECT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT,
MAX(VLRNOTA) over() AS MAMOUNT
FROM TGFCAB CAB
JOIN TGFPAR PAR ON PAR.CODPARC = CAB.CODPARC
JOIN TSIBAI BAI ON BAI.CODBAI = PAR.CODBAI
WHERE CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY CAB.CODPARC, PAR.RAZAOSOCIAL, BAI.NOMEBAI
)
SELECT CODPARC, RAZAOSOCIAL, NOMEBAI, AMOUNT
FROM T
WHERE AMOUNT=MAMOUNT
Note it's usually (always) beneficial to join tables using clear explicit join syntax. This should be fine cross-platform between Oracle & SQL Server.

Getting a query result taken from the same data but with temporary var

I got a simple thing to do.
Well, maybe not, but someone somewhere surely can help me out : P
I got a simple data structure that contains
expedition date
delivery date
transaction type
I would need to create a query which could
order the rows by a date specific to the transaction type.
(ie : using the expedition date for transaction of type "selling", and delivery date for transaction of type "purchasing")
I was wondering if there was a more efficient way to do this than
by fetching 2 times the same data with different clause where(while adding a column used to order them(tempDate)) and then using another select to encompass these 2 queries to which I would add the order clause on the tempDate.
--> the initial fetching I would do 2 times works on many tables(many, many, many joins)
Basically my current solution is :
Select * from
(
Select ...
date_exp as dateTemp;
from ...
where conditions* And dateRelatedCondition
UNION
Select ...
date_livraison as dateTemp;
from ...
Where conditions* And NOT(dateRelatedCondition)
) as comboSelect
Order By MIN(comboSelect.dateTemp)
OVER(PARTITION BY(REF_product)),
(REF_product),
comboSelect.dateTemp asc;
*
->Those conditions are the same in both inner Select query
Thank you for your time.

Without the UNION:
dateRelatedCondition should be removed from WHERE and put to the SELECT like:
CASE WHEN dateRelatedCondition THEN date_exp ELSE date_livraison END as dateTemp
Without the subquery:
in ORDER BY you need the same expression in the window function:
Order By MIN(CASE WHEN dateRelatedCondition THEN date_exp ELSE date_livraison END)
OVER(PARTITION BY(REF_product)),
(REF_product),
dateTemp asc

You mean like this?:
ORDER BY CASE
WHEN TransactionType = 'Selling' THEN ExpeditionDate
WHEN TransactionType = 'purchasing' THEN DeliveryDate
END

Calculated column syntax when using a group by function Teradata

I'm trying to include a column calculated as a % of OTYPE.
IE
Order type | Status | volume of orders at each status | % of all orders at this status
SELECT
T.OTYPE,
STATUS_CD,
COUNT(STATUS_CD) AS STATVOL,
(STATVOL / COUNT(ROW_ID)) * 100
FROM Database.S_ORDER O
LEFT JOIN /* Finding definitions for status codes & attaching */
(
SELECT
ROW_ID AS TYPEJOIN,
"NAME" AS OTYPE
FROM database.S_ORDER_TYPE
) T
ON T.TYPEJOIN = ORDER_TYPE_ID
GROUP BY (T.OTYPE, STATUS_CD)
/*Excludes pending and pending online orders */
WHERE CAST(CREATED AS DATE) = '2018/09/21' AND STATUS_CD <> 'Pending'
AND STATUS_CD <> 'Pending-Online'
ORDER BY T.OTYPE, STATUS_CD DESC
OTYPE STATUS_CD STATVOL TOTALPERC
Add New Service Provisioning 2,740 100
Add New Service In-transit 13 100
Add New Service Error - Provisioning 568 100
Add New Service Error - Integration 1 100
Add New Service Complete 14,387 100
Current output just puts 100 at every line, need it to be a % of total orders
Could anyone help out a Teradata & SQL student?
The complication making this difficult is my understanding of the group by and count syntax is tenuous. It took some fiddling to get it displayed as I have it, I'm not sure how to introduce a calculated column within this combo.
Thanks in advance

There are a couple of places the total could be done, but this is the way I would do it. I also cleaned up your other sub query which was not required, and changed the date to a non-ambiguous format (change it back if it cases an issue in Teradata)
SELECT
T."NAME" as OTYPE,
STATUS_CD,
COUNT(STATUS_CD) AS STATVOL,
COUNT(STATUS_CD)*100/TotalVol as Pct
FROM database.S_ORDER O
LEFT JOIN EDWPRDR_VW40_SBLCPY.S_ORDER_TYPE T on T.ROW_ID = ORDER_TYPE_ID
cross join (select count(*) as TotalVol from database.S_ORDER) Tot
GROUP BY T."NAME", STATUS_CD, TotalVol
WHERE CAST(CREATED AS DATE) = '2018-09-21' AND STATUS_CD <> 'Pending' AND STATUS_CD <> 'Pending-Online'
ORDER BY T."NAME", STATUS_CD DESC

A where clause comes before a group by clause, so the query
shown in the question isn't valid.
Always prefix every column reference with the relevant table alias, below I have assumed that where you did not use the alias that it belongs to the orders table.
You probably do not need a subquery for this left join. While there are times when a subquery is needed or good for performance, this does not appear to be the case here.
Most modern SQL compliant databases provide "window functions", and Teradata does do this. They are extremely useful, and here when you combine count() with an over clause you can get the total of all rows without needing another subquery or join.
Because there is neither sample data nor expected result provided with the question I do not actually know which numbers you really need for your percentage calculation. Instead I have opted to show you different ways to count so that you can choose the right ones. I suspect you are getting 100 for each row because the count(status_cd) is equal to the count(row_id). You need to count status_cd differently to how you count row_id. nb: The count() function increases by 1 for every non-null value
I changed the way your date filter is applied. It is not efficient to change data on every row to suit constants in a where clause. Leave the data untouched and alter the way you apply the filter to suit the data, this is almost always more efficient (search sargable)
SELECT
t.OTYPE
, o.STATUS_CD
, COUNT(o.STATUS_CD) count_status
, COUNT(t.ROW_ID count_row_id
, count(t.row_id) over() count_row_id_over
FROM dbo.S_ORDER o
LEFT JOIN dbo.S_ORDER_TYPE t ON t.TYPEJOIN = o.ORDER_TYPE_ID
/*Excludes pending and pending online orders */
WHERE o.CREATED >= '2018-09-21' AND o.CREATED < '2018-09-22'
AND o.STATUS_CD <> 'Pending'
AND o.STATUS_CD <> 'Pending-Online'
GROUP BY
t.OTYPE
, o.STATUS_CD
ORDER BY
t.OTYPE
, o.STATUS_CD DESC

As #TomC already noted, there's no need for the join to a Derived Table. The simplest way to get the percentage is based on a Group Sum. I also changed the date to an Standard SQL Date Literal and moved the where before group by.
SELECT
t."NAME",
o.STATUS_CD,
Count(o.STATUS_CD) AS STATVOL,
-- rule of thumb: multiply first then divide, otherwise you will get unexpected results
-- (Teradata rounds after each calculation)
100.00 * STATVOL / Sum(STATVOL) Over ()
FROM database.S_ORDER AS O
/* Finding definitions for status codes & attaching */
LEFT JOIN database.S_ORDER_TYPE AS t
ON t.ROW_ID = o.ORDER_TYPE_ID
/*Excludes pending and pending online orders */
-- if o.CREATED is a Timestamp there's no need to apply the CAST
WHERE Cast(o.CREATED AS DATE) = DATE '2018-09-21'
AND o.STATUS_CD NOT IN ('Pending', 'Pending-Online')
GROUP BY (T.OTYPE, o.STATUS_CD)
ORDER BY T.OTYPE, o.STATUS_CD DESC
Btw, you probably don't need an Outer Join, Inner should return the same result.

Postgresql query for every day sold stock count

I have project on CRM which maintains product sales order for every organization.
I want to count everyday sold stock which I have managed to do by looping over by date but obviously it is a ridiculous method and taking more time and memory.
Please help me to find out it in single query. Is it possible?
Here is my database structure for your reference.
product : id (PK), name
organization : id (PK), name
sales_order : id (PK), product_id (FK), organization_id (FK), sold_stock, sold_date(epoch time)
Expected Output for selected month :
organization | product | day1_sold_stock | day2_sold_stock | ..... | day30_sold_stock
http://sqlfiddle.com/#!15/e1dc3/3

Create tablfunc :
CREATE EXTENSION IF NOT EXISTS tablefunc;
Query :
select "proId" as ProductId ,product_name as ProductName,organizationName as OrganizationName,
coalesce( "1-day",0) as "1-day" ,coalesce( "2-day",0) as "2-day" ,coalesce( "3-day",0) as "3-day" ,
coalesce( "4-day",0) as "4-day" ,coalesce( "5-day",0) as "5-day" ,coalesce( "6-day",0) as "6-day" ,
coalesce( "7-day",0) as "7-day" ,coalesce( "8-day",0) as "8-day" ,coalesce( "9-day",0) as "9-day" ,
coalesce("10-day",0) as "10-day" ,coalesce("11-day",0) as "11-day" ,coalesce("12-day",0) as "12-day" ,
coalesce("13-day",0) as "13-day" ,coalesce("14-day",0) as "14-day" ,coalesce("15-day",0) as"15-day" ,
coalesce("16-day",0) as "16-day" ,coalesce("17-day",0) as "17-day" ,coalesce("18-day",0) as "18-day" ,
coalesce("19-day",0) as "19-day" ,coalesce("20-day",0) as "20-day" ,coalesce("21-day",0) as"21-day" ,
coalesce("22-day",0) as "22-day" ,coalesce("23-day",0) as "23-day" ,coalesce("24-day",0) as "24-day" ,
coalesce("25-day",0) as "25-day" ,coalesce("26-day",0) as "26-day" ,coalesce("27-day",0) as"27-day" ,
coalesce("28-day",0) as "28-day" ,coalesce("29-day",0) as "29-day" ,coalesce("30-day",0) as "30-day" ,
coalesce("31-day",0) as"31-day"
from crosstab(
'select hist.product_id,pr.name,o.name,EXTRACT(day FROM TO_TIMESTAMP(hist.sold_date/1000)),sum(sold_stock)
from sales_order hist
left join product pr on pr.id = hist.product_id
left join organization o on o.id = hist.organization_id
where EXTRACT(MONTH FROM TO_TIMESTAMP(hist.sold_date/1000)) =5
and EXTRACT(YEAR FROM TO_TIMESTAMP(hist.sold_date/1000)) = 2017
group by hist.product_id,pr.name,EXTRACT(day FROM TO_TIMESTAMP(hist.sold_date/1000)),o.name
order by o.name,pr.name',
'select d from generate_series(1,31) d')
as ("proId" int ,product_name text,organizationName text,
"1-day" float,"2-day" float,"3-day" float,"4-day" float,"5-day" float,"6-day" float
,"7-day" float,"8-day" float,"9-day" float,"10-day" float,"11-day" float,"12-day" float,"13-day" float,"14-day" float,"15-day" float,"16-day" float,"17-day" float
,"18-day" float,"19-day" float,"20-day" float,"21-day" float,"22-day" float,"23-day" float,"24-day" float,"25-day" float,"26-day" float,"27-day" float,"28-day" float,
"29-day" float,"30-day" float,"31-day" float);
Please note, use PostgreSQL Crosstab Query. I have used coalesce for handling null values(Crosstab Query to show "0" when there is null data to return).

Following query will help to find the same:
select o.name,
p.name,
sum(case when extract (day from to_timestamp(sold_date))=1 then sold_stock else 0 end)day1_sold_stock,
sum(case when extract (day from to_timestamp(sold_date))=2 then sold_stock else 0 end)day2_sold_stock,
sum(case when extract (day from to_timestamp(sold_date))=3 then sold_stock else 0 end)day3_sold_stock,
from sales_order so,
organization o,
product p
where so.organization_id=o.id
and so.product_id=p.id
group by o.name,
p.name;
I just provided logic to find for 3 days, you can implement the same for rest of the days.
basically first do basic joins on id, and then check if each date(after converting epoch to timestamp and then extract day).

You have a few options here but it is important to understand the limitations first.
The big limitation is that the planner needs to know the record size before the planning stage, so this has to be explicitly defined, not dynamically defined. There are various ways of getting around this. At the end of the day, you are probably going to have somethign like Bavesh's answer, but there are some tools that may help.
Secondly, you may want to aggregate by date in a simple query joining the three tables and then pivot.
For the second approach, you could:
You could do a simple query and then pull the data into Excel or similar and create a pivot table there. This is probably the easiest solution.
You could use the tablefunc extension to create the crosstab for you.
Then we get to the first problem which is that if you are always doing 30 days, then it is easy if tedious. But if you want to do every day for a month, you run into the row length problem. Here what you can do is create a dynamic query in a function (pl/pgsql) and return a refcursor. In this case the actual planning takes place in the function and the planner doesn't need to worry about it on the outer level. Then you call FETCH on the output.

MySQL to PostgreSQL: GROUP BY issues

So I decided to try out PostgreSQL instead of MySQL but I am having some slight conversion problems. This was a query of mine that samples data from four tables and spit them out all in on result.
I am at a loss of how to convey this in PostgreSQL and specifically in Django but I am leaving that for another quesiton so bonus points if you can Django-fy it but no worries if you just pure SQL it.
SELECT links.id, links.created, links.url, links.title, user.username, category.title, SUM(votes.karma_delta) AS karma, SUM(IF(votes.user_id = 1, votes.karma_delta, 0)) AS user_vote
FROM links
LEFT OUTER JOIN `users` `user` ON (`links`.`user_id`=`user`.`id`)
LEFT OUTER JOIN `categories` `category` ON (`links`.`category_id`=`category`.`id`)
LEFT OUTER JOIN `votes` `votes` ON (`votes`.`link_id`=`links`.`id`)
WHERE (links.id = votes.link_id)
GROUP BY votes.link_id
ORDER BY (SUM(votes.karma_delta) - 1) / POW((TIMESTAMPDIFF(HOUR, links.created, NOW()) + 2), 1.5) DESC
LIMIT 20
The IF in the select was where my first troubles began. Seems it's an IF true/false THEN stuff ELSE other stuff END IF yet I can't get the syntax right. I tried to use Navicat's SQL builder but it constantly wanted me to place everything I had selected into the GROUP BY and that I think it all kinds of wrong.
What I am looking for in summary is to make this MySQL query work in PostreSQL. Thank you.
Current Progress
Just want to thank everybody for their help. This is what I have so far:
SELECT links_link.id, links_link.created, links_link.url, links_link.title, links_category.title, SUM(links_vote.karma_delta) AS karma, SUM(CASE WHEN links_vote.user_id = 1 THEN links_vote.karma_delta ELSE 0 END) AS user_vote
FROM links_link
LEFT OUTER JOIN auth_user ON (links_link.user_id = auth_user.id)
LEFT OUTER JOIN links_category ON (links_link.category_id = links_category.id)
LEFT OUTER JOIN links_vote ON (links_vote.link_id = links_link.id)
WHERE (links_link.id = links_vote.link_id)
GROUP BY links_link.id, links_link.created, links_link.url, links_link.title, links_category.title
ORDER BY links_link.created DESC
LIMIT 20
I had to make some table name changes and I am still working on my ORDER BY so till then we're just gonna cop out. Thanks again!

Have a look at this link GROUP BY
When GROUP BY is present, it is not
valid for the SELECT list expressions
to refer to ungrouped columns except
within aggregate functions, since
there would be more than one possible
value to return for an ungrouped
column.
You need to include all the select columns in the group by that are not part of the aggregate functions.

A few things:
Drop the backticks
Use a CASE statement instead of IF() CASE WHEN votes.use_id = 1 THEN votes.karma_delta ELSE 0 END
Change your timestampdiff to DATE_TRUNC('hour', now()) - DATE_TRUNC('hour', links.created) (you will need to then count the number of hours in the resulting interval. It would be much easier to compare timestamps)
Fix your GROUP BY and ORDER BY

Try to replace the IF with a case;
SUM(CASE WHEN votes.user_id = 1 THEN votes.karma_delta ELSE 0 END)
You also have to explicitly name every column or calculated column you use in the GROUP BY clause.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas