Negative comparison with ActiveRecord - sql

I have two tables:
sales
period_id
customer_id
product_id
value
total_sales
period_id
customer_id
product_id
value
I want to select every combination of period_id and customer_id that exists on sales but don't exists on total_sales. I believe there's a short way to do this. But every approach I thought involves N+1 queries.
How can I do this?

In my opinion this is precisely the case where it's reasonable to diverge from the purist ActiveRecord approach and bring some SQL to the table. This should do the trick perfectly well:
Sale.find_by_sql("
SELECT * FROM sales WHERE NOT EXISTS (
SELECT 1 FROM total_sales
WHERE total_sales.period_id = sales.period_id AND
total_sales.customer_id = sales.customer_id)
")
I suppose it also can be done in a more ActiveRecordish style, but it will most certainly lose clarity of the code.
Of course if you ever drop this to you codebase, you really should wrap it in a method and leave in the model, so that SQL code at least doesn't bleed into the controller.
Hope this helps!
P.S. Oh, here's the SQL fiddle I used to test this out: http://sqlfiddle.com/#!15/eaf1d/1

How about you first get the list that matches, then use that to get the rows that you actually want:
unwanted_ids = Sale.joins('inner join total_sales on sales.period_id = total_sales.period_id and sales.customer_id = total_sales.customer_id').pluck('distinct sales.id')
wanted_rows = unwanted_ids.any? Sale.where('id not in (?)', unwanted_ids) : Sale.all
wanted_rows.select(:period_id, :customer_id) #if you only want these two
This way you only need two queries.
EDIT:
You can also do this:
Sale.joins('left outer join total_sales on sales.period_id = total_sales.period_id and sales.customer_id = total_sales.customer_id').where('total_sales.period_id is null')
Does it in one query

Related

Select only the row with the max value, but the column with this info is a SUM()

I have the following query:
SELECT DISTINCT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT
FROM TGFCAB CAB, TGFPAR PAR, TSIBAI BAI
WHERE CAB.CODPARC = PAR.CODPARC
AND PAR.CODBAI = BAI.CODBAI
AND CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI
Which the result is this. Company names and neighborhood hid for obvious reasons
The query at the moment, for those who don't understand Latin languages, is giving me clients, company name, company neighborhood, and the total value of movements.
in the WHERE clause it is only filtering sales movements of companies from an established city.
But if you notice in the Select statement, the column that is retuning the value that aggregates the total amount of value of sales is a SUM().
My goal is to return only the company that have the maximum value of this column, if its a tie, display both of em.
This is where i'm struggling, cause i can't seem to find a simple solution. I tried to use
WHERE AMOUNT = MAX(AMOUNT)
But as expected it didn't work
You tagged the question with the whole bunch of different databases; do you really use all of them?
Because, "PL/SQL" reads as "Oracle". If that's so, here's one option.
with temp as
-- this is your current query
(select columns,
sum(vrlnota) as amount
from ...
where ...
)
-- query that returns what you asked for
select *
from temp t
where t.amount = (select max(a.amount)
from temp a
);
You should be able to achieve the same without the need for a subquery using window over() function,
WITH T AS (
SELECT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT,
MAX(VLRNOTA) over() AS MAMOUNT
FROM TGFCAB CAB
JOIN TGFPAR PAR ON PAR.CODPARC = CAB.CODPARC
JOIN TSIBAI BAI ON BAI.CODBAI = PAR.CODBAI
WHERE CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY CAB.CODPARC, PAR.RAZAOSOCIAL, BAI.NOMEBAI
)
SELECT CODPARC, RAZAOSOCIAL, NOMEBAI, AMOUNT
FROM T
WHERE AMOUNT=MAMOUNT
Note it's usually (always) beneficial to join tables using clear explicit join syntax. This should be fine cross-platform between Oracle & SQL Server.

How to use SUM in this situation?

I have the following tables below and their schema:
INV
id, product code, name, ucost, tcost, desc, type, qoh
1,123,CPASS 700,1.00,5.00,CPASS 700 Lorem, COM,5
2,456,Shelf 5,2.00,6.00,Shelf 5 KJ, BR,3
GRP
id,type,desc
1,COM,COMPASS
2,BR,SHELF
Currently I have a query like this:
SELECT INV.*,GRP.DESCR AS CATEGORY
FROM INV LEFT JOIN GRP ON INV.TYPE = GRP.TYPE
WHERE INV.QOH = 0
There is no problems with that query.
Right now,I want to know the SUM of the TCOST of every INV record where their QOH is 0.
In this situation, does that I mean all I have to do is to write a separate query like the one below:
SELECT SUM(TCOST)
FROM INV
WHERE QOH = 0
Does it make any sense for me to try and combine those two queries as one ?
First understand that SUM is the aggregate function hence either you can run the Query like
(SELECT SUM(TCOST) FROM INV WHERE QOH=0) as total
This will return Sum of TCOST in INV Table for mentioned condition.
Another approach is finding the Sum based on the some column (e.g. Type)
you could write query like
SELECT Type , SUM(TCOST) FROM INV WHERE QOH=0 GROUP BY type ;
Its not clear on what criteria you want to sum . But I think above two approaches would provide you fare idea .
Mmm, you could maybe use a correlated query, though i'm not sure it's the best approach since I'm not sure I understand what your attempting to do:
SELECT INV.*,
GRP.DESCR AS CATEGORY ,
(SELECT SUM(TCOST) FROM INV WHERE QOH=0) as your_sum
FROM INV LEFT JOIN GRP ON INV.TYPE = GRP.TYPE
WHERE INV.QOH = 0
If you want only one value for the sum(), then your query is fine. If you want a new column with the sum, then use window functions:
SELECT INV.*, GRP.DESCR AS CATEGORY,
SUM(INV.TCOST) OVER () as sum_at_zero
FROM INV LEFT JOIN
GRP
ON INV.TYPE = GRP.TYPE
WHERE INV.QOH = 0;
It does not make sense to combine the queries by adding a row to the first one, because the columns are very different. A SQL result set requires that all rows have the same columns.

How can I access a selected column from my first select-statement in my third-level subslect?

I have a table "Bed" and a table "Component". Between those two I have a m:n relation and the table "BedComponent", where I store the Bed-ID and the Component-ID.
Every Component has a price. And now I want to write a select-statement that gives me the sum of prices for a certain bed.
This is what I have:
SELECT Bed.idBed, Bed.name, SUM(src.price) AS summe, Bed.idCustomer
FROM Bed,
(SELECT price
FROM dbo.Component AS C
WHERE (C.idComponent IN
(SELECT idComponent
FROM dbo.BedComponent AS BC
WHERE 1 = BC.idBed))) AS src
GROUP BY dbo.Bed.idBed, dbo.Bed.name, dbo.Bed.idCustomer;
This statement works. But of course I don't want to write the bed-ID hard coded into my select as it will always calculate the price for bed 1. Instead of the "1" i want to have the current bed-id.
I work with MS SQL Server
Thanks for your help.
I think you want:
select b.idBed, b.name, SUM(src.price) AS summe, b.idCustomer
from bed b join
bedcomponent bc
on b.idBed = bc.idBed join
component c
on c.idComponent = bc.idComponent
group by b.idBed, b.name, b.idCustomer;
The idCustomer looks strange to me in the select and group by, but I don't know what you are trying to achieve.
Also note the use of table aliases, which make the query easier to write and to read.

SUM(a*b) not working

I have a PHP page running in postgres. I have 3 tables - workorders, wo_parts and part2vendor. I am trying to multiply 2 table column row datas together, ie wo_parts has a field called qty and part2vendor has a field called cost. These 2 are joined by wo_parts.pn and part2vendor.pn. I have created a query like this:
$scoreCostQuery = "SELECT SUM(part2vendor.cost*wo_parts.qty) as total_score
FROM part2vendor
INNER JOIN wo_parts
ON (wo_parts.pn=part2vendor.pn)
WHERE workorder=$workorder";
But if I add the costs of the parts multiplied by the qauntities supplied, it adds to a different number than what the script is doing. Help....I am new to this but if someone can show me in SQL I can modify it for postgres. Thanks
Without seeing example data, there's no way for us to know why you're query totals are coming out differently that when you do the math by hand. It could be a bad join, so you are getting more/less records than you expected. It's also possible that your calculations are off. Pick an example with the smallest number of associated records & compare.
My suggestion is to add a GROUP BY to the query:
SELECT SUM(p.cost * wp.qty) as total_score
FROM part2vendor p
JOIN wo_parts wp ON wp.pn = p.pn
WHERE workorder = $workorder
GROUP BY workorder
FYI: MySQL was designed to allow flexibility in the GROUP BY, while no other db I've used does - it's a source of numerous questions on SO "why does this work in MySQL when it doesn't work on db x...".
To Check that your Quantities are correct:
SELECT wp.qty,
p.cost
FROM WO_PARTS wp
JOIN PART2VENDOR p ON p.pn = wp.pn
WHERE p.workorder = $workorder
Check that the numbers are correct for a given order.
You could try a sub-query instead.
(Note, I don't have a Postgres installation to test this on so consider this more like pseudo code than a working example... It does work in MySQL tho)
SELECT
SUM(p.`score`) AS 'total_score'
FROM part2vendor AS p2v
INNER JOIN (
SELECT pn, cost * qty AS `score`
FROM wo_parts
) AS p
ON p.pn = p2v.pn
WHERE p2n.workorder=$workorder"
In the question, you say the cost column is in part2vendor, but in the query you reference wo_parts.cost. If the wo_parts table has its own cost column, that's the source of the problem.

Why does added RAND() cause MySQL to overload?

OK I have this query which gives me DISTINCT product_series, plus all the other fields in the table:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price
LIMIT 1
) ORDER BY product_price
This works fine. I am also ordering by price so I can get the starting price for each series. Nice.
However today my boss told me that all the products thats are showing up from this query are of metal_type white gold And he wants to show random metal types. so I added RAND() to the order by after the ORDER BY price so that I will still get the lowest price, but a random metal in the lowest price.. here is the new query:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price, RAND()
LIMIT 1
) ORDER BY product_price, RAND()
When I run this query, MySQL completely shuts down and tells me that there are too many connections And I get a phone call from the host admin asking me what the hell I did.
I didn't believe that could be just from added RAND() to the query and I thought it had to be a coincidence. I waited a few hours after everything was fixed and ran the query again. Immediately... same issue.
So what is going on? Because I have no clue. Is there something wrong with my query?
Thanks!!!!
Using RAND() for ORDER BY is not a good idea, because it does not scale as the data increases. You can see more information on it, including two alternatives you can adapt, in my answer to this question.
Here's a blog post that explains the issue quite well, and workarounds:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
And here's a similar warning against ORDER BY RAND() for MySQL, I think the cause is basically the same there:
http://www.webtrenches.com/post.cfm/avoid-rand-in-mysql
Depending on the number of products in your site, that function call is going to execute once per record, potentially slowing the query down.. considerably.
The Too Many Connections error is probably due to this query blocking others while it tries to compute those numbers.
Find another way. ;)
Instead, you can generate random numbers on the programming language you're using, instead of the MySQL side, as rand() is being called for each row
If you know how many records you have you can select a random record like this (this is Perl):
$handle->Sql("SELECT COUNT(0) AS nor FROM table");
$handle->FetchRow();
$nor = $handle->Data('nor');
$rand = int(rand()*$nor)+1;
$handle->Sql("SELECT * FROM table LIMIT $rand,1");
$handle->FetchRow();
.
.
.