how to extract MAX value with multiple conditions in SQL - sql

I'm dumping an SQL sales/order table and running the following Excel array command to find the highest value, for a particular order:
{ =MAX(IF([ORDER]=[#[ORDER]];IF([PRODUCT]=[#PRODUCT];[QTY]))) }
This checks, for any rows belonging to the same order, for the same product, what the highest QTY listed is. But being an Arry formula, it freezes my Excel for many minutes.
Can I do something similar directly in SQL?

You can get MAX value with aggregate function MAX and apply condition with WHERE like below.
SELECT MAX(QTY)
FROM TABLE
WHERE [ORDER] = #ORDER AND [PRODUCT] = #PRODUCT
And if you want ORDER and PRODUCT wise MAX QTY value for all ORDER and PRODUCT then use GROUP BY like below.
SELECT MAX(QTY)
FROM TABLE
GROUP BY [ORDER], [PRODUCT]

If you want the value per row, then you would use window functions:
select t.*, max(qty) over (partition by order, product)
from t;
Note: order is a very bad name for a column because it is a SQL keyword. If that is the real name, you need to escape it.

Related

Bigquery - how to aggregate data based on conditions

I have a simple table like the following, which has product, price, cost and category. price and cost can be null.
And this table is being updated from time to time. Now I want to have a daily summary of the table content grouped by category, to see in each category, how many products that has no price, and how many has a price, and how many products has a price that is higher than the cost, so the result table would look like the following:
I think I can get a query running everyday by setting up query re-run schedule in bigQuery, so I can have three rows of data appended to the result table everyday.
But the problem is, how can I get those three rows? I know I can group by, but how do I get the count with those conditions like not null, larger than, etc.
You seem to want window functions:
select t.*
countif(price is nuill) over (partition by date) as products_no_price,
countif(price <= cost) over (partition by date) as products_price_lower_than_cost
from t;
You can run this code on the table that has date column. In fact, you don't need to store the last two columns.
If you want to insert the first table into the second, then there is no date and you can simply use:
select t.*
countif(price is nuill) over () as products_no_price,
countif(price <= cost) over () as products_price_lower_than_cost
from t;

SQL to find best row in group based on multiple columns?

Let's say I have an Oracle table with measurements in different categories:
CREATE TABLE measurements (
category CHAR(8),
value NUMBER,
error NUMBER,
created DATE
)
Now I want to find the "best" row in each category, where "best" is defined like this:
It has the lowest errror.
If there are multiple measurements with the same error, the one that was created most recently is the considered to be the best.
This is a variation of the greatest N per group problem, but including two columns instead of one. How can I express this in SQL?
Use ROW_NUMBER:
WITH cte AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY category ORDER BY error, created DESC) rn
FROM measurements m
)
SELECT category, value, error, created
FROM cte
WHERE rn = 1;
For a brief explanation, the PARTITION BY clause instructs the DB to generate a separate row number for each group of records in the same category. The ORDER BY clause places those records with the smallest error first. Should two or more records in the same category be tied with the lowest error, then the next sorting level would place the record with the most recent creation date first.

Whats the difference between these two SQL queries?

Question: Select the item and per unit price for each item in the items_ordered table. Hint: Divide the price by the quantity.
1.
select item, sum(price)/sum(quantity)
from items_ordered
group by item;
2.
select item, price/quantity
from items_ordered
group by item;
Have a look at the resultis for flashlights. First one shows average price correctly but 2nd one only takes 28/4 and shows 7, ignoring the 4.5 few rows down. Someone please explain why this is the case.
The used table data from an external website.
SUM() is a group function - so that essentially says go get me all the price and quantities by item, and add them all up to return them in one row.
MySQL is quite forgiving when grouping things and will try to retrieve a rowset (which is why your second example returns something - albeit wrong).
Generally, if you are GROUPing columns (items in your exmaple), you need to return one row per column (item).
Try running the SQL below to see what that looks like.
SELECT item
, SUM(price) AS sum_price
, SUM(quantity) AS sum_quantity
, COUNT(*) AS item_count
, SUM(price) / SUM(quantity) AS avg_price_per_quant
FROM items_ordered
GROUP BY item
ORDER BY item ASC
The first query returns the average price for that item, the second query returns the price for the first item it encounters. This only works in MySQL, the second query would error in SQL Server as no aggegrate function is used. See this post for more details Why does MySQL allow "group by" queries WITHOUT aggregate functions?.

Distinct with order by clause

I want to get distinct Category and order there result by curdate column.
select distinct(Category)'Category' from sizes order by curdate desc
But this simple query is generating errors.
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
I'm afraid you have the same constraint for SELECT DISTINCT as for GROUP BY clauses: namely, you cannot make use of a field that's not declared in the fields list, because it simply doesn't know which curdate to use when sorting in case there are several rows with different curdate values for the same Category.
EDIT: try something like:
SELECT Category FROM sizes GROUP BY Category ORDER BY MAX(curdate) DESC
Replace MAX with MIN or whatever suits you.
EDIT2: In this case, MAX(curdate) doesn't even have to be present in the field list since it's used in an aggregate function.
with cte as
(
select
Category,
[CurDate],
row_number() over(partition by Category order by [CurDate]) as rn
from sizes
)
select
Category
from cte
where rn = 1
order by [CurDate]
You look to be after a list of all the categories, with a date associated with each one. Whether you want the earliest first or latest first, you should be able to do one of the following:
SELECT Category, MAX(curdate) FROM sizes GROUP BY Category
Or:
SELECT Category, MIN(curdate) FROM sizes GROUP BY Category
Depending on whether you want the most recent or earliest dates associated with each category. If you need the list to then be ORDERed by the dates, add one of the following onto the end:
ORDER BY MAX(curdate)
ORDER BY MIN(curdate)
Curdate must be in your select statement also, right now you are only specifying Category
Exactly what the error says, it cannot order a distinct list if the sort filed is not part of the select. Reason being is that there may be multiple sort values for each of the distinct values selected. If the data looks like this
Category CurDate
AAA 1/1/2011
BBB 2/1/2011
AAA 3/1/2011
Should AA be before or after BBB in the distinct list? If you just ordered by the date without the distinct you would get it in both positions. Since SQL doesn't know which date should be associated with the distinct category it will not let you sort by the date.
as the error-message said, you can't order by a column that isn't selected wehen using SELECT DISTINCT (same problem as with GROUP BY...). change your query to this:
SELECT DISTINCT category, curdate FROM sizes ORDER BY curdate DESC
EDIT: replying to yourt comment:
if you want to select the distinct category with the last date for every category, you'll have to change your query a bit. i can think of two possibilities for this: using MAX() like Costi Ciudatu posted or doing some crazy stuff with subselects - the first one would be the better approach.

MySQL Single Row Returned From Temporary Table

I am running the following queries against a database:
CREATE TEMPORARY TABLE med_error_third_party_tmp
SELECT `med_error_category`.description AS category, `med_error_third_party_category`.error_count AS error_count
FROM
`med_error_category` INNER JOIN `med_error_third_party_category` ON med_error_category.`id` = `med_error_third_party_category`.`category`
WHERE
year = 2003
GROUP BY `med_error_category`.id;
The only problem is that when I create the temporary table and do a select * on it then it returns multiple rows, but the query above only returns one row. It seems to always return a single row unless I specify a GROUP BY, but then it returns a percentage of 1.0 like it should with a GROUP BY.
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
Here are the server specs:
Server version: 5.0.77
Protocol version: 10
Server: Localhost via UNIX socket
Does anybody see a problem with this that is causing the problem?
Standard SQL requires you to specify a GROUP BY clause if any column is not wrapped in an aggregate function (IE: MIN, MAX, COUNT, SUM, AVG, etc), but MySQL supports "hidden columns in the GROUP BY" -- which is why:
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
...runs without error. The problem with the functionality is that because there's no GROUP BY, the SUM is the SUM of the error_count column for the entire table. But the other column values are completely arbitrary - they can't be relied upon.
This:
SELECT category,
error_count/(SELECT SUM(error_count)
FROM med_error_third_party_tmp) AS percentage
FROM med_error_third_party_tmp;
...will give you a percentage on a per row basis -- category values will be duplicated because there's no grouping.
This:
SELECT category,
SUM(error_count)/x.total AS percentage
FROM med_error_third_party_tmp
JOIN (SELECT SUM(error_count) AS total
FROM med_error_third_party_tmp) x
GROUP BY category
...will gives you a percentage per category of the sum of the categories error_count values vs the sum of the error_count values for the entire table.
another way to do it - without the temp table as seperate item...
select category, error_count/sum(error_count) "Percentage"
from (SELECT mec.description category
, metpc.error_count
FROM med_error_category mec
, med_error_third_party_category metpc
WHERE mec.id = metpc.category
AND year = 2003
GROUP BY mec.id
);
i think you will notice that the percentage is unchanging over the categories. This is probably not what you want - you probably want to group the errors by category as well.