Adding other columns into a join with a group by - sql

In Oracle 11g Express, I have the following query:
select t1.product_name, SUM(t1.product_cost_per_month)
FROM table1 t1 INNER JOIN table2 t2 on (t1.product_name = t2.product_name)
WHERE t2.date > sysdate
GROUP BY t1.product_name
This works, it returns the sum of the cost of products per month, group by product after a certain date (I just use sysdate here as an example).
However, I would like to display some additional description about each product, i.e the vendor. So I use this code:
select t1.product_name, SUM(t1.product_cost_per_month), t2.vendor
FROM table1 t1 INNER JOIN table2 t2 ON (t1.product_name = t2.product_name)
WHERE t2.date > sysdate
GROUP BY t1.product_name
This doesn't work because all variables need to have an aggregation function applied to them to use "Group by", but an aggregation function for something like "vendor" seems meaningless... So is there a way to do this?
I am probably going to write a short pl/sql routine to solve, but I am wondering if there is a purely SQL way to do this?

Vendor should also be included in the GROUP BY clause.
GROUP BY t1.product_name, t2.vendor
Another technique to achieve what you're doing would be a nested query:
SELECT t1.product_name,
(
select sum(product_cost_per_month)
from table2 t2
where
t1.product_name = t2.product_name
and t2.date > sysdate
) as total_product_cost,
t1.another_field,
t1.another_field2,
t1.another_field3
FROM table1
(Apologies for any errors, I didn't test this but this should give you the gist of it)

Related

How to join two SQL tables by extracting maximum numbers from one then into another?

As others have commented, I'm now going to add some code:
Imported tables
table3
Case No. is the primary key. Each report date shows one patient. Depending on if the patient is import or local, the cumulative column increases. You can see some days there are no cases so the date like 25/01/2020 is skipped
table2
Report date has no duplicate.
Now, I want to join the tables. Example outcome here:
enter image description here
The maximum cumulative of each date is joined into the new table. So although 26/01/2020 of table3 shows the increase from 6, 7, to 8, I only want the highest cumulative number there.
Thanks for letting me know how my previous query could be improved. Your opinion helps me a lot.
I have tried Gordon Linoff's by substituting the actual names (which I initially omitted because I thought they were ambiguous).
His code is as follows (I've upvoted):
SELECT t3.`Report date`,
max(max(t3.cumulative_local)) over (order by t3.`Report date`),
max(max(t3.cumulative_import)) over (order by t3.`Report date`)
from table3 t3 left join
table2 t2
using (`Report date`)
group by t2.`Report date`;
But I got an error
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'new.t3.Report date' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Anyways I am now experimenting. Both answers helped. If you know how to fix 1055, let me know, or if you could propose another solution. Thanks
I think you just want aggregation and window functions:
select t1.date,
max(max(cumulativea)) over (order by t1.date),
max(max(cumulativeb)) over (order by t1.date)
from table1 t1 left join
table2 t2
on t1.date = t2.date
group by t1.date;
This returns the maximum values of the two columns up to each date, which is, I think, what you are trying to describe.
I don't understand why you have cumulA and cumulB on table1. I suppose it will be to store the Max cumulA and cumulB for each days.
You must first self-join table2 to find the Max for each date (with a GROUP BY date) :
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
After, you join left table1 on result of self-join table2 to have each days.
SELECT * FROM `t1` LEFT JOIN t2 ON t1.date = t2.date ORDER BY t1.date
Here is the fusion of the 2 junctions :
SELECT * FROM `t1` LEFT JOIN (
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
) AS tt
ON t1.date = tt.date ORDER BY t1.date
You do the same for cumulB.
And after (I suppose), you INSERT INTO the result into table1.
I hope I answered your question.
Good continuation.
_Teddy_

Tuning SQL query : subquery with aggregate function on the same table

The following query takes approximately 30 seconds to give results.
table1 contains ~20m lines
table2 contains ~10000 lines
I'm trying to find a way to improve performances. Any ideas ?
declare #PreviousMonthDate datetime
select #PreviousMonthDate = (SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19000101', GETDATE()) - 1, '19000101') as [PreviousMonthDate])
select
distinct(t1.code), t1.ent, t3.lib, t3.typ from table1 t1, table2 t3
where (select min(t2.dat) from table1 t2 where t2.code=t1.code) >#PreviousMonthDate
and t1.ent in ('XXX')
and t1.code=t3.cod
and t1.dat>#PreviousMonthDate
Thanks
This is your query, more sensibly written:
select t1.code, t1.ent, t2.lib, t2.typ
from table1 t1 join
table2 t2
on t1.code = t2.cod
where not exists (select 1
from table1 tt1
where tt1.code = t1.code and
tt1.dat <= #PreviousMonthDate
) and
t1.ent = 'XXX' and
t1.dat > #PreviousMonthDate;
For this query, you want the following indexes:
table1(ent, dat, code) -- for the where
table1(code, dat) -- for the subquery
table2(cod, lib, typ) -- for the join
Notes:
Table aliases should make sense. t3 for table2 is cognitively dissonant, even though I know these are made up names.
not exists (especially with the right indexes) should be faster than the aggregation subquery.
The indexes will satisfy the where clause, reducing the data needed for filtering.
select distinct is a statement. distinct is not a function, so the parentheses do nothing.
Never use comma in the FROM clause. Always use proper, explicit, standard JOIN syntax.

Sum Column in Joined Table and add as column SQL

So say I have two tables in Oracle SQL (not actually data but for ease should highlight my question)
Table1 that contains only Order_id and Order_quantity and Table2 that contains only Order_id and Order_price
Then I join them as follows
Select T1.Order_id,
T1.Order_quantity,
T2.Order_price,
T1.Order_quantity*T2.Order_price As "Order_amount",
Sum(Order_amount) As "Total_Sales"
from Table1 T1
inner join Table2 T2
on T1.Order_id = T2.Order_id
So essentially I want to have two extra columns, one as the product of columns from the two tables, and another as the sum of that column in my joined table(so every entry will be the same). However as you need to
SUM(variable_name) From Table_Name
Can I assign a variable name to my new table and then refer to that. I tried the following but I'm getting a SQL command not properly ended error
Select T1.Order_id,
T1.Order_quantity,
T2.Order_price,
T1.Order_quantity*T2.Order_price As "Order_amount",
Sum(Order_amount) from New_Table As "Total_Sales"
from (Table1 T1
inner join Table2 T2
on T1.Order_id = T2.Order_id) As New_Table
Thanks for any assistance, apologies as I have a pretty naive understanding of SQL at present
I think you just want a window function:
select T1.Order_id, T1.Order_quantity, T2.Order_price,
T1.Order_quantity*T2.Order_price As order_amount,
sum(T1.Order_quantity*T2.Order_price) over () As Total_Sales
from Table1 T1 inner join
Table2 T2
on T1.Order_id = T2.Order_id
You cannot re-use the alias order_amount in the select. You need to repeat the expression -- or use a subquery or CTE to define it.
If your DBMS doesn't have a window function supports then you can use subquery instead
select order_id, Order_quantity,
(select t1.Order_quantity * t2.Order_price
from table2 t2
where t2.Order_id = t1.Order_id) as Order_amount,
(select sum(t1.Order_quantity * t2.Order_price)
from table2 t2
where t2.Order_id = t1.Order_id) as Total_Sales
from table1 t1;

SQL query for Top 5 of every category

I have a table that has three columns: Category, Timestamp and Value.
What I want is a SQL select that will give me the 5 most recent values of each category. How would I go about and do that?
I tried this:
select
a."Category",
b."Timestamp",
b."Value"
from
(select "Category" from "Table" group by "Category" order by "Category") a,
(select a."Category", c."Timestamp", c."Value" from "Table" c
where c."Category" = a."Category" limit 5) b
Unfortunately, it won't allow it because "subquery in FROM cannot refer to other relations of same query level".
I'm using PostGreSQL 8.3, by the way.
Any help will be appreciated.
SELECT t1.category, t1.timestamp, t1.value, COUNT(*) as latest
FROM foo t1
JOIN foo t2 ON t1.id = t2.id AND t1.timestamp <= t2.timestamp
GROUP BY t1.category, t1.timestamp
HAVING latest <= 5;
Note: Try this out and see if it performs suitably for your needs. It will not scale well for large groups.

How to convert a SQL subquery to a join

I have two tables with a 1:n relationship: "content" and "versioned-content-data" (for example, an article entity and all the versions created of that article). I would like to create a view that displays the top version of each "content".
Currently I use this query (with a simple subquery):
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1
WHERE (version = (SELECT MAX(version) AS topversion
FROM mytable
WHERE (fk_idothertable = t1.fk_idothertable)))
The subquery is actually a query to the same table that extracts the highest version of a specific item. Notice that the versioned items will have the same fk_idothertable.
In SQL Server I tried to create an indexed view of this query but it seems I'm not able since subqueries are not allowed in indexed views. So... here's my question... Can you think of a way to convert this query to some sort of query with JOINs?
It seems like indexed views cannot contain:
subqueries
common table expressions
derived tables
HAVING clauses
I'm desperate. Any other ideas are welcome :-)
Thanks a lot!
This probably won't help if table is already in production but the right way to model this is to make version = 0 the permanent version and always increment the version of OLDER material. So when you insert a new version you would say:
UPDATE thetable SET version = version + 1 WHERE id = :id
INSERT INTO thetable (id, version, title, ...) VALUES (:id, 0, :title, ...)
Then this query would just be
SELECT id, title, ... FROM thetable WHERE version = 0
No subqueries, no MAX aggregation. You always know what the current version is. You never have to select max(version) in order to insert the new record.
Maybe something like this?
SELECT
t2.id,
t2.title,
t2.contenttext,
t2.fk_idothertable,
t2.version
FROM mytable t1, mytable t2
WHERE t1.fk_idothertable == t2.fk_idothertable
GROUP BY t2.fk_idothertable, t2.version
HAVING t2.version=MAX(t1.version)
Just a wild guess...
You Might be able to make the MAX a table alias that does group by.
It might look something like this:
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1 JOIN
(SELECT fk_idothertable, MAX(version) AS topversion
FROM mytable
GROUP BY fk_idothertable) as t2
ON t1.version = t2.topversion
I think FerranB was close but didn't quite have the grouping right:
with
latest_versions as (
select
max(version) as latest_version,
fk_idothertable
from
mytable
group by
fk_idothertable
)
select
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable,
t1.version
from
mytable as t1
join latest_versions on (t1.version = latest_versions.latest_version
and t1.fk_idothertable = latest_versions.fk_idothertable);
M
If SQL Server accepts LIMIT clause, I think the following should work:
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1 ordery by t1.version DESC LIMIT 1;
(DESC - For descending sort; LIMIT 1 chooses only the first row and
DBMS usually does good optimization on seeing LIMIT).
I don't know how efficient this would be, but:
SELECT t1.*, t2.version
FROM mytable AS t1
JOIN (
SElECT mytable.fk_idothertable, MAX(mytable.version) AS version
FROM mytable
) t2 ON t1.fk_idothertable = t2.fk_idothertable
Like this...I assume that the 'mytable' in the subquery was a different actual table...so I called it mytable2. If it was the same table then this will still work, but then I imagine that fk_idothertable will just be 'id'.
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1
INNER JOIN (SELECT MAX(Version) AS topversion,fk_idothertable FROM mytable2 GROUP BY fk_idothertable) t2
ON t1.id = t2.fk_idothertable AND t1.version = t2.topversion
Hope this helps