Joining two tables with aggregates - sql

I've got two tables described below:
CREATE TABLE categories
(
id integer NOT NULL,
category integer NOT NULL,
name text,
CONSTRAINT kjhfskfew PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE products_
(
id integer NOT NULL,
date date,
id_employee integer,
CONSTRAINT grh PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
Now I have to do report in which I need following information:
categories.category, categories.name (all of them, so string_agg is ok) - could be many assigned to one category and products_.id_employee -> but not with comma as above with category name but the one with newest date assigned (and here is my problem);
I've tried already constructions as:
SELECT
DISTINCT ON (category ) category,
string_agg(name, ','),
(SELECT
id_employee
FROM products_
WHERE date = (SELECT
max(date)
FROM products_
WHERE id IN (SELECT
id
FROM categories
WHERE id = c.id)))
FROM categories c
ORDER BY category;
But PostgreSQL says that subquery is returning to many rows...
Please help!
EXAMPLE INSERTS:
INSERT INTO categories(
id, category, name)
VALUES (1,22,'car'),(2,22,'bike'),(3,22,'boat'),(4,33,'soap'),(5,44,'chicken');
INSERT INTO products_(
id, date, id_employee)
VALUES (1,'2009-11-09',11),(2,'2010-09-09',2),(3,'2013-01-01',4),(5,'2014-09-01',90);
OK, I've solved this problem.
This one works just fine:
WITH max_date AS (
SELECT
category,
max(date) AS date,
string_agg(name, ',') AS names
FROM test.products_
JOIN test.categories c
USING (id)
GROUP BY c.category
)
SELECT
max(id_employee) AS id_employee,
md.category,
names
FROM test.products_ p
LEFT JOIN max_date md
USING (date)
LEFT JOIN test.categories
USING (category)
WHERE p.date = md.date AND p.id IN (SELECT
id
FROM test.categories
WHERE category = md.category)
GROUP BY category, names;

It seems that id is being used to join the two tables, which seems strange to me.
In any case, the base query for the category names is:
SELECT c.category, string_agg(c.name, ','),
FROM categories c
group by c.category;
The question is: how to get the most recent name? This approach uses the row_number() function:
SELECT c.category, string_agg(c.name, ','), cp.id_employee
FROM categories c left outer join
(select c.category, c.name, p.id_employee,
row_number() over (partition by c.category order by date desc) as seqnum
from categories c left outer join
products_ p
on c.id = p.id
) cp
on cp.category = c.category and
cp.seqnum = 1
group by c.category, cp.id_employee;

Related

Left outer joins aggregate first

I have the following tables
CREATE TABLE categories(
id SERIAL,
);
CREATE TABLE category_translations(
id SERIAL,
name varchar not null,
locale varchar not null,
category_id integer not null
);
CREATE TABLE products(
id SERIAL,
category_id integer not null
);
CREATE TABLE line_items(
id SERIAL,
total_cents integer
product_id integer not null
);
What I'm trying to do is output a map of each category name to the sum of total of its associated line_items total_cents. Something like:
name
sum_total_cents
Fresh foods
100000
Dry products
532000
There is a uniqueness constraint that only one name for each locale will be stored. So a category will have one row for each locale stored in the category_translations table
What I currently have is
SELECT SUM(line_items.total_cents) AS sum_total_cents, ???
FROM line_items INNER JOIN products ON products.id = line_items.product_id
INNER JOIN categories ON categories.id = products.category_id
LEFT OUTER JOIN category_translations ON category_translations.category_id = categories.id
WHERE category_translations.locale ='en'
GROUP BY categories.id
I'm looking for an aggregate function to return the first name for the category. The only piece missing is that what to be written instead of the ??? as I've been facing a lot of must appear in the GROUP BY clause or be used in an aggregate function errors. In pseudo-code I'm looking for a FIRST() aggregate method in PostgreSQL that I can use
Assuming you want one random name from any locale, you can do:
select
c.id,
(select name from category_translations t
where t.category_id = c.id limit 1) as name,
sum(i.total_cents) as sum_total_cents
from categories c
left join products p on p.category_id = c.id
left join line_items i on i.product_id = p.id
group by c.id, name
Alternatively, if you want the category name for the locale 'en' then you can do:
select
c.id,
(select t.name from category_translations t
where t.category_id = c.id and t.locale ='en') as name,
sum(i.total_cents) as sum_total_cents
from categories c
left join products p on p.category_id = c.id
left join line_items i on i.product_id = p.id
group by c.id, name

SQL INNER JOIN entity

I want to execute this query :
-- The most expensive item sold ever
SELECT
c.itemID, c.itemName
FROM
item AS c
JOIN
(SELECT
b.itemID as 'itemid', MAX(b.item_initialPrice) AS 'MaxPrice'
FROM
buyeritem AS a
INNER JOIN
item AS b ON a.item_ID = b.itemID) AS d ON c.itemID = d.itemid
GROUP BY
c.itemID, c.itemName;
My item table looks like this:
create table item
(
itemID int IDENTITY(1000, 1) NOT NULL,
itemName varchar(15) NOT NULL,
Item_desc varchar(255),
Item_initialPrice MONEY,
ItemQty int,
ownerID int NOT NULL,
condition varchar(20) NOT NULL,
PRIMARY KEY (itemID),
FOREIGN KEY (ownerID) REFERENCES seller (sellerID)
);
The problem is that column item.itemID is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. I tried to add a group by clause at the end
group by c.itemID, c.itemName
but I still get the same error? I don't really know where the problem comes from.
I also have this query
-- The most active seller(the one who has offered the most number of items)
SELECT
a.ownerID, b.sellerName
FROM
item AS a
INNER JOIN
seller AS b ON a.ownerID = b.sellerID
GROUP BY
a.ownerID, b.sellerName
ORDER BY
COUNT(a.itemID) DESC;
I want to add itemQty along with the ownerID and sellerName from item table stated above, what would be the best way to achieve that?
Just write distinct instead of Group By as Group By will not work with out an aggregated function like sum,max etc. in select statement which is missing in your query.An example of this is second query which I have written
SELECT distinct c.itemID, c.itemName
FROM item AS c
JOIN (
SELECT b.itemID as itemid, MAX(b.item_initialPrice) AS MaxPrice FROM buyeritem AS a
INNER JOIN item AS b ON a.item_ID = b.itemID
GROUP BY b.itemID) as d
ON c.itemID = d.itemid ;
For second query
Select a.* from
(
SELECT a.ownerID, b.sellerName, count(distinct a.ITEM_ID) as item_qty
FROM item AS a
INNER JOIN seller AS b ON a.ownerID = b.sellerID
GROUP BY a.ownerID,b.sellerName
) a
order by item_qty DESC

Sql join solution is possible for below case?

Can we solve below case with joins, I have solved with window functions
Relation: In the tables below, each order in the Orders table, is associated with a given Customer through the cust_id foreign key column that references the ID column in the Customer table.
Question: Find the largest order amount for each salesperson and the associated order number, along with the customer to whom that order belongs and sales person name.
Create Table Salesperson
(
ID int,
name varchar(100),
age float,
salary money
);
Create Table Orders
(
Number int,
order_date datetime,
cust_id int,
salesperson_id int,
Amount money
);
Create Table Customer
(
ID int,
name varchar(100),
city varchar(100),
IndustryType varchar(100)
);
insert into Salesperson values
( 1,'Rohit',25,50000),
( 2,'Pramod',25,50000),
( 3,'Atul',25,50000);
insert into Orders values
( 1,getdate(),101,1,50000),
( 2,getdate(),101,1,500000),
( 3,getdate(),102,1,10000),
( 4,getdate(),101,2,5000),
( 5,getdate(),102,2,700000),
( 6,getdate(),102,2,10000);
insert into Customer values
( 101,'Altu','bhopal','IT'),
( 102,'bltu','bhopal','ITES'),
( 103,'cltu','bhopal','NW');
Solution on with window function:
with CTE_MaxAmount
as
(
select max(amount) over (partition by salesperson_id ) as amount,
dense_rank() over (partition by salesperson_id order by amount) as rowid,
cust_id,
salesperson_id,number
from Orders with(nolock)
)
select ct.amount,
ct.cust_id,
c.name as customername,
s.name as salesman,
ct.salesperson_id,
number as OrderNumbner
from Customer c
join CTE_MaxAmount ct
on (c.id = ct.cust_id)
join Salesperson s
on (s.id = ct.salesperson_id)
where rowid = 1;
I'm breaking with my personal policy not to answer homework questions because the question is an opportunity to show how easily English is translated into SQL. The question is phrased exactly as the query can be built up.
find the largest order amount for each salesperson
select max(Amount) as Amount, salesperson_id from Orders group by salesperson_id
and the associated order number
select o.Number, M.salesperson_id, M.Amount
from Orders as o join (
select max(Amount) as amount, salesperson_id
from Orders group by salesperson_id
) as M
on o.salesperson_id = M.salesperson_id
and o.Amount = M.Amount
along with the customer
select c.name, o.Number, M.salesperson_id, M.Amount
from Orders as o join (
select max(Amount) as amount, salesperson_id
from Orders group by salesperson_id
) as M
on o.salesperson_id = M.salesperson_id
and o.Amount = M.Amount
join Customer as c
on o.cust_id = c.ID
and sales person name
select s.name as 'salesperson',
c.name as 'customer',
o.Number, M.salesperson_id, M.Amount
from Orders as o join (
select max(Amount) as amount, salesperson_id
from Orders group by salesperson_id
) as M
on o.salesperson_id = M.salesperson_id
and o.Amount = M.Amount
join Customer as c
on o.cust_id = c.ID
join Salesperson as s
on o.salesperson_id = s.ID

Getting first line of a LEFT OUTER JOIN

I have 3 tables:
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS FROM ADDRESSES
WHERE ROWNUM <2
ORDER BY UPDATED_DATE DESC)c
ON a.ID = c.ID
An ID can have only one name but can have multiple addresses. I only want the latest one. This query returns the address as null even when there is an address I guess cause it only fetches the first address from the table and then tries LEFT JOIN it to the ID of addresses which it canno find. What is the correct way of writing this query?
Try KEEP DENSE_RANK
Data source:
CREATE TABLE person
(person_id int primary key, firstname varchar2(4), lastname varchar2(9))
/
INSERT ALL
INTO person (person_id, firstname, lastname)
VALUES (1, 'john', 'lennon')
INTO person (person_id, firstname, lastname)
VALUES (2, 'paul', 'mccartney')
SELECT * FROM dual;
CREATE TABLE address
(person_id int, address_id int primary key, city varchar2(8))
/
INSERT ALL
INTO address (person_id, address_id, city)
VALUES (1, 1, 'new york')
INTO address (person_id, address_id, city)
VALUES (1, 2, 'england')
INTO address (person_id, address_id, city)
VALUES (1, 3, 'japan')
INTO address (person_id, address_id, city)
VALUES (2, 4, 'london')
SELECT * FROM dual;
Query:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select person_id,
min(city) -- can change this to max(city). will work regardless of min/max
-- important you do this to get the recent: keep(dense_rank last)
keep(dense_rank last order by address_id)
as recent_city
from address
group by person_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/2
Not all database has similar functionality with Oracle's KEEP DENSE_RANK windowing function, you can use plain windowing function instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city, x.pick_one_only
from person p
left join (
select
person_id,
row_number() over(partition by person_id order by address_id desc) as pick_one_only,
city as recent_city
from address
) x on x.person_id = p.person_id and x.pick_one_only = 1
Live test: http://www.sqlfiddle.com/#!4/7b1c9/48
Or use tuple testing, shall work on databases that doesn't support windowing function:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
person_id,city as recent_city
from address
where (person_id,address_id) in
(select person_id, max(address_id)
from address
group by person_id)
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/21
Not all database supports tuple testing like in the preceding code though. You can use JOIN instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
address.person_id,address.city as recent_city
from address
join
(
select person_id, max(address_id) as recent_id
from address
group by person_id
) r
ON address.person_id = r.person_id
AND address.address_id = r.recent_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/24
You can use the analytic function RANK
(SELECT DISTINCT ID
FROM IDS) a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES) b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS ,
rank() over (partition by id
order by updated_date desc) rnk
FROM ADDRESSES) c
ON ( a.ID = c.ID
and c.rnk = 1)
Without having access to any database at the moment, you should be able to do
(SELECT DISTINCT ID
FROM IDS) a LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b ON a.ID = b.ID LEFT OUTER JOIN
(SELECT TOP 1 ADDRESS
FROM ADDRESSES
ORDER BY UPDATED_DATE DESC) c ON a.ID = c.ID
As you might see, the "TOP 1" at 'Address' will only return the first row of the result set.
Also, are you sure that a.ID and c.ID is the same?
I would imagine you need something like .... c ON a.ID = c.AddressID
If not, i'm not entirely sure how you link multiple addresses to a single ID.
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS, ROWNUMBER() OVER(PARTITON BY ID ORDER BY UPDATED_DATE DESC) RN
FROM ADDRESSES
)c
ON a.ID = c.ID
where c.RN=1

mysql query with double join

I have 3 tables, but I can only get to join another table count. See below.
The one below works like a charm, but I need to add another "count" from another table.
there is a 3rd table called "ci_nomatch" and contains a reference to ci_address_book.reference
which could have multiple entries (many on many) but I only need the count of that table.
so if ci_address_book would have an entries called "item1","item 2","item3"
and ci_nomatch would have "1,item1,user1","2,item1,user4"
I would like to have returned "2" for Item1 on the query.
Any ideas? I tried another join, but it tells me that the reference does not exist, while it does!
SELECT c.*, IFNULL(p.total, 0) AS matchcount
FROM ci_address_book c
LEFT JOIN (
SELECT addressbook_id, COUNT(match_id) AS total
FROM ci_matched_sanctions
GROUP BY addressbook_id
) AS p
ON c.id=p.addressbook_id
ORDER BY matchcount DESC
LIMIT 0,15
You could subquery it directly in the select
SELECT c.*, IFNULL(p.total, 0) AS matchcount,
(SELECT COUNT(*) FROM ci_nomatch n on n.reference = c.reference) AS othercount
FROM ci_address_book c
LEFT JOIN (
SELECT addressbook_id, COUNT(match_id) AS total
FROM ci_matched_sanctions
GROUP BY addressbook_id
) AS p
ON c.id=p.addressbook_id
ORDER BY matchcount DESC
LIMIT 0,15
#updated for comment. Including an extra column "(matchcount - othercount) AS deducted" would be best done by sub-querying.
SELECT *, matchcount - othercount AS deducted
FROM
(
SELECT c.* , IFNULL( p.total, 0 ) AS matchcount, (
SELECT COUNT( * ) FROM ci_falsepositives n
WHERE n.addressbook_id = c.reference ) AS othercount
FROM ci_address_book c
LEFT JOIN (
SELECT addressbook_id, COUNT( match_id ) AS total
FROM ci_matched_sanctions GROUP BY addressbook_id ) AS p
ON c.id = p.addressbook_id ORDER BY matchcount DESC LIMIT 0 , 15
) S