Postgresql returning repeating data on multiple join - sql

I can't figure out why my output is duplicating, but I'm sure it is incorrect joins. I'm new to SQL in general, so this may be an inefficent query.
SELECT a.id, a.number, a.number2, b.id, b.number3, b.number4, c.id, c.score, (a.number - b.number3) as a_b_difference, (a.number2 - b.number4) as a_b_difference3 FROM file a
INNER JOIN file b on a.id = b.id
INNER JOIN file c on a.id = c.id
I want to subtract the two fields and combine all three files on the id. However, my result populates with repeating data.
This is a sample of what I am getting:

c.id is a.id, so that join is not needed:
SELECT a.id, a.number, a.number2, b.id, b.number3, b.number4, a.id, c.score,
(a.number - b.number3) as a_b_difference,
(a.number2 - b.number4) as a_b_difference3
FROM file a INNER JOIN
file b
ON a.id = b.id;
As for the rest of your problem, I don't see duplication. If too many rows are being generated, perhaps you need to fix your JOIN conditions. You could use SELECT DISTINCT() if the entire row were duplicated. If you want one row from within a group, you can use GROUP BY or DISTINcT ON.

SELECT distinct a.id, a.number, a.number2, b.id, b.number3, b.number4, c.id, c.score, (a.number - b.number3) as a_b_difference, (a.number2 - b.number4) as a_b_difference3 FROM file a
INNER JOIN file b on a.id = b.id
INNER JOIN file c on a.id = c.id

Related

ERROR: column must appear in the GROUP BY clause or be used in an aggregate function when using two joins

I have the following PostgreSQL:
select
A.*,
B.child,
REGEXP_MATCHES(A.b_number, '([^.]*--[0-9]*).*') as number,
sum(cast(A.amount AS decimal)) as sum_amount,
count(A.amount) as cnt_amount
into result
from B
join A on B.name = A.name and B.parent = A.id
join C on A.name = C.name and B.child = C.id
group by A.name, A.unit, number;
select * from result;
But I get the following error:
SQL Error [42803]: ERROR: column "A.index" must appear in the GROUP BY clause or be used in an aggregate function.
What is the reason for this?
I tried adding A.index to the GROUP BY clause but it only kept asking for different columns. I also tried creating a subquery but failed because I have two joins and I'm trying to create a new table result.
Here is a version with the GROUP BY problem corrected:
SELECT
A.name,
A.unit,
B.child,
REGEXP_MATCHES(A.b_number, '([^.]*--[0-9]*).*') AS number,
SUM(CAST(A.amount AS decimal)) AS sum_amount,
COUNT(A.amount) AS cnt_amount
INTO result
FROM B
INNER JOIN A ON B.name = A.name AND B.parent = A.id
INNER JOIN C ON A.name = C.name AND B.child = C.id
GROUP BY
A.name,
A.unit,
B.child,
number;
Note that every column/alias which appears in the SELECT clause also appears in GROUP BY. Exceptions to this are columns which appear inside aggregate functions. In that case, it is OK for them to not appear in GROUP BY.

Query returning too many results

SQL query that returns expected 29 results for a.id = 366
select a.name, c.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
where a.id = 366
GROUP BY a.name, c.name
order by MAX(b.renew_date), MAX(b.date) desc;
SQL code below that returns 34 results, multiple results where two different Provides supplied the same course. I know these extra results are because I added e.name to the list to be returned. But all that is needed is the 29 entries with the latest date and Providers names.
select a.name, c.name, e.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
GROUP BY a.name, c.name, e.name
order by MAX(b.renew_date), MAX(b.date) desc;
Can anyone rework this code to return a single DISTINCT Provider name with the MAX(renew_date) for each course.
This returns exactly one row per distinct combination of (a.name, c.name):
The one with the latest renew_date.
Among these, the one with the latest date (may differ from global max(date)!).
Among these, the one with the alphabetically first e.name:
SELECT DISTINCT ON (a.name, c.name)
a.name AS a_name, c.name AS c_name, e.name AS e_name
, b.renew_date, b.date
FROM boson_course c
JOIN boson_coursedetail b on c.id = b.course_id
JOIN boson_coursedetail_attendance d on d.coursedetail_id = b.id
JOIN boson_employee a on a.id = d.employee_id
JOIN boson_provider e on b.provider_id = e.id
WHERE a.id = 366
ORDER BY a.name, c.name
, b.renew_date DESC NULLS LAST
, b.date DESC NULLS LAST
, e.name;
The result is sorted by a_name, c_name first. If you need your original sort order, wrap this in a subquery:
SELECT *
FROM (<query from above>) sub
ORDER BY renew_date DESC NULLS LAST
, date DESC NULLS LAST
, a_name, c_name, e_name;
Explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Why DESC NULL LAST?
PostgreSQL sort by datetime asc, null first?
Aside: Don't use basic type names like date ad column names. Also, name is hardly ever a good name. As you can see, we have to use aliases to make this query useful. Some general advice on naming conventions:
How to implement a many-to-many relationship in PostgreSQL?
Try using distinct on:
select distinct on (a.name, c.name, e.name), a.name, c.name, e.name,
B.date, b.renew_date as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
ORDER BY a.name, c.name, e.name, B.date desc
order by MAX(b.renew_date), MAX(b.date) desc;

Finding the next result when none is found?

I am attempting to traverse a hierarchy with a CTE and it works fine in one scenario but not another and that is where I am stuck.
Given the query;
;WITH BOMcte (ID, Code, BomName , ProductID, ProductCode, ProductName , ParentAssemblyID )
AS
(
SELECT b.id,
b.code,
b.name,
p.id,
p.default_code,
p.name_template,
b.bom_id
FROM mrp_bom AS b
INNER JOIN product_product p on b .product_id = p.id
WHERE b. bom_id IS NULL
and b.id = #AssemblyID
UNION ALL
SELECT b.id,
b.code,
b.name,
p.id,
p.default_code,
p.name_template,
b.bom_id
FROM mrp_bom AS b
INNER JOIN product_product p on b .product_id = p.ID
INNER JOIN BOMcte AS cte ON b.bom_id = cte.ID
)
SELECT BoM.* FROM BOMcte BoM
The query works just as I expected because the BoM drills down to the child boms on the column bom_id.
In code (from OpenERP) when a child BoM isn't found, (no bom_id) a child product is searched for based on the product_id:
sids = bom_obj.search(cr, uid, [('bom_id','=',False),('product_id','=',bom.product_id.id)])
I am wondering if there is a method I can use to accomplish the same thing in SQL. Once the CTE doesn't return rows, check with the product_id and a null bom_id. I had thought about another recursive member but I don't think that's what I am looking for.
I know my question probably isn't very clear but, any suggestions?
SQL Fiddle example data here: http://sqlfiddle.com/#!3/b9052/1
The reason why trying the following as HABO suggested on b.bom_id = cte.ID or ( b.bom_id is NULL and b.product_id = cte.product_id ) and you already tried doesn't work is because it never logically terminates.
However you do have a terminating expression which is do it once when no children are found. The easiest way to that is to add a UNION which checks to make sure that a row in BOMcte has no child
WHERE NOT EXISTS (SELECT * FROM BOMcte bc WHERE b.id = bc.PARENTASSEMBLYID)
Full SQL
;WITH BOMcte (ID, Code, BomName , ProductID, ProductCode, ProductName , ParentAssemblyID )
AS
(
SELECT b.id,
b.code,
b.name,
p.id,
p.default_code,
p.name_template,
b.bom_id
FROM mrp_bom AS b
INNER JOIN product_product p on b .product_id = p.id
WHERE b. bom_id IS NULL
and b.id = #AssemblyID
UNION ALL
SELECT b.id,
b.code,
b.name,
p.id,
p.default_code,
p.name_template,
b.bom_id
FROM mrp_bom AS b
INNER JOIN product_product p on b .product_id = p.ID
INNER JOIN BOMcte AS cte ON b.bom_id = cte.ID
)
SELECT * FROM BOMcte
UNION
SELECT b.id,
b.code,
b.name,
p.id,
p.default_code,
p.name_template,
b.bom_id
FROM mrp_bom AS b
INNER JOIN product_product p on b.product_id = p.id
WHERE NOT EXISTS (SELECT * FROM BOMcte bc WHERE b.id = bc.PARENTASSEMBLYID)
SQL DEMO
Note: It may be possible to encode the terminating expression in the CTE using an incrementing LEVEL value like those found in the MSDN article on Recursive Queries
I'm a little unclear on what you are trying to do, but something like this for your final join may do it:
on b.bom_id = cte.ID or ( b.bom_id is NULL and b.product_id = cte.product_id )

Multi table joins - can I add an outer join to this?

I'm having a problem moving from a situation where an Outer Join works, to where it fails.
Working (pseudo code example)
SELECT a.number, a.name, b.ref, c.ref, c.firmref
FROM jobs a, teams b LEFT OUTER JOIN teamfirms c ON b.ref = c.team
WHERE a.ref = b.job
There is a many to one relationship between jobs and teams (many teams per job) that is always populated
There may or may not be firms in table c, but the query above gives me the result I would expect (approx 5000 records)
The problem comes when I want to bring in the details about the teams from a fourth table
The code I am trying is below
SELECT a.number, a.name, b.ref, c.ref, c.firmref, d.name
FROM jobs a, teams b LEFT OUTER JOIN teamfirms c ON b.ref = c.team, firms d
WHERE a.ref = b.job
AND d.ref = c.firmref
At this point the NULLS that I am trying to capture disappear and I drop approx 500 records
What am I doing wrong?
take a whack at this.
select a.number, a.name, b.ref, c.ref, c.firmref, d.name
from jobs a left outer join teams b on b.job = a.ref
left outer join teamfirms c on b.ref = c.team
left outer join firms d on c.firmref = d.ref
left outer join table e on a.column = e.column
or you could do
select a.number, a.name, b.ref, c.ref, c.firmref, d.name
from
jobs a, teams b, teamfirms c, firms d
where
a.ref = b.job
and b.ref = c.team
and c.firmref = d.ref
one or the other... not both.
Just to throw this in for good measure...
You use INNER JOIN to return all rows
from both tables where there is a
match. ie. in the resulting table all
the rows and colums will have values.
LEFT OUTER JOIN returns all the rows
from the first table, even if there
are no matches in the second table.
RIGHT OUTER JOIN returns all the rows
from the second table, even if there
are no matches in the first table.
You are mixing ANSI 89 and 92 JOIN syntax (implicit and explicit JOINs). Try converting the entire query to explicit JOINs. The problem is likely that the new JOIN you're adding (implicit syntax) is INNER and wants to be OUTER, or that you want to resolve the JOINs in a different order (which you can do with parens once you write them all as OUTER JOINs)
Try, the following:
SELECT
a.number, a.name, b.ref, c.ref, c.firmref, d.name
FROM
jobs a, teams b
LEFT OUTER JOIN teamfirms c ON b.ref = c.team
LEFT OUTER JOIN firms d on c.firmref = d.ref
WHERE a.ref = b.job
If it works, you could then try to turn the 2nd LEFT OUTER into an INNER. Possibly incorrectly I've generally left it as an outer when I've needed this sort of thing.
Here is my attempt:
SELECT a.number, a.name, b.ref, c.ref, c.firmref, d.name
FROM jobs a
join teams b on (b.job = a.ref)
LEFT OUTER JOIN teamfirms c ON (b.ref = c.team)
LEFT OUTER JOIN firms d on (d.ref = c.firmref)
This will join all jobs to team and if a teamfirm exist then also bring firm details. if no team firm relationship you still get your nulls.
Try the following:
SELECT a.number, a.name, b.ref, c.ref, c.firmref, d.name
FROM jobs a, teams b LEFT OUTER JOIN teamfirms c ON b.ref = c.team
LEFT OUTER JOIN firms d ON c.firmref = d.ref
WHERE a.ref = b.job

Simple SQL question about getting rows and associated counts

this oughta be an easy one.
My question is very similar to this one; basically, I've got a table of posts, a table of comments with a foreign key for the post_id, and a table of votes with a foreign key for the post id. I'd like to do a single query and get back a result set containing one row per post, along with the count of associated comments and votes.
From the question I've linked to above, it seems that for getting a table back containing just a row for each post and a comment count, this is the right approach:
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
GROUP BY a.ID, a.Title
I thought adding vote count would be as easy as adding another left join, as in
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments, COUNT(v.id AS NumVotes)
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY a.ID, a.Title
but I'm getting bad numbers back. What am I missing?
SELECT
a.ID,
a.Title,
COUNT(DISTINCT c.ID) AS NumComments,
COUNT(DISTINCT v.id) AS NumVotes
FROM
Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY
a.ID,
a.Title
SELECT id, title,
(
SELECT COUNT(*)
FROM comments c
WHERE c.ParentID = a.ID
) AS NumComments,
(
SELECT COUNT(*)
FROM votes v
WHERE v.ParentID = a.ID
) AS NumVotes
FROM articles a
try:
COUNT(DISTINCT c.ID) AS NumComments
You are thinking in trees, not recordsets.
In the recordset the you get each Comment and each Vote returned multiple times combined with each other. Run the query without the group by and the count to see what I mean.
The solution is simple: use COUNT(DISCTINCT c.ID) and COUNT(DISTINCT v.ID)