Syntax error to combine left join and select - sql

I'm getting a syntax error at Left Join. So in trying to combine the two, i used the left join and the brackets. I'm not sure where the problem is:
SELECT DISTINCT a.order_id
FROM fact.outbound AS a
ORDER BY Rand()
LIMIT 5
LEFT JOIN (
SELECT
outbound.marketplace_name,
outbound.product_type,
outbound.mpid,
outbound.order_id,
outbound.sku,
pbdh.mpid,
pbdh.product_type,
pbdh.validated_exp_reach,
pbdh.ultimate_sales_rank_de,
pbdh.ultimate_sales_rank_fr,
(
pbdh.very_good_stock_count + good_stock_count + new_Stock_count
) as total_stock
FROM
fact.outbound AS outbound
LEFT JOIN reporting_layer.pricing_bi_data_historisation AS pbdh ON outbound.mpid = pbdh.mpid
AND trunc(outbound.ordered_date) = trunc(pbdh.importdate)
WHERE
outbound.ordered_date > '2022-01-01'
AND pbdh.importdate > '2022-01-01'
LIMIT
5
) AS b ON a.orderid = b.order_id
Error:
You have an error in your SQL syntax; it seems the error is around: 'LEFT JOIN ( SELECT outbound.marketplace_name, outbound.product_t' at line 9
What could be the reason?

Place the first limit logic into a separate subquery, and then join the two subqueries:
SELECT DISTINCT a.order_id
FROM
(
SELECT order_id
FROM fact.outbound
ORDER BY Rand()
LIMIT 5
) a
LEFT JOIN
(
SELECT
outbound.marketplace_name,
outbound.product_type,
outbound.mpid,
outbound.order_id,
outbound.sku,
pbdh.mpid,
pbdh.product_type,
pbdh.validated_exp_reach,
pbdh.ultimate_sales_rank_de,
pbdh.ultimate_sales_rank_fr,
(pbdh.very_good_stock_count +
good_stock_count + new_Stock_count) AS total_stock
FROM fact.outbound AS outbound
LEFT JOIN reporting_layer.pricing_bi_data_historisation AS pbdh
ON outbound.mpid = pbdh.mpid AND
TRUNC(outbound.ordered_date) = TRUNC(pbdh.importdate)
WHERE outbound.ordered_date > '2022-01-01' AND
pbdh.importdate > '2022-01-01'
-- there should be an ORDER BY clause here...
LIMIT 5
) AS b
ON a.orderid = b.order_id;
Note that the select clause of the b subquery can be reduced to just the order_id, as no values from this subquery are actually selected in the end.

You can skip the LEFT JOIN, since no b columns are selected. (And SELECT DISTINCT makes sure any duplicates are eliminated.)
SELECT DISTINCT order_id
FROM fact.outbound
ORDER BY Rand()
LIMIT 5

Related

Postgres, how to limit number of rows returned from joined tables

I have the following query that return the data I want, however for the joined tables, I want to limit the number of rows returned and preferrably be able to specify for each joined table.
I tried using limit with the select itself, but doesn't seem to be supported.
Is this possible? I am using Postgres 11.
select array_to_json(array_agg(t)) from (
select
tbl_327.field_43,tbl_327.field_1,tbl_327.field_2,
jsonb_agg(distinct jsonb_build_object('id',tbl_332.id,'data',tbl_332.fullname)) as field_7,
jsonb_agg(distinct jsonb_build_object('id',tbl_312.id,'data',tbl_312.fullname)) as field_33
from schema_1.tbl_327 tbl_327
left join schema_1.tbl_327_to_tbl_332_field_7 field_7 on field_7.tbl_327_id=tbl_327.id
left join schema_1.tbl_332_customid tbl_332 on tbl_332.id = field_7.tbl_332_id
left join schema_1.tbl_327_to_tbl_312_field_33 field_33 on field_33.tbl_327_id=tbl_327.id
left join schema_1.tbl_312_customid tbl_312 on tbl_312.id = field_33.tbl_312_id
group by tbl_327.field_43,tbl_327.field_1,tbl_327.field_2
) t
UPDATED
here is my new query. I simplified it, but the issue is it's no longer returning correct data. For the field_4 field, it's returing rows/data that isn't associated with the record. Do I have something wrong?
select array_to_json(array_agg(t)) from (
select
tbl_342.field_1,tbl_342.field_2,tbl_342.id,
jsonb_agg(distinct jsonb_build_object('id',tbl_312.id,'data',tbl_312.fullname)) as field_4
from schema_1.tbl_342 tbl_342
left join lateral (
select distinct field_4.*
from schema_1.tbl_342_to_tbl_312_field_4 field_4
where field_4.tbl_342_id=tbl_342.id
limit 50) field_4 on true
left join lateral (
select distinct tbl_312.*
from schema_1.tbl_312_customid tbl_312
where tbl_312_id = field_4.tbl_312_id
limit 5
) tbl_312 on true
group by tbl_342.field_1,tbl_342.field_2,tbl_342.id
) t
One approach is to turn each left join to a lateral join; you can then set the limit within each subquery:
select array_to_json(array_agg(t)) from (
select
tbl_327.field_43,tbl_327.field_1,tbl_327.field_2,
jsonb_agg(distinct jsonb_build_object('id',tbl_332.id,'data',tbl_332.fullname)) as field_7,
jsonb_agg(distinct jsonb_build_object('id',tbl_312.id,'data',tbl_312.fullname)) as field_33
from schema_1.tbl_327 tbl_327
left join lateral (
select field_7.*
from schema_1.tbl_327_to_tbl_332_field_7 field_7
where field_7.tbl_327_id=tbl_327.id
order by ...
limit 5
) field_7 on true
left join lateral (
select tbl_332.*
from schema_1.tbl_332_customid tbl_332
where tbl_332.id = field_7.tbl_332_id
order by ??
limit 5
) tbl_332 on true
left join lateral ...
group by tbl_327.field_43,tbl_327.field_1,tbl_327.field_2
) t
Note that you need an order by to go along with limit in order to get stable results - you can replace the question marks in the query with the revelant columns or set of columns.

LEFt JOIN LATERAL showing error with SELECT

I am trying to run below query :
SELECT
tc.ID_NUMBER AS AFC_RPP_Number,
hc.BUSINESS AS Business,
hc.DIRECTOR AS Director,
tc.REASON_FOR_REVISION AS Description_of_Change
FROM alo_gg.AWS_PIM tc
left join lateral(
select BUSINESS,DIRECTOR
FROM alo_ggg.tracker
WHERE START_DATE <= tc.DATE AND SO = tc.SO
ORDER BY START_DATE DESC
LIMIT 1
) hc;
Above query is showing error:
ERROR: syntax error at or near "SELECT"
left join lateral (SELECT BUSINESS,DIRECTOR...
If I run the subquery separately it is giving me a result, but with lateral it is giving me an error.
You need to add ON TRUE and remove comma:
SELECT
tc.ID_NUMBER AS AFC_RPP_Number,
hc.BUSINESS AS Business,
hc.DIRECTOR AS Director,
tc.REASON_FOR_REVISION AS Description_of_Change
FROM alo_gg.AWS_PIM tc -- removing comma
left join lateral(
select BUSINESS,DIRECTOR
FROM alo_ggg.tracker
WHERE START_DATE <= tc.DATE AND SO = tc.SO
ORDER BY START_DATE DESC
LIMIT 1
) hc ON TRUE; -- adding `ON` clause

must appear in the group by clause in sql

I have a sql statement and I am trying to add order by, when I add order statement I get an error
ERROR: column "items.id" must appear in the GROUP BY clause or be used in an aggregate function
My query is.
WITH "has_children_cte"
AS (SELECT DISTINCT "parent_id" AS "item_id",
1 AS "has_children"
FROM "items")
SELECT "item_category_id",
Count(*) AS "count"
FROM "items"
INNER JOIN "items" AS "root_item"
ON ( "root_item"."id" = "items"."root_id" )
LEFT JOIN "item_types"
ON ( "items"."item_type_id" = "item_types"."id" )
LEFT JOIN "item_categories"
ON ( "item_categories"."id" = "item_types"."item_category_id" )
INNER JOIN "order_items"
ON ( "items"."order_item_id" = "order_items"."id" )
INNER JOIN "orders"
ON ( "order_items"."order_id" = "orders"."id" )
LEFT JOIN "has_children_cte"
ON ( "items"."id" = "has_children_cte"."item_id" )
WHERE ( ( "items"."parent_id" IS NULL )
AND ( "items"."state" != 'discarded' ) )
GROUP BY "item_category_id"
ORDER BY "items"."id";
I have add the ORDER BY "items"."id";
Then I get this error. When I try to add items.id into group by I got bad results.
Unfortunately I am unable to handle this error.
The ORDER BY (logically) takes place after the aggregation. And after the aggregation, "items"."id" is not available in each row.
So just use an aggregation function:
ORDER BY MIN("items"."id")

How to use aggregated result in order by clause with math operation in Postgres

I have a query as below
select "products".*,
AVG(score_values.score) as average_scores,
(select count(*) from "comments" where "products"."id" = "comments"."product_id") as comments_count
from "products"
inner join "score_values" on "products"."id" = "score_values"."product_id" and "score_values"."active" = 1
group by "products"."id"
order by average_scores desc
limit 5
When I add math operator to order clause, I get error which is column not exists.
order by average_scores * 0.9 + comments_count * 5 / 1000 desc
[42703] ERROR: column "average_scores" does not exist
How can I solve this problem ?
You have two options:
Repeat the expressions in the ORDER BY clause:
ORDER BY AVG(score_values.score) * 0.9
+ (select count(*) from "comments"
where "products"."id" = "comments"."product_id") * 5 / 1000
Use a subquery like GMB's answer suggests.
The second option is the better one.
Note that this behavior is documented:
A sort_expression can also be the column label or number of an output column, as in:
SELECT a + b AS sum, c FROM table1 ORDER BY sum;
SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
both of which sort by the first output column. Note that an output column name has to stand alone, that is, it cannot be used in an expression — for example, this is not correct:
SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- wrong
This restriction is made to reduce ambiguity.
You can use aggregate expressions in the order by clause. The aggregates will be calculated once.
select
products.*,
avg(score_values.score) as average_scores,
count(comments.*) as comments_count
from products
inner join comments on products.id = comments.product_id
inner join score_values on products.id = score_values.product_id and score_values.active = 1
group by products.id
order by avg(score_values.score) * 0.9 + count(comments.*) * 5 / 1000 desc
limit 5
You could work around this by wrapping your query as a subquery, and the order in the outer query, like:
select *
from (
select "products".*,
AVG(score_values.score) as average_scores,
(select count(*) from "comments" where "products"."id" = "comments"."product_id") as comments_count
from "products"
inner join "score_values" on "products"."id" = "score_values"."product_id" and "score_values"."active" = 1
group by "products"."id"
limit 5
) x
order by average_scores * 0.9 + comments_count * 5 / 1000 desc

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...