Set limit to array_agg() - sql

I have the following Postgres query:
SELECT array_agg("Esns".id )
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
LIMIT 2;
The limit will affect the rows. I want it to limit the array_agg() to 2 items. The following query works but I get my output with each entry in quotes:
SELECT array_agg ("temp")
FROM (
SELECT "Esns".id
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
LIMIT 4
) as "temp" ;
This give me the following output
{(13),(14),(15),(12)}
Any ideas?

select id[1], id[2]
from (
SELECT array_agg("Esns".id ) as id
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
) s
or if you want the output as array you can slice it:
SELECT (array_agg("Esns".id ))[1:2] as id_array
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2

The parentheses (not "quotes") in the result are decorators for the row literals. You are building an array of whole rows (which happen to contain only a single column). Instead, aggregate only the column.
Also, direct array construction from a query result is typically simpler and faster:
SELECT ARRAY (
SELECT e.id
FROM public."Esns" e
JOIN public."PurchaseOrderItems" p ON p.id = e."PurchaseOrderItemId"
WHERE p."GradeId" = 2
-- ORDER BY ???
LIMIT 4 -- or 2?
)
You need to ORDER BY something if you want a stable result and / or pick certain rows. Otherwise the result is arbitrary and can change with every next call.
While being at it I rewrote the query with explicit JOIN syntax, which is generally preferable, and used table aliases to simplify.

Related

Agregating a subquery

I try to find what I missed in the code to retrieve the value of "Last_Maintenance" in a table called "Interventions".
I try to understand the order rules of SQL and the particularities of subqueries without success.
Did I missed something, something basic or an important step?
---Interventions with PkState "Schedule_Visit" with the Last_Maintenance aggregation
SELECT Interventions.ID AS Nro_Inter,
--Interventions.PlacesList AS Nro_Place,
MaintenanceContracts.Num AS Nro_Contract,
Interventions.TentativeDate AS Schedule_Visit,
--MaintenanceContracts.NumberOfVisits AS Number_Visits_Contracts,
--Interventions.VisitNumber AS Visit_Number,
(SELECT MAX(Interventions.AssignmentDate)
FROM Interventions
WHERE PkState = 'AE4B42CF-0003-4796-89F2-2881527DFB26' AND PkMaintenanceContract IS NOT NULL) AS Last_Maintenance --PkState "Maintenance Executed"
FROM Interventions
INNER JOIN MaintenanceContracts ON MaintenanceContracts.Pk = Interventions.PkMaintenanceContract
WHERE PkState = 'AE4B42CF-0000-4796-89F2-2881527ABC26' AND PkMaintenanceContract IS NOT NULL --PkState "Schedule_Visit"
GROUP BY Interventions.AssignmentDate,
Interventions.ID,
Interventions.PlacesList,
MaintenanceContracts.Num,
Interventions.TentativeDate,
MaintenanceContracts.NumberOfVisits,
Interventions.VisitNumber
ORDER BY Nro_Contract
I try to use GROUP BY and HAVING clause in a sub query, I did not succeed. Clearly I am lacking some understanding.
Output
The output of "Last_Maintenance" is the last date of entire contracts in the DB, which is not the desirable output. The desirable output is to know the last date the maintenance was executed for each row, meaning, for each "Nro-Contract". Somehow I need to aggregate like I did below.
In opposition of what mention I did succeed in another table.
In the table Contracts I did had success as you can see.
SELECT
MaintenanceContracts.Num AS Nro_Contract,
MAX(Interventions.AssignmentDate) AS Last_Maintenance
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
FROM MaintenanceContracts
INNER JOIN Interventions ON Interventions.PkMaintenanceContract = MaintenanceContracts.Pk
WHERE MaintenanceContracts.ActiveContract = 2 OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
GROUP BY MaintenanceContracts.Num, MaintenanceContracts.Name,
MaintenanceContracts.StartDate,
MaintenanceContracts.EndDate
ORDER BY Nro_Contract
I am struggling to understanding how nested queries works and how I can leverage in a simple manner the queries.
I think you're mixed up in how aggregation works. The MAX function will get a single MAX value over the entire dataset. What you're trying to do is get a MAX for each unique ID. For that, you either use derived tables, subqueries or windowed functions. I'm a fan of using the ROW_NUMBER() function to assign a sequence number. If you do it correctly, you can use that row number to get just the most recent record from a dataset. From your description, it sounds like you always want to have the contract and then get some max values for that contract. If that is the case, then you're second query is closer to what you need. Using windowed functions in derived queries has the added benefit of not having to worry about using the GROUP BY clause. Try this:
SELECT
MaintenanceContracts.Num AS Nro_Contract,
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
i.AssignmentDate as Last_Maintenance
FROM MaintenanceContracts
INNER JOIN (
SELECT *
--This fuction will order the records for each maintenance contract.
--The most recent intervention will have a row_num = 1
, ROW_NUMBER() OVER(PARTITION BY PkMaintenanceContract ORDER BY AssignmentDate DESC) as row_num
FROM Interventions
) as i
ON i.PkMaintenanceContract = MaintenanceContracts.Pk
AND i.row_num = 1 --Used to get the most recent intervention.
WHERE MaintenanceContracts.ActiveContract = 2
OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
ORDER BY Nro_Contract
;

Assistance with SQL Query (aggregating)

I have a requirement to create a Sales report and I have a sql query:
SELECT --top 1
t.branch_no as TBranchNo,
t.workstation_no as TWorkstation,
t.tender_ref_no as TSaleRefNo,
t.tender_line_no as TLineNo,
t.tender_code as TCode,
T.contribution as TContribution,
l.sale_line_no as SaleLineNo
FROM TENDER_LINES t
LEFT JOIN SALES_TX_LINES l
on t.branch_no = l.branch_no and t.workstation_no = l.workstation_no and t.tender_ref_no = l.sale_tx_no
where l.sale_tx_no = 2000293 OR l.sale_tx_no = 1005246 --OR sale_tx_no = 1005261
order by t.tender_ref_no asc,
l.sale_line_no desc
The results of the query look like the following:
The results I am trying to achieve is:
With only 1 line for transaction 2 either SaleLineNo 1 or 2, while still have=ing both lines for transaction 1 because the TCode is different.
Thanks
I am using SSQL2012.
Not exactly sure on what data you have, but you might want to try
GROUP BY TlineNo, TCode ...
But you have to keep a look on not to group by something that would result in duplicate contribution values.
You can use the ROW_NUMBER function that allows to partition the rows in groups, and number the lines inside each group starting by one. If you choose the right columns to define the partition, and keep only the rows with "row_number = 1`, you have solved the first part of your problem, i.e. discarding the lines that don't have to appear in the report. (See the sample sin the linked documentation, they're quite clear).
Once you have solved this problem, you simply have to repeat what you're doing, but on the result of this data, instead of the original data. You can use a view, a CTE, or a subselect to achieve your result, i.e.
With view:
CREATE VIEW FilteredData AS -- here the rank function query, then selct from the view
SELECT --here your current query --
FROM FilteredData
With CTE
WITH -- here the rank function query
SELECT -- your current querym, from the CTE
With subselect
SELECT -- your current query
FROM (SELECT FROM -- here the rank function query -- )
Appreciate your assistance with my query. After playing around, I have found a solution that works just as I want. It is as below: I did a Group by as hinted by #Yogesh86 on a few fields.
SELECT
MAX(t.branch_no) as TBranchNo,
Max(t.workstation_no) as TWorkstation,
t.tender_ref_no as TSaleRefNo,
Max(t.tender_line_no) as TLineNo,
t.tender_code as TCode,
MAx(T.contribution) as TContribution,
MAX(l.sale_line_no) as SaleLineNo
FROM TENDER_LINES t
LEFT JOIN SALES_TX_LINES l
on t.branch_no = l.branch_no and t.workstation_no = l.workstation_no and t.tender_ref_no = l.sale_tx_no
where l.sale_tx_no = 2000293 OR l.sale_tx_no = 1005246 --OR sale_tx_no = 1005261
GROUP BY
t.tender_ref_no,
t.tender_line_no,
t.tender_code

How to do an In Statement with a sub query returning 2 columns and one of the columns is a Count

I have this query:
SELECT *
FROM GUITARS.FENDER
WHERE FENDER.GUITARTYPE IN (
SELECT GUITARTYPE,Count(*)
FROM GUITARS.GUITAR_TYPE
WHERE GuitarColor = 'RED'
Group By GUITARTYPE
Having Count(*) = 1)
Basically I want to make sure I am only checking the Guitartypes that don't have duplicates with a count. The issue is the IN is only checking for 1 column, but i need the count(*)in there for instances of more than one guitar type. Is there a way to make this query work, or possible another way around doing the count.
You don't need to have the count() returned in the select statement, having the group by and the count() is sufficient.
SELECT *
FROM GUITARS.FENDER
WHERE FENDER.GUITARTYPE IN (
SELECT GUITARTYPE
FROM GUITARS.GUITAR_TYPE
WHERE GuitarColor = 'RED'
Group By GUITARTYPE
Having Count(*) = 1)
Adding the code so it looks right.

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

MAX Subquery in SQL Anywhere Returning Error

In sqlanywhere 12 I wrote the following query which returns two rows of data:
SELECT "eDatabase"."Vendor"."VEN_CompanyName", "eDatabase"."OrderingInfo"."ORD_Timestamp"
FROM "eDatabase"."OrderingInfo"
JOIN "eDatabase"."Vendor"
ON "eDatabase"."OrderingInfo"."ORD_VEN_FK" = "eDatabase"."Vendor"."VEN_PK"
WHERE ORD_INV_FK='7853' AND ORD_DefaultSupplier = 1
Which returns:
'**United Natural Foods IN','2018-02-07 15:05:15.513'
'Flora ','2018-02-07 14:40:07.491'
I would like to only return the row with the maximum timestamp in the column "ORD_Timestamp". After simply trying to select by MAX("eDatabase"."OrderingInfo"."ORD_Timestamp") I found a number of posts describing how that method doesn't work and to use a subquery to obtain the results.
I'm having difficulty creating the subquery in a way that works and with the following query I'm getting a syntax error on my last "ON":
SELECT "eDatabase"."Vendor"."VEN_CompanyName", "eDatabase"."OrderingInfo"."ORD_Timestamp"
FROM ( "eDatabase"."OrderingInfo"
JOIN
"eDatabase"."OrderingInfo"
ON "eDatabase"."Vendor"."VEN_PK" = "eDatabase"."OrderingInfo"."ORD_VEN_FK" )
INNER JOIN
(SELECT "eDatabase"."Vendor"."VEN_CompanyName", MAX("eDatabase"."OrderingInfo"."ORD_Timestamp")
FROM "eDatabase"."OrderingInfo")
ON "eDatabase"."Vendor"."VEN_PK" = "eDatabase"."OrderingInfo"."ORD_VEN_FK"
WHERE ORD_INV_FK='7853' AND ORD_DefaultSupplier = 1
Does anyone know how I can adjust this to make the query correctly select only the max ORD_Timestamp row?
try this:
SELECT TOP 1 "eDatabase"."Vendor"."VEN_CompanyName", "eDatabase"."OrderingInfo"."ORD_Timestamp"
FROM "eDatabase"."OrderingInfo"
JOIN "eDatabase"."Vendor"
ON "eDatabase"."OrderingInfo"."ORD_VEN_FK" = "eDatabase"."Vendor"."VEN_PK"
WHERE ORD_INV_FK='7853' AND ORD_DefaultSupplier = 1
order by "ORD_Timestamp" desc
this orders them biggest on to and say only hsow the top row