Parent-child sql query with order by and limit - sql

I have two tables DOCUMENT and ATTRIBUTES like these
DOCUMENT(id),
ATTRIBUTE(name, value, doc_fk).
I need to run a query that works like this "abstract query"
select top 100 documents
where $state='COMPLETED'
order by $creationDate
Where $state and $creationDate are two attributes.
Note that the limit is on documents, not attributes, and sort and filter are on two different attributes. The final query should return all document attributes, not only the filtered/sorted ones.
I was able to write this with a very complex query and I'm looking for better alternatives. I could post my solution if useful, but I do not want to point you in the, possibly, wrong direction.
It's ok to get a FEW extra documents, like 1000 instead of 100, and filter/sort in memory.
Could be ok for the limit not to be exact, like 74 instead of the required limit 100, but not too far from it.
Extra "soft" requirements:
the query should work with several databases (oracle, mysql and sqlserver), so weird analytic functions should be avoided unless available on all platforms
should work with JPA (eclipselink 2.4.0 implementation)
The expected output is something like this
DOC_ID ATTRIBUTE_NAME VALUE
123 state COMPLETED
123 creationDate 21/11/2012
123 userid someone
456 state COMPLETED
...

Ah, the flaws of an EAV design.
Try this.
select
top 100
document.*
from document
inner join attribute astate on document.id = astate.doc_fk
and astate.name='state'
and astate.value = 'completed'
inner join attribute acreation on document.id = acreation.doc_fk
and acreation.name='creationdate'
order by cast(acreation.value as date)
But it's only going to get more complicated if you persist with this EAV structure.
(PS. MySQL doesn't use TOP, but LIMIT instead)

SELECT doc_id, attr_name, attr_val, creationDate FROM
(
SELECT * FROM (
SELECT
doc.id as 'doc_id', attr.name as 'attr_name', null as 'attr_val', attr.value as 'creationDate'
FROM
ATTRIBUTE attr
LEFT JOIN
DOCUMENT doc ON attr.doc_fk = doc.id
WHERE
attr.name='creationDate'
ORDER BY creationDate desc;
) AS dt1
UNION ALL
SELECT * FROM(
SELECT
doc.id as 'doc_id', attr.name as 'attr_name', attr.value as 'attr_val', null as 'creationDate'
FROM
ATTRIBUTE attr
LEFT JOIN
DOCUMENT doc ON attr.doc_fk = doc.id;
) as dt2
) as dt0 GROUP BY doc_id ORDER by creationDate desc LIMIT 100;
Derived table 1 (dt1) gives you all the date attributes - to enable order your results by document's creation date.
Derived table 2 gives you all the attribute.. all put together by "union all", enables you to group by document, then order by the date of creation.
Hope this is in the right direction.

Related

Agregating a subquery

I try to find what I missed in the code to retrieve the value of "Last_Maintenance" in a table called "Interventions".
I try to understand the order rules of SQL and the particularities of subqueries without success.
Did I missed something, something basic or an important step?
---Interventions with PkState "Schedule_Visit" with the Last_Maintenance aggregation
SELECT Interventions.ID AS Nro_Inter,
--Interventions.PlacesList AS Nro_Place,
MaintenanceContracts.Num AS Nro_Contract,
Interventions.TentativeDate AS Schedule_Visit,
--MaintenanceContracts.NumberOfVisits AS Number_Visits_Contracts,
--Interventions.VisitNumber AS Visit_Number,
(SELECT MAX(Interventions.AssignmentDate)
FROM Interventions
WHERE PkState = 'AE4B42CF-0003-4796-89F2-2881527DFB26' AND PkMaintenanceContract IS NOT NULL) AS Last_Maintenance --PkState "Maintenance Executed"
FROM Interventions
INNER JOIN MaintenanceContracts ON MaintenanceContracts.Pk = Interventions.PkMaintenanceContract
WHERE PkState = 'AE4B42CF-0000-4796-89F2-2881527ABC26' AND PkMaintenanceContract IS NOT NULL --PkState "Schedule_Visit"
GROUP BY Interventions.AssignmentDate,
Interventions.ID,
Interventions.PlacesList,
MaintenanceContracts.Num,
Interventions.TentativeDate,
MaintenanceContracts.NumberOfVisits,
Interventions.VisitNumber
ORDER BY Nro_Contract
I try to use GROUP BY and HAVING clause in a sub query, I did not succeed. Clearly I am lacking some understanding.
Output
The output of "Last_Maintenance" is the last date of entire contracts in the DB, which is not the desirable output. The desirable output is to know the last date the maintenance was executed for each row, meaning, for each "Nro-Contract". Somehow I need to aggregate like I did below.
In opposition of what mention I did succeed in another table.
In the table Contracts I did had success as you can see.
SELECT
MaintenanceContracts.Num AS Nro_Contract,
MAX(Interventions.AssignmentDate) AS Last_Maintenance
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
FROM MaintenanceContracts
INNER JOIN Interventions ON Interventions.PkMaintenanceContract = MaintenanceContracts.Pk
WHERE MaintenanceContracts.ActiveContract = 2 OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
GROUP BY MaintenanceContracts.Num, MaintenanceContracts.Name,
MaintenanceContracts.StartDate,
MaintenanceContracts.EndDate
ORDER BY Nro_Contract
I am struggling to understanding how nested queries works and how I can leverage in a simple manner the queries.
I think you're mixed up in how aggregation works. The MAX function will get a single MAX value over the entire dataset. What you're trying to do is get a MAX for each unique ID. For that, you either use derived tables, subqueries or windowed functions. I'm a fan of using the ROW_NUMBER() function to assign a sequence number. If you do it correctly, you can use that row number to get just the most recent record from a dataset. From your description, it sounds like you always want to have the contract and then get some max values for that contract. If that is the case, then you're second query is closer to what you need. Using windowed functions in derived queries has the added benefit of not having to worry about using the GROUP BY clause. Try this:
SELECT
MaintenanceContracts.Num AS Nro_Contract,
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
i.AssignmentDate as Last_Maintenance
FROM MaintenanceContracts
INNER JOIN (
SELECT *
--This fuction will order the records for each maintenance contract.
--The most recent intervention will have a row_num = 1
, ROW_NUMBER() OVER(PARTITION BY PkMaintenanceContract ORDER BY AssignmentDate DESC) as row_num
FROM Interventions
) as i
ON i.PkMaintenanceContract = MaintenanceContracts.Pk
AND i.row_num = 1 --Used to get the most recent intervention.
WHERE MaintenanceContracts.ActiveContract = 2
OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
ORDER BY Nro_Contract
;

SAP Query IMRG Measure documents

I'm learning SAP queries.
I want to get all the Measure documents from an equipement.
To do that, I use 3 tables :
EQUI, IMPTT, IMRG
The query works but I have all documents instead I only want to get the last one by Date. But I can't do that. I'm sure that I have to add a custom field, but I have tried but none of them works.
For example, my last code :
select min( IMRG~INVTS ) IMRG~RECDV
from IMRG inner join IMPTT on
IMRG~POINT = IMPTT~POINT into (INVTS, IMRGVAL)
where IMRG~POINT = IMPTT-POINT AND
IMPTT~MPOBJ = EQUI-OBJNR
and IMRG~CANCL = '' group by IMRG~MDOCM IMRG~RECDV.
ENDSELECT.
Thanks for your help.
You will need to get the date from IMRG, and the inverted timestamp field, so the MIN() of this will be the most recent - that looks correct.
However your GROUP BY looks wrong. You should be grouping on the IMPTT~POINT field so that you get one record per measurement point. Note that one Point IMPTT can have many measurements (IMRG), so something like this:
SELECT EQUI-OBJNR, IMPTT~POINT, MIN(IMRG~IMRC_INVTS)
...
GROUP BY EQUI-OBJNR, IMPTT~POINT
If I got you correctly, you are trying to get the freshest measurement of the equipment disregard of measurement point. So you can try this query, which is not so beautiful, but it just works.
SELECT objnr COUNT(*) MIN( invts )
FROM equi AS eq
JOIN imptt AS tt
ON tt~mpobj = eq~objnr
JOIN imrg AS ig
ON ig~point = tt~point
INTO (wa_objnr, count, wa_invts)
WHERE ig~cancl = ''
GROUP BY objnr.
SELECT SINGLE recdv FROM imrg JOIN imptt ON imptt~point = imrg~point INTO wa_imrgval WHERE invts = wa_invts AND imptt~mpobj = wa_objnr.
WRITE: / wa_objnr, count, wa_invts, wa_imrgval.
ENDSELECT.

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

SQL query on a condition

I'm writing a query to retrieve translated content. I want it so that if there isn't a translation for the given language id, it automatically returns the translation for the default language, with Id 1.
select Translation.Title
,Translation.Summary
from Translation
where Translation.FkLanguageId = 3
-- If there is no LanguageId of 3, select the record with LanguageId of 1.
I'm working in MS SQL but I think the theory is not DBMS-specific.
Thanks in advance.
This assumes one row per Translation only, based on how you phrased the question. If you have multiple rows per FkLanguageId and I've misunderstood, please let us know and the query becomes more complex of course
select TOP 1
Translation.Title
,Translation.Summary
from
Translation
where
Translation.FkLanguageId IN (1, 3)
ORDER BY
FkLanguageId DESC
You'd use LIMIT in another RDBMS
Assuming the table contains different phrases grouped by PhraseId
WITH Trans As
(
select Translation.Title
,Translation.Summary
,ROW_NUMBER() OVER (PARTITION BY PhraseId ORDER BY FkLanguageId DESC) RN
from Translation
where Translation.FkLanguageId IN (1,3)
)
SELECT *
FROM Trans WHERE RN=1
This assumes the existance of a TranslationKey that associates one "topic" with several different translation languages:
SELECT
isnull(tX.Title, t1.Title) Title
,isnull(tX.Summary, t1.Summary) Summary
from Translation t1
left outer join Translation tX
on tx.TranslationKey = t1.Translationkey
and tx.FkLanguageId = #TargetLanguageId
where t1.FkLanguageId = 1 -- "Default
Maybe this is a dirty solution, but it can help you
if not exists(select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3)
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 1
else
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3
Since your reference to pastie.org shows that you're looking up phrases or specific menu item names in a table I'm going to assume that there is a phrase ID to identify the phrases in question.
SELECT ISNULL(forn_lang.Title, default_lang.Title) Title,
ISNULL(forn_lang.Summary, default_lang.Summary) Summary
FROM Translation default_lang
LEFT OUTER JOIN Translation forn_lang ON default_lang.PhraseID = forn_lang.PhraseID AND forn_lang.FkLanguageId = 3
WHERE default_lang.FkLanguageId = 1

NHibernate Return Values

I am currently working on a project using NHiberate as the DAL with .NET 2.0 and NHibernate 2.2.
Today I came to a point where I had to join a bunch of entities/collections to get what I want. That is fine.
What got me was that I do not want the query to return a list of objects of a certain entity type but rather the result would include various properties from different entities.
The following query is not what I am doing but it is kind of query that I am talking about here.
select order.id, sum(price.amount), count(item)
from Order as order
join order.lineItems as item
join item.product as product,
Catalog as catalog
join catalog.prices as price
where order.paid = false
and order.customer = :customer
and price.product = product
and catalog.effectiveDate < sysdate
and catalog.effectiveDate >= all (
select cat.effectiveDate
from Catalog as cat
where cat.effectiveDate < sysdate
)
group by order
having sum(price.amount) > :minAmount
order by sum(price.amount) desc
My question is, in this case what type result is supposed to be returned? It is certainly not of type Order, neither is of type LineItems.
Thanks for your help!
John
you can always use List of object[] for returning data and it will work fine.
This is called a projection, and it happens any time you specify an explicit select clause that contains rows from various tables (or even aggregate / summary data from a single table).
Using LINQ you can create anonymous objects to store these rows of data, like this:
var crunchies = (from foo in bar
where foo.baz == quux
select new { foo.corge, foo.grault }).ToList();
Then you can do crunchies[0].corge for example to pull out the rows & columns.
If you are using NHibernate.Linq this will "just work".
If you're using HQL or Criteria API, then what Fahad mentioned will work. You'll get a List<object[]> as a result, and the index of the array references the order of the columns that you returned in your select clause.