DQL Select returns several rows for repeatable values - documentum

I am using Documentum Developer Edition 6.6. I have run the following DQL:
select "r_object_id", "r_modify_date", "r_version_label","i_position" ,"object_name" from "dm_document" where FOLDER (ID('0bde75d18000cfa4')) and "r_object_type"='dm_document' order by "r_modify_date" asc, "i_position" desc
I expected: the DQL will return one row for each dm_documentum object. I remember – my earlier requests with this DQL did it – one roe for each document. But today I see: for some of the dm_document objects only one row returned; whereas for other dm_document objects several rows are returned per object! Like the following:
09de75d18000d514 7/28/2011 3:41 PM 1.0,CURRENT -1,-2 Doc1
09de75d18000d515 7/28/2011 3:41 PM 1.0 -1 Doc2
...
09de75d18000d515 7/28/2011 3:41 PM CURRENT -2 Doc2
In other words – for a the 09de75d18000d514 one row (with repeatable “r_version_label" and "i_position" as arrays) was returned; whereas for another document 09de75d18000d515 the repeatable properties were returned as separate rows.
Why is that? For me, this looks like a bug – because of the documents 09de75d18000d514 and 09de75d18000d515 have no essential differences; they are just usual dm_document instances, nothing more.
And the more important question is: what can I do? I see the problem disappears if I remove the “"i_position" desc” from the DQL – then each dm_object is returned as single row. But I needed this “"i_position" desc” sorting to have “r_version_label" sorted in accordance to corresponding values of the "i_position" (each item of “r_version_label" array corresponds to an item of the "i_position" array that contains its “position number”).
Maybe this my assumption – that Documentum should order the “r_version_label" accordingly to the "i_position" because of I specified “"i_position" desc” – was wrong? If so, now I see the only way to cope with this:
I use the DQL without the “"i_position" desc”
My software (it uses DFS) will sort the “r_version_label" items itself - after the DQL brought the results - using their indexes from "i_position"
Maybe some better solution is available?

I assume you want to get rows which have r_version_label in the same order than in the objects if they had been fetched.
I know that for that you can use 'order by r_object_id, i_position desc'.
Since you want ordering on r_modify_date as well, you could try 'order by r_modify_date asc, r_object_id, i_position desc' or just do the date sorting in your code.

Related

How to get the data of the newly accessed record by a query on PostgreSQL using it's internal variables and functions?

Let's say I have the following 'items' table in my PostgreSQL database:
id
item
value
1
a
10
2
b
20
3
c
30
For some reason I can't control I need to run the following query:
select max(value) from items;
which will return 30 as the result.
At this point, I know that I can find the record that contains that value using simple select statements, etc. That's not the actual problem.
My real questions are:
Does PostgreSQL know (behind the scenes) what's is the ID of that
record, although the query shows only the max value of the column
'value'?
If yes, can I have access to that information and,
therefore, get the ID and other data from the found record?
I'm not allowed to create indexes and sequences, or change way the max value is retrieved. That's a given. I need to work from that point onward and find a solution (which I have, actually, from regular query work).
I'm just guessing that the database knows in which record that information (30) is and that I could have access to it.
I've been searching for an answer for a couple of hours but wasn't able to find anything.
What am I missing? Any ideas?
Note: postgres (PostgreSQL) 12.5 (Ubuntu 12.5-0ubuntu0.20.10.1)
You can simply extract the whole record that contains max(value) w/o bothering about Postgres internals like this:
select id, item, "value"
from items
order by "value" desc
limit 1;
I do not think that using undocumented "behind the scenes" ways is a good idea at all. The planner is smart enough to do exactly what you need w/o extra work.

Why is there no `select last` or `select bottom` in SQL Server like there is `select top`?

I know this might probably sound like a stupid question, but please bear with me.
In SQL-server we have
SELECT TOP N ...
now in that we can get the first n rows in ascending order (by default), cool. If we want records to be sorted on any other column, we just specify that in the order by clause, something like this...
SELECT TOP N ... ORDER BY [ColumnName]
Even more cool. But what if I want the last row? I just write something like this...
SELECT TOP N ... ORDER BY [ColumnName] DESC
But there is a slight concern with that. I said concern and not issue because it isn't actually an issue. By this way, I could get the last row based on that column, but what if I want the last row that was inserted. I know about SCOPE_IDENTITY, IDENT_CURRENT and ##IDENTITY, but consider a heap (a table without a clustered index) without any identity column, and multiple accesses from many places (please don't go into this too much as to how and when these multiple operation are happening, this doesn't concern the main thing). So in this case there doesn't seems to be an easy way to find which row was actually inserted last. Some might answer this as
If you do a select * from [table] the last row shown in the sql result window will be the last one inserted.
To anything thinking about this, this is not actually the case, at least not always and one that you can always rely on (msdn, please read the Advanced Scanning section).
So the question boils down to this, as in the title itself. Why doesn't SQL Server have a
SELECT LAST
or say
SELECT BOTTOM
or something like that, where we don't have to specify the Order By and then it would give the last record inserted in the table at the time of executing the query (again I am not going into details about how would this result in case of uncommitted reads or phantom reads).
But if still, someone argues that we can't talk about this without talking about these read levels, then, for them, we could make it behave as the same way as TOP work but just the opposite. But if your argument is then we don't need it as we can always do
SELECT TOP N ... ORDER BY [ColumnName] DESC
then I really don't know what to say here. I know we can do that, but are there any relation based reason, or some semantics based reason, or some other reason due to which we don't have or can't have this SELECT LAST/BOTTOM. I am not looking for way to does Order By, I am looking for reason as to why do don't have it or can't have it.
Extra
I don't know much about how NOSQL works, but I've worked (just a little bit) with mongodb and elastic search, and there too doesn't seems to be anything like this. Is the reason they don't have it is because no one ever had it before, or is it for some reason not plausible?
UPDATE
I don't need to know that I need to specify order by descending or not. Please read the question and understand my concern before answering or commenting. I know how will I get the last row. That's not even the question, the main question boils down to why no select last/bottom like it's counterpart.
UPDATE 2
After the answers from Vladimir and Pieter, I just wanted to update that I know the the order is not guaranteed if I do a SELECT TOP without ORDER BY. I know from what I wrote earlier in the question might make an impression that I don't know that's the case, but if you just look a further down, I have given a link to msdn and have mentioned that the SELECT TOP without ORDER BY doesn't guarantees any ordering. So please don't add this to your answer that my statement in wrong, as I have already clarified that myself after a couple of lines (where I provided the link to msdn).
You can think of it like this.
SELECT TOP N without ORDER BY returns some N rows, neither first, nor last, just some. Which rows it returns is not defined. You can run the same statement 10 times and get 10 different sets of rows each time.
So, if the server had a syntax SELECT LAST N, then result of this statement without ORDER BY would again be undefined, which is exactly what you get with existing SELECT TOP N without ORDER BY.
You have stressed in your question that you know and understand what I've written below, but I'll still keep it to make it clear for everyone reading this later.
Your first phrase in the question
In SQL-server we have SELECT TOP N ... now in that we can get the
first n rows in ascending order (by default), cool.
is not correct. With SELECT TOP N without ORDER BY you get N "random" rows. Well, not really random, the server doesn't jump randomly from row to row on purpose. It chooses some deterministic way to scan through the table, but there could be many different ways to scan the table and server is free to change the chosen path when it wants. This is what is meant by "undefined".
The server doesn't track the order in which rows were inserted into the table, so again your assumption that results of SELECT TOP N without ORDER BY are determined by the order in which rows were inserted in the table is not correct.
So, the answer to your final question
why no select last/bottom like it's counterpart.
is:
without ORDER BY results of SELECT LAST N would be exactly the same as results of SELECT TOP N - undefined.
with ORDER BY result of SELECT LAST N ... ORDER BY X ASC is exactly the same as result of SELECT TOP N ... ORDER BY X DESC.
So, there is no point to have two key words that do the same thing.
There is a good point in the Pieter's answer: the word TOP is somewhat misleading. It really means LIMIT result set to some number of rows.
By the way, since SQL Server 2012 they added support for ANSI standard OFFSET:
OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
[
FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
]
Here adding another key word was justified that it is ANSI standard AND it adds important functionality - pagination, which didn't exist before.
I would like to thank #Razort4x here for providing a very good link to MSDN in his question. The "Advanced Scanning" section there has an excellent example of mechanism called "merry-go-round scanning", which demonstrates why the order of the results returned from a SELECT statement cannot be guaranteed without an ORDER BY clause.
This concept is often misunderstood and I've seen many question here on SO that would greatly benefit if they had a quote from that link.
The answer to your question
Why doesn't SQL Server have a SELECT LAST or say SELECT BOTTOM or
something like that, where we don't have to specify the ORDER BY and
then it would give the last record inserted in the table at the time
of executing the query (again I am not going into details about how
would this result in case of uncommitted reads or phantom reads).
is:
The devil is in the details that you want to omit. To know which record was the "last inserted in the table at the time of executing the query" (and to know this in a somewhat consistent/non-random manner) the server would need to keep track of this information somehow. Even if it is possible in all scenarios of multiple simultaneously running transactions, it is most likely costly from the performance point of view. Not every SELECT would request this information (in fact very few or none at all), but the overhead of tracking this information would always be there.
So, you can think of it like this: by default the server doesn't do anything specific to know/keep track of the order in which the rows were inserted, because it affects performance, but if you need to know that you can use, for example, IDENTITY column. Microsoft could have designed the server engine in such a way that it required an IDENTITY column in every table, but they made it optional, which is good in my opinion. I know better than the server which of my tables need IDENTITY column and which do not.
Summary
I'd like to summarise that you can look at SELECT LAST without ORDER BY in two different ways.
1) When you expect SELECT LAST to behave in line with existing SELECT TOP. In this case result is undefined for both LAST and TOP, i.e. result is effectively the same. In this case it boils down to (not) having another keyword. Language developers (T-SQL language in this case) are always reluctant to add keywords, unless there are good reasons for it. In this case it is clearly avoidable.
2) When you expect SELECT LAST to behave as SELECT LAST INSERTED ROW. Which should, by the way, extend the same expectations to SELECT TOP to behave as SELECT FIRST INSERTED ROW or add new keywords LAST_INSERTED, FIRST_INSERTED to keep existing keyword TOP intact. In this case it boils down to the performance and added overhead of such behaviour. At the moment the server allows you to avoid this performance penalty if you don't need this information. If you do need it IDENTITY is a pretty good solution if you use it carefully.
There is no select last because there is no need for it. Consider a "select top 1 * from table" . Top 1 would get you the first row that is returned. And then the process stops.
But there is no guarantees about ordering if you don't specify an order by. So it may as well be any row in the dataset you get back.
Now do a "select last 1 * from table". Now the database will have to process all the rows in order to get you the last one.
And because ordering is non-deterministic, it may as well be the same result as from the select "top 1".
See now where the problem comes? Without an order by top and last are actually the same, only "last" will take more time. And with an order by, there's really only a need for top.
SELECT TOP N ...
now in that we can get the first n rows in ascending order (by
default), cool. If we want records to be sorted on any other column,
we just specify that in the order by clause, something like this...
What you say here is totally wrong and absolutely NOT how it works. There is no guarantee on what order you get. Ascending order on what ?
create table mytest(id int, id2 int)
insert into mytest(id,id2)values(1,5),(2,4),(3,3),(4,2),(5,1)
select top 1 * from mytest
select * from mytest
create clustered index myindex on mytest(id2)
select top 1 * from mytest
select * from mytest
insert into mytest(id,id2)values(6,0)
select top 1 * from mytest
Try this code line by line and see what you get with the last "select top 1".....you get in this case the last inserted record.
update
I think you understand that "select top 1 * from table" basically means: "Select a random row from the table".
So what would last mean? "Select the last random row from the table?" Wouldn't the last random row from a table be conceptually the same as saying any 1 random row from the table? And if that's true, top and last are the same, so there is no need for last.
Update 2
In hindsight I was happier with the syntax mysql uses : LIMIT.
Top doesn't say anything about ordering it is only there to specify the number of rows to be returned.
Limits the rows returned in a query result set to a specified number of rows or percentage of rows in SQL Server 2014.
The reasons why SELECT LAST_INSERTED does not make sense.
It cannot be easily applied to non-heap tables.
Heap data can be freely moved by DBMS so those "natural" order is subject to change. To keep it the system needs some additional mechanism which seems to be a useless waste.
If really desired it can be simulated by adding some 'auto-increment' column.
SQL Server ordering is arbitrary unless otherwise stated. It's set based, therefore you must define what your set is. Correct SCOPE_IDENTITY() is the correct way to capture the last inserted record, or the OUTPUT clause. Why would you do inserts on a heap that you need to reference chronologically anyway?? That's super bad database design.

When no 'Order by' is specified, what order does a query choose for your record set?

I was always of the impression that a query with no specified 'Order by' rule, would order this by the results by what was specified within your where clause.
For instance, my where clause states:
WHERE RESULTS_I_AM_SEARCHING_FOR IN
ITEM 1
ITEM 2
ITEM 3
I would have imagined that the results returned for items 1, 2 and 3 would be in the order specified in the where, however this is not the case. Does anyone know what order it sorts them in when not specified?
Thanks and sorry for the really basic question!
Damon
If you don't specify an ORDER BY, then there is NO ORDER defined.
The results can be returned in an arbitrary order - and that might change over time, too.
There is no "natural order" or anything like that in a relational database (at least in all that I know of). The only way to get a reliable ordering is by explicitly specifying an ORDER BY clause.
Update: for those who still don't believe me - here's two excellent blog posts that illustrate this point (with code samples!) :
Conor Cunningham (Architect on the Core SQL Server Engine team): No Seatbelt - Expecting Order without ORDER BY
Alexander Kuznetsov: Without ORDER BY, there is no default sort order (post in the Web Archive)
With SQL Server, if no ORDER BY is specified, the results are returned in the quickest way possible.
Therefore without an ORDER BY, make no assumptions about the order.
As it was already said you should never rely on the "default order" because it doesn't exist. Anyway if you still want to know some curious details about sql server implementation you can check this out:
http://exacthelp.blogspot.co.uk/2012/10/default-order-of-select-statement-in.html

How to retain the order of results while using a IN Clause in DB2?

I need to get a set of results using the IN clause, but the default ordering is done and the results are returned. Is there a way to maintain the order of the in clause in db2 ?
ORDER BY FILED would be a solution in MySQL but is there an equivalent in DB2 ?
As I understand it, you want to do this:
select foo from table where bar in (3, 1, 2);
and order by which item bar matched. i.e. bar = 3 comes first, followed by 1, then 2.
I don't think there is a built-in way to do what you want in DB2.
However, take a look at this recent question, which discusses workarounds.
If you want results in a particular order, ORDER BY is how to do it. SQL does not guarantee an order unless you use ORDER BY. There is no relationship whatsoever between a sort order of a result set and the way you choose to list items in any IN() clause. An IN() clause has nothing to do with it.
Note that a specific sort order may be obtained at any time without ORDER BY purely by luck. However, it is not guaranteed. If rows change over time, a different sort order might show up without warning.
This is a SQL behavior, not DB2. DB2 simply works for this behavior the way SQL is intended to work.

Can I set NHibernate's default "OrderBy" to be "CreatedDate" not "Id"?

This is an oddball question I figure.
Can I get NHibernate to ask SQL to sort data by CreatedDate by default unless I set an OrderBy in my HQL or Criteria? I'm interested in knowing whether this sort can be accomplished at the DB level to avoid bringing in LINQ.
The reason is that I use GUIDs for Ids and when I do something like this:
Sheet sheet = sheetRepository.Get(_someGUID);
IList<SheetLineItems> lineItems = sheet.LineItems;
to fetch all of the lineItems, they come back in whatever arbitrary way that SQL sorts that fetch, which I figure is GUID. At some point I'll add ordinals to my line items, but for now, I just want to use CreatedDate as the sort criteria. I don't want to be forced to do:
IList<SheetLineItem> lineItems = sheetLineItemRepository.GetAll(_sheetGUID);
and then writing that method to sort by CreatedDate. I figure if everything is just sorted on CreatedDate by default, that would be fine, unless specifically requested otherwise.
You don't need to write a method to do the sorting, just use LINQ's OrderBy extension method:
sheetLineItemRepository.GetAll(_sheetGUID).OrderBy(x => x.CreatedDate);
You could put a clustered index on CreatedDate in the database and then you will probably get the records in this order but you definitely shouldn't rely on it.
No, you can't. The best solution (as you mention in a comment) is to set the order-by attribute in the collection mapping. Note that the value should be set to the database column name, not the property name.