PostgreSQL ORDER BY with VIEWs - sql

Let's say I want to write a simple SELECT query that uses a VIEW:
CREATE TEMP VIEW people AS
SELECT
p.person_id
,p.full_name
,p.phone
FROM person p
ORDER BY p.last_name;
SELECT
p.*
,h.address
,h.appraisal
FROM people p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.last_name, h.appraisal;
The obvious problem here is that p.last_name is no longer available when I go to perform the final ORDER BY.
How can I sort the final query so that the original sequence of the people view follows through to the final query?
The simple solution here, is to just include p.last_name with the view. I don't want to do that - my real world example (much more complicated) makes that a problem.
I've done similar things with temp tables in the past. For example, I create the table with CREATE TEMP TABLE testing WITH OIDS and then do an ORDER BY testing.oid to pass through the original sequence.
Is it possible to do the same with views?

This is possible if you use row_number() over().
Here is an example:
SELECT
p.*
,h.address
,h.appraisal
FROM (SELECT *, row_number() over() rn FROM people) p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.rn, h.appraisal;
And here is the SQL Fiddle you can test with.
As #Erwin Brandstetter correctly points out, using rank() will produce the correct results and allow for sorting on additional fields (in this case, appraisal).
SELECT
p.*
,h.address
,h.appraisal
FROM (SELECT *, rank() over() rn FROM people) p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.rn, h.appraisal;
Think about it this way, using row_number(), it will always sort by that field only, regardless of any other sorting parameters. By using rank() where ties are the same, other fields can easily be search upon.
Good luck.

Building on the idea of #sgeddes, but use rank() instead:
SELECT p.*
, h.address
, h.appraisal
FROM (SELECT *, rank() OVER () AS rnk FROM people) p
LEFT JOIN homes h ON h.person_id = p.person_id
ORDER BY p.rnk, h.appraisal;
db<>fiddle here - demonstrating the difference
Old sqlfiddle

Create a row_number column and use it in the select.
CREATE TEMP VIEW people AS
SELECT
row_number() over(order by p.last_name) as i
,p.person_id
,p.full_name
,p.phone
FROM person p
SELECT
p.*
,h.address
,h.appraisal
FROM people p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.i, h.appraisal

Related

Why does the optimizer decide to self-join a table?

I'm analyzing my query that looks like this:
WITH Project_UnfinishedCount AS (
SELECT P.Title AS title, COUNT(T.ID) AS cnt
FROM PROJECT P LEFT JOIN TASK T on P.ID = T.ID_PROJECT AND T.ACTUAL_FINISH_DATE IS NULL
GROUP BY P.ID, P.TITLE
)
SELECT Title
FROM Project_UnfinishedCount
WHERE cnt = (
SELECT MAX(cnt)
FROM Project_UnfinishedCount
);
It returns a title of a project that has the biggest number of unfinished tasks in it.
Here is its execution plan:
I wonder why it has steps 6-8 that look like self-join of project table? And than it stores the result of the join as a view, but the view, according to rows and bytes columns is the same as project table. Why does he do it?
I'd also like to know what 2 and 1 steps stand for. I guess, 2 stores the result of my CTE to use it in steps 10-14 and 1 removes the rows from the view that don't have the 'cnt' value that was returned by the subquery, is this a correct guess?
In addition to the comments above, when you reference a CTE more than once, there is a heuristic that tells the optimizer to materialize the CTE, which is why you see the temp table transformation.
A few other comments/questions regarding this query. I'm assuming that the relationship is that a PROJECT can have 0 or more tasks, and each TASK is for one and only one PROJECT. In that case, I wonder why you have an outer join? Moreover, you are joining on the ACTUAL_FINISH_DATE column. This would mean that if you have a project, where all the task were complete, then the outer join would materialize the non-matching row, which would make your query results appear to indicate that there was 1 unfinished task. So I think your CTE should look more like:
SELECT P.Title AS title, COUNT(T.ID) AS cnt
FROM PROJECT P
JOIN TASK T on P.ID = T.ID_PROJECT
WHERE T.ACTUAL_FINISH_DATE IS NULL
GROUP BY P.ID, P.TITLE
With all that being said, these "find the match (count, max etc) within a group" type of queries are often more efficient when written as a window function. That way you can eliminate the self join. This can make a big performance difference when you have millions or billions of rows. So for example, your query could be re-written as:
SELECT TITLE, CNT
FROM (
SELECT P.Title AS title, COUNT(T.ID) AS cnt
, RANK() OVER( ORDER BY COUNT(*) DESC ) RNK
FROM PROJECT P
JOIN TASK T on P.ID = T.ID_PROJECT
WHERE T.ACTUAL_FINISH_DATE IS NULL
GROUP BY P.ID, P.TITLE
)
WHERE RNK=1

simple SQL subquery

It seems im a complete idiot when it comes to SQL....
All i need is get one value from other table, but there is multiple rows with same customerId on second table.. and i would need to get one with highest timestamp
CREATE OR REPLACE VIEW CUS_SETTINGS as
SELECT
c.id as id,
c.LANG as Language,
c.ALLOWEMAIL as AllowEmail,
l.CONFIRMED as confirmed
FROM cus.CUSTOMER c
????? something with l
/
LEFT JOIN will bring every row so i have multiple duplicate id's
What i need is propably subquery, but i cant get it to work...
(SELECT CONFIRMED FROM settings WHERE ?? c.id == l.id ?? AND MAX(TIMESTAMP) )
i've tried many many variations of joins and subqueries.. but for some reason.. SQL is just
too confusing....
You can use a ROW_NUMBER() in the subquery:
SELECT c.id as id, c.LANG as Language, c.ALLOWEMAIL as AllowEmail,
l.CONFIRMED as confirmed
FROM cus.CUSTOMER c JOIN
(SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY s.timestamp DESC) as seqnum
FROM settings s
) s
ON s.id = c.id AND s.seqnum = 1;
Note: You might want a LEFT JOIN if you want to keep all customers, even those with no settings.
You can use analytic functions. MAX() OVER(PARTITION BY) clause can give you max timestamped id.
Analytic Functions Docs 11gR2
SELECT SECONDD.CONFIRMED
FROM CUSTOMER CU
INNER JOIN
(SELECT
*
FROM (SELECT SECONDD.*,
MAX (S.TIMESTAMP) OVER (PARTITION BY S.ID)
AS MAXTIMESTAMP
FROM SETTINGS SECONDD)
WHERE TIMESTAMP = MAXTIMESTAMP) SECONDD
ON SECONDD.ID = CU.ID
Don't worry; while this sounds very basic, it isn't :-)
The easiest way to get the CONFIRMED for the latest TIMESTAMP in Oracle is KEEP LAST. E.g.:
SELECT customer_id, MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp)
FROM settings
GROUP BY customer_id;
The related CREATE VIEW statement:
CREATE OR REPLACE VIEW cus_settings as
SELECT
c.id AS id,
c.lang AS language,
c.allowemail AS allowemail,
l.last_confirmed AS confirmed
FROM cus.customer c
LEFT JOIN
(
SELECT
customer_id,
MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp) AS last_confirmed
FROM settings
GROUP BY customer_id;
) l ON l.customer_id = c.id;
Or:
CREATE OR REPLACE VIEW cus_settings as
SELECT
c.id AS id,
c.lang AS language,
c.allowemail AS allowemail,
(
SELECT MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp)
FROM settings s
WHERE s.customer_id = c.id
) AS confirmed
FROM cus.customer c;

Sqlite - get numbered rows

I am retrieving list of persons from a database and each person has some points. What I want to achieve is to get all person information along with person's points and rank. Points are calculated on the go, because they are not stored within the entity and the query looks something like that:
SELECT p.<some person attributes>, s.points, [here I need rank] as rank
FROM Persons p LEFT JOIN <subquery calculating points> s
ON p.id = s.personId
ORDER BY s.points DESC
In my select part I need to get a position in ranking of a person (what is basically order of returned rows, since I order it by points, right?)
Is there any sql/sqlite column or function to return that?
This is exactly what window functions are for. Specifically, dense_rank will also take care of pesky edge-cases where several users have the same number of points:
SELECT p.<some person attributes> s.points,
DENSE_RANK() OVER (ORDER BY points DESC) as "rank"
FROM Persons p
LEFT JOIN <subquery calculating points> s ON p.id = s.personId
ORDER BY s.points DESC
Unfortunately, SQLite is not very good at this. You pretty much need to resort to a correlated subquery:
with s as (
<subquery calculating points>
)
select p.<some person attributes>, s.points,
(select 1 + count(*)
from s s2
where s2.points > s.points
) as rank
from Persons p left join
s
on p.id = s.personId
order by s.points desc;
This specifically implements rank() over (order by points desc). Similar logic can be used for dense_rank() or row_number() if that is what you really need.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

How to reference a custom field in SQL

I am using mssql and am having trouble using a subquery. The real query is quite complicated, but it has the same structure as this:
select
customerName,
customerId,
(
select count(*)
from Purchases
where Purchases.customerId=customerData.customerId
) as numberTransactions
from customerData
And what I want to do is order the table by the number of transactions, but when I use
order by numberTransactions
It tells me there is no such field. Is it possible to do this? Should I be using some sort of special keyword, such as this, or self?
use the field number, in this case:
order by 3
Sometimes you have to wrestle with SQL's syntax (expected scope of clauses)
SELECT *
FROM
(
select
customerName,
customerId,
(
select count(*)
from Purchases
where Purchases.customerId=customerData.customerId
) as numberTransactions
from customerData
) as sub
order by sub.numberTransactions
Also, a solution using JOIN is correct. Look at the query plan, SQL Server should give identical plans for both solutions.
Do an inner join. It's much easier and more readable.
select
customerName,
customerID,
count(*) as numberTransactions
from
customerdata c inner join purchases p on c.customerID = p.customerID
group by customerName,customerID
order by numberTransactions
EDIT: Hey Nathan,
You realize you can inner join this whole table as a sub right?
Select T.*, T2.*
From T inner join
(select
customerName,
customerID,
count(*) as numberTransactions
from
customerdata c inner join purchases p on c.customerID = p.customerID
group by customerName,customerID
) T2 on T.CustomerID = T2.CustomerID
order by T2.numberTransactions
Or if that's no good you can construct your queries using temporary tables (#T1 etc)
There are better ways to get your result but just from your example query this will work on SQL2000 or better.
If you wrap your alias in single ticks 'numberTransactions' and then call ORDER BY 'numberTransactions'
select
customerName,
customerId,
(
select count(*)
from Purchases
where Purchases.customerId=customerData.customerId
) as 'numberTransactions'
from customerData
ORDER BY 'numberTransactions'
The same thing could be achieved by using GROUP BY and a JOIN, and you'll be rid of the subquery. This might be faster too.
I think you can do this in SQL2005, but not SQL2000.
You need to duplicate your logic. SQL Server isn't very smart at columns that you've named but aren't part of the dataset in your FROM statement.
So use
select
customerName,
customerId,
(
select count(*)
from Purchases p
where p.customerId = c.customerId
) as numberTransactions
from customerData c
order by (select count(*) from purchases p where p.customerID = c.customerid)
Also, use aliases, they make your code easier to read and maintain. ;)