I have a query that pulls the following table, but what I'm really interested in grabbing are the highlighted rows that my results are generating. I was trying to write a case statement within the query, but I realized that I'm omitting some of the grp_mkt records I'm trying to keep. Logic is essentially is i want records of segments not in grp_mkt AND segments if you are not in grp_mkt. I can probably do a join of the same query to find this but the tables are massive (impression level data) that I'd rather not try to pull the tables again.
It seems that you want to pull segments once, with the prioritization to any market but grp_market. Here is one way:
with t as (
select t.*,
row_number() over (partition by segment order by (case when market = grp_mkt then 1 else 2 end) desc) as seqnum
from <your query/table here> t
)
select t.*
from t
where seqnum = 1;
If you could have multiple segments (non-grpmkt) and you want all of them, then using rank() instead of row_number() would work.
Related
I have a table where I save authors and songs, with other columns. The same song can appear multiple times, and it obviously always comes from the same author. I would like to select the author that has the least songs, including the repeated ones, aka the one that is listened to the least.
The final table should show only one author name.
Clearly, one step is to find the count for every author. This can be done with an elementary aggregate query. Then, if you order by count and you can just select the first row, this would solve your problem. One approach is to use ROWNUM in an outer query. This is a very elementary approach, quite efficient, and it works in all versions of Oracle (it doesn't use any advanced features).
select author
from (
select author
from your_table
group by author
order by count(*)
)
where rownum = 1
;
Note that in the subquery we don't need to select the count (since we don't need it in the output). We can still use it in order by in the subquery, which is all we need it for.
The only tricky part here is to remember that you need to order the rows in the subquery, and then apply the ROWNUM filter in the outer query. This is because ORDER BY is the very last thing that is processed in any query - it comes after ROWNUM is assigned to rows in the output. So, moving the WHERE clause into the subquery (and doing everything in a single query, instead of a subquery and an outer query) does not work.
You can use analytical functions as follows:
Select * from
(Select t.*,
Row_number() over (partition by song order by cnt_author) as rn
From
(Select t.*,
Count(*) over (partition by author) as cnt_author
From your_table t) t ) t
Where rn = 1
I have a table called project and a view called downtime_report_overview. The downtime_report_overview consists of the table downtimeReport (id, startTime, stopTime, downTimeCauseId, employeeId, ...) and the joined downtimeCause.name.
Thanks to Gorden's reply (postgres - select one specfic row of another table and store it as column), I am able to include an active downtime (stopTime = null) via an array aggregate and filter as column to the project query. Since I might need to more properties to the downtime_report_overview (e.g. meta data like username) in the near future I was wondering if is a way where I can extract the correct downtimeReport only once.
In the example below I using the array aggregation 3 times, once id, startTime and causeName. It seems verbose on the one hand and on the other I'm not even certain that it will select the correct downTime row for all 3 columns.
SELECT
COUNT(downtime_report_overview."downtimeReportId") AS "downtimeReportsTotalCount",
FLOOR(date_part('epoch'::text, sum(downtime_report_overview."stopTime" - downtime_report_overview."startTime")))::integer AS "downtimeReportsTotalDurationInSeconds",
(array_agg(downtime_report_overview."downtimeReportId" ORDER BY downtime_report_overview."startTime" DESC) FILTER (WHERE downtime_report_overview."stopTime" IS null))[1] AS "activeDownTimeReportId",
(array_agg(downtime_report_overview."startTime" ORDER BY downtime_report_overview."startTime" DESC) FILTER (WHERE downtime_report_overview."stopTime" IS null))[1] AS "activeDownTimeReportStartTime",
(array_agg(downtime_report_overview."downtimeCauseName" ORDER BY downtime_report_overview."startTime" DESC) FILTER (WHERE downtime_report_overview."stopTime" IS null))[1] AS "activeDownTimeReportCauseName"
...
There are several ways to approach this. Obviously, you can write a separate expression for each column. Or, you can play around with manipulating an entire row as a record.
In this case, perhaps the simplest approach is to separate the aggregation and getting the row of interest. Based on the original question, the code would look like:
SELECT p.*, tt.*
FROM (SELECT p."projectID"
count(t."timeTrackId") as "timeTracksTotalCount",
floor(date_part('epoch'::text, sum(t."stopTime" - t."startTime")))::integer AS "timeTracksTotalDurationInSeconds"
FROM project p LEFT JOIN
time_track t
ON t."fkProjectId" = p."projectId"
GROUP BY p."projectID"
) p LEFT JOIN
(SELECT DISTINCT ON (t."fkProjectId") tt.*
FROM time_track tt
WHERE t."stopTime" is null
ORDER BY t."fkProjectId", t."startTime" desc
) tt
ON tt."fkProjectId" = p."projectId";
This is a follow-up question to Retrieving last record in each group from database - SQL Server 2005/2008
In the answers, this example was provided to retrieve last record for a group of parameters (example below retrieves last updates for each value in computername):
select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
from t t2
where t2.computername = t.computername
);
In my case, however, "lastupdate" is not unique (some updates come in batches and have same lastupdate value, and if two updates of "computername" come in the same batch, you will get non-unique output for "computername + lastupdate").
Suppose I also have field "rowId" that is just auto-incremental. The mitigation would be to include in the query another criterion for a max('rowId') field.
NB: while the example employs time-specific name "lastupdate", the actual selection criteria may not be related to the time at all.
I, therefore, like to ask, what would be the most performant query that selects the last record in each group based both on "group-defining parameter" (in the case above, "computername") and on maximal rowId?
If you don't have uniqueness, then row_number() is simpler:
select t.*
from (select t.*,
row_number() over (partition by computername order by lastupdate, rowid desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery is usually faster. However, the performance difference is not that great.
I have a table that keeps costs of products. I'd like to get the average cost AND last buying invoice for each product.
My solution was creating a sub-select to get last buying invoice but unfortunately I'm getting
ORA-00904: "B"."CODPROD": invalid identifier
My query is
SELECT (b.cod_aux) product,
-- here goes code to get average cost,
(SELECT round(valorultent, 2)
FROM (SELECT valorultent
FROM pchistest
WHERE codprod = b.codprod
ORDER BY dtultent DESC)
WHERE ROWNUM = 1)
FROM pchistest a, pcembalagem b
WHERE a.codprod = b.codprod
GROUP BY a.codprod, b.cod_aux
ORDER BY b.cod_aux
In short what I'm doing on sub-select is ordering descendantly and getting the first row given the product b.codprod
Your problem is that you can't use your aliased columns deeper than one sub-query. According to the comments, this was changed in 12C, but I haven't had a chance to try it as the data warehouse that I use is still on 11g.
I would use something like this:
SELECT b.cod_aux AS product
,ROUND (r.valorultent, 2) AS valorultent
FROM pchistest a
JOIN pcembalagem b ON (a.codprod = b.codprod)
JOIN (SELECT valorultent
,codprod
,ROW_NUMBER() OVER (PARTITION BY codprod
ORDER BY dtultent DESC)
AS row_no
FROM pchistest) r ON (r.row_no = 1 AND r.codprod = b.codprod)
GROUP BY a.codprod, b.cod_aux
ORDER BY b.cod_aux
I avoid sub-queries in SELECT statements. Most of the time, the optimizer wants to run a SELECT for each item in the cursor, OR it does some crazy nested loops. If you do it as a sub-query in the JOIN, Oracle will normally process the rows that you are joining; normally, it is more efficient. Finally, complete your per item functions (in this case, the ROUND) in the final product. This will prevent Oracle from doing it on ALL rows, not just the ones you use. It should do it correctly, but it can get confused on complex queries.
The ROW_NUMBER() OVER (PARTITION BY ..) is where the magic happens. This adds a row number to each group of CODPRODs. This allows you to pluck the top row from each CODPROD, so this allows you to get the newest/oldest/greatest/least/etc from your sub-query. It is also great for filtering duplicates.
I hope this isn't a repeated discussion. I did search through the forums and didn't find anything that related to my problem. I love this site by the way. It has helped me for a couple years now and I usually will get all my questions solved by searching this site.
Anyway, I am running into an issue in SQL with the ROW_NUMBER() function. I created a view that involves joining a different view and a table and in the view I pulled fields from both the view and the table, but I also created two fields that was a ROW_Number() field called seq_number and another field called seq_alpha.
Seq Number field is:
ROW_NUMBER() over(order by book_date, room, start) as seq_number,
The seq_alpha field is a case field that is based on what the row number is to give an alpha letter instead.
For example
case ROW_NUMBER() over(order by book_date, Room, Start)
when 1 then 'A'
when 2 then 'B'
when 3 then 'C'
....
End as seq_alpha
When I created the view for testing purposes I used a WHERE clause and everything worked exactly the way it should. I then commented out the Where clause and had the table created.
Then after the view was created I tried to pull the created view and used the same Where clause that worked when creating the view but now did it like:
select *
from created_view
where (same as I used for testing)
But now what happens is the seq_number field looks at the entire view instead of letting the where clause filter out the results. All the other data pulls correctly, but the seq_number and seq_alpha fields don't. So instead of having the seq number be 1-22 in my 22 results it is 400,000 range and the seq_alpha field doesn't even display anything because I only went up to 51 in my case.
Has anyone had similar issues with trying to pull a row_number field after it is created and the field not filtering the results with the where clause?
Thanks for your help in advance!
EDIT
After Mikael's response it seems unlikely that I am able to create a row_number field in a view and then query the view afterward and have it work the way I want. So, my next question is is there a way to create an alpha sequence based on the row number that I can put in a view and be able to query later and be able to have it work correct based on the where clause? Or do I just need to create the alpha sequence field every time I would pull this view?
J
row_number() enumerates the rows it sees when it is executed after the where clause is applied.
This will enumerate the whole table
select ID,
row_number() over(order by ID) as rn
from YourTable
and this will enumerate all rows where ID < 10
select ID,
row_number() over(order by ID) as rn
from YourTable
where ID < 10
If you create a view that queries the whole table
create view ViewYourTable as
select ID,
row_number() over(order by ID) as rn
from YourTable
and then query the view with a where clause ID < 10
select T.*
from ViewYourTable as T
where T.ID < 10
you are doing the same as if you used a derived table.
select T.*
from (
select ID,
row_number() over(order by ID) as rn
from YourTable
) as T
where T.ID < 10
The where clause is applied after the rows are enumerated.