sql Row_Number function is not working after created in table - sql

I hope this isn't a repeated discussion. I did search through the forums and didn't find anything that related to my problem. I love this site by the way. It has helped me for a couple years now and I usually will get all my questions solved by searching this site.
Anyway, I am running into an issue in SQL with the ROW_NUMBER() function. I created a view that involves joining a different view and a table and in the view I pulled fields from both the view and the table, but I also created two fields that was a ROW_Number() field called seq_number and another field called seq_alpha.
Seq Number field is:
ROW_NUMBER() over(order by book_date, room, start) as seq_number,
The seq_alpha field is a case field that is based on what the row number is to give an alpha letter instead.
For example
case ROW_NUMBER() over(order by book_date, Room, Start)
when 1 then 'A'
when 2 then 'B'
when 3 then 'C'
....
End as seq_alpha
When I created the view for testing purposes I used a WHERE clause and everything worked exactly the way it should. I then commented out the Where clause and had the table created.
Then after the view was created I tried to pull the created view and used the same Where clause that worked when creating the view but now did it like:
select *
from created_view
where (same as I used for testing)
But now what happens is the seq_number field looks at the entire view instead of letting the where clause filter out the results. All the other data pulls correctly, but the seq_number and seq_alpha fields don't. So instead of having the seq number be 1-22 in my 22 results it is 400,000 range and the seq_alpha field doesn't even display anything because I only went up to 51 in my case.
Has anyone had similar issues with trying to pull a row_number field after it is created and the field not filtering the results with the where clause?
Thanks for your help in advance!
EDIT
After Mikael's response it seems unlikely that I am able to create a row_number field in a view and then query the view afterward and have it work the way I want. So, my next question is is there a way to create an alpha sequence based on the row number that I can put in a view and be able to query later and be able to have it work correct based on the where clause? Or do I just need to create the alpha sequence field every time I would pull this view?
J

row_number() enumerates the rows it sees when it is executed after the where clause is applied.
This will enumerate the whole table
select ID,
row_number() over(order by ID) as rn
from YourTable
and this will enumerate all rows where ID < 10
select ID,
row_number() over(order by ID) as rn
from YourTable
where ID < 10
If you create a view that queries the whole table
create view ViewYourTable as
select ID,
row_number() over(order by ID) as rn
from YourTable
and then query the view with a where clause ID < 10
select T.*
from ViewYourTable as T
where T.ID < 10
you are doing the same as if you used a derived table.
select T.*
from (
select ID,
row_number() over(order by ID) as rn
from YourTable
) as T
where T.ID < 10
The where clause is applied after the rows are enumerated.

Related

Postgres - aggregate information once and add add multiple properties as columns

I have a table called project and a view called downtime_report_overview. The downtime_report_overview consists of the table downtimeReport (id, startTime, stopTime, downTimeCauseId, employeeId, ...) and the joined downtimeCause.name.
Thanks to Gorden's reply (postgres - select one specfic row of another table and store it as column), I am able to include an active downtime (stopTime = null) via an array aggregate and filter as column to the project query. Since I might need to more properties to the downtime_report_overview (e.g. meta data like username) in the near future I was wondering if is a way where I can extract the correct downtimeReport only once.
In the example below I using the array aggregation 3 times, once id, startTime and causeName. It seems verbose on the one hand and on the other I'm not even certain that it will select the correct downTime row for all 3 columns.
SELECT
COUNT(downtime_report_overview."downtimeReportId") AS "downtimeReportsTotalCount",
FLOOR(date_part('epoch'::text, sum(downtime_report_overview."stopTime" - downtime_report_overview."startTime")))::integer AS "downtimeReportsTotalDurationInSeconds",
(array_agg(downtime_report_overview."downtimeReportId" ORDER BY downtime_report_overview."startTime" DESC) FILTER (WHERE downtime_report_overview."stopTime" IS null))[1] AS "activeDownTimeReportId",
(array_agg(downtime_report_overview."startTime" ORDER BY downtime_report_overview."startTime" DESC) FILTER (WHERE downtime_report_overview."stopTime" IS null))[1] AS "activeDownTimeReportStartTime",
(array_agg(downtime_report_overview."downtimeCauseName" ORDER BY downtime_report_overview."startTime" DESC) FILTER (WHERE downtime_report_overview."stopTime" IS null))[1] AS "activeDownTimeReportCauseName"
...
There are several ways to approach this. Obviously, you can write a separate expression for each column. Or, you can play around with manipulating an entire row as a record.
In this case, perhaps the simplest approach is to separate the aggregation and getting the row of interest. Based on the original question, the code would look like:
SELECT p.*, tt.*
FROM (SELECT p."projectID"
count(t."timeTrackId") as "timeTracksTotalCount",
floor(date_part('epoch'::text, sum(t."stopTime" - t."startTime")))::integer AS "timeTracksTotalDurationInSeconds"
FROM project p LEFT JOIN
time_track t
ON t."fkProjectId" = p."projectId"
GROUP BY p."projectID"
) p LEFT JOIN
(SELECT DISTINCT ON (t."fkProjectId") tt.*
FROM time_track tt
WHERE t."stopTime" is null
ORDER BY t."fkProjectId", t."startTime" desc
) tt
ON tt."fkProjectId" = p."projectId";

Most Efficient Way To Filter Revisioned Order Numbers

I have an issue with filtering out orders in my T-SQL database, we use the format "Q123456" to assign an order number, and this is appended with "-X" (X being the revision number). I need to eliminate all of the older revisions (and the original without the "-X") if a newer one is present.
I know that I can use wildcard characters for this, and check the strings against each other, but all of my answers seem extremely inefficient running a ridiculous number of comparisons. The run time gets out of control, very quickly and I imagine there is a much better structure to utilize.
Thank you in advance, if you need anymore information let me know and I'll do my best to provide it.
EDIT: Example data, you have orders:
Q123456
Q123456-1
Q123456-2
Q134567
I need to remove Q123456 and Q123456-1 from the results of the queue, because of the presence of Q123456-2. Q124567 will also be a result as it is the most up to date entry for that order.
If we can assume:
All orders are 7 characters in length except when they have a revision....
no revision exceeds '-9' and if it does all orders are displayed as -xx with leading zero (-01...-10): If not we have to do some more work. Like take the right most part of the order to the - and then cast it to integer and order by it and assign 0 when no - exists.
Order_Num is the "ORDER_NUMBER in your table.
We can use a window function and a common table expression to generate a row number for each order_Number series (Q123456 of Q123456,Q123456-1) and then only select the one with the highest order_number (ordered by order num desc)
With CTE AS (SELECT A.*, Row_number() over (PARTITION BY substring(Order_Num,1,7)
ORDER BY order_num Desc) RN
FROM table)
Select * from cte where RN = 1
Without a cte...
SELECT *
FROM (SELECT A.*, Row_number() over (PARTITION BY substring(Order_Num,1,7)
ORDER BY order_num Desc) RN
FROM table)
WHERE RN = 1

SQL Eliminate Duplicates with NO ID

I have a table with the following Columns...
Node, Date_Time, Market, Price
I would like to delete all but 1 record for each Node, Date time.
SELECT Node, Date_Time, MAX(Price)
FROM Hourly_Data
Group BY Node, Date_Time
That gets the results I would like to see but cant figure out how to remove the other records.
Note - There is no ID for this table
Here are steps that are rather workaround than a simple one-command which will work in any relational database:
Create new table that looks just like the one you already have
Insert the data computed by your group-by query to newly created table
Drop the old table
Rename new table to the name the old one used to have
Just remember that locking takes place and you need to have some maintenance time to perform this action.
There are simpler ways to achieve this, but they are DBMS specific.
here is an easy sql-server method that creates a Row Number within a cte and deletes from it. I believe this method also works for most RDBMS that support window functions and Common Table Expressions.
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY Node, Date_Time ORDER BY Price DESC)
FROM
Hourly_Data
)
DELETE
FROM
cte
WHERE
RowNum > 1

Need (Teradata) SQL Help On Selecting Specific Rows

I have a query that pulls the following table, but what I'm really interested in grabbing are the highlighted rows that my results are generating. I was trying to write a case statement within the query, but I realized that I'm omitting some of the grp_mkt records I'm trying to keep. Logic is essentially is i want records of segments not in grp_mkt AND segments if you are not in grp_mkt. I can probably do a join of the same query to find this but the tables are massive (impression level data) that I'd rather not try to pull the tables again.
It seems that you want to pull segments once, with the prioritization to any market but grp_market. Here is one way:
with t as (
select t.*,
row_number() over (partition by segment order by (case when market = grp_mkt then 1 else 2 end) desc) as seqnum
from <your query/table here> t
)
select t.*
from t
where seqnum = 1;
If you could have multiple segments (non-grpmkt) and you want all of them, then using rank() instead of row_number() would work.

select group by in sql multiple columns

I have three columns in one table (code, code alt and product). Code column has duplicate data. I want to leave all results without repeating the code column. I try with this
Select code, code alt, product from table
where code in
(
select code from table
group by code
having count (code)=1
)
but do not appear all results.
Thanks
If you want to leave only one row with the particular code from the bunch of rows with the same code you need to decide which one from multiple rows you want to leave.
You need some criteria by which you will be able to rank rows with the same code and select one from them by higher (for example) rank value. The script below will leave only one - random - row with specific code.
This is just an example that shows you the idea and it intended for SQL Server - because you do not pointed your DBMS
with [src] as (
select code, [code alt], product, rank() over(partition by code order by newid()) [rank]
from [table])
select * from [src] where [rank] = 1
Ranking Functions