Incrementing Row_Number based on change in value of a column - sql

I have a sql view which contains data from 3 linked entities (Title > Edition > SKU). The data in this view is used to search on ANY field within the 3 entities. For example, if you specify a condition title.regionid = '14' the view returns 4,000 unique rows (1 per SKU), which belong to 765 unique Editions, which belong to 456 unique Titles.
What I need is to enable paging based on Titles using Row_Number(). So
SELECT * FROM myview WHERE title.regionid = '14' AND Row BETWEEN 0 AND 35
The problem is that my Row column needs to count the rows by Title, not by SKU, so from a result set of 4,000 rows, if the first title contains 12 editions and 65 SKUs, the row number for all 65 rows should be 1, because they belong to the same Title.
I cannot use GroupBy because my view contains 40+ columns all of which can be searched on via the WHERE clause.
Here's the query:
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY a.TitleSort ASC) AS Row, a.*
FROM (SELECT * FROM v_AdvancedSearch
WHERE
istitledeleted = 0
--AND ISBN = '1-4157-5842-5'
--AND etc
) AS a
) d
WHERE
Row BETWEEN 0 AND 35
In the first page there are 35 rows which only belong to 4 titles, but the Row column counts by row so it stops there, whereas if it counted by Title I would get 387 rows for page 1... How can I accomplish paging in this situation?

WITH Titles AS
(
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY a.TitleSort ASC) AS Row, a.*
FROM (SELECT DISTINCT TitleSORT, TitleId FROM v_AdvancedSearch
WHERE
istitledeleted = 0
--AND ISBN = '1-4157-5842-5'
--AND PictureFormat = 'Widescreen'
--AND UPC = '0-9736-14381-6-0'
--AND Edition = 'Standard'
) AS a
) d
WHERE
Row BETWEEN 0 AND 35
)
SELECT * FROM v_AdvancedSearch V
INNER JOIN Titles ON Titles.TitleId = V.TitleId
WHERE istitledeleted = 0
--CONDITIONS NEED TO BE REPEATED HERE
--AND ISBN = '1-4157-5842-5'
ORDER BY V.TitleSort ASC

This form works best for me
WITH
[cte] AS (
SELECT
DENSE_RANK ( ) OVER ( ORDER BY [v].[TitleSort], [v].[TitleId] ) AS [ordinal],
[v].[TitleSort],
[v].[TitleId]
--,
--field list,
--etc
FROM [v_AdvancedSearch] AS [v]
WHERE
[v].[istitledeleted] = 0
--AND
--additional conditions AND
--etc
)
SELECT
[v].[ordinal],
[v].[TitleSort],
[v].[TitleId]
--,
--field list,
--etc
FROM [cte] AS [v]
WHERE
[v].[ordinal] >= 0 AND
[v].[ordinal] <= 35
ORDER BY [v].[ordinal];
No need for DISTINCT or GROUP BY
No need to repeat the criteria conditions
Just gotta have that explicit field list
https://learn.microsoft.com/en-us/sql/t-sql/functions/dense-rank-transact-sql
https://www.google.com/search?q=%22sql+server%22+%22select+star%22

Related

Selecting Top Row in Calculated Column

I need to subtract the top row in a table that has multiple records from another table that has one row. One table has assets with one date and the other has multiple assets grouped by older dates. I am also limiting the results to times when the newer asset is greater than 40% or less than 40% the older asset.
I have already tried using the row_number function to pull the top row from the second table but am having trouble with the subquery.
Select
p.pid, e.coname, p.seq, p.valmo, p.valyr, p.assets,
(case
when ((p.assets-p1.assets)/p.assets) * 100 <= -40
or ((p.assets-p1.assets)/p.assets) * 100 >=40
and p.assets <> p1.assets
then ((p.assets - p1.assets) / p.assets) * 100
end) as "PercentDiff"
from
pen_plans p
join
pen_plans_archive p1 on p.pid = p1.pid and p.seq = p1.seq
join
entities e on p.pid = e.pid
where
p.assets > 500000 and e.mmd = 'A'
order by
VALYR desc
So I need to subtract the top row in "pen_plans_archive" from the assets in "pen_plans". I've tried to combine something like this in a subquery into the above:
select assets from (select assets row_number() over (partition by assets
order by valyr DESC) as R
from pen_plans_archive) RS
where R=1 order by valyr DESC
The "assets" column definition is Number(12,0).
I expect the query to produce the columns, PID, CONAME, SEQ, VALMO, VALYR, ASSETS, and the Calculated Column PERCENTDIFF with no null values.
The first query produces null values and also is subtracting every asset figure in pen_plans_archive from pen_plans which is not what I need.
Are you just trying to do the Top function?
Select TOP 1 <column>

Find duplicates in MS SQL table

I know that this question has been asked several times but I still cannot figure out why my query is returning values which are not duplicates. I want my query to return only the records which have identical value in the column Credit. The query executes without any errors but values which are not duplicated are also being returned. This is my query:
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
That's because you're matching on the credit field back to your table, which contains duplicates. You need to isolate the rows that are duplicated with ROW_NUMBER:
;WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CREDIT ORDER BY (SELECT NULL)) AS RN
FROM _bvGLTransactionsFull)
Select
CTE.AccountDesc,
_bvGLAccountsFinancial.Description,
CTE.TxDate,
CTE.Description,
CTE.Credit,
CTE.Reference,
CTE.UserName
From
_bvGLAccountsFinancial Inner Join
CTE On _bvGLAccountsFinancial.AccountLink = CTE.AccountLink
WHERE CTE.RN > 1
Group By
CTE.AccountDesc, _bvGLAccountsFinancial.Description,
CTE.TxDate, CTE.Description,
CTE.Credit, CTE.Reference,
CTE.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(CTE.Reference), CTE.TrCode
Having
CTE.TxDate > 01 / 11 / 2014 And
CTE.Reference Like '5_____' And
CTE.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Just as a side note, I would consider using aliases to shorten your queries and make them more readable. Prefixing the table name before each column in a join is very difficult to read.
I trust your code in terms of extracting all data per your criteria. With this, let me have a different approach and see your script "as-is". So then, lets keep first all the records in a temp.
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
-- temp table
INTO #tmpTable
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Then remove the "single occurrence" data by creating a row index and remove all those 1 time indexes.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
You can change your preference through PARTITION BY <column name>.
Should you have any concerns, please raise it first as these are so far how I understood your case.
EDIT : To include those credits that has duplicates.
SELECT
tmp1.*
FROM #tmpTable tmp1
RIGHT JOIN (
SELECT
Credit
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
) AS tmp2
ON tmp1.Credit = tmp2.Credit

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.
There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.
Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.
For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant
You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.
If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.

Selecting a single (random) row for an SQL join

I've got an sql query that selects data from several tables, but I only want to match a single(randomly selected) row from another table.
Easier to show some code, I guess ;)
Table K is (k_id, selected)
Table C is (c_id, image)
Table S is (c_id, date)
Table M is (c_id, k_id, score)
All ID-columns are primary keys, with appropriate FK constraints.
What I want, in english, is for eack row in K that has selected = 1 to get a random row from C where there exists a row in M with (K_id, C_id), where the score is higher than a given value, and where c.image is not null and there is a row in s with c_id
Something like:
select k.k_id, c.c_id, m.score
from k,c,m,s
where k.selected = 1
and m.score > some_value
and m.k_id = k.k_id
and m.c_id = c.c_id
and c.image is not null
and s.c_id = c.c_id;
The only problem is this returns all the rows in C that match the criteria - I only want one...
I can see how to do it using PL/SQL to select all relevent rows into a collection and then select a random one, but I'm stuck as to how to select a random one.
you can use the 'order by dbms_random.random' instruction with your query.
i.e.:
SELECT column FROM
(
SELECT column FROM table
ORDER BY dbms_random.value
)
WHERE rownum = 1
References:
http://awads.net/wp/2005/08/09/order-by-no-order/
http://www.petefreitag.com/item/466.cfm
with analytics:
SELECT k_id, c_id, score
FROM (SELECT k.k_id, c.c_id, m.score,
row_number() over(PARTITION BY k.k_id ORDER BY NULL) rk
FROM k, c, m, s
WHERE k.selected = 1
AND m.score > some_value
AND m.k_id = k.k_id
AND m.c_id = c.c_id
AND c.image IS NOT NULL
AND s.c_id = c.c_id)
WHERE rk = 1
This will select one row that satisfies your criteria per k_id. This will likely select the same set of rows if you run the query several times. If you want more randomness (each run produces a different set of rows), you would replace ORDER BY NULL by ORDER BY dbms_random.value
I'm not too familiar with oracle SQL, but try using LIMIT random(), if there is such a function available.