Complicated logic of group by and partition - sql

I have the table as per the script below.
The data that I want finally is shown in the screenshot.
The logic that is to be implemented is :
If SUM(FPR_QTY) > QPA, Use QPA without summing it up.
Else, use FPR_QTY.
Eg explanation: For the first 4 rows, TOT_FPR > QPA, so I just need 1.
While for rest 4 rows, TOT_FPR < QPA, i need to use TOT_FPR.
So, ultimately, I want 21 against each record.
Please let me know if my explanation is not clear.
create table #TEMP
(QPA int
,FPR_QTY int
, key1 varchar(2)
, key2 varchar(10)
)
insert into #TEMP values
(1,1,'K1','kk1')
,(1,0,'k1','kk1')
,(1,1,'k1','kk1')
,(1,0,'k1','kk1')
,(50,5,'k2','kk1')
,(50,5,'k2','kk1')
,(50,5,'k2','kk1')
,(50,5,'k2','kk1')
select *
,SUM(FPR_QTY) OVER (PARTITION BY key1) AS TOT_FPR
from #TEMP

Here you go...I was able to write the query within two selects. I wish I could accept my own answer as it is most simplest and will run without failure.
select * ,SUM(IIF(TOT_FPR>QPA,IIF(QPA_IND = 1,QPA,0),FPR_QTY)) OVER (PARTITION BY key2) FINAL
from
(
select *
,SUM(FPR_QTY) OVER (PARTITION BY key1) AS TOT_FPR
,ROW_NUMBER() OVER (PARTITION BY key1 order by QPA) AS QPA_IND
from #TEMP
)T

Related

Show the incremental count for repeats in SQL SERVER

Say you have a table where one column has repeat values. How can I add another column that shows how many times that value has shown up SO FAR (top-down).
Ex. You have a column say "ccode" and in ccode you have the value "R52" repeat twice. Rather than Join the final count (2), I want the first appearance of R52 to have a count=1, and the second to have a count=2, and so on...
CREATE TABLE Temp
(
ccode varchar(50),
name varchar(50),
Val1 varchar(50),
g_Name varchar(50),
ce_hybrid varchar(50)
)
INSERT INTO Temp VALUES
( 'R52' , 'adam#email.ca' , 1, 'WALT', '3P'),
( 'R52' , 'adam#email.ca' , 2 , 'KEN', '3P'),
( 'R00' , 'alison#email.ca' , 1 , 'QUIN', '3P')
SELECT ccode, name, [1_G_Name], [2_G_Name], [1_Hybrids], [2_Hybrids] FROM
(
SELECT ccode, name, col, val FROM(
SELECT *, Val1+'_G_Name' as Col, g_Name as Val FROM Temp
UNION
SELECT *, Val1+'_Hybrids' as Col, ce_hybrid as Val FROM Temp
) t
) tt
PIVOT ( max(val) for Col in ([1_G_Name], [2_G_Name], [1_Hybrids], [2_Hybrids]) ) AS pvt
For a better idea: http://sqlfiddle.com/#!18/6160d/2
I want to have a table like above, but add Val1 column afterwards (dynamically) based on the repeats SO FAR in the table (top-down).
This output (image below) is CORRECT. But say my table didn't have Val1 column:
INSERT INTO Temp VALUES
( 'R52', 'adam#email.ca', 'WALT', '3P'),
( 'R52', 'adam#email.ca', 'KEN', '3P'),
( 'R00', 'alison#email.ca', 'QUIN', '3P')
How would I add Val1 column with the (1 , 2 , 1) to based on repeat count as I mentioned
Required Output:
I got an answer thanks to an amazing senior developer at work. I would feel bad if I didn't share so:
SELECT *, rank() over (partition by ccode order by g_name) Val1 FROM Temp
Use Rank() and Partition over the table. I partitioned by ccode so any matching/duplicating will start from 1 and add 1 each time the same ccode appears in the table.
Example 1: http://sqlfiddle.com/#!18/95a0d5/6
Example 2: http://sqlfiddle.com/#!18/569d8/1
Example 3: http://sqlfiddle.com/#!18/41bf32/1
In example 3, notice how since we used order by g_name and there are 2 identical names KEN and KEN for ccode=R52, the Val1 is 2 and 2 for them and 4 next time (3 gets skipped)
I ignored the rest of the code regarding pivot since my question was more regarding this rank/partition. I'm not super familiar with it other than what was explained over a call, but hope it helps someone.
P.S. what would be a better name for this question?

How to use a special while loop in tsql, do while numeric

I'm loading some quite nasty data through Azure data factory
This is how the data looks after being loaded, existing of 2 parts:
1. Metadata of a test
2. Actual measurements of the test -> the measurement is numeric
Image I have about 10 times such 'packages' of 1.Metadata + 2.Measurements
What I would like it to be / what I'm looking for is the following:
The number column with 1,2,.... is what I'm looking for!
Imagine my screenshot could go no further but this goes along until id=10
I guess a while loop is necessary here...
Query before:
SELECT Field1 FROM Input
Query after:
SELECT GeneratedId, Field1 FROM Input
Thanks a lot in advance!
EDIT: added a hint:
Here is a solution, this requires SQL-SERVER 2012 or later.
Start by getting an Id column on your data. If you can do this previous to the script that would be even better, but if not, try something like this...
CREATE TABLE #InputTable (
Id INT IDENTITY(1, 1),
TestData NVARCHAR(MAX) )
INSERT INTO #InputTable (TestData)
SELECT Field1 FROM Input
Now create a query to get the GeneratedId of each package as well as the Id where they start and end. You can do this by getting all the records LIKE 'title%' since that is the first record of each package, then using ROW_NUMBER, Id, and LEAD for the GeneratedId, StartId, and EndId respectively.
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%'
Lastly, join this to the input in order to get all the records, with the correct GeneratedId.
SELECT
package.GeneratedId, i.TestData
FROM (
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%' ) package
INNER JOIN #InputTable i
ON i.Id >= package.StartId
AND (package.EndId IS NULL OR i.Id < package.EndId)

Select column values from DB for which the subsequent row does not have a specified value

I have a table say MyTable has two columns Id, Data and has following records in it:
Id Data
----------
1. ABCDE00
2. DEFGH11
3. CCCCC21
4. AAAAA00
5. BBBBB10
6. vvvvv00
7. xxxxx88
Now what I want that all the records which have end with string 00 and does not have subsequent row having column ending with 11.
So my output using this condition should be like this:
1. AAAAA00
2. vvvvv00
Any help would be appreciated.
This answer makes some assumptions:
You have a column specifying the ordering. Let me call it id.
By "subsequent row" you mean the row with the next highest id.
You are using SQL Server 2012+.
In that case, lead() does what you want:
select t.*
from (select t.*, lead(data order by id) as next_data
from t
) t
where data like '%00' and (next_data not like '%11' or next_data is null);
Earlier versions of SQL Server have alternative methods for calculating next_data.
if anyone is not using sql server 2012,then they an try this
declare #t table(id int identity(1,1),col1 varchar(100))
insert into #t values
('ABCDE00')
,('DEFGH11')
,('CCCCC21')
,('AAAAA00')
,('BBBBB10')
,('vvvvv00')
,('xxxxx88')
;With CTE as
(
select *,case when CHARINDEX('00',reverse(col1))>0 then 1 end
End00 from #t
)
,CTE1 as
(
select a.id,a.col1 from cte A
where exists
(select id from cte b where a.id=b.id+1 and b.end00 is not null)
and CHARINDEX('11',reverse(a.col1))<=0
)
select a.id,a.col1 from cte A
where exists
(select id from cte1 b where a.id=b.id-1 )

Find gaps in auto incremented values

Imagine having a table as the one below:
create table test (
id int auto_increment,
some int,
columns int
)
And then this table get used alot. Rows are inserted and rows are deleted and over time there might be gaps in the number that once was auto incremented. As an example, if I at some point make the following query:
select top 10 id from test
I might get something like
3
4
6
7
9
10
13
14
18
19
How do I design a query that returns the missing values 1,2,5,8 etc?
The easiest way is to get ranges of missing values:
select (id + 1) as firstmissing, (nextid - 1) as lastmissing
from (select t.id, lead(id) over (order by id) as nextid
from test t
) t
where nextid is not null and nextid <> id + 1;
Note this uses the lead() function, which is available in SQL Server 2012+. You can do something similar with apply or a subquery in earlier versions. Here is an example:
select (id + 1) as firstmissing, (nextid - 1) as lastmissing
from (select t.id, tt.id as nextid
from test t cross apply
(select top 1 id
from test t2
where t2.id > t.id
order by id
) tt
) t
where nextid is not null and nextid <> id + 1;
Simple way is by using cte..
;WITH cte
AS (SELECT 1 id
UNION ALL
SELECT id + 1 id from cte
WHERE id < (SELECT Max(id)
FROM tablename))
SELECT *
FROM cte
WHERE id NOT IN(SELECT id
FROM tablename)
Note: this will start from 1. If you want start from the min value of your table just replace
"SELECT 1 id" to "SELECT Min(id) id FROM tablename"
Why does it matter? I'm not trying to be snarky, but this question is usually asked in the context of "I want to fill in the gaps" or "I want to compress my id values to be contiguous". In either case, the answer is "don't do it". In your example, there was at some point a row with id = 5. If you're going to do either of the above, you'll be assigning a different, unrelated set of business data that id. If there's anything that references the id external to your database, now you've just invented a problem that you didn't have before. The id should be treated as immutable and arbitrary for all intents and purposes. If you really require it to be gapless, don't use identity and never do a hard delete (i.e. if you need to deactivate a row, you need a column which says whether it's active or not).

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70