LAG() function in sql 2008 - sql

I have looked at a few other questions regarding this problem, we are trying to get a stored procedure working that contains the LAG() function, but the machine we are now trying to install an instance on is SQL 2008 and we can't use it
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
What I've tried so far (edited to reflect Zohar Peled's) suggestoin
SELECT se.SetID,se.SetName,se.ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level,
QuestionType
FROM tblSet se
left join tblSet se2 on se.ParentSetId = se2.ParentSetId -1
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where se.CollectionId=#colID and se.IsDeleted=0
order by se.SetID
it does not seem to be bringing out all of the same records when I run them side by side and the level value seems to be different also
I have put in some of the outputs into a HTML formatted table from the version containing LAG() (the first results) then the second is the new version, where the levels are not coming out the same
https://jsfiddle.net/gyn8Lv3u/

LAG() can be implemented using a self-join as Jeroen wrote in his comment, or by using a correlated subquery. In this case, it's a simple lag() so the correlated subquery is also simple:
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (
(
SELECT TOP 1 ParentSetId
FROM tblSet seInner
WHERE seInner.ParentSetId < se.ParentSetId
ORDER BY seInner.ParentSetId DESC
)
<> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
If you had specified an offset it would be harder do implement using a correlated subquery, and a self join would make a much easier solution.

Sample data and desired results would help. This construct:
(case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1
end) as level
is quite strange. You are lagging by the only column used in the order by. That makes sense. But then you are comparing the value to the same column, implying that there are duplicates.
If you have duplicates, then order by se.ParentSetId is unstable. That is, the "previous" row is indeterminate because of the duplicate values being ordered. You can run the query twice and get different results.
I am guessing you want one row with the value 1 for each parent set id. If so, then in either database, you would use:
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level
This also has the problem with an unstable ordering. You can fix this by changing the order by to what you really want.

Related

Find records in SQL Server 2008 R2

Please find below image for make understanding my issues. I have a table as shown below picture. I need to get only highlighted (yellow) records. What is the best method to find these records?
In SQL Server 2012+, you can use the lead() and lag() functions. However, this is not available in SQL Server 2008. Here is a method using outer apply:
select t.*
from t outer apply
(select top 1 tprev.*
from t tprev
on tprev.time < t.time
order by tprev.time desc
) tprev outer apply
(select top 1 tnext.*
from t tnext
on tnext.time > t.time
order by tnext.time asc
)
where (t.cardtype = 1 and tnext.cardtype = 2) or
(t.cardtype = 2 and tprev.cardtype = 1);
With your sample data, it would also be possible to use self joins on the id column. This seems unsafe, though, because there could be gaps in that columns values.
Havent tried this, but I think it should work. First, make a view of the table in your question, with the rownumber included as one column:
CREATE VIEW v AS
SELECT
ROW_NUMBER() OVER(ORDER BY id) AS rownum,
id,
time,
card,
card_type
FROM table
Then, you can get all the rows of type 1 followed by a row of type 2 like this:
SELECT
a.id,
-- And so on...
FROM v AS a
JOIN v AS b ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
And all the rows of type 2 preceded by a row of type 1 like this:
SELECT
b.id,
-- And so on...
FROM v AS b
JOIN v AS a ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
To get them both in the same set of results, you can just use UNION ALL. Technically, you don't need the view. You could use nested selects instead, but since you will need to query the table four times it might be nice to have it as a view.
Also, if the ID is continous (it goes 1, 2, 3 without any gaps), you don't need the rownum and can just use the ID instead.
here is a code you can run in sql server
select * from Table_name where id in (1,2,6,7,195,160,164,165)

Optimization of multiple aggregate sorting in SQL

I have a postgres query written for the Spree Commerce store that sorts all of it's products in the following order: In stock (then first available), Backorder (then first available), Sold out (then first available).
In order to chain it with rails scopes I had to put it in the order by clause as opposed to anywhere else. The query itself works, and is fairly performant, but complex. I was curious if anyone with a bit more knowledge could discuss a better way to do it? I'm interested in performance, but also different ways to approach the problem.
ORDER BY (
SELECT
CASE
WHEN tt.count_on_hand > 0
THEN 2
WHEN zz.backorderable = true
THEN 1
ELSE 0
END
FROM (
SELECT
row_number() OVER (dpartition),
z.id,
bool_or(backorderable) OVER (dpartition) as backorderable
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.backorderable as backorderable
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
JOIN "spree_stock_locations" ON spree_stock_locations.id=spree_stock_items.stock_location_id
WHERE spree_stock_locations.active = true
) z window dpartition as (PARTITION by id)
) zz
JOIN (
SELECT
row_number() OVER (dpartition),
t.id,
sum(count_on_hand) OVER (dpartition) as count_on_hand
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.count_on_hand as count_on_hand
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
) t window dpartition as (PARTITION by id)
) tt ON tt.row_number = 1 AND tt.id = spree_products.id
WHERE zz.row_number = 1 AND zz.id=spree_products.id
) DESC, available_on DESC
The FROM shown above determines whether or not a product is backorderable, and the JOIN shown above determines the stock in inventory. Note that these are very similar queries, except that I need to determine if something is backorderable based on a locations ability to support backorders and its state, WHERE spree_stock_locations.active=true.
Thanks for any advice!

How to perform running sum (balance) in SQL

I have 2 SQL Tables
unit_transaction
unit_detail_transactions
(tables schema here: http://sqlfiddle.com/#!3/e3204/2 )
What I need is to perform an SQL Query in order to generate a table with balances. Right now I have this SQL Query but it's not working fine because when I have 2 transactions with the same date then the balance is not calculated correctly.
SELECT
ft.transactionid,
ft.date,
ft.reference,
ft.transactiontype,
CASE ftd.isdebit WHEN 1 THEN MAX(ftd.debitaccountid) ELSE MAX(ftd.creditaccountid) END as financialaccountname,
CAST(COUNT(0) as tinyint) as totaldetailrecords,
ftd.isdebit,
SUM(ftd.amount) as amount,
balance.amount as balance
FROM unit_transaction_details ftd
JOIN unit_transactions ft ON ft.transactionid = ftd.transactionid
JOIN
(
SELECT DISTINCT
a.transactionid,
SUM(CASE b.isdebit WHEN 1 THEN b.amount ELSE -ABS(b.amount) END) as amount
--SUM(b.debit-b.credit) as amount
FROM unit_transaction_details a
JOIN unit_transactions ft ON ft.transactionid = a.transactionid
CROSS JOIN unit_transaction_details b
JOIN unit_transactions ft2 ON ft2.transactionid = b.transactionid
WHERE (ft2.date <= ft.date)
AND ft.unitid = 1
AND ft2.unitid = 1
AND a.masterentity = 'CONDO-A'
GROUP BY a.transactionid,a.amount
) balance ON balance.transactionid = ft.transactionid
WHERE
ft.unitid = 1
AND ftd.isactive = 1
GROUP BY
ft.transactionid,
ft.date,
ft.reference,
ft.transactiontype,
ftd.isdebit,
balance.amount
ORDER BY ft.date DESC
The result of the query is this:
Any clue on how to perform a correct SQL that will show me the right balances ordered by transaction date in descendant mode?
Thanks a lot.
EDIT: THINK OF 2 POSSIBLE SOLUTIONS
The problem is generated when you have the same date in 2 transactions, so here is what Im going to do:
Save Date and Time into "date" column. That way there won't be 2 exact dates.
OR
Create a "priority" column and set the priority for each record. So if I found that the date already exists and it has priority = 1 then the current priority will be 2.
What do you think?
There are two ways to do a running sum. I am going to show the syntax on a simpler table, to give you an idea.
Some databases (Oracle, PostgreSQL, SQL Server 2012, Teradata, DB2 for instance) support cumulative sums directly. For this you use the following function:
select sum(<val>) over (partition by <column> order by <ordering column>)
from t
This is a windows function that will calculate the running sum of for each group of records identified by . The order of the sum is .
Alas, many databases don't support this functionality, so you would need to do a self join to do this in a single SELECT query in the database:
select t.column, sum(tprev.<val>) as cumsum
from t left join
t tprev
where t.<column> = tprev.<column> and
t.<ordering column> >= tprev.<ordering column>
group by t.column
There is also the possibility of creating another table and using a cursor to assign the cumulative sum, or of doing the sum at the application level.

Duplicate results returned from query when distinct is used

On a current project at I am needing to do some pagination of results returned from SQL. I have hit a corner case in which the query can accept identifiers as part of the where clause, normally this isn't an issue but in one case we have a single identifier being passed up that has a one to many relationship with one of the tables that the query joins on and it is returning multiple rows in the results. That issue was fixed by introducing a distinct to the query. The following is the query which returns the correct result of one row (all table/field names have been changed of course):
select distinct [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
However when I introduce pagination into the query I am finding that it is now returning multiple rows when it should be only be returning one. The method I am using for pagination is as follows:
select [item_id]
from (
select distinct [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
) as [items]
where [items].[row_num] between 0 and 100
I worry that adding a distinct to the outer query will cause an incorrect number of results to be returned and I am unsure of how else to fix this issue. The database I am querying is MS SQL Server 2008.
About 5 minutes after posting the question a possible solution hit me, if I group by the item_id (and any sort criteria) which should only be one instance of it should solve the issue. After testing this was the query that I was left with:
select [item_id]
from (
select [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
group by [item_table].[item_id], [item_table].[pub_date]
) as [items]
where [items].[row_num] between 0 and 100
I don't see where the DISTINCT is adding any value in your first query. The results are [item_table].[item_id] and [row_num]. Because the value of [row_num] is already distinct, the combination of [item_table].[item_id] and [row_num] will be distinct. When adding the DISTINCT keyword to the query, no rows are excluded.
In the second query, your results will return [item_id] from the sub query where [row_num] meets the criteria. If there where duplicate [item_id] values in the sub-query, there will be duplicates in the final results, but now you don't display [row_num] to distinguish the duplicates.

Sql Server Query Selecting Top and grouping by

SpousesTable
SpouseID
SpousePreviousAddressesTable
PreviousAddressID, SpouseID, FromDate, AddressTypeID
What I have now is updating the most recent for the whole table and assigning the most recent regardless of SpouseID the AddressTypeID = 1
I want to assign the most recent SpousePreviousAddress.AddressTypeID = 1
for each unique SpouseID in the SpousePreviousAddresses table.
UPDATE spa
SET spa.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa INNER JOIN Spouses ON spa.SpouseID = Spouses.SpouseID,
(SELECT TOP 1 SpousePreviousAddresses.* FROM SpousePreviousAddresses
INNER JOIN Spouses AS s ON SpousePreviousAddresses.SpouseID = s.SpouseID
WHERE SpousePreviousAddresses.CountryID = 181 ORDER BY SpousePreviousAddresses.FromDate DESC) as us
WHERE spa.PreviousAddressID = us.PreviousAddressID
I think I need a group by but my sql isn't all that hot. Thanks.
Update that is Working
I was wrong about having found a solution to this earlier. Below is the solution I am going with
WITH result AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) AS rowNumber, *
FROM SpousePreviousAddresses
WHERE CountryID = 181
)
UPDATE result
SET AddressTypeID = 1
FROM result WHERE rowNumber = 1
Presuming you are using SQLServer 2005 (based on the error message you got from the previous attempt) probably the most straightforward way to do this would be to use the ROW_NUMBER() Function couple with a Common Table Expression, I think this might do what you are looking for:
WITH result AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) as rowNumber,
*
FROM
SpousePreviousAddresses
)
UPDATE SpousePreviousAddresses
SET
AddressTypeID = 2
FROM
SpousePreviousAddresses spa
INNER JOIN result r ON spa.SpouseId = r.SpouseId
WHERE r.rowNumber = 1
AND spa.PreviousAddressID = r.PreviousAddressID
AND spa.CountryID = 181
In SQLServer2005 the ROW_NUMBER() function is one of the most powerful around. It is very usefull in lots of situations. The time spent learning about it will be re-paid many times over.
The CTE is used to simplyfy the code abit, as it removes the need for a temporary table of some kind to store the itermediate result.
The resulting query should be fast and efficient. I know the select in the CTE uses *, which is a bit of overkill as we dont need all the columns, but it may help to show what is happening if anyone want to see what is happening inside the query.
Here's one way to do it:
UPDATE spa1
SET spa1.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa1
LEFT OUTER JOIN SpousePreviousAddresses AS spa2
ON (spa1.SpouseID = spa2.SpouseID AND spa1.FromDate < spa2.FromDate)
WHERE spa1.CountryID = 181 AND spa2.SpouseID IS NULL;
In other words, update the row spa1 for which no other row spa2 exists with the same spouse and a greater (more recent) date.
There's exactly one row for each value of SpouseID that has the greatest date compared to all other rows (if any) with the same SpouseID.
There's no need to use a GROUP BY, because there's kind of an implicit grouping done by the join.
update: I think you misunderstand the purpose of the OUTER JOIN. If there is no row spa2 that matches all the join conditions, then all columns of spa2.* are returned as NULL. That's how outer joins work. So you can search for the cases where spa1 has no matching row spa2 by testing that spa2.SpouseID IS NULL.
UPDATE spa SET spa.AddressTypeID = 1
WHERE spa.SpouseID IN (
SELECT DISTINCT s1.SpouseID FROM Spa S1, SpousePreviousAddresses S2
WHERE s1.SpouseID = s2.SpouseID
AND s2.CountryID = 181
AND s1.PreviousAddressId = s2.PreviousAddressId
ORDER BY S2.FromDate DESC)
Just a guess.