Duplicate rows when trying to grab last row of another table - sql

I am trying to grab the last (latest) blog article's link for each month using a stored procedure but I cannot seem to find a way past my problem.
Currently, my code below repeats the (the latest blog article's) 'LINK' column like so:
SELECT AVG(DATEPART(mm, b.blog_date)) AS MonthNum --CANNOT USE MONTHNUM IN ORDER BY UNLESS WRAPPED WITH AVG() [average], weird but works
, CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24)) AS MONTH
, CAST(DATEPART(YEAR, b.blog_date) AS varchar(4)) AS YEAR
, CAST(count(b.blog_content) AS varchar(24)) as ARTICLES
, (SELECT TOP (1) b.blog_url
FROM Management.Blog
WHERE (website_owner_id = 2)
GROUP BY blog_date
, blog_url
ORDER BY blog_date DESC
) AS LINK
, CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24)) + CAST(DATEPART(YEAR, b.blog_date) AS varchar(4)) AS ID
, blog_date as DATE
FROM Management.Blog b
WHERE b.website_owner_id = 2
GROUP BY CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24))
, CAST(DATEPART(YEAR, b.blog_date) AS varchar(4))
, b.blog_url
, blog_date
, CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24)) + CAST(DATEPART(YEAR, b.blog_date) AS varchar(4))
ORDER BY DATE DESC
I understand the code is horrible to read (& probably to execute on the SQL server too) but I'm in a position where I am only new to SQL server (coming from MySQL where I've only really had to use a basic select query) and I am open to any suggestions to changing the query and/or table design.
Essentially there should be no duplicates of the ID column (which is only really added in to assist in removing the duplicates and can be omitted if need be).

Without sample data I'm unable to test whether this would work
After FROM Management.Blog b
Add this
INNER JOIN(
SELECT MonthNum = DATEPART(MONTH, BL.blog_date))
,blog_date
,RN = ROW_NUMBER()OVER(ORDER BY BL.blog_date DESC)
,BL.blog_url
FROM Management.Blog BL
) X ON B.blog_date = X.blog_date
AND X.RN = 1
Replace
(SELECT TOP (1) b.blog_url
FROM Management.Blog
WHERE (website_owner_id = 2)
GROUP BY blog_date
, blog_url
ORDER BY blog_date DESC
) AS LINK
with
X.blog_url AS [LINK]
change this in GROUP BY
, b.blog_url
with
, x.blog_url

I non-functioning query usually does not do a very good job of conveying what someone wants. Based on your explanation:
I am trying to grab the last (latest) blog article's link for each month
I would expect something like this:
SELECT b.*
FROM (SELECT b.*,
ROW_NUMBER() OVER (PARTITION BY YEAR(b.blog_date), MONTH(b.blog_date), b.blog_url, b.website_owner_id
ORDER BY blog_date DESC
) as seqnum
FROM Management.Blog b
) b
WHERE b.website_owner_id = 2 AND
seqnum = 1;

Related

SQL : Repeat patterns for given date range

I have tried few things and took some help from internet and got the solution to repeat the pattern of each employee for given date range.
but I am facing one challenge which is, if there is any weekend in between then I am losing the pattern's continuity. I have a requirement that if there is any weekend or any leave then my pattern should continue after the weekend or any applied leaves.
Please find the SQL Fiddle for more clarity on my requirement.
Also find the attached screen shot of my current output and expected requirement.
For given screen shot I have taken only one employee with default weekend (sat,sun or 6,7) but in SQL fiddle we have different week off for each employee...
Check below: SQL Fiddle
Prepare two intermediate tables using the SQL you presented and the SQL described in my previous answer as follows:
TempWeekOff(Txt , i , WeekOffId , EmployeeID)
AS (
SELECT STUFF(WeekOff, 1, CHARINDEX(#delimiter, WeekOff+#delimiter+'~'), ''), 1 , CAST(LEFT(WeekOff, CHARINDEX(#delimiter, WeekOff+#delimiter+'~')-1) AS VARCHAR(MAX)),
EmployeeID
FROM RuleTableTemp
UNION ALL
SELECT STUFF(Txt, 1, CHARINDEX(#delimiter, Txt+#delimiter+'~'), '')
, i + 1
, CAST(LEFT(Txt, CHARINDEX(#delimiter, Txt+#delimiter+'~')-1) AS VARCHAR(MAX))
, EmployeeID
FROM TempWeekOff
WHERE Txt > ''
),
TempDates(i_count,Dates,dd,EmployeeID,SortCount,WeekOffId)
AS (
SELECT
i_count,
DATEADD(DAY, i_count, #startDate) AS Dates ,
DATEPART(DW,DATEADD(DAY, i_count, #startDate)) as dd,
EmpID,
sortCount,
WeekOffId
FROM (SELECT DATEDIFF(DAY, #startDate, #endDate) + 1) AS t_datediff(t_days)
CROSS APPLY (SELECT TOP (t_days) ROW_NUMBER() OVER(ORDER BY (SELECT 0) ) - 1 FROM E8) AS t_dateadd(i_count)
CROSS APPLY (SELECT DISTINCT EmployeeID FROM RuleTableTemp) AS t(EmpID)
CROSS APPLY (SELECT COUNT(Sort) FROM PatternXFrequency WHERE EmployeeID = EmpID ) AS EmpPattern(sortCount)
LEFT OUTER JOIN TempWeekOff ON EmpID = TempWeekOff.EmployeeID AND DATEPART(DW,DATEADD(DAY, i_count, #startDate)) = TempWeekOff.WeekOffId
)
Please check if you can get the expected output with the following SQL.
SELECT
d.EmployeeID,
d.Dates,
d.dd,
p.ShiftId
FROM (SELECT *,((ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Dates)-1) % SortCount) AS i FROM TempDates WHERE WeekOffId IS NULL) AS d
INNER JOIN PatternXFrequency p ON p.EmployeeID = d.EmployeeID AND d.i = p.Sort
UNION
SELECT
d.EmployeeID,
d.Dates,
d.dd,
NULL
FROM (SELECT * FROM TempDates WHERE WeekOffId IS NOT NULL) AS d
ORDER BY 1,2

Stored proc gives different result set on different server

I have put together a stored procedure on my dev machine, which runs SQL Server 10.50.6220 (Express). It works correctly and returns the expected (and consistent) results.
I have then done a full backup and restored to a test machine running SQL Server 10.50.6000.34. The stored proc on the new server now returns incorrect results, whats more, the results it returns are different each time it is run.
ALTER PROCEDURE [dbo].[Get_Station_Utilisation]
#From NVARCHAR(50),
#To NVARCHAR(50)
AS
IF #From='' SET #From = NULL
IF #To='' SET #To = NULL
SELECT T.StationID As [Station ID], dbo.Stations.StationName As [Station Name], T.StaffWorking As [Workers], T.Mins
FROM
(SELECT StatsID As StationID, [Count] As StaffWorking, SUM(Duration) AS Mins
FROM
(SELECT dbo.Active_Clockings.StationID AS StatsID, COUNT(*) AS [Count], DATEDIFF(Minute, Times2.Time, Times1.Time) AS Duration
FROM
(SELECT TOP 100 PERCENT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
ORDER BY [Time] DESC) AS Times1 JOIN
(SELECT TOP 100 PERCENT ROW_NUMBER() OVER (ORDER BY(SELECT 1)) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
ORDER BY [Time] DESC) AS Times2 ON Times1.rownum = Times2.rownum + 1 JOIN
dbo.Active_Clockings ON Times1.Time > dbo.Active_Clockings.StartTime AND Times2.Time < dbo.Active_Clockings.FinishTime
AND (#From IS NULL OR (dbo.Active_Clockings.FinishTime > CAST(#From as date)))
AND (#To IS NULL OR dbo.Active_Clockings.FinishTime < DATEADD(Day, 1, CAST(#To as date)))
GROUP BY Times1.rownum, Times2.Time, Times1.Time, dbo.Active_Clockings.StationID) AS Totals
GROUP BY [Count], StatsID
) AS T INNER JOIN
dbo.Stations ON T.StationID= dbo.Stations.ID
ORDER BY T.StationID, T.StaffWorking ASC
Each row of underlying data contains a staff member, the station they were working at and their start & finish times. The purpose of the stored proc is to return a list of stations, along with the number of mins that each quantity of workers were at that station, as below:
My question is, what could be causing the incorrect and inconsistent results on the test server? And what can I do to fix it?
I have read this, possibly related, question:
Stored proc gives different result set than tsql, only on some servers
and have tried creating local variables for the parameters but it does not seem to have any effect.
what could be causing the inconsistent results
Non-deterministic ordering
ROW_NUMBER() OVER (ORDER BY(SELECT 1))
By ORDER BY(SELECT 1) you are telling the optimiser here that you don't care in which order the rows will be numbered. I didn't analyse the whole query, but is it really the case?
Another bit that has a strong smell is SELECT TOP 100 PERCENT with some ORDER BY in the inner/subquery. It looks like you think that adding ORDER BY like this in the inner query guarantees something. It doesn't.
If you need your row numbers ordered by [Time] DESC, then put it in ROW_NUMBER:
ROW_NUMBER() OVER (ORDER BY [Time] DESC)
Thanks to #Vladimir, I have managed to tweak the stored procedure so that it returns the correct results. As suggested, I moved the sorting behavior to the ROW_NUMBER function, rather than the ORDER BY clause (although it actually needed to be ASC, not DESC).
I will mark his answer as correct but thought I would post my final code here for completeness:
ALTER PROCEDURE [dbo].[Get_Station_Utilisation]
#From NVARCHAR(50),
#To NVARCHAR(50)
AS
IF #From='' SET #From = NULL
IF #To='' SET #To = NULL
SELECT T.StationID As [Station ID], dbo.Stations.StationName As [Station Name], T.StaffWorking As [Workers], T.Mins
FROM
(SELECT StatsID As StationID, [Count] As StaffWorking, SUM(Duration) AS Mins
FROM
(SELECT dbo.Active_Clockings.StationID AS StatsID, COUNT(*) AS [Count], DATEDIFF(Minute, Times2.Time, Times1.Time) AS Duration
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [Time] ASC) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
) AS Times1 JOIN
(SELECT ROW_NUMBER() OVER (ORDER BY [Time] ASC) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
) AS Times2 ON Times1.rownum = Times2.rownum + 1 JOIN
dbo.Active_Clockings ON Times1.Time > dbo.Active_Clockings.StartTime AND Times2.Time < dbo.Active_Clockings.FinishTime
AND (#From IS NULL OR (dbo.Active_Clockings.FinishTime > CAST(#From as date)))
AND (#To IS NULL OR dbo.Active_Clockings.FinishTime < DATEADD(Day, 1, CAST(#To as date)))
GROUP BY Times1.rownum, Times2.Time, Times1.Time, dbo.Active_Clockings.StationID) AS Totals
GROUP BY [Count], StatsID
) AS T INNER JOIN
dbo.Stations ON T.StationID= dbo.Stations.ID
ORDER BY T.StationID, T.StaffWorking ASC

Counting New Unique Values in Growing Time Window

I have a large table of users (as a guid), some associated values, and a time stamp of when each row was inserted. A user might be associated with many rows in this table.
guid | <other columns> | insertdate
I want to count for each month: how many unique new users were inserted. It's easy to do manually:
select count(distinct guid)
from table
where insertdate >= '20060201' and insertdate < '20060301'
and guid not in (select guid from table where
insertdate >= '20060101' and insertdate < '20060201')
How could this be done for each successive month in sql?
I thought to use a rank function to associate clearly each guid with a month:
select guid,
,dense_rank() over ( order by datepart(YYYY, insertdate),
datepart(m, t.TransactionDateTime)) as MonthRank
from table
and then iterate upon each rank value:
declare #no_times int
declare #counter int = 1
set #no_times = select count(distinct concat(datepart(year, t.TransactionDateTime),
datepart(month, t.TransactionDateTime))) from table
while #no_times > 0 do
(
select count(*), #counter
where guid not in (select guid from table where rank = #counter)
and rank = #int + 1
#counter += 1
#no_times -= 1
union all
)
end
I know this strategy is probably the wrong way to go about things.
Ideally, I would like a result set to look like this:
MonthRank | NoNewUsers
I would be extremely interested and grateful if a sql wizard could point me in the right direction.
SELECT
DATEPART(year,t.insertdate) AS YearNum
,DATEPART(mm,t.insertdate) as MonthNum
,COUNT(DISTINCT guid) AS NoNewUsers
,DENSE_RANK() OVER (ORDER BY COUNT(DISTINCT t.guid) DESC) AS MonthRank
FROM
table t
LEFT JOIN table t2
ON t.guid = t2.guid
AND t.insertdate > t2.insertdate
WHERE
t2.guid IS NULL
GROUP BY
DATEPART(year,t.insertdate)
,DATEPART(mm,t.insertdate)
Use a left join to see if the table ever existed as a prior insert date and if they didn't then count them using aggregation like you normally would. If you want to add a rank to see which month(s) have the highest number of new users then you can use your DENSE_RANK() function but because you are already grouping by want you want you do not need a partition clause.
If you want the first time that a guid entered, then your query doesn't exactly work. You can get the first time with two aggregations:
select year(first_insertdate), month(first_insertdate), count(*)
from (select t.guid, min(insertdate) as first_insertdate
from t
group by t.guid
) t
group by year(first_insertdate), month(first_insertdate)
order by year(first_insertdate), month(first_insertdate);
If you are looking for counting guids each time they skip a month, then you can use lag():
select year(insertdate), month(insertdate), count(*)
from (select t.*,
lag(insertdate) over (partition by guid order by insertdate) as prev_insertdate
from t
) t
where prev_insertdate is null or
datediff(month, prev_insertdate, insertdate) >= 2
group by year(insertdate), month(insertdate)
order by year(insertdate), month(insertdate);
I solved it with the terrible while loop, then a friend helped me to solve it more efficiently in another way.
The loop version:
--ranked by month
select t.TransactionID
,t.BuyerUserID
,concat(datepart(year, t.InsertDate), datepart(month,
t.InsertDate)) MonthRankName
,dense_rank() over ( order by datepart(YYYY, t.InsertDate),
datepart(m, t.InsertDate)) as MonthRank
into #ranked
from table t;
--iteratate
declare #counter int = 1
declare #no_times int
select #no_times = count(distinct concat(datepart(year, t.InsertDate),
datepart(month, t.InsertDate))) from table t;
select count(distinct r.guid) as NewUnique, r.Monthrank into #results
from #ranked r
where r.MonthRank = 1 group by r.MonthRank;
while #no_times > 1
begin
insert into #results
select count(distinct rt.guid) as NewUnique, #counter + 1 as MonthRank
from #ranked r
where rt.guid not in
(
select rt2.guid from #ranked rt2
where rt2.MonthRank = #counter
)
and rt.MonthRank = #counter + 1
set #counter = #counter+1
set #no_times = #no_times-1
end
select * from #results r
This turned out to run pretty slowly (as you might expect)
What turned out to be faster by a factor of 10 was this method:
select t.guid,
cast (concat(datepart(year, min(t.InsertDate)),
case when datepart(month, min(t.InsertDate)) < 10 then
'0'+cast( datepart(month, min(t.InsertDate)) as varchar(10))
else cast (datepart(month, min(t.InsertDate)) as varchar(10)) end
) as int) as MonthRankName
into #NewUnique
from table t
group by t.guid;
select count(1) as NewUniques, t.MonthRankName from #NewUnique t
group by t.MonthRankName
order by t.MonthRankName
Simply identifying the very first month each guid appears, then counting the number of these occurring each month. With a bit of a hack to get YearMonth formatted nicely (this seems to be more efficient than format([date], 'yyyyMM') but need to experiment more on that.

SQL - Replace repeated rows with null values while preserving number of rows

I am trying to get only one instance of a year instead of 12 because I am using this column in a lookup table to provide parameters to a report. Because I am using both monthly and yearly data, I am trying to get them both in the same table.
I have a table like this
--Date--------Year
--------------------
1/2012-------2012
2/2012-------2012
3/2012-------2012
4/2012-------2012
5/2012-------2012
6/2012-------2012
7/2012-------2012
8/2012-------2012
9/2012-------2012
10/2012------2012
11/2012------2012
12/2012------2012
1/2013-------2013
2/2013-------2013
And this is my desired table
--Date--------Year
--------------------
1/2012-------2012
2/2012-------null
3/2012-------null
4/2012-------null
5/2012-------null
6/2012-------null
7/2012-------null
8/2012-------null
9/2012-------null
10/2012------null
11/2012------null
12/2012------null
1/2013-------2013
2/2013-------null
Can someone give me an idea of how to solve a problem like this?
The code I am using right now is
SELECT CAST(MONTH(rmp.EcoDate) AS Varchar(2)) + '/' + CAST(YEAR(rmp.EcoDate) AS varchar(4)) AS Date, Year(rmp.EcoDate) as EcoYear
FROM PhdRpt.ReportCaseList_542 AS rcl INNER JOIN
CaseCases AS cc ON rcl.CaseCaseId = cc.CaseCaseId INNER JOIN
PhdRpt.RptMonthlyProduction_542 AS rmp ON rcl.ReportRunCaseId = rmp.ReportRunCaseId`
GROUP BY rmp.EcoDate
You can do this by enumerating the rows within a year. Then update all but the first:
with toupdate as (
select t.*, row_number() over (partition by [year] order by [date]) as seqnum
from t
)
update toupdate
set [year] = NULL
where seqnum > 1;
If you want this as a select statement:
with ts as (
select t.*, row_number() over (partition by [year] order by [date]) as seqnum
from t
)
select [date],
(case when seqnum = 1 then [year] end) as [year]
from ts;

Order by year in post date

My sql query is:
SELECT DISTINCT
SUBSTRING(DATENAME(MONTH, PostDate), 1, 3) + '-' + CAST(YEAR(PostDate) AS VARCHAR(4)) AS PostArchive,
Posts = COUNT(*)
FROM
Post WHERE Verified=1
GROUP BY
SUBSTRING(DATENAME(MONTH, PostDate), 1, 3) + '-' + CAST(YEAR(PostDate) AS VARCHAR(4)),
YEAR(PostDate), MONTH(PostDate)
ORDER BY PostArchive
Its gives a result like this:
PostArchive Posts
------------------------
Mar-2009 1
Mar-2010 1
May-2005 1
May-2011 1
May-2012 1
May-2013 1
But I want a result order by date(year) like this.
PostArchive Posts
------------------------
May-2005 1
Mar-2009 1
Mar-2010 1
May-2011 1
May-2012 1
May-2013 1
I search and found this link but unable to solve my problem.
I try :
ORDER BY CONVERT(DateTime, PostArchive,101) DESC
But it gives me a error:
Invalid column name 'PostArchive'.
Is there any way to do this or I am in wrong way.Thanks.
The reason for the error is that PostArchive is the name you've given to the column on the SELECT line, which is effectively the output of the query. The ORDER BY clause does not look at that, it looks at its input to the query, which in this case is PostDate
I assume that you didn't really mean that you want to order it by
year, but instead by year/month. The ordering issue that you have is
because you are ordering it as a character and not as a date.
You don't need DISTINCT, since you already GROUP BY.
Main problem is that you already converted to VARCHAR. Hence, months
are unsortable.
ssss
-- Create a CTE (inline view)
WITH T AS (
SELECT YEAR(PostDate) PostYear
, MONTH(PostDate) PostMM
, SUBSTRING(DATENAME(MONTH, PostDate),1,3) PostMonth
, COUNT(*) Posts
FROM Post
WHERE Verified = 1
GROUP BY YEAR(PostDate)
, MONTH(PostDate)
, DATENAME(MONTH, PostDate)
)
-- Build you date string
SELECT PostMonth + '-' + CAST(PostYear AS VARCHAR(4)) AS PostArchive
, Posts
FROM T
-- Sort it by the components separately
ORDER BY PostYear
-- Don't use the character, otherwise, Aug will come before Mar
, PostMM
I used CTE to get the result try this
with tempTable (PostArchiveMonth , PostArchiveYear , PostArchiveMonthName , Posts )
(
select month(PostDate) , YEAR(PostDate) , SUBSTRING(DATENAME(MONTH, PostDate), 1, 3)
COUNT(*)
FROM Post
WHERE Verified=1
group by MONTH(PostDate) ,YEAR( PostDate)
,SUBSTRING(DATENAME(MONTH, PostDate), 1, 3)
)
select PostArchiveMonthName +'-' + PostArchiveYear as PostArchive , Posts
from tempTable
order by PostArchiveYear , PostArchiveMonth
Try
SELECT DISTINCT
SUBSTRING(DATENAME(MONTH, PostDate), 1, 3) + '-' + CAST(YEAR(PostDate) AS VARCHAR(4)) AS PostArchive,
Posts = COUNT(*)
FROM
Post WHERE Verified=1
GROUP BY
SUBSTRING(DATENAME(MONTH, PostDate), 1, 3) + '-' + CAST(YEAR(PostDate) AS VARCHAR(4)),
YEAR(PostDate), MONTH(PostDate)
Order by Month(PostDate), Year(PostDate)
Try to change this:
ORDER BY PostArchive
...to this...
ORDER BY YEAR(PostDate)