Match between tables with New, Not New - sql

I am still learning TSQL at the moment and im new to here so forgive me if Ive not done this right.
I have a table that each day loads new days data. Each day that loads has a report date for the previous day.
I want to get yesterdays data (eg - 17/09/2019) from the table, and I want to look at the data in the same table from the day before that (eg - 16/09/2019) and I want to run a check for the reference number and if the Reference number appears on the day before then I want it to say Not New, and if it does match to the day before then I want it to say New.
The columns I have is :
ReferenceNumber, ReportData, NewAppt
NewAppt column will be where it put the outcome of New/Not New

Something like this should work:
WITH Yesterday AS (
SELECT DISTINCT
ReferenceNumber,
CONVERT(DATE, ReportDate) AS ReportDate
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -1, GETDATE())),
DayBeforeYesterday AS (
SELECT DISTINCT
ReferenceNumber
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -2, GETDATE()))
SELECT
y.ReferenceNumber,
y.ReportDate,
CASE
WHEN x.ReferenceNumber IS NOT NULL THEN 0
ELSE 1
END AS NewAppointment
FROM
Yesterday y
LEFT JOIN DayBeforeYesterday x ON x.ReferenceNumber = y.ReferenceNumber;
Make a list of all the DISTINCT reference numbers from each day, and then join them into one big list, with the logic to see if there was a reference number yesterday that was also in the list from the day before yesterday.

I suppose your column ReportData is some sort of 'date' type and contains only the date (no time).
Furthermore, for each date, there should be at most 1 record for a specific ReferenceNumber.
In that case, try this:
SELECT t1.ReferenceNumber,
t1.ReportData,
CASE
WHEN t2.ReferenceNumber IS NULL THEN 'New'
ELSE 'Not New'
END AS NewAppt
FROM my_table t1
LEFT OUTER JOIN my_table t2
ON t1.ReferenceNumber = t2.ReferenceNumber
AND t2.ReportData = DATEADD(day, -1, t1.ReportData)
WHERE t1.ReportData = '2019-09-17';

Here's an approach using LAG which removes the need to join to the same table several times and instead just checks the preceding row for that Reference Number.
Note that in my interpretation of your request if a Reference Number disappears for a day and then returns then it's flagged as new. You can adapt the query to simply check if the number has appeared at any point in the past if that's not what you need.
CREATE TABLE #TestData (ReferenceNumber int,Reportdata date)
INSERT INTO #TestData
VALUES (1,'2019-01-16'),(1,'2019-01-17'),(1,'2019-01-18'),(2,'2019-01-18'),(3,'2019-01-17'),(3,'2019-01-18'),(4,'2019-01-17')
SELECT
ReferenceNumber
,ReportData
,IIF(
LAG(ReportData) OVER(PARTITION BY ReferenceNumber ORDER BY ReportData)
= dateadd(day,-1,ReportData)
,'Not New'
,'New'
) AS NewAppt
FROM #TestData

Related

SUM with GROUP BY don't display zero sums

I'm trying to get data from a single table. I grouped by CURR. I have 12 condition listed. But some are zero.
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR
FROM XXX
WHERE DATEPART(year, CONVERT(dateTime, DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, DATE)) = 1
GROUP BY CURR
This is returning 3 value. But I want 12 value including zero sums. I tried this with CASE, but I could not.
Thanks...
What the others have been trying to tell you is that you need a "list" of the CURR values you want to see in your results. Generally, these would come from another table in a properly normalized database. Do you have one? It seems not but it is worth asking. A properly normalized database would generally have one.
So how do you create this list dynamically? Let us assume that your existing table has at least one row for every CURR value you desire in your resultset - even if that row has a date that falls outside of your period of interest. We can use that to form this list and then outer join that list to your existing query that does the summing.
with curr_list as (select distinct CURR from dbo.XXX)
select curr_list.CURR,
sum(isnull(tbl.AMOUNT)) as TOPL
from curr_list left join dbo.XXX as tbl
on curr_list.CURR = tbl.CURR
and tbl.[DATE] >= '20180101'
and tbl.[DATE] < '20180201'
group by curr_list.CURR
order by curr_list.CURR;
That should work assuming I made no typos. The CTE (named curr_list) creates the list of ID values that you want to see in your resultset. When you outer join that to your transaction data you will get at least one row for each CURR value. You then sum the amounts to aggregate those rows into a single row for each CURR value. Notice the change to the date criteria. Your original approach prevents the optimizer from using any useful indexes on that column.
If your existing table does not have a row for every value of CURR you want in your results, then you can simply change the cte and hardcode the values you desire.
I may get what you are trying to do.
If you have 12 CURR (I'm guessing currency?) and want to get the total amount of the transaction in January 2018 grouped by currency.
If this is the case, here is what you should try to do :
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR_TABLE.CURR
FROM CURR_TABLE
LEFT JOIN XXX ON XXX.CURR = CURR_TABLE.CURR
WHERE DATEPART(year, CONVERT(dateTime, XXX.DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, XXX.DATE)) = 1
GROUP BY CURR_TABLE.CURR
That way you'll get all currency listed (even if no reccord of that currency is available for that month.
EDIT:
I don't like that kind of syntax when you can avoid it, but you can :
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR_TABLE.CURR
FROM (SELECT distinct CURR FROM XXX) CURR_TABLE
LEFT JOIN XXX ON XXX.CURR = CURR_TABLE.CURR
WHERE DATEPART(year, CONVERT(dateTime, XXX.DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, XXX.DATE)) = 1
GROUP BY CURR_TABLE.CURR
Please execute below query and compare with your expected output.
SELECT CURR, SUM(ISNULL(AMOUNT,0)) AS TOPL,
CURR
FROM XXX
WHERE DATEPART(year, CONVERT(dateTime, DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, DATE)) = 1
GROUP BY CURR

Query to check number of records created in a month.

My table creates a new record with timestamp daily when an integration is successful. I am trying to create a query that would check (preferably automated) the number of days in a month vs number of records in the table within a time frame.
For example, January has 31 days, so i would like to know how many days in january my process was not successful. If the number of records is less than 31, than i know the job failed 31 - x times.
I tried the following but was not getting very far:
SELECT COUNT (DISTINCT CompleteDate)
FROM table
WHERE CompleteDate BETWEEN '01/01/2015' AND '01/31/2015'
Every 7 days the system executes the job twice, so i get two records on the same day, but i am trying to determine the number of days that nothing happened (failures), so i assume some truncation of the date field is needed?!
One way to do this is to use a calendar/date table as the main source of dates in the range and left join with that and count the number of null values.
In absence of a proper date table you can generate a range of dates using a number sequence like the one found in the master..spt_values table:
select count(*) failed
from (
select dateadd(day, number, '2015-01-01') date
from master..spt_values where type='P' and number < 365
) a
left join your_table b on a.date = b.CompleteDate
where b.CompleteDate is null
and a.date BETWEEN '01/01/2015' AND '01/31/2015'
Sample SQL Fiddle (with count grouped by month)
Assuming you have an Integers table*. This query will pull all dates where no record is found in the target table:
declare #StartDate datetime = '01/01/2013',
#EndDate datetime = '12/31/2013'
;with d as (
select *, date = dateadd(d, i - 1 , #StartDate)
from dbo.Integers
where i <= datediff(d, #StartDate, #EndDate) + 1
)
select d.date
from d
where not exists (
select 1 from <target> t
where DATEADD(dd, DATEDIFF(dd, 0, t.<timestamp>), 0) = DATEADD(dd, DATEDIFF(dd, 0, d.date), 0)
)
Between is not safe here
SELECT 31 - count(distinct(convert(date, CompleteDate)))
FROM table
WHERE CompleteDate >= '01/01/2015' AND CompleteDate < '02/01/2015'
You can use the following query:
SELECT DATEDIFF(day, t.d, dateadd(month, 1, t.d)) - COUNT(DISTINCT CompleteDate)
FROM mytable
CROSS APPLY (SELECT CAST(YEAR(CompleteDate) AS VARCHAR(4)) +
RIGHT('0' + CAST(MONTH(CompleteDate) AS VARCHAR(2)), 2) +
'01') t(d)
GROUP BY t.d
SQL Fiddle Demo
Explanation:
The value CROSS APPLY-ied, i.e. t.d, is the ANSI string of the first day of the month of CompleteDate, e.g. '20150101' for 12/01/2015, or 18/01/2015.
DATEDIFF uses the above mentioned value, i.e. t.d, in order to calculate the number of days of the month that CompleteDate belongs to.
GROUP BY essentially groups by (Year, Month), hence COUNT(DISTINCT CompleteDate) returns the number of distinct records per month.
The values returned by the query are the differences of [2] - 1, i.e. the number of failures per month, for each (Year, Month) of your initial data.
If you want to query a specific Year, Month then just simply add a WHERE clause to the above:
WHERE YEAR(CompleteDate) = 2015 AND MONTH(CompleteDate) = 1

Pending Monthly SQL Counts

The below query returns accurate info, I just haven't had any luck trying to make this:
1) More dynamic so I'm not repeating the same line of code every month
2) Formatted differently, so just 2 columns of month + year are needed to view pending counts by field1 + field2
Example code (basically, sum when (OPEN date is before/on the last day of the month) and (CLOSE date comes after the month OR it's still opened)
SELECT
SUM(CAST(case when OPENDATE <= '2014-11-30 23:59:59'
and ((CLOSED >= '2014-12-01')
or (CLOSED is null)) then '1' else '0' end as int)) Nov14
,SUM(CAST(case when OPENDATE <= '2014-12-31 23:59:59'
and ((CLOSED >= '2015-01-01')
or (CLOSED is null)) then '1' else '0' end as int)) Dec14
,SUM(CAST(case when OPENDATE <= '2015-01-30 23:59:59'
and ((CLOSED >= '2015-02-01')
or (CLOSED is null)) then '1' else '0' end as int)) Jan15
,FIELD1,FIELD2
FROM T
GROUP BY FIELD1,FIELD2
Results:
FIELD1 FIELD2 NOV14 DEC14 JAN15
A A 2 5 7
A B 6 8 4
C A 5 6 5
…
Instead of:
COUNT FIELD1 FIELD2 MO YR
14 A A 12 2014
18 A B 12 2014
16 C A 1 2015
...
Is there a way to get this in one shot? Sorry if this is a repeat topic, I've looked at some boards and they've helped me get closing counts.. but using a range between two date fields, I haven't had any luck.
Thanks in advance
One way to do it is to use a table of numbers or calendar table.
In the code below the table Numbers has a column Number, which contains integer numbers starting from 1. There are many ways to generate such table.
You can do it on the fly, or have the actual table. I personally have such table in the database with 100,000 rows.
The first CROSS APPLY effectively creates a column CurrentMonth, so that I don't have to repeat the call to DATEADD many times later.
Second CROSS APPLY is your query that you want to run for each month. It can be as complicated as needed, it can return more than one row if needed.
-- Start and end dates should be the first day of the month
DECLARE #StartDate date = '20141201';
DECLARE #EndDate date = '20150201';
SELECT
CurrentMonth
,FIELD1
,FIELD2
,Counts
FROM
Numbers
CROSS APPLY
(
SELECT DATEADD(month, Numbers.Number-1, #StartDate) AS CurrentMonth
) AS CA_Month
CROSS APPLY
(
SELECT
FIELD1
,FIELD2
,COUNT(*) AS Counts
FROM T
WHERE
OPENDATE < CurrentMonth
AND (CLOSED >= CurrentMonth OR CLOSED IS NULL)
GROUP BY
FIELD1
,FIELD2
) AS CA
WHERE
Numbers.Number < DATEDIFF(month, #StartDate, #EndDate) + 1
;
If you provide a table with sample data and expected output, I could verify that the query produces correct results.
The solution is written in SQL Server 2008.
Like this:
SELECT
FIELD1,FIELD2,datepart(month, OPENDATE), datepart(year, OPENDATE), sum(1)
FROM T
GROUP BY FIELD1,FIELD2, datepart(month, OPENDATE), datepart(year, OPENDATE)
But this of course is just based on OPENDATE, if you need to have the same thing calculated into several months, that's going to be more difficult, and you'll probably need a calendar "table" that you'll have to cross apply with this data.

SQL to identify missing week

I have a database table with the following structure -
Week_End Sales
2009-11-01 43223.43
2009-11-08 4324.23
2009-11-15 64343.23
...
Week_End is a datetime column, and the date increments by 7 days with each new entry.
What I want is a SQL statement that will identify if there is a week missing in the sequence. So, if the table contained the following data -
Week_End Sales
2009-11-01 43223.43
2009-11-08 4324.23
2009-11-22 64343.73
...
The query would return 2009-11-15.
Is this possible? I am using SQL Server 2008, btw.
You've already accepted an answer so I guess you don't need this, but I was almost finished with it anyway and it has one advantage that the selected solution doesn't have: it doesn't require updating every year. Here it is:
SELECT T1.*
FROM Table1 T1
LEFT JOIN Table1 T2
ON T2.Week_End = DATEADD(week, 1, T1.Week_End)
WHERE T2.Week_End IS NULL
AND T1.Week_End <> (SELECT MAX(Week_End) FROM Table1)
It is based on Andemar's solution, but handles the changing year too, and doesn't require the existence of the Sales column.
Join the table on itself to search for consecutive rows:
select a.*
from YourTable a
left join YourTable b
on datepart(wk,b.Week_End) = datepart(wk,a.Week_End) + 1
-- No next week
where b.sales is null
-- Not the last week
and datepart(wk,a.Week_End) <> (
select datepart(wk,max(Week_End)) from YourTable
)
This should return any weeks without a next week.
Assuming your "week_end" dates are always going to be the Sundays of the week, you could try a CTE - a common table expression that lists out all the Sundays for 2009, and then do an outer join against your table.
All those rows missing from your table will have a NULL value for their "week_end" in the select:
;WITH Sundays2009 AS
(
SELECT CAST('20090104' AS DATETIME) AS Sunday
UNION ALL
SELECT
DATEADD(DAY, 7, cte.Sunday)
FROM
Sundays2009 cte
WHERE
DATEADD(DAY, 7, cte.Sunday) < '20100101'
)
SELECT
sun.Sunday 'Missing week end date'
FROM
Sundays2009 sun
LEFT OUTER JOIN
dbo.YourTable tbl ON sun.Sunday = tbl.week_end
WHERE
tbl.week_end IS NULL
I know this has already been answered, but can I suggest something really simple?
/* First make a list of weeks using a table of numbers (mine is dbo.nums(num), starting with 1) */
WITH AllWeeks AS (
SELECT DATEADD(week,num-1,w.FirstWeek) AS eachWeek
FROM
dbo.nums
JOIN
(SELECT MIN(week_end) AS FirstWeek, MAX(week_end) as LastWeek FROM yourTable) w
ON num <= DATEDIFF(week,FirstWeek,LastWeek)
)
/* Now just look for ones that don't exist in your table */
SELECT w.eachWeek AS MissingWeek
FROM AllWeeks w
WHERE NOT EXISTS (SELECT * FROM yourTable t WHERE t.week_end = w.eachWeek)
;
If you know the range you want to look over, you don't need to use the MIN/MAX subquery in the CTE.

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.