Summary data even when department is missing for a day - sql

I have data submitted by several departments that I need to summarise to output on a report.
Most days, every department submits data. Some days, a department might miss submitting data.
I need to reflect a zero value entry for that department for the day, rather than skipping it.
I don't know why, but this is striking me as a difficult challenge.
If my data looks like this:
Date, Department, Employee
1 May 2016, First, Fred
1 May 2016, First, Wilma
1 May 2016, Second, Betty
1 May 2016, Second, Barney
2 May 2016, Second, Betty
3 May 2016, First, Wilma
3 May 2016, Second, Betty
3 May 2016, Second, Barney
If I do a count(*) on this data, the output I am hoping for is:
1 May 2016, First, 2
1 May 2016, Second, 2
2 May 2016, First, 0
2 May 2016, Second, 1
3 May 2016, First, 1
3 May 2016, Second, 2
It's the 3rd line, "2 May 2016, First, 0", that I can't get my output to include.
My underlying data is more complex than above, but above is a reasonable simplex representation of the problem.
I'm at the point where I'm messing around with cursors trying to 'build' this recordset, so I think that's a clue that I need to ask for help.

Assuming that your main table is:
create table mydata
(ReportDate date,
department varchar2(20),
Employee varchar2(20));
We can use the below query:
with dates (reportDate) as
(select to_date('01-05-2016','dd-mm-yyyy') + rownum -1
from all_objects
where rownum <=
to_date('03-05-2016','dd-mm-yyyy')-to_date('01-05-2016','dd-mm-yyyy')+1 ),
departments( department) as
( select 'First' from dual
union all
select 'Second' from dual) ,
AllReports ( reportDate, Department) as
(select dt.reportDate,
dp.department
from dates dt
cross join
departments dp )
select ar.reportDate, ar.department, count(md.employee)
from AllReports ar
left join myData md
on ar.ReportDate = md.reportDate and
ar.department = md.department
group by ar.reportDate, ar.department
order by 1, 2
First we generate dates that we are interested in. In our sample between 01-05-2016 and 03-05-2016. It's in dates WITH.
Next we generate list of departments - Departments WITH.
We cross join them to generate all possible reports - AllReports WITH.
And we use LEFT JOIN to your main table to figure out which data exists and which are missing.

Related

SQL Parent Child Relationship

I would like to know how to create a parent/child relationship for a set of specific months, let's say we have an employee John and I want to know all the people working under John, so I would do a CTE like this:
WITH CTE
AS
(
SELECT #EmployeeIdTmp as EmployeeId,
0 AS [Level]
UNION ALL
SELECT em.[EmployeeId],
[Level] + 1
FROM Employee em
INNER JOIN CTE t
ON em.[ManagerId] = t.EmployeeId
WHERE (em.[ManagerId] <> em.[EmployeeId]
AND em.[ManagerId] IS NOT NULL)
)
SELECT EmployeeId, [Level]
FROM CTE
In this CTE I have a specific where condition but it doesn't matter just business rules :)
This is fine, is working perfectly on SQL Server 2008 R2, now I need to build my hierarchy relation based not only on the current month, I need to look back for example two months ago.
If I see only one month it's fine but If I implement a logic to cover more than one month, I get stuck into a circular reference which is right because John could have Maria working for him on January and also the same hierarchy happens on February, my question is how I can build a hierarchy based on what happened in a period of time, like for example between January and February.
I'm sure there is a way to do it but mine is not :)
Sorry I'll provide more data about it. Let's say I need to run a report between January and February 2015, company has an organization hierarchy on January but could be different on February because one employee change his manager or left the company. So all these changes needs to be reflected on my treeview for that period of month.
Here an example of my treeview:
For January:
John
Maria
Julia
Darin
For February:
John
Maria
Julia
Nicolas
Darin
If I pick a date from January to February I should see a combination of both including the new employee Nicolas on February. I have a a table that keeps history of each month keeping the employee/manager hierarchy so for each month I could have repeated data yes.
Table Employee:
EmployeeId int
ManagerId int
PeriodId int
The PeriodId column is a number that represents a month/year so for example my hierarchy for january will have PeriodId = 1, february = 2 and so on, the PeriodId is unique by month/year.
I have a table value function with the CTE above that receives a manager and returns all employees under him and the level.
My CTE including the PeriodId looks like this:
WITH CTE
AS
(
SELECT #EmployeeIdTmp as EmployeeId,
0 AS [Level]
UNION ALL
SELECT em.[EmployeeId],
[Level] + 1
FROM Employee em
INNER JOIN #PeriodIds p
ON em.[PeriodId] = p.[PeriodId]
INNER JOIN CTE t
ON em.[ManagerId] = t.EmployeeId
WHERE (em.[ManagerId] <> em.[EmployeeId]
AND em.[ManagerId] IS NOT NULL)
)
SELECT EmployeeId, [Level]
FROM CTE
When I'm checking for one month, all is good, but as soon as I try to get data for two months for example, is taking too much time and repeating data more than two times, even if I'm specifying just two months.
If I understand correctly, the question can be reduced to "how to prevent circular traversal in CTE queries". Look here for an answer: TSQL CTE: How to avoid circular traversal?

SQL statement to match dates that are the closest?

I have the following table, let's call it Names:
Name Id Date
Dirk 1 27-01-2015
Jan 2 31-01-2015
Thomas 3 21-02-2015
Next I have the another table called Consumption:
Id Date Consumption
1 26-01-2015 30
1 01-01-2015 20
2 01-01-2015 10
2 05-05-2015 20
Now the problem is, that I think that doing this using SQL is the fastest, since the table contains about 1.5 million rows.
So the problem is as follows, I would like to match each Id from the Names table with the Consumption table provided that the difference between the dates are the lowest, so we have: Dirk consumes on 27-01-2015 about 30. In case there are two dates that have the same "difference", I would like to calculate the average consumption on those two dates.
While I know how to join, I do not know how to code the difference part.
Thanks.
DBMS is Microsoft SQL Server 2012.
I believe that my question differs from the one mentioned in the comments, because it is much more complicated since it involves comparison of dates between two tables rather than having one date and comparing it with the rest of the dates in the table.
This is how you could it in SQL Server:
SELECT Id, Name, AVG(Consumption)
FROM (
SELECT n.Id, Name, Consumption,
RANK() OVER (PARTITION BY n.Id
ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date]))) AS rnk
FROM Names AS n
INNER JOIN Consumption AS c ON n.Id = c.Id ) t
WHERE t.rnk = 1
GROUP BY Id, Name
Using RANK with PARTITION BY n.Id and ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date])) you can locate all matching records per Id: all records with the smallest difference in days are going to have rnk = 1.
Then, using AVG in the outer query, you are calculating the average value of Consumption between all matching records.
SQL Fiddle Demo

count occurrences for each week using db2

I am looking for some general advice rather than a solution. My problem is that I have a list of dates per person where due to administrative procedures, a person may have multiple records stored for this one instance, yet the date recorded is when the data was entered in as this person is passed through the paper trail. I understand this is quite difficult to explain so I'll give an example:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2000-01-01 B
1 2000-01-02 C
1 2003-04-01 A
1 2003-04-03 A
where I want to know how many valid records a person has by removing annoying audits that have recorded the date as the day the data was entered, rather than the date the person first arrives in the dataset. So for the above person I am only interested in:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2003-04-01 A
what makes this problem difficult is that I do not have the luxury of an audit column (the audit column here is just to present how to data is collected). I merely have dates. So one way where I could crudely count real events (and remove repeat audit data) is to look at individual weeks within a persons' history and if a record(s) exists for a given week, add 1 to my counter. This way even though there are multiple records split over a few days, I am only counting the succession of dates as one record (which after all I am counting by date).
So does anyone know of any db2 functions that could help me solve this problem?
If you can live with standard weeks it's pretty simple:
select
person, year(dt), week(dt), min(dt), min(audit)
from
blah
group by
person, year(dt), week(dt)
If you need seven-day ranges starting with the first date you'd need to generate your own week numbers, a calendar of sorts, e.g. like so:
with minmax(mindt, maxdt) as ( -- date range of the "calendar"
select min(dt), max(dt)
from blah
),
cal(dt,i) as ( -- fill the range with every date, count days
select mindt, 0
from minmax
union all
select dt+1 day , i+1
from cal
where dt < (select maxdt from minmax) and i < 100000
)
select
person, year(blah.dt), wk, min(blah.dt), min(audit)
from
(select dt, int(i/7)+1 as wk from cal) t -- generate week numbers
inner join
blah
on t.dt = blah.dt
group by person, year(blah.dt), wk

SQL: Can GROUP BY contain an expression as a field?

I want to group a set of dated records by year, when the date is to the day. Something like:
SELECT venue, YEAR(date) AS yr, SUM(guests) AS yr_guests
FROM Events
...
GROUP BY venue, YEAR(date);
The above is giving me results instead of an error, but the results are not grouping by year and venue; they do not appear to be grouping at all.
My brute force solution would be a nested subquery: add the YEAR() AS yr as an extra column in the subquery, then do the grouping on yr in the outer query. I'm just trying to learn to do as much as possible without nesting, because nesting usually seems horribly inefficient.
I would tell you the exact SQL implementation I'm using, but I've had trouble discovering it. (I'm working through the problems on http://www.sql-ex.ru/ and if you can tell what they're using, I'd love to know.) Edited to add: Per test in comments, it is probably not SQL Server.
Edited to add the results I am getting (note the first two should be summed):
venue | yr | yr_guests
1 2012 15
1 2012 35
2 2012 12
1 2008 15
I expect those first two lines to instead be summed as
1 2012 50
Works Fine in SQL Server 2008.
See working Example here: http://sqlfiddle.com/#!3/3b0f9/6
Code pasted Below.
Create The Events Table
CREATE TABLE [Events]
( Venue INT NOT NULL,
[Date] DATETIME NOT NULL,
Guests INT NOT NULL
)
Insert the Rows.
INSERT INTO [Events] VALUES
(1,convert(datetime,'2012'),15),
(1,convert(datetime,'2012'),35),
(2,convert(datetime,'2012'),12),
(1,convert(datetime,'2008'),15);
GO
-- Testing, select newly inserted rows.
--SELECT * FROM [Events]
--GO
Run the GROUP BY Sql.
SELECT Venue, YEAR(date) AS yr, SUM(guests) AS yr_guests
FROM Events
GROUP BY venue, YEAR(date);
See the Output Results.
VENUE YR YR_GUESTS
1 2008 15
1 2012 50
2 2012 12
it depends of your database engine (or SQL)
to be sure (over different DB Systems & Versions), make a subquery
SELECT venue, theyear, SUM(guests) from (
SELECT venue, YEAR(date) AS theyear, guest
FROM Events
)
GROUP BY theyear
you make a subtable of
venue, date as theyear, guest
aaaa, 2001, brother
aaaa, 2001, bbrother
bbbb, 2001, nobody
... and so on
and then
count them

Advice on database design / SQL for retrieving data with chronological order

I am creating a database that will help keep track of which employees have been on a certain training course. I would like to get some guidance on the best way to design the database.
Specifically, each employee must attend the training course each year and my database needs to keep a history of all the dates on which they have attend the course in the past.
The end user will use the software as a planning tool to help them book future course dates for employees. When they select a given employee they will see:
(a) Last attendance date
(b) Projected future attendance date(i.e. last attendance date + 1 calendar year)
In terms of my database, any given employee may have multiple past course attendance dates:
EmpName AttandanceDate
Joe Bloggs 1st Jan 2007
Joe Bloggs 4th Jan 2008
Joe Bloggs 3rd Jan 2009
Joe Bloggs 8th Jan 2010
My question is what is the best way to set up the database to make it easy to retrieve the most recent course attendance date? In the example above, the most recent would be 8th Jan 2010.
Is there a good way to use SQL to sort by date and pick the MAX date?
My other idea was to add a column called ‘MostRecent’ and just set this to TRUE.
EmpName AttandanceDate MostRecent
Joe Bloggs 1st Jan 2007 False
Joe Bloggs 4th Jan 2008 False
Joe Bloggs 3rd Jan 2009 False
Joe Bloggs 8th Jan 2010 True
I wondered if this would simplify the SQL i.e.
SELECT Joe Bloggs WHERE MostRecent = ‘TRUE’
Also, when the user updates a given employee’s attendance record (i.e. with latest attendance date) I could use SQL to:
Search for the employee and set the
MostRecent value to FALSE
Add a new record with MostRecent set to TRUE?
Would anybody recommended either method over the other? Or do you have a completely different way of solving this problem?
To get the last attendance date use the group function called MAX, i.e.
SELECT MAX(AttandanceDate)
FROM course_data
WHERE employee_name = 'Joe Bloggs'
To get the max attendance date for all the employees:
SELECT employee_name, MAX(AttandanceDate)
FROM course_data
GROUP BY employee_name
ORDER BY employee_name
Query above will NOT return data for employees who haven't attended any courses. So you need to execute a different query.
SELECT A.employee_name, B.AttandanceDate
FROM employee AS A
LEFT JOIN (
SELECT employee_id, MAX(AttandanceDate) AS AttandanceDate
FROM course_data
GROUP BY employee_id
) AS B ON A.id = B.employee_id
ORDER BY A.employee_name
For employees who haven't attended any course, the query will return a NULL AttendanceDate.
The flag is redundant. The other way how to get last attend day by employee:
select top 1 AttandanceDate
from course_data
WHERE employee_name = 'Joe Bloggs'
order by AttandanceDate desc
This may already be the case, but the output from the AttandanceDate columns makes me suspicious that that column may not be a datetime column. Most RDBMS's have some sort of date, time, and/or date time data types to use for storing this information. In which KandadaBoggu's AND OMG Ponies responses are perfect. But if you are storing your dates as strings you WILL have issues trying to do any of their suggestions.
Using a date time data type usually also opens you to the possibilites of obtaining date details like:
e.g. SELECT YEAR(2008-01-01) will return 2008 as an integer.
If you are running SQL Server 2005 or 2008 or later, you can use row_number() do something like the following. This will list everyone, with their most recent attendance.
with temp1 as
(select *
, (row_number() over (partition by EmpName order by AttandanceDate descending))
as [CourseAttendanceOrder]
from AttendanceHistory)
select *
from temp
where CourseAttendanceOrder = 1
order by EmpName
This could be put into a view so you can use it as needed.
However, if you always will be focused on one individual at a time, it may be more efficient to make a stored procedure that can use statements like select max(AttandanceDate) for just the person you are working on.