Is there a way to select rows in database per day? - sql

I have to create a sql request which returns the employees with 0 absence and with 0 hours worked per day between an interval choosen by an user. I got my view v_personnel (rows about the employees), exp_mat_abs (rows about absence of my employee) and exp_mat_mo (rows about the worked hours of the employees). I am stuck. I don't find the way how i can get the employees with 0 absence and with 0 hours worked per day between an interval choosen by an user.
Can someone help me with it ?

So you have a record on the employees table for each employee, but there will be no Record in hours worked or hours absent if the non working employee was not absent. Aside from the slight contradiction in how someone can not work but also not be absent :) (I assume you mean they're on some kind of no-work-no-pay deal) what you want is
employees
left join hoursworked on ...
left join absent on ...
Employees that are zero absent/hours will have NULL in every column from each of the hours/absence tables
If you just want these employees you can
WHERE hoursworkedtable.primarykeycolumnname IS NULL
If you want to turn the null into zero you can:
SELECT COALESCE(somenullcolumn, 0) as somenullcolumn
The only other thing I wanted to point out is that hours worked and hours absent aren't necessarily related concepts - you might want to group them up into eg a time period before you join and join on the employee and the day also, otherwise you'll end up with a Cartesian product - basically find some way of reducing the rows coming out of the worked and absence tables so there is only one row from each per employee (per time period) before you join:
employees
left join (select ... from hoursworked group by...) on ...
left join (select ... from absent group by...) on ...
If you don't, then multiple rows from either of them, per employee, will cause data from the other table to duplicate
Edit; so you need some extra days per employee:
employee e
cross join
generate_series('2019-01-10'::timestamp, '2019-01-21'::timestamp, '1 day'::interval) d
left outer join (select emp, thedate, ... from hoursworked group by emp, thedate) w
on w.empid = e.empid and w.thedate = d.d
left outer join (select emp, thedate, ... from absent group by emp, thedate) a
on a.empid = e.empid and a.thedate = d.d
we ask generate_series to make us a list of dates between the 10th and the 21st, and join the employees to it. This gives us a per-employee-per-day thing. Now we can left join our hours and absences onto this and get hours worked per day (or null if they didnt work) and absences per day (or null if they weren't absent)

Related

SQL Server one-to-many relational IF query

I have an employees table that has many employee_records. The employee_records table has a column named event_type and it is an enum that can be either hire-date, promotion, termination, title-change, or rehire. I am attempting to calculate an employees total time employed and make some calculations based on how long they have been employed.
How can I add a column that gives me the total days that they have been employed?
Essentially, I need to see if they have a record with the event_type = termination and if they do, then I need to see if they have a rehire date, and if they do, then I need to use their rehire date as the first day of their employment and calculate their time of employment that way.
As for a result, I simply need a days_employed column that reflects the actual amount of days they have been employed.
Here is what I have so far.
SELECT
employees.id,
first_name,
last_name,
email,
event_type,
CASE
WHEN DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) < 365 * 5
THEN 1
WHEN DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) < 365 * 10
THEN 2
ELSE 3
END AS benfits_type,
DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) AS days_employed,
employee_records.created_at AS hire_date
FROM
employees
JOIN
employee_records ON employees.id = employee_records.employee_id
ORDER BY
employees.id ASC;
Here is an example of how you could do this. I'll post the query first and then walk through my explanation. If I understood you correctly, you were not looking for total days the employee was hired, but rather the total days of the employee's most recent employment at the company (Max hire date to max termination date or today).
;WITH hired
AS (SELECT employee_records.employee_id id,
Max(employee_records.created_at) created_at
FROM employee_records
WHERE event_type = 'hire-date'
GROUP BY employee_records.employee_id),
latest
AS (SELECT employees.id
id,
Isnull(Cast(Max(employee_records.created_at) AS DATE), Getdate()
)
created_at
FROM employees
LEFT JOIN employee_records
ON employees.id = employee_records.employee_id
AND employee_records.event_type = 'termination'
GROUP BY employees.id)
SELECT *,
Datediff(day, hired, latestday) DaysEmployeed
FROM (SELECT hired.id,
hired.created_at AS Hired,
CASE
WHEN hired.created_at > latest.created_at THEN Cast(
Getdate() AS DATE)
ELSE latest.created_at
END AS LatestDay
FROM hired
INNER JOIN latest
ON hired.id = latest.id) JoinedCTEs
First of all I know you mentioned event_type is an integer, but for easy explanation I used a varchar.
Two CTEs to start.
First there is "hired" which will get you the latest hire date. So if an employee has multiple hire dates, it grabs the latest date.
Second there is "latest" which is the latest date an employee has a termination date, but also uses today's date as a placeholder date if an employee has never been terminated.
The final query joins the two CTEs and does a datediff by day to determine how many days an employee has been at the company. If the termination date is earlier than the hire date (An employee who was hired, terminated, rehired and is still with the company), it will take today's date as the latest date to count.

Average Group size per month Over previous ten years

I need to find the average size (average number of employees) of all the groups (employers) that we do business with per month for the last ten years.
So I have no problem getting the average group size for each month. For the Current month I can use the following:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
group by ER.EmployerName
This will give me a list of how many employees are in each group. I can then copy and paste the column into excel get the average for the current month.
For the previous month, I want exclude any employees that were added after that month. I have a query for this too:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
group by ER.EmployerName
That will exclude all employees that were added this month. I can continue to this all the way back ten years, but I know there is a better way to do this. I have no problem running this query 120 times, copying and pasting the results into excel to compute the average. However, I'd rather learn a more efficient way to do this.
Another Question, I can't do the following, anyone know a way around it:
Select avg(count(*))
Thanks in advance guys!!
Edit: Employees that have been terminated can be found like this. NULL are employees that are currently employed.
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
join Gen_Info gne on gne.id = EE.newuserid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
and (gne.TerminationDate is NULL OR gen.TerminationDate < DATEADD(day, -14,GETDATE())
group by ER.EmployerName
Are you after a query that shows the count by year and month they were added? if so this seems pretty straight forward.
this is using mySQL date functions Year & month.
Select AVG(cnt) FROM (
Select count(*) cnt, Year(dateAdded), Month(dateAdded)
from System_Users su
join system_Employers se on se.employerid = su.employerid
group by Year(dateAdded), Month(dateAdded)) B
The inner query counts and breaks out the counts by year and month We then wrap that in a query to show the avg.
--2nd attempt but I'm Brain FriDay'd out.
This uses a Common table Expression (CTE) to generate a set of data for the count by Year, Month of the employees, and then averages out by month.
if this isn't what your after, sample data w/ expected results would help better frame the question and I can making assumptions about what you need/want.
With CTE AS (
Select Year(dateAdded) YR , Month(DateAdded) MO, count(*) over (partition by Year(dateAdded), Month(dateAdded) order by DateAdded Asc) as RunningTotal
from System_Users su
join system_Employers se on se.employerid = su.employerid
Order by YR ASC, MO ASC)
Select avg(RunningTotal), mo from cte;

How can I make generate_series work for part of a month?

I'm using filled_months to fill blank months in and group data by months. The problem is I can't seem to make it work for querying partial months (e.g. 2016-09-01 to 2016-09-15), it always counts the full month. Can someone point me in the right direction?
with filled_months AS
(SELECT
month,
0 AS blank_count
FROM generate_series(date_trunc('month',date('2016-09-01')), date_trunc('month',date('2016-09-15')), '1 month') AS
month)
SELECT to_char(mnth.month, 'YYYY Mon') AS month_year,
count(distinct places.id)
FROM filled_months mnth
left outer join restaurants
ON date_trunc('month', restaurants.created_at) = mnth.month
left outer join places
ON restaurants.places_id = places.id
WHERE places.id IS NULL OR restaurants.id IS NULL
GROUP BY mnth.month
ORDER BY mnth.month
If I understand your quandary correctly, I don't think you even need the generate series here. I think you can get by with a between in your where clause and let the grouping in SQL handle the rest:
SELECT
to_char(r.created_at, 'YYYY Mon') AS month,
count(distinct p.id)
FROM
restaurants r
left join places p ON r.places_id = p.id
WHERE
r.created_at between '2016-09-01' and '2016-09-15'
GROUP BY month
ORDER BY month
I tested this on three records with dates, 9/1/16, 9/15/16 and 9/30/16, and it gave a count of two. When I expanded the range to 9/30 it correctly gave three.
DISCLAIMER: I didn't understand what this meant:
places.id IS NULL OR restaurants.id IS NULL
If this doesn't work, then perhaps you can add some sample data to your question, along with some expected results.

How to join two queries with different GROUP BY levels, leaving some records null

In MS Access, using SQL, I've combined two queries with inner join that both require the user to input a Start Date and End Date range. The first query (query 1) lists the count of how many people have left the program, the situation with which they left, the month, and the year they left. It is grouping the Count function first on the year, then month, then the leaving situation (which can be any of 5 options), which means that there are multiple records for each month (but not necessarily the same number of records for each month). The second query (query 2) counts the number of people we've admitted to the program, the month, and the year. It is grouped first on the year, then month. So, with this query, there is only one record per month. My inner join combines the queries correctly, except that it repeats query 2's values multiple times, depending on how many records each of query 1's months have. Is there a way to have query 2's values only listed once per month, therefore leaving the rest of the records for that month null?
Here's query 1:
SELECT Count(clients.ssn) AS CountOfDepartures, clients.[leaving situation], a.monthname, a.year1, a.month1
FROM clients INNER JOIN (SELECT month(clients.[departure date]) AS Month1, year(clients.[departure date]) AS Year1, months.monthname, clients.ssn FROM clients
INNER JOIN months ON month(clients.[departure date])=months.monthnumber WHERE clients.[departure date] BETWEEN [Enter Start Date] AND [Enter End Date]) AS A
ON clients.ssn=a.ssn
GROUP BY a.year1, a.monthname, clients.[leaving situation], a.month1
ORDER BY a.year1 DESC , a.month1 DESC;
Here's query 2
SELECT Count(clients.ssn) AS CountofIntakes, b.monthname, b.year2, b.month2
FROM clients
INNER JOIN (SELECT month(clients.prog_start) AS Month2, year(clients.prog_start) AS Year2, months.monthname, clients.ssn
FROM clients INNER JOIN months ON month(clients.prog_start)=months.monthnumber WHERE clients.prog_start BETWEEN [Enter Start Date] AND [Enter End Date]) AS B
ON clients.ssn=b.ssn
GROUP BY b.monthname, b.year2, b.month2
ORDER BY b.year2 DESC , b.month2 DESC;
Here's how I combined them, but it gives me the repeating values:
SELECT countofdeparturesbyleavingsituationmonth.countofdepartures, countofdeparturesbyleavingsituationmonth.[leaving situation], countofdeparturesbyleavingsituationmonth.monthname, countofdeparturesbyleavingsituationmonth.year1, countofdeparturesbyleavingsituationmonth.month1, countofintakesbymonth.countofintakes
FROM countofdeparturesbyleavingsituationmonth
INNER JOIN countofintakesbymonth ON (countofdeparturesbyleavingsituationmonth.monthname=countofintakesbymonth.monthname) AND (countofdeparturesbyleavingsituationmonth.year1=countofintakesbymonth.year2) AND (countofdeparturesbyleavingsituationmonth.monthname=countofintakesbymonth.monthname)
ORDER BY year1 DESC , month1 DESC;
The CLIENTS table has a record for each client with a bunch of columns for different clinical data (I work for a non-profit drug/alcohol rehabilitation center). The MONTHS table just have the twelve months written out in one column with a corresponding number in the second column. I use the inner join with the MONTHS table in order to list the monthname rather than the number (though I just realized that I could probably just do this in my report with the MonthName function....). Any advice is very appreciated!
Also, I'm asking this question because when I make a report on this query the SUM value for the total report (and per calendar year) for the intakes is incorrect because, in the query, each month has multiple intake values. So the SUM is much larger than it should be. Beyond that issue, this query generates the correct report.
You could use DISTINCT clause:
SELECT DISTINCT
queryA.people,
Null as situation,
queryA.[month],
queryA.[year]
FROM
queryA INNER JOIN queryB
ON (queryA.people=queryB.people)
AND (queryA.[month]=queryB.[month])
AND (queryA.[year]=queryB.[year])

employee database with varing salary over time

I have the following tables:
PROJECTS - project_id, name
EMPLOYEES - employee_id, name
SALARY - employee_id, date, per_hour
HOURS - log_id, project_id, employee_id, date, num_hours
I need to query how much a project is costing. Problem is that Salary can vary. For example, a person can get a raise.
The SALARY table logs the per_hour charge for an employee. With every change in cost being recorded with its date.
How can I query this information to make sure that the the log from the HOURS table is always matched to the right entry from the SALARY table. Right match being.. depending on the date of the hours log, get the row from the salary table with the highest date before the log's date.
ie.. if the work was performed on Feb 14th. Get the row for this employee from the Salary table with the highest date.. but still before the 14th.
Thank you,
What you need is an end date on SALARY. When a new record is inserted into SALARY for an employee, the previous record with the highest date (or better yet, a current flag set to 'Y' as recommended by cletus) should have its end date column set to the same date as the start date for the new record.
This should work with your current schema but be aware that it may be slow.
SELECT
SUM(h.num_hours * s.per_hour) AS cost
FROM PROJECTS p
INNER JOIN HOURS h
ON p.project_id = h.project_id
INNER JOIN (
SELECT
s1.employee_id,
s1.date AS start_date,
MIN(s2.date) AS end_date
FROM SALARY s1
INNER JOIN SALARY s2
ON s1.employee_id = s2.employee_id
AND s1.date < s2.date
GROUP BY
s1.employee_id,
s1.date) s
ON h.employee_id = s.employee_id
AND h.date >= s.start_date
AND h.date < s.end_date
In the 'Hours' table actually log the value of the salary that you use (don't link it based on ID). This will give you more flexibility in the future.
I have found the easiest way to handle queries spanning dates like this is to store a StartDate and an EndDate, where the EndDate is NULL for the current salary. I use a trigger to make sure there is only ever one NULL value for EndDate, and that there are no overlapping date ranges, or gaps between the ranges. StartDate is made not nullable, since that is never a valid value.
Then your join is pretty simple:
select h.num_hours, s.per_hour
from hours h
inner join salary s on h.employee_id = s.employee_id
and h.date >= s.StartDate and (h.date <= s.EndDate or s.EndDate is null)