How to replace null with 0 in conditional selection - google-bigquery

I got 100 supervisors in my list, and I would like to count how many employees under their supervision at the beginning of 01/01/2018.
These are the codes I tried. However, for supervisors have no employees, their names just disappear in the table. I just wanna keep their names and set the number of employees as 0 if they don't have any.
select
Supervisor,
IFNULL(COUNT(EmpID),0) AS start_headcount
from
`T1`
where
(Last_hire_date < date'2018-01-01'
AND
term_date >= date'2018-01-01' )OR
( Last_hire_date < date'2018-01-01'
AND
term_date is null)
group by
1
order by
1 asc
The result turned out to be only 92 supervisors appeared in the list who have employees under them. The other 8 supervisors who have no employees just gone. I cannot figure out a better way to improve it.
Can anyone also help with this?

Below is for BigQuery Standard SQL
#standardSQL
SELECT
Supervisor,
COUNTIF(
(Last_hire_date < DATE '2018-01-01' AND term_date >= DATE '2018-01-01' )
OR
(Last_hire_date < DATE'2018-01-01' AND term_date IS NULL)
) AS start_headcount
FROM
`T1`
GROUP BY
1
ORDER BY
1 ASC
The problem in original query in question was because filtering was happening on WHERE clause level thus effectively totally excluding not matching rows and as result some Supervisor were not shown
So, instead, I moved that condition into COUNTIF() - replacing IFNULL(COUNT()) stuff
In case if your data stored such that you need to take DISTINCT into account - below will address this case
here you are not counting distinct employee ID as the headcount
#standardSQL
SELECT
Supervisor,
COUNT(DISTINCT
IF(
(Last_hire_date < DATE '2018-01-01' AND term_date >= DATE '2018-01-01' )
OR
(Last_hire_date < DATE'2018-01-01' AND term_date IS NULL),
EmpID,
NULL
)
) AS start_headcount
FROM
`T1`
GROUP BY
1
ORDER BY
1 ASC

Related

Trying to display data that has more than a day gap between two dates on the same user?

I'm trying to create a query in Toad for Oracle that allows me to pull users who have had more than a one day gap between their previous and current supervisor(s) with a Supervisor Type of 'Registered Principal'.
For example, if the user has a Supervisor with an end date of 10/20/2019, I would expect to see a Supervisor assigned by 10/21/2019. If not then I would want those exceptions displayed since as of 10/22/2019, there is a one day gap. If date of '12/31/9999' is displayed then that means the supervisor is current.
SELECT DISTINCT a.AssocID, a.SupervisorAssocID, TRUNC(a.StartDate),
TRUNC(a.EndDate), a.SupervisorType
FROM TableName a
INNER JOIN (SELECT AssocID, StartDate, EndDate
FROM TableName
) b ON a.AssocID = b.AssocID
WHERE a.StartDate != TRUNC(b.StartDate)
AND TRUNC(b.EndDate) > a.StartDate
AND a.StartDate != TRUNC(b.EndDate)
AND a.SupervisorType = 'Registered Principal';
I expect to only see users who have had a gap of more than one day between Supervisors.
You can use LEAD analytic function to get the next start date:
SELECT *
FROM (
SELECT a.*,
LEAD( startdate ) OVER (
PARTITION BY AssocId
ORDER BY StartDate ASC
) AS next_startdate
FROM tablename a
-- WHERE SupervisorType = 'Registered Principal'
)
WHERE SupervisorType = 'Registered Principal'
AND TRUNC( enddate ) + INTERVAL '1' DAY < TRUNC( next_startdate )
Note: its unclear where you want to filter on SupervisorType; your query makes it seem like it should be the outer query but it could be the inner query if you only want to consider differences between Registered Principals and not any other type of supervisor.

What is a better alternative to a "helper" table in an Oracle database?

Let's say I have an 'employees' table with employee start and end dates, like so:
employees
employee_id start_date end_date
53 '19901117' '99991231'
54 '19910208' '20010512'
55 '19910415' '20120130'
. . .
. . .
. . .
And let's say I want to get the monthly count of employees who were employed at the end of the month. So the resulting data set I'm after would look like:
month count of employees
'20150131' 120
'20150228' 118
'20150331' 122
. .
. .
. .
The best way I currently know how to do this is to create a "helper" table to join onto, such as:
helper_tbl
month
'20150131'
'20150228'
'20150331'
.
.
.
And then do a query like so:
SELECT t0b.month,
count(t0a.employee_id)
FROM employees t0a
JOIN helper_tbl t0b
ON t0b.month BETWEEN t0a.start_date AND t0a.end_date
GROUP BY t0b.month
However, this is somewhat annoying solution to me, because it means I'm having to create these little helper tables all the time and they clutter up my schema. I feel like other people must run into the same need for "helper" tables, but I'm guessing people have figured out a better way to go about this that isn't so manual. Or do you all really just keep creating "helper" tables like I do to get around these situations?
I understand this question is a bit open-ended up for stack overflow, so let me offer a more closed-ended version of the question which is, "Given just the 'employees' table, what would YOU do to get the resulting data set that I showed above?"
You can use a CTE to generate all the month values, either form a fixed starting point or based on the earliest date in your table:
with months (month) as (
select add_months(first_month, level - 1)
from (
select trunc(min(start_date), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select * from months;
With data that was an earliest start date of 1990-11-17 as in your example, that generates 333 rows:
MONTH
-------------------
1990-11-01 00:00:00
1990-12-01 00:00:00
1991-01-01 00:00:00
1991-02-01 00:00:00
1991-03-01 00:00:00
...
2018-06-01 00:00:00
2018-07-01 00:00:00
You can then use that in a query that joins to your table, something like:
with months (month) as (
select add_months(first_month, level - 1)
from (
select trunc(min(start_date), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select m.month, count(*) as employees
from months m
left join employees e
on e.start_date <= add_months(m.month, 1)
and (e.end_date is null or e.end_date >= add_months(m.month, 1))
group by m.month
order by m.month;
Presumably you wan to include people who are still employed, so you need to allow for the end date being null (unless you're using a magic end-date value for people who are still employed...)
With dates stored as string it's a bit more complicated but you can generate the month information in a similar way:
with months (month, start_date, end_date) as (
select add_months(first_month, level - 1),
to_char(add_months(first_month, level - 1), 'YYYYMMDD'),
to_char(last_day(add_months(first_month, level - 1)), 'YYYYMMDD')
from (
select trunc(min(to_date(start_date, 'YYYYMMDD')), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select m.month, m.start_date, m.end_date, count(*) as employees
from months m
left join employees e
on e.start_date <= m.end_date
and (e.end_date is null or e.end_date > m.end_date)
group by m.month, m.start_date, m.end_date
order by m.month;
Very lightly tested with a small amount of made-up data and both seem to work.
If you want to get the employees who were employed at the end of the month, then you can use the LAST_DAY function in the WHERE clause of the your query. Also, you can use that function in the GROUP BY clause of your query. So your query would be like below,
SELECT LAST_DAY(start_date), COUNT(1)
FROM employees
WHERE start_date = LAST_DAY(start_date)
GROUP BY LAST_DAY(start_date)
or if you just want to count employees employed per month then use below query,
SELECT LAST_DAY(start_date), COUNT(1)
FROM employees
GROUP BY LAST_DAY(start_date)

RedShift: Alternative to 'where in' to compare annual login activity

Here are the two cases:
Members Lost: Get the distinct count of user ids from 365 days ago who haven't had any activity since then
Members Added: Get the distinct count of user ids from today who don't exist in the previous 365 days.
Here are the SQL statements I've been writing. Logically I feel like this should work (and it does for sample data), but the dataset is 5Million+ rows and takes forever! Is there any way to do this more efficiently? (base_date is a calendar that I'm joining on to build out a 2 year trend. I figured this was faster than joining the 5million table on itself...)
-- Members Lost
SELECT
effective_date,
COUNT(DISTINCT dwuserid) as members_lost
FROM base_date
LEFT JOIN site_visit
-- Get Login Activity for 365th day
ON DATEDIFF(day, srclogindate, effective_date) = 365
WHERE dwuserid NOT IN (
-- Get Distinct Login activity for Current Day (PY) + 1 to Current Day (CY) (i.e. 2013-01-02 to 2014-01-01)
SELECT DISTINCT dwuserid
FROM site_visit b
WHERE DATEDIFF(day, b.srclogindate, effective_date) BETWEEN 0 AND 364
)
GROUP BY effective_date
ORDER BY effective_date;
-- Members Added
SELECT
effective_date,
COUNT(DISTINCT dwuserid) as members_added
FROM base_date
LEFT JOIN site_visit ON srclogindate = effective_date
WHERE dwuserid NOT IN (
SELECT DISTINCT dwuserid
FROM site_visit b
WHERE DATEDIFF(day, b.srclogindate, effective_date) BETWEEN 1 AND 365
)
GROUP BY effective_date
ORDER BY effective_date;
Thanks in advance for any help.
UPDATE
Thanks to #JohnR for pointing me in the right direction. I had to tweak your response a bit because I need to know on any login day how many were "Member Added" or "Member Lost" so it had to be a 365 rolling window looking back or looking forward. Finding the IDs that didn't have a match in the LEFT JOIN was much faster.
-- Trim data down to one user login per day
CREATE TABLE base_login AS
SELECT DISTINCT "dwuserid", "srclogindate"
FROM site_visit
-- Members Lost
SELECT
current."srclogindate",
COUNT(DISTINCT current."dwuserid") as "members_lost"
FROM base_login current
LEFT JOIN base_login future
ON current."dwuserid" = future."dwuserid"
AND current."srclogindate" < future."srclogindate"
AND DATEADD(day, 365, current."srclogindate") >= future."srclogindate"
WHERE future."dwuserid" IS NULL
GROUP BY current."srclogindate"
-- Members Added
SELECT
current."srclogindate",
COUNT(DISTINCT current."dwuserid") as "members_added"
FROM base_login current
LEFT JOIN base_login past
ON current."dwuserid" = past."dwuserid"
AND current."srclogindate" > past."srclogindate"
AND DATEADD(day, 365, past."srclogindate") >= current."srclogindate"
WHERE past."dwuserid" IS NULL
GROUP BY current."srclogindate"
NOT IN should generally be avoided because it has to scan all data.
Instead of joining to the site_visit table (which is presumably huge), try joining to a sub-query that selects UserID and the most recent login date -- that way, there is only one row per user instead of one row per visit.
For example:
SELECT dwuserid, min (srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
You could then simplify the queries to something like:
-- Members Lost: Last login was between 12 and 13 months ago
SELECT
COUNT(*)
FROM
(
SELECT dwuserid, min(srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
)
WHERE
last_login BETWEEN current_date - interval '13 months' and current_date - interval '12 months'
-- Members Added: First visit in last 12 months
SELECT
COUNT(*)
FROM
(
SELECT dwuserid, min(srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
)
WHERE
first_login > current_date - interval '12 months'

How to simplify this oracle sql query?

I have written following select to get the previous different grade value from jobs table.
This works well but is it possible to simplify the code that it won't have 3 levels?
select value_1
from ( select distinct
value_1,
date_from,
date_to,
emp_id,
(select o.value_1
from jobs o
where o.emp_id=w.emp_id
and (
(o.date_to >= sysdate and o.date_from <= sysdate) or
(o.data_from <= sysdate and o.data_to is null)
)
) current_grade
from jobs w
where w.emp_id = t.emp_id
order by data_from desc
)
where value_1 != current_grade
and data_from <= sysdate
and rownum=1
and t.emp_id=123
order by data_from desc,
value_1,
emp_id
What it suppose to do? I want to select previous different grade value from jobs table. This table is used to store positions for each employee, they have date_from, date_to, additionally in value_1 we store the grade symbol. What is important for me is to select previous different value for grade which could have changed 3 positions before.
I don't think you can get away from a three-level query in this instance, but it can be simplified. As I noted in my comment, the ORDER BY in the outer query is superfluous, and you would actually get incorrect results if the ORDER BY in the second query was not there. Oracle's rownum does not work like other databases' Top-N queries -- rownum is calculated before order by, so using rownum= with an ORDER BY will not necessarily return the highest row.
This should produce the desired result, and is slightly more compact:
SELECT
value_1
FROM
(
SELECT
value_1
FROM
jobs w
WHERE
date_from <= sysdate
and emp_id=123
and value_1 != (SELECT value_1
FROM jobs o
WHERE o.emp_id = w.emp_id
AND (o.date_to >= sysdate and o.date_from <= sysdate
OR o.date_from <= sysdate and o.date_to is null))
ORDER BY date_from desc
)
WHERE
rownum = 1
SQLFiddle here
You can do it with a single table hit by getting value_1 of latest date_to value in the past.
select value_1 from jobs where date_to < sysdate and emp_id = 123
If you need the latest job role do a order by desc and get first row.

Total Count of Active Employees by Date

I have in the past written queries that give me counts by date (hires, terminations, etc...) as follows:
SELECT per.date_start AS "Date",
COUNT(peo.EMPLOYEE_NUMBER) AS "Hires"
FROM hr.per_all_people_f peo,
hr.per_periods_of_service per
WHERE per.date_start BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
AND per.date_start BETWEEN :PerStart AND :PerEnd
AND per.person_id = peo.person_id
GROUP BY per.date_start
I was now looking to create a count of active employees by date, however I am not sure how I would date the query as I use a range to determine active as such:
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.current_employee_flag = 'Y'
and TRUNC(sysdate) BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
Here is a simple way to get started. This works for all the effective and end dates in your data:
select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
It works by adding one person for each start and subtracting one for each end (via num) and doing a cumulative sum. This might have duplicates dates, so you might also do an aggregation to eliminate those duplicates:
select thedate, max(numActives)
from (select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
) t
group by thedate;
If you really want all dates, then it is best to start with a calendar table, and use a simple variation on your original query:
select c.thedate, count(*) as NumActives
from calendar c left outer join
hr.per_periods_of_service pos
on c.thedate between pos.effective_start_date and pos.effective_end_date
group by c.thedate;
If you want to count all employees who were active during the entire input date range
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.[EFFECTIVE_START_DATE] <= :StartDate
AND (peo.[EFFECTIVE_END_DATE] IS NULL OR peo.[EFFECTIVE_END_DATE] >= :EndDate)
Here is my example based on Gordon Linoff answer
with a little modification, because in SUBSTRACT table all records were appeared with -1 in NUM, even if no date was in END DATE = NULL.
use AdventureWorksDW2012 --using in MS SSMS for choosing DATABASE to work with
-- and may be not work in other platforms
select
t.thedate
,max(t.numActives) AS "Total Active Employees"
from (
select
dates.thedate
,SUM(dates.num) over (order by dates.thedate) as numActives
from
(
(
select
StartDate as thedate
,1 as num
from DimEmployee
)
union all
(
select
EndDate as thedate
,-1 as num
from DimEmployee
where EndDate IS NOT NULL
)
) AS dates
) AS t
group by thedate
ORDER BY thedate
worked for me, hope it will help somebody
I was able to get the results I was looking for with the following:
--Active Team Members by Date
SELECT "a_date",
COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo,
(SELECT DATE '2012-04-01'-1 + LEVEL AS "a_date"
FROM dual
CONNECT BY LEVEL <= DATE '2012-04-30'+2 - DATE '2012-04-01'-1
)
WHERE peo.current_employee_flag = 'Y'
AND "a_date" BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
GROUP BY "a_date"
ORDER BY "a_date"