Sum the total employment time/year for employees / plsql [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
a 'job' table as follow
How do I find the total years of employment for all the employees?
+--------+-----------+-----------------+------------------+
| emplid | record | action | effect dt |
+--------+-----------+-----------------+------------------+
| 1 | 0 | terminate | 8/9/2010 |
| 1 | 0 | hire | 5/6/2006 |
| 1 | 0 | terminate | 3/4/2005 |
| 1 | 0 | hire | 1/1/2003 |
| 1 | 1 | hire | 1/1/2006 |
| 1 | 1 | terminate | 5/5/2004 |
| 1 | 1 | promote | 4/4/2003 |
| 1 | 1 | hire | 3/3/2002 |
| 1 | 1 | terminate | 2/2/2001 |
| 1 | 1 | hire | 1/1/2000 |
| 2 | 0 | rehire | 6/7/2013 |
| 2 | 0 | terminate | 5/6/2011 |
| 2 | 0 | rehire | 3/3/2010 |
| 2 | 0 | terminate | 2/2/2009 |
| 2 | 0 | hire | 1/1/2008 |
+--------+-----------+-----------------+------------------+

If you are using Oracle 12 or later, use MATCH_RECOGNIZE to find pairs of (re)hire/terminate entries and then use MONTHS_BETWEEN to find the duration then group and sum to get the total:
SELECT emplid,
record,
SUM(
MONTHS_BETWEEN(
COALESCE( terminate_dt, SYSDATE ),
hire_dt
) / 12
) AS hire_years
FROM job
MATCH_RECOGNIZE(
PARTITION BY emplid, record
ORDER BY effect_dt
MEASURES
FIRST( hire.effect_dt ) AS hire_dt,
LAST( terminate.effect_dt ) AS terminate_dt
ONE ROW PER MATCH
PATTERN ( hire changes* terminate? )
DEFINE
hire AS hire.action IN ( 'hire', 'rehire' ),
changes AS changes.action NOT IN ( 'hire', 'rehire', 'terminate' ),
terminate AS terminate.action IN ( 'terminate' )
)
GROUP BY
emplid,
record
(Assuming that if an employee has been hired but there is no later termination entry then they are still employed.)
Which, for the sample data:
CREATE TABLE job ( emplid, record, action, effect_dt ) AS
SELECT 1, 0, 'terminate', DATE '2010-09-08' FROM DUAL UNION ALL
SELECT 1, 0, 'hire', DATE '2006-06-05' FROM DUAL UNION ALL
SELECT 1, 0, 'terminate', DATE '2005-04-03' FROM DUAL UNION ALL
SELECT 1, 0, 'hire', DATE '2003-01-01' FROM DUAL UNION ALL
SELECT 1, 1, 'hire', DATE '2006-01-01' FROM DUAL UNION ALL
SELECT 1, 1, 'terminate', DATE '2004-05-05' FROM DUAL UNION ALL
SELECT 1, 1, 'promote', DATE '2003-04-04' FROM DUAL UNION ALL
SELECT 1, 1, 'hire', DATE '2002-03-03' FROM DUAL UNION ALL
SELECT 1, 1, 'terminate', DATE '2001-02-02' FROM DUAL UNION ALL
SELECT 1, 1, 'hire', DATE '2000-01-01' FROM DUAL UNION ALL
SELECT 2, 0, 'rehire', DATE '2013-07-06' FROM DUAL UNION ALL
SELECT 2, 0, 'terminate', DATE '2011-06-05' FROM DUAL UNION ALL
SELECT 2, 0, 'rehire', DATE '2010-03-03' FROM DUAL UNION ALL
SELECT 2, 0, 'terminate', DATE '2009-02-02' FROM DUAL UNION ALL
SELECT 2, 0, 'hire', DATE '2008-01-01' FROM DUAL;
Outputs:
EMPLID | RECORD | HIRE_YEARS
-----: | -----: | ----------------------------------------:
1 | 0 | 6.51344086021505376344086021505376344086
1 | 1 | 18.18784323974512146555157307845479888494
2 | 0 | 9.75773571286340103544404619673436877738
db<>fiddle here

Related

Create column with timeframe relative to other column in SQL

Suppose I have the following table t_1 where every row represents a day:
+------+------------+-------+
| week | date | val |
+------+------------+-------+
| 1 | 2022-02-07 | 1 | <- Monday
| 1 | 2022-02-08 | 2 |
| 1 | 2022-02-09 | 3 |
| 1 | 2022-02-10 | 4 | <- Thursday
| 1 | 2022-02-11 | 5 |
| 1 | 2022-02-12 | 6 |
| 1 | 2022-02-13 | 7 |
| 2 | 2022-02-14 | 8 | <- Monday
| 2 | 2022-02-15 | 9 |
| 2 | 2022-02-16 | 10 |
| 2 | 2022-02-17 | 11 | <- Thursday
| 2 | 2022-02-18 | 12 |
| 2 | 2022-02-19 | 13 |
| 2 | 2022-02-20 | 14 |
+------+------------+-------+
How can I create the following table t2 from t1?
+------------+------------+-----------+------------+
| date_start | date_end | val_cur. | val_prev |
+------------+------------+-----------+------------+
| 2022-01-14 | 2022-01-17 | 38 | 10 |
+------------+------------+-----------+------------+
Here val_cur is defined as the sum of values of the current timeframe (i.e. the sum of values between date_start and date_end) and val_prev is defined as the sum of values of the previous timeframe (i.e. the current timeframe minus one week).
-- Bigquery Standard SQL
WITH t_1 AS
(SELECT 1 AS week, '2022-02-07' AS date, 1 AS val UNION ALL
SELECT 1, '2022-02-08', 2 UNION ALL
SELECT 1, '2022-02-09', 3 UNION ALL
SELECT 1, '2022-02-10', 4 UNION ALL
SELECT 1, '2022-02-11', 5 UNION ALL
SELECT 1, '2022-02-12', 6 UNION ALL
SELECT 1, '2022-02-13', 7 UNION ALL
SELECT 2, '2022-02-14', 8 UNION ALL
SELECT 2, '2022-02-15', 9 UNION ALL
SELECT 2, '2022-02-16', 10 UNION ALL
SELECT 2, '2022-02-17', 11 UNION ALL
SELECT 2, '2022-02-18', 12 UNION ALL
SELECT 2, '2022-02-19', 13 UNION ALL
SELECT 2, '2022-02-20', 14)
SELECT '2022-02-14' AS date_start, '2022-02-17' AS date_stop, sum(val) AS val_cur
FROM t_1
WHERE date >= '2022-02-14' AND date <= '2022-02-17'
Output:
+-----+------------+------------+---------+
| Row | date_start | date_stop | val_cur |
+-----+------------+------------+---------+
| 1 | 2022-02-14 | 2022-02-17 | 38 |
+-----+------------+------------+---------+
But how do I get the last column?
Consider below approach
with your_table as (
select 1 as week, date '2022-02-07' as date, 1 as val union all
select 1, '2022-02-08', 2 union all
select 1, '2022-02-09', 3 union all
select 1, '2022-02-10', 4 union all
select 1, '2022-02-11', 5 union all
select 1, '2022-02-12', 6 union all
select 1, '2022-02-13', 7 union all
select 2, '2022-02-14', 8 union all
select 2, '2022-02-15', 9 union all
select 2, '2022-02-16', 10 union all
select 2, '2022-02-17', 11 union all
select 2, '2022-02-18', 12 union all
select 2, '2022-02-19', 13 union all
select 2, '2022-02-20', 14
), timeframe as (
select date '2022-02-14' as date_start, date '2022-02-17' as date_stop
)
select date_start, date_stop,
sum(if(date between date_start and date_stop,val, 0)) as val_cur,
sum(if(date between date_start - 7 and date_stop - 7,val, 0)) as val_prev
from your_table, timeframe
group by date_start, date_stop
with output

Print the rows that have the largest combination across two columns (oracle)

Tell me how to display lines as in the example through window functions.
The algorithm is as follows:
Group by “clusterid”, which is not null. And if “issuedate” and “operdate” are equal in each section, then we display all lines with “Publid” for which there is the largest number of unique combinations “publid + inn”.
Example
|*inn*|*publid*|*clusterid*|*issuedate*|*operdate*|
|-----|--------|-----------|-----------|----------|
| 333 | 1 | 12 | 01-01-21 | 05-01-21 |
| 222 | 1 | 12 | 01-01-21 | 05-01-21 |
| 333 | 2 | 12 | 01-01-21 | 05-01-21 |
| 222 | 2 | 12 | 01-01-21 | 05-01-21 |
| 111 | 2 | 12 | 01-01-21 | 05-01-21 |
|-----|--------|-----------|-----------|----------|
Result
|*inn*|*publid*|*clusterid*|*issuedate*|*operdate*|
|-----|--------|-----------|-----------|----------|
| 333 | 2 | 12 | 01-01-21 | 05-01-21 |
| 222 | 2 | 12 | 01-01-21 | 05-01-21 |
| 111 | 2 | 12 | 01-01-21 | 05-01-21 |
|-----|--------|-----------|-----------|----------|
I've been thinking about how to write the code for a long time, but I can't. There is the following idea, but not entirely correct.
SELECT a.*
FROM (SELECT m.*, RANK() OVER (PARTITION BY clusterid order by issuedate desc, operdate desc, count(inn) desc) AS rn
FROM table as m
GROUP BY publid
WHERE clusterid is not null
) AS a
WHERE a.rn = 1
This is how I understood it:
SQL> with test (inn, publid, clusterid, issuedate, operdate) as
2 (select 333, 1, 12, date '2021-01-01', date '2021-01-05' from dual union all
3 select 222, 1, 12, date '2021-01-01', date '2021-01-05' from dual union all
4 select 333, 2, 12, date '2021-01-01', date '2021-01-05' from dual union all
5 select 222, 2, 12, date '2021-01-01', date '2021-01-05' from dual union all
6 select 111, 2, 12, date '2021-01-01', date '2021-01-05' from dual
7 ),
8 temp as
9 (select inn, publid, clusterid, issuedate, operdate,
10 row_number() over (partition by clusterid, inn order by publid desc) rn
11 from test
12 )
13 select inn, publid, clusterid, issuedate, operdate
14 from temp
15 where rn = 1;
INN PUBLID CLUSTERID ISSUEDAT OPERDATE
---------- ---------- ---------- -------- --------
111 2 12 01.01.21 05.01.21
222 2 12 01.01.21 05.01.21
333 2 12 01.01.21 05.01.21
SQL>

How to calculate sum of rows that is last 1-7, 8-14, 15-21 etc. by group in sql as separate columns [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I like to calculate sum of rows that is last 1-7, 8-14, 15-21 etc. by group in sql as separate columns:
Input Data:
Expected Result:
You can use analytic functions:
SELECT t.*,
CASE COUNT(*) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING
)
WHEN 7
THEN SUM( vol ) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING
)
END AS last7,
CASE COUNT(*) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN 7 FOLLOWING AND 13 FOLLOWING
)
WHEN 7
THEN SUM( vol ) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN 7 FOLLOWING AND 13 FOLLOWING
)
END AS last8_14
FROM table_name t
Which, for your sample data:
CREATE TABLE table_name ( rw, grp, vol ) AS
SELECT 1, 'A', 1 FROM DUAL UNION ALL
SELECT 2, 'A', 2 FROM DUAL UNION ALL
SELECT 3, 'A', 3 FROM DUAL UNION ALL
SELECT 4, 'A', 4 FROM DUAL UNION ALL
SELECT 5, 'A', 2 FROM DUAL UNION ALL
SELECT 6, 'A', 3 FROM DUAL UNION ALL
SELECT 7, 'A', 4 FROM DUAL UNION ALL
SELECT 8, 'A', 5 FROM DUAL UNION ALL
SELECT 9, 'A', 5 FROM DUAL UNION ALL
SELECT 10, 'A', 6 FROM DUAL UNION ALL
SELECT 11, 'A', 7 FROM DUAL UNION ALL
SELECT 12, 'A', 3 FROM DUAL UNION ALL
SELECT 13, 'A', 4 FROM DUAL UNION ALL
SELECT 14, 'A', 5 FROM DUAL UNION ALL
SELECT 15, 'A', 4 FROM DUAL;
Outputs:
RW | GRP | VOL | LAST7 | LAST8_14
-: | :-- | --: | ----: | -------:
1 | A | 1 | 19 | 35
2 | A | 2 | 23 | 34
3 | A | 3 | 26 | null
4 | A | 4 | 29 | null
5 | A | 2 | 32 | null
6 | A | 3 | 33 | null
7 | A | 4 | 34 | null
8 | A | 5 | 35 | null
9 | A | 5 | 34 | null
10 | A | 6 | null | null
11 | A | 7 | null | null
12 | A | 3 | null | null
13 | A | 4 | null | null
14 | A | 5 | null | null
15 | A | 4 | null | null
db<>fiddle here

Filter rows within an analytical function - Oracle

I need to filter rows within an analytical function (for example: lag).
Is there a way to do that efficiently without sub-query (It is a very large table)?
This is the table:
And the expected result should look like the following:
This looks like a typical use of analytic functions. In this case, I think a cumulative max() seems appropriate:
select t.*,
max(case when is_valid = 1 then date end) over
(partition by client_id
order by date
rows between unbounded preceding and 1 preceding
) as last_valid_session
from t;
It is hard to think of a more concise way to implement this logic, although lag() or last_value() could also be used:
select t.*,
lag(case when is_valid = 1 then date end ignore nulls) over
(partition by client_id
order by date
) as last_valid_session
from t;
Use LAG with IGNORE NULLS and a CASE expression to filter for only valid dates:
Oracle Setup:
CREATE TABLE test_data ( session_id, client_id, is_valid, "DATE" ) AS
SELECT 1, 11, 0, DATE '2018-01-01' FROM DUAL UNION ALL
SELECT 2, 22, 1, DATE '2018-01-02' FROM DUAL UNION ALL
SELECT 3, 33, 0, DATE '2018-01-03' FROM DUAL UNION ALL
SELECT 4, 11, 1, DATE '2018-01-04' FROM DUAL UNION ALL
SELECT 5, 22, 0, DATE '2018-01-05' FROM DUAL UNION ALL
SELECT 6, 33, 1, DATE '2018-01-06' FROM DUAL UNION ALL
SELECT 7, 11, 0, DATE '2018-01-07' FROM DUAL UNION ALL
SELECT 8, 22, 1, DATE '2018-01-08' FROM DUAL UNION ALL
SELECT 9, 33, 0, DATE '2018-01-09' FROM DUAL UNION ALL
SELECT 10, 11, 1, DATE '2018-01-10' FROM DUAL;
Query:
SELECT t.*,
LAG( CASE is_valid WHEN 1 THEN "DATE" END )
IGNORE NULLS
OVER ( PARTITION BY client_id ORDER BY "DATE" )
AS last_valid_session
FROM test_data t
ORDER BY session_id
Output:
SESSION_ID | CLIENT_ID | IS_VALID | DATE | LAST_VALID_SESSION
---------: | --------: | -------: | :-------- | :-----------------
1 | 11 | 0 | 01-JAN-18 | null
2 | 22 | 1 | 02-JAN-18 | null
3 | 33 | 0 | 03-JAN-18 | null
4 | 11 | 1 | 04-JAN-18 | null
5 | 22 | 0 | 05-JAN-18 | 02-JAN-18
6 | 33 | 1 | 06-JAN-18 | null
7 | 11 | 0 | 07-JAN-18 | 04-JAN-18
8 | 22 | 1 | 08-JAN-18 | 02-JAN-18
9 | 33 | 0 | 09-JAN-18 | 06-JAN-18
10 | 11 | 1 | 10-JAN-18 | 04-JAN-18
db<>fiddle here

Generate the rank/number if the difference between consecutive rows is less than 10 days

Need hive query that calculates the date difference for consecutive records but for the same txn type and generate same number if the difference is less than 10 else generate new number.
Input table
+--------+----------+-------------+
| Txn_id | Txn_type | Txn_date |
+--------+----------+-------------+
| 1 | T100 | 26-Aug-2015 |
| 2 | T100 | 03-Nov-2015 |
| 3 | T100 | 05-Dec-2015 |
| 4 | T100 | 08-Dec-2015 |
| 5 | T100 | 25-Jan-2016 |
| 6 | T111 | 26-Jan-2016 |
| 7 | T200 | 02-Feb-2016 |
| 8 | T200 | 07-May-2016 |
| 9 | T200 | 12-May-2016 |
| 10 | T200 | 20-May-2016 |
+--------+----------+-------------+
Expected output
+--------+----------+-------------+--------+
| Txn_id | Txn_type | Txn_date | Number |
+--------+----------+-------------+--------+
| 1 | T100 | 26-Aug-2015 | 1 |
| 2 | T100 | 03-Nov-2015 | 2 |
| 3 | T100 | 05-Dec-2015 | 3 |
| 4 | T100 | 08-Dec-2015 | 3 |
| 5 | T100 | 25-Jan-2016 | 4 |
| 6 | T111 | 26-Jan-2016 | 1 |
| 7 | T200 | 02-Feb-2016 | 1 |
| 8 | T200 | 07-May-2016 | 2 |
| 9 | T200 | 12-May-2016 | 2 |
| 10 | T200 | 20-May-2016 | 2 |
+--------+----------+-------------+--------+
Not sure if "less than 10 days" means strict or non-strict inequality, but otherwise:
with
inputs ( txn_id, txn_type, txn_date ) as (
select 1, 'T100', to_date('26-Aug-2015', 'dd-Mon-yy') from dual union all
select 2, 'T100', to_date('03-Nov-2015', 'dd-Mon-yy') from dual union all
select 3, 'T100', to_date('05-Dec-2015', 'dd-Mon-yy') from dual union all
select 4, 'T100', to_date('08-Dec-2015', 'dd-Mon-yy') from dual union all
select 5, 'T100', to_date('25-Jan-2016', 'dd-Mon-yy') from dual union all
select 6, 'T111', to_date('26-Jan-2016', 'dd-Mon-yy') from dual union all
select 7, 'T200', to_date('02-Feb-2016', 'dd-Mon-yy') from dual union all
select 8, 'T200', to_date('07-May-2016', 'dd-Mon-yy') from dual union all
select 9, 'T200', to_date('12-May-2016', 'dd-Mon-yy') from dual union all
select 10, 'T200', to_date('20-May-2016', 'dd-Mon-yy') from dual
),
prep ( txn_id, txn_type, txn_date, ct ) as (
select txn_id, txn_type, txn_date,
case when txn_date < lag(txn_date) over (partition by txn_type
order by txn_date) + 10 then 0 else 1 end
from inputs
)
select txn_id, txn_type, txn_date,
sum(ct) over (partition by txn_type order by txn_date) as number_
from prep;
I used number_ as a column name; don't use reserved Oracle words for table or column names unless your life depends on it, and not even then.
Use a common table expression to mark the rows that have a difference of more than 10 days and then count those to get the new number.
with test_data as (
SELECT 1 txn_id, 'T100' txn_type, to_date('26-AUG-2015','DD-MON-YYYY') txn_date from dual union all
SELECT 2 txn_id, 'T100', to_date('03-NOV-2015','DD-MON-YYYY') from dual union all
SELECT 3 txn_id, 'T100', to_date('05-DEC-2015','DD-MON-YYYY') from dual union all
SELECT 4 txn_id, 'T100', to_date('08-DEC-2015','DD-MON-YYYY') from dual union all
SELECT 5 txn_id, 'T100', to_date('25-JAN-2016','DD-MON-YYYY') from dual union all
SELECT 6 txn_id, 'T111', to_date('26-JAN-2016','DD-MON-YYYY') from dual union all
SELECT 7 txn_id, 'T200', to_date('02-FEB-2016','DD-MON-YYYY') from dual union all
SELECT 8 txn_id, 'T200', to_date('07-MAY-2016','DD-MON-YYYY') from dual union all
SELECT 9 txn_id, 'T200', to_date('12-MAY-2016','DD-MON-YYYY') from dual union all
SELECT 10 txn_id, 'T200', to_date('20-MAY-2016','DD-MON-YYYY') from dual),
markers as (
select td.*,
case when td.txn_date - nvl(lag(td.txn_date)
over ( partition by txn_type order by txn_id ), td.txn_date-9999) > 10
THEN 'Y' ELSE NULL end new_txn_marker from test_data td )
SELECT txn_id, txn_type,txn_date,
count(new_txn_marker) over ( partition by txn_type order by txn_id ) "NUMBER"
FROM markers;