Suppose I have the following table t_1 where every row represents a day:
+------+------------+-------+
| week | date | val |
+------+------------+-------+
| 1 | 2022-02-07 | 1 | <- Monday
| 1 | 2022-02-08 | 2 |
| 1 | 2022-02-09 | 3 |
| 1 | 2022-02-10 | 4 | <- Thursday
| 1 | 2022-02-11 | 5 |
| 1 | 2022-02-12 | 6 |
| 1 | 2022-02-13 | 7 |
| 2 | 2022-02-14 | 8 | <- Monday
| 2 | 2022-02-15 | 9 |
| 2 | 2022-02-16 | 10 |
| 2 | 2022-02-17 | 11 | <- Thursday
| 2 | 2022-02-18 | 12 |
| 2 | 2022-02-19 | 13 |
| 2 | 2022-02-20 | 14 |
+------+------------+-------+
How can I create the following table t2 from t1?
+------------+------------+-----------+------------+
| date_start | date_end | val_cur. | val_prev |
+------------+------------+-----------+------------+
| 2022-01-14 | 2022-01-17 | 38 | 10 |
+------------+------------+-----------+------------+
Here val_cur is defined as the sum of values of the current timeframe (i.e. the sum of values between date_start and date_end) and val_prev is defined as the sum of values of the previous timeframe (i.e. the current timeframe minus one week).
-- Bigquery Standard SQL
WITH t_1 AS
(SELECT 1 AS week, '2022-02-07' AS date, 1 AS val UNION ALL
SELECT 1, '2022-02-08', 2 UNION ALL
SELECT 1, '2022-02-09', 3 UNION ALL
SELECT 1, '2022-02-10', 4 UNION ALL
SELECT 1, '2022-02-11', 5 UNION ALL
SELECT 1, '2022-02-12', 6 UNION ALL
SELECT 1, '2022-02-13', 7 UNION ALL
SELECT 2, '2022-02-14', 8 UNION ALL
SELECT 2, '2022-02-15', 9 UNION ALL
SELECT 2, '2022-02-16', 10 UNION ALL
SELECT 2, '2022-02-17', 11 UNION ALL
SELECT 2, '2022-02-18', 12 UNION ALL
SELECT 2, '2022-02-19', 13 UNION ALL
SELECT 2, '2022-02-20', 14)
SELECT '2022-02-14' AS date_start, '2022-02-17' AS date_stop, sum(val) AS val_cur
FROM t_1
WHERE date >= '2022-02-14' AND date <= '2022-02-17'
Output:
+-----+------------+------------+---------+
| Row | date_start | date_stop | val_cur |
+-----+------------+------------+---------+
| 1 | 2022-02-14 | 2022-02-17 | 38 |
+-----+------------+------------+---------+
But how do I get the last column?
Consider below approach
with your_table as (
select 1 as week, date '2022-02-07' as date, 1 as val union all
select 1, '2022-02-08', 2 union all
select 1, '2022-02-09', 3 union all
select 1, '2022-02-10', 4 union all
select 1, '2022-02-11', 5 union all
select 1, '2022-02-12', 6 union all
select 1, '2022-02-13', 7 union all
select 2, '2022-02-14', 8 union all
select 2, '2022-02-15', 9 union all
select 2, '2022-02-16', 10 union all
select 2, '2022-02-17', 11 union all
select 2, '2022-02-18', 12 union all
select 2, '2022-02-19', 13 union all
select 2, '2022-02-20', 14
), timeframe as (
select date '2022-02-14' as date_start, date '2022-02-17' as date_stop
)
select date_start, date_stop,
sum(if(date between date_start and date_stop,val, 0)) as val_cur,
sum(if(date between date_start - 7 and date_stop - 7,val, 0)) as val_prev
from your_table, timeframe
group by date_start, date_stop
with output
Related
I have a table of bills with the following structure:
id | store_name | sum | payment_date
1 | Amazon | 10 | 11.05.2022
2 | Amazon | 20 | 11.05.2022
3 | Ebay | 15 | 11.05.2022
4 | AppleStore | 13 | 11.05.2022
5 | Google Play| 6 | 11.05.2022
What I need is to select all data from table and set additional field "Priority" based on a sum of bill. First 2 rows get priority 1, next 2 rows get priority 2, others get 0:
id | store_name | sum | payment_date | priority
2 | Amazon | 20 | 11.05.2022 | 1
3 | Ebay | 15 | 11.05.2022 | 1
4 | AppleStore | 13 | 11.05.2022 | 2
1 | Amazon | 10 | 11.05.2022 | 2
5 | Google Play| 6 | 11.05.2022 | 0
In addition table contains data about bills from various days (field payment_date) and this priority should be set based on data inside each single day.
Order the rows for each day and then assign priority based on the row number:
SELECT t.*,
CASE ROW_NUMBER()
OVER (PARTITION BY TRUNC(payment_date) ORDER BY sum DESC)
WHEN 1 THEN 1
WHEN 2 THEN 1
WHEN 3 THEN 2
WHEN 4 THEN 2
ELSE 0
END AS priority
FROM table_name t
Which, for the sample data:
CREATE TABLE table_name (id, store_name, sum, payment_date) AS
SELECT 1, 'Amazon', 10, DATE '2022-05-11' FROM DUAL UNION ALL
SELECT 2, 'Amazon', 20, DATE '2022-05-11' FROM DUAL UNION ALL
SELECT 3, 'Ebay', 15, DATE '2022-05-11' FROM DUAL UNION ALL
SELECT 4, 'Apple Store', 13, DATE '2022-05-11' FROM DUAL UNION ALL
SELECT 5, 'Google Play', 6, DATE '2022-05-11' FROM DUAL;
Outputs:
ID
STORE_NAME
SUM
PAYMENT_DATE
PRIORITY
2
Amazon
20
2022-05-11 00:00:00
1
3
Ebay
15
2022-05-11 00:00:00
1
4
Apple Store
13
2022-05-11 00:00:00
2
1
Amazon
10
2022-05-11 00:00:00
2
5
Google Play
6
2022-05-11 00:00:00
0
db<>fiddle here
Tell me how to display lines as in the example through window functions.
The algorithm is as follows:
Group by “clusterid”, which is not null. And if “issuedate” and “operdate” are equal in each section, then we display all lines with “Publid” for which there is the largest number of unique combinations “publid + inn”.
Example
|*inn*|*publid*|*clusterid*|*issuedate*|*operdate*|
|-----|--------|-----------|-----------|----------|
| 333 | 1 | 12 | 01-01-21 | 05-01-21 |
| 222 | 1 | 12 | 01-01-21 | 05-01-21 |
| 333 | 2 | 12 | 01-01-21 | 05-01-21 |
| 222 | 2 | 12 | 01-01-21 | 05-01-21 |
| 111 | 2 | 12 | 01-01-21 | 05-01-21 |
|-----|--------|-----------|-----------|----------|
Result
|*inn*|*publid*|*clusterid*|*issuedate*|*operdate*|
|-----|--------|-----------|-----------|----------|
| 333 | 2 | 12 | 01-01-21 | 05-01-21 |
| 222 | 2 | 12 | 01-01-21 | 05-01-21 |
| 111 | 2 | 12 | 01-01-21 | 05-01-21 |
|-----|--------|-----------|-----------|----------|
I've been thinking about how to write the code for a long time, but I can't. There is the following idea, but not entirely correct.
SELECT a.*
FROM (SELECT m.*, RANK() OVER (PARTITION BY clusterid order by issuedate desc, operdate desc, count(inn) desc) AS rn
FROM table as m
GROUP BY publid
WHERE clusterid is not null
) AS a
WHERE a.rn = 1
This is how I understood it:
SQL> with test (inn, publid, clusterid, issuedate, operdate) as
2 (select 333, 1, 12, date '2021-01-01', date '2021-01-05' from dual union all
3 select 222, 1, 12, date '2021-01-01', date '2021-01-05' from dual union all
4 select 333, 2, 12, date '2021-01-01', date '2021-01-05' from dual union all
5 select 222, 2, 12, date '2021-01-01', date '2021-01-05' from dual union all
6 select 111, 2, 12, date '2021-01-01', date '2021-01-05' from dual
7 ),
8 temp as
9 (select inn, publid, clusterid, issuedate, operdate,
10 row_number() over (partition by clusterid, inn order by publid desc) rn
11 from test
12 )
13 select inn, publid, clusterid, issuedate, operdate
14 from temp
15 where rn = 1;
INN PUBLID CLUSTERID ISSUEDAT OPERDATE
---------- ---------- ---------- -------- --------
111 2 12 01.01.21 05.01.21
222 2 12 01.01.21 05.01.21
333 2 12 01.01.21 05.01.21
SQL>
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
a 'job' table as follow
How do I find the total years of employment for all the employees?
+--------+-----------+-----------------+------------------+
| emplid | record | action | effect dt |
+--------+-----------+-----------------+------------------+
| 1 | 0 | terminate | 8/9/2010 |
| 1 | 0 | hire | 5/6/2006 |
| 1 | 0 | terminate | 3/4/2005 |
| 1 | 0 | hire | 1/1/2003 |
| 1 | 1 | hire | 1/1/2006 |
| 1 | 1 | terminate | 5/5/2004 |
| 1 | 1 | promote | 4/4/2003 |
| 1 | 1 | hire | 3/3/2002 |
| 1 | 1 | terminate | 2/2/2001 |
| 1 | 1 | hire | 1/1/2000 |
| 2 | 0 | rehire | 6/7/2013 |
| 2 | 0 | terminate | 5/6/2011 |
| 2 | 0 | rehire | 3/3/2010 |
| 2 | 0 | terminate | 2/2/2009 |
| 2 | 0 | hire | 1/1/2008 |
+--------+-----------+-----------------+------------------+
If you are using Oracle 12 or later, use MATCH_RECOGNIZE to find pairs of (re)hire/terminate entries and then use MONTHS_BETWEEN to find the duration then group and sum to get the total:
SELECT emplid,
record,
SUM(
MONTHS_BETWEEN(
COALESCE( terminate_dt, SYSDATE ),
hire_dt
) / 12
) AS hire_years
FROM job
MATCH_RECOGNIZE(
PARTITION BY emplid, record
ORDER BY effect_dt
MEASURES
FIRST( hire.effect_dt ) AS hire_dt,
LAST( terminate.effect_dt ) AS terminate_dt
ONE ROW PER MATCH
PATTERN ( hire changes* terminate? )
DEFINE
hire AS hire.action IN ( 'hire', 'rehire' ),
changes AS changes.action NOT IN ( 'hire', 'rehire', 'terminate' ),
terminate AS terminate.action IN ( 'terminate' )
)
GROUP BY
emplid,
record
(Assuming that if an employee has been hired but there is no later termination entry then they are still employed.)
Which, for the sample data:
CREATE TABLE job ( emplid, record, action, effect_dt ) AS
SELECT 1, 0, 'terminate', DATE '2010-09-08' FROM DUAL UNION ALL
SELECT 1, 0, 'hire', DATE '2006-06-05' FROM DUAL UNION ALL
SELECT 1, 0, 'terminate', DATE '2005-04-03' FROM DUAL UNION ALL
SELECT 1, 0, 'hire', DATE '2003-01-01' FROM DUAL UNION ALL
SELECT 1, 1, 'hire', DATE '2006-01-01' FROM DUAL UNION ALL
SELECT 1, 1, 'terminate', DATE '2004-05-05' FROM DUAL UNION ALL
SELECT 1, 1, 'promote', DATE '2003-04-04' FROM DUAL UNION ALL
SELECT 1, 1, 'hire', DATE '2002-03-03' FROM DUAL UNION ALL
SELECT 1, 1, 'terminate', DATE '2001-02-02' FROM DUAL UNION ALL
SELECT 1, 1, 'hire', DATE '2000-01-01' FROM DUAL UNION ALL
SELECT 2, 0, 'rehire', DATE '2013-07-06' FROM DUAL UNION ALL
SELECT 2, 0, 'terminate', DATE '2011-06-05' FROM DUAL UNION ALL
SELECT 2, 0, 'rehire', DATE '2010-03-03' FROM DUAL UNION ALL
SELECT 2, 0, 'terminate', DATE '2009-02-02' FROM DUAL UNION ALL
SELECT 2, 0, 'hire', DATE '2008-01-01' FROM DUAL;
Outputs:
EMPLID | RECORD | HIRE_YEARS
-----: | -----: | ----------------------------------------:
1 | 0 | 6.51344086021505376344086021505376344086
1 | 1 | 18.18784323974512146555157307845479888494
2 | 0 | 9.75773571286340103544404619673436877738
db<>fiddle here
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I like to calculate sum of rows that is last 1-7, 8-14, 15-21 etc. by group in sql as separate columns:
Input Data:
Expected Result:
You can use analytic functions:
SELECT t.*,
CASE COUNT(*) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING
)
WHEN 7
THEN SUM( vol ) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING
)
END AS last7,
CASE COUNT(*) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN 7 FOLLOWING AND 13 FOLLOWING
)
WHEN 7
THEN SUM( vol ) OVER (
PARTITION BY grp
ORDER BY rw
ROWS BETWEEN 7 FOLLOWING AND 13 FOLLOWING
)
END AS last8_14
FROM table_name t
Which, for your sample data:
CREATE TABLE table_name ( rw, grp, vol ) AS
SELECT 1, 'A', 1 FROM DUAL UNION ALL
SELECT 2, 'A', 2 FROM DUAL UNION ALL
SELECT 3, 'A', 3 FROM DUAL UNION ALL
SELECT 4, 'A', 4 FROM DUAL UNION ALL
SELECT 5, 'A', 2 FROM DUAL UNION ALL
SELECT 6, 'A', 3 FROM DUAL UNION ALL
SELECT 7, 'A', 4 FROM DUAL UNION ALL
SELECT 8, 'A', 5 FROM DUAL UNION ALL
SELECT 9, 'A', 5 FROM DUAL UNION ALL
SELECT 10, 'A', 6 FROM DUAL UNION ALL
SELECT 11, 'A', 7 FROM DUAL UNION ALL
SELECT 12, 'A', 3 FROM DUAL UNION ALL
SELECT 13, 'A', 4 FROM DUAL UNION ALL
SELECT 14, 'A', 5 FROM DUAL UNION ALL
SELECT 15, 'A', 4 FROM DUAL;
Outputs:
RW | GRP | VOL | LAST7 | LAST8_14
-: | :-- | --: | ----: | -------:
1 | A | 1 | 19 | 35
2 | A | 2 | 23 | 34
3 | A | 3 | 26 | null
4 | A | 4 | 29 | null
5 | A | 2 | 32 | null
6 | A | 3 | 33 | null
7 | A | 4 | 34 | null
8 | A | 5 | 35 | null
9 | A | 5 | 34 | null
10 | A | 6 | null | null
11 | A | 7 | null | null
12 | A | 3 | null | null
13 | A | 4 | null | null
14 | A | 5 | null | null
15 | A | 4 | null | null
db<>fiddle here
Need hive query that calculates the date difference for consecutive records but for the same txn type and generate same number if the difference is less than 10 else generate new number.
Input table
+--------+----------+-------------+
| Txn_id | Txn_type | Txn_date |
+--------+----------+-------------+
| 1 | T100 | 26-Aug-2015 |
| 2 | T100 | 03-Nov-2015 |
| 3 | T100 | 05-Dec-2015 |
| 4 | T100 | 08-Dec-2015 |
| 5 | T100 | 25-Jan-2016 |
| 6 | T111 | 26-Jan-2016 |
| 7 | T200 | 02-Feb-2016 |
| 8 | T200 | 07-May-2016 |
| 9 | T200 | 12-May-2016 |
| 10 | T200 | 20-May-2016 |
+--------+----------+-------------+
Expected output
+--------+----------+-------------+--------+
| Txn_id | Txn_type | Txn_date | Number |
+--------+----------+-------------+--------+
| 1 | T100 | 26-Aug-2015 | 1 |
| 2 | T100 | 03-Nov-2015 | 2 |
| 3 | T100 | 05-Dec-2015 | 3 |
| 4 | T100 | 08-Dec-2015 | 3 |
| 5 | T100 | 25-Jan-2016 | 4 |
| 6 | T111 | 26-Jan-2016 | 1 |
| 7 | T200 | 02-Feb-2016 | 1 |
| 8 | T200 | 07-May-2016 | 2 |
| 9 | T200 | 12-May-2016 | 2 |
| 10 | T200 | 20-May-2016 | 2 |
+--------+----------+-------------+--------+
Not sure if "less than 10 days" means strict or non-strict inequality, but otherwise:
with
inputs ( txn_id, txn_type, txn_date ) as (
select 1, 'T100', to_date('26-Aug-2015', 'dd-Mon-yy') from dual union all
select 2, 'T100', to_date('03-Nov-2015', 'dd-Mon-yy') from dual union all
select 3, 'T100', to_date('05-Dec-2015', 'dd-Mon-yy') from dual union all
select 4, 'T100', to_date('08-Dec-2015', 'dd-Mon-yy') from dual union all
select 5, 'T100', to_date('25-Jan-2016', 'dd-Mon-yy') from dual union all
select 6, 'T111', to_date('26-Jan-2016', 'dd-Mon-yy') from dual union all
select 7, 'T200', to_date('02-Feb-2016', 'dd-Mon-yy') from dual union all
select 8, 'T200', to_date('07-May-2016', 'dd-Mon-yy') from dual union all
select 9, 'T200', to_date('12-May-2016', 'dd-Mon-yy') from dual union all
select 10, 'T200', to_date('20-May-2016', 'dd-Mon-yy') from dual
),
prep ( txn_id, txn_type, txn_date, ct ) as (
select txn_id, txn_type, txn_date,
case when txn_date < lag(txn_date) over (partition by txn_type
order by txn_date) + 10 then 0 else 1 end
from inputs
)
select txn_id, txn_type, txn_date,
sum(ct) over (partition by txn_type order by txn_date) as number_
from prep;
I used number_ as a column name; don't use reserved Oracle words for table or column names unless your life depends on it, and not even then.
Use a common table expression to mark the rows that have a difference of more than 10 days and then count those to get the new number.
with test_data as (
SELECT 1 txn_id, 'T100' txn_type, to_date('26-AUG-2015','DD-MON-YYYY') txn_date from dual union all
SELECT 2 txn_id, 'T100', to_date('03-NOV-2015','DD-MON-YYYY') from dual union all
SELECT 3 txn_id, 'T100', to_date('05-DEC-2015','DD-MON-YYYY') from dual union all
SELECT 4 txn_id, 'T100', to_date('08-DEC-2015','DD-MON-YYYY') from dual union all
SELECT 5 txn_id, 'T100', to_date('25-JAN-2016','DD-MON-YYYY') from dual union all
SELECT 6 txn_id, 'T111', to_date('26-JAN-2016','DD-MON-YYYY') from dual union all
SELECT 7 txn_id, 'T200', to_date('02-FEB-2016','DD-MON-YYYY') from dual union all
SELECT 8 txn_id, 'T200', to_date('07-MAY-2016','DD-MON-YYYY') from dual union all
SELECT 9 txn_id, 'T200', to_date('12-MAY-2016','DD-MON-YYYY') from dual union all
SELECT 10 txn_id, 'T200', to_date('20-MAY-2016','DD-MON-YYYY') from dual),
markers as (
select td.*,
case when td.txn_date - nvl(lag(td.txn_date)
over ( partition by txn_type order by txn_id ), td.txn_date-9999) > 10
THEN 'Y' ELSE NULL end new_txn_marker from test_data td )
SELECT txn_id, txn_type,txn_date,
count(new_txn_marker) over ( partition by txn_type order by txn_id ) "NUMBER"
FROM markers;