Combine data in multiple rows in Oracle - sql

I have Oracle 12c so please answer my question based on using Oracle syntax. I want to combine data in multiple rows into 1 row. Please see expected result for an example.I tried using PIVOT function but it did not work for me because I want to PIVOT Call_day from previous row to latest row and want to have list of columns as shown in "Expected result" below. Thank you for your help.
Data in the table:
Acct_num Call_day Call_code Start_day_To_Call
1 04/23/2018 AA 04/02/2018
1 04/24/2018 NULL 04/02/2018
1 04/25/2018 CC 04/02/2018
2 04/26/2018 ZZ 05/02/2018
2 04/27/2018 CC 05/02/2018
If multiple calls made within Start_day_To_Call date then I want last 2 latest call pivot data as shown below:
Expected result:
Acct_num Call_day1 Call_day2 Call_code1 Call_code2 Start_day_To_Call
1 04/24/2018 04/25/2018 NULL CC 04/02/2018
2 04/26/2018 04/27/2018 ZZ CC 05/02/2018

If you want only two days you can use this query:
first you get last call for each acct_num and then find previous call and then fill data according to them. You can use an id to touch performance if needed.
select p.acct_num,
p.prev_last_day,
(select z.call_code
from test_tbl z
where z.acct_num = p.acct_num
and z.call_day = p.prev_last_day) prev_call_code,
last_day,
(select z.call_code
from test_tbl z
where z.acct_num = p.acct_num
and z.call_day = p.last_day) last_call_code,
p.start_day_to_call
from (select x.acct_num,
max(x.call_day) last_day,
max((select max(y.call_day)
from test_tbl y
where y.acct_num = x.acct_num
and y.call_day < x.call_day)) prev_last_day,
min(x.start_day_to_call) start_day_to_call
from test_tbl x
group by x.acct_num) p
order by p.acct_num

Related

Having a hard time building an aggregate SQL query

I am new at SQL and have a pretty good knowledge of basic stuff but I am stuck with my request.
My request gets me te following table (except for the last column on the right end side):
Team
Variable
Date
Value
Column_I_need_to_add
A
aa
2022/05/01
100
0
A
aa
2022/06/01
25
0
A
aa
2022/07/01
580
0
A
ad
2022/08/01
50
605
B
aa
2021/05/01
75
0
B
aa
2021/06/01
110
0
B
aa
2021/07/01
514
0
B
ad
2021/08/01
213
624
What I cannot turn my head around, is how to code for the last column that fills rows for the ad variable by summing values of the aa variables of the same team but only for the two months prior to the date of the ad variable.
Here is the script I have so far, that gets me the first four columns:
SELECT
team.Team,
Var.Variable,
TO_DATE(Var.Year||'-'||LPAD(Var.Month,2,'00')||'-'||'01','YYYY-MM-DD')AS Date ,
Var.value
FROM table1 as Var
join table2 as team
on Var.code=team.code
---This last join with table3 is only there to add other columns that are not relevant to this problem.
---join table3 as detail_var on Var.variable=detail_var.code_var
I was not content with the previous answer, with OUTER APPLY, as understood from further reading. So had to do a bit of further grinding and this is what I came up with (Now for Postgres 13).
It is cleaner and does the job in a conciser fashion. I've also added a FIDDLE LINK. If you want to see the previous answer please look at the edit versions.
SELECT
team.Team
,var.Variable
,var.Date
,var.value
,CASE
WHEN var.Variable='ad' THEN
(SELECT sum(value) FROM table1
WHERE
(TO_DATE(Year||'-'||LPAD(Month::varchar(2),2,'0')||'-'||'01','YYYY-MM-DD')
BETWEEN (var.Date - INTERVAL '2 month') AND var.Date)
AND Variable = 'aa'
AND code = var.code)
ELSE null
END as past2monthsValue
FROM (
-- this sub query to change Year & Month to Date Type Value
-- this Date Type Value (Date) will be used to compare dates
-- (var.Date) in the above sub-query
SELECT
code,
Variable,
TO_DATE(Year||'-'||LPAD(Month::varchar(2),2,'0')||'-'||'01','YYYY-MM-DD') AS Date,
value
FROM table1
) var
JOIN table2 AS team ON var.code=team.code

SQL/Power BI How to expand table according dates

I have a table like below, where a new record is created when there is a change in the status of a task.
task
status
last update
A
1
28/04/2022
A
3
01/05/2022
A
5
05/05/2022
B
1
28/04/2022
B
3
03/05/2022
B
4
05/05/2022
The problem is that I need to plot a graph within a time range, where I know the status of each item regardless of the date it was changed/created. With that, I think the easiest is to transform to the table below:
task
status
last update
A
1
28/04/2022
A
1
29/04/2022
A
1
28/04/2022
A
1
29/04/2022
A
1
30/04/2022
A
3
01/05/2022
A
3
02/05/2022
A
3
03/05/2022
A
3
04/05/2022
A
5
05/05/2022
B
1
28/04/2022
B
1
29/04/2022
B
1
30/04/2022
B
1
01/05/2022
B
1
02/05/2022
B
3
03/05/2022
B
3
04/05/2022
B
4
05/05/2022
However, I can't think of a way to do it, either directly in Power BI or even in SQL, since I'm connecting to a redshift database through a sql query.
Could you please help me?
Thanks
You can create the below visual using the standard line chart visualization. In the visualization settings, go to the "Shapes" menu and turn the "Stepped" view on.
While not necessary, it may be best practice to create a date dimension table with daily values spanning from the minimum update date to the maximum update date.
Dates = CALENDAR(MIN(Tasks[last update]),MAX(Tasks[last update]))
You can then create a one to many relationship between Dates and Tasks.
demo
very similar question: How to do forward fill as a PL/PGSQL function
I don't know the actual differences between amazon redshift and postgresql.
The demo is based on postgresql 14. It may not works on redshift.
Basic idea:for every distinct task, get the max, min last_updated date then use the generate_series function to expand the date based on task, task's min & max last_update. key point is first_value(status), because the once you expand the date, then obviously some date the status value is null, then use partition to fill the gap. If you want deep more, you can read manual: https://www.postgresql.org/docs/14/plpgsql.html
CREATE OR REPLACE FUNCTION test_expand ()
RETURNS TABLE (
_date1 date,
_first_ctask text,
_first_cstatus bigint
)
AS $$
DECLARE
distinct_task record;
max_last_update date;
min_last_update date;
_sql text;
BEGIN
FOR distinct_task IN SELECT DISTINCT
task
FROM
test_1
ORDER BY
1 LOOP
min_last_update := (
SELECT
min(last_update)
FROM
test_1
WHERE
task = distinct_task.task
LIMIT 1);
max_last_update := (
SELECT
max(last_update)
FROM
test_1
WHERE
task = distinct_task.task
LIMIT 1);
_sql := format($dml$ WITH cte AS (
SELECT
date1::date, $task$ % s$task $ AS _task, status, count(status) OVER (ORDER BY date1) AS c_s FROM (
SELECT
generate_series($a$ % s$a $::date, $b$ % s$b $::date, interval '1 day')) g (date1)
LEFT JOIN test_1 ON date1 = last_update)
SELECT
date1, _task, first_value(status) OVER (PARTITION BY c_s ORDER BY date1, status)
FROM cte $dml$, distinct_task.task, min_last_update, max_last_update);
RETURN query EXECUTE _sql;
END LOOP;
RETURN;
END;
$$
LANGUAGE plpgsql;

how to turn a wide table into a long table

I have a wide table that looks like this:
Case REFERENCE
OUTCOME_EMP_SITUATION
MONTH1_EMP_SITUATION
MONTH1_REASON
MONTH3_EMP_SITUATION
MONTH3_REASON
MONTH6_EMP_SITUATION
MONTH6_REASON
12345
Employed
Employed
Outcome at 1 month
Employed
Outcome at 3 month
Employed
Outcome at 6 month
this is survey results that people completed after they finished employment program. They complete the survey 4 times, once immediately after finishing the program, and then after 1/3/6 month. the problem is, the results for immediately after program completion are in one table (Outcome table) and the 1/3/6 month checkpoint results are in another table (Checkpointinfo table) I would like to combine those tables to create a long table so that instead of having "Outcome" in 5 different columns, I would have it in one column and it would look like this:
Case Reference
Outcome_emp_situation
Month_Reason
12345
Employed
NULL
12345
Employed
Outcome at 1 month
12345
Employed
Outcome at 3 month
12345
Employed
Outcome at 6 month
I was wondering if anyone could please help me out to turn this wide query into a long table query.
Here is the query for the wide table:
Select
ch.CASEREFERENCE, oc.OUTCOME_DATE, oc.OUTCOME_REFERENCE_ID, oc.OUTCOME_EMP_SITUATION, oc.OUTCOME_EMPLOYMENT_TYPE, oc.OUTCOME_NUM_JOBS, oc.OUTCOME_NAICS_DESC, oc.OUTCOME_JOB_NATURE,
oc.OUTCOME_WORK_HOURS, oc.OUTCOME_WAGE, oc.OUTCOME_STUDENT_STATUS, oc.OUTCOME_GOT_SERVICE, oc.OUTCOME_RIGHT_SERVICE, oc.OUTCOME_RECOMMEND_PROGRAM,
ck1.REASONCODE AS REASONCODE1,
CASE WHEN ck1.REASONCODE = 'OT1' THEN "Outcome at 1 month" END MONTH1_REASON,
ck1.MONTH_START_DATE AS MONTH1_START_DATE, ck1.MONTH_END_DATE AS MONTH1_END_DATE, ck1.MONTH_OUTCOME_EMP_SITUATION AS MONTH1_OUTCOME_EMP_SITUATION,
ck1.MONTH_EMPLOYMENT_TYPE AS MONTH1_EMPLOYMENT_TYPE, ck1.MONTH_NUM_JOBS AS ,MONTH1_NUM_JOBS, ck1.MONTH_NAICS_DESC AS MONTH1_NAICS_DESC, ck1.MONTH_JOB_NATURE AS MONTH1_JOB_NATURE,
ck1.MONTH_WORK_HOURS AS MONTH1_WORK_HOURS, ck1.MONTH_WAGE AS MONTH1_WAGE, ck1.MONTH_STUDENT_STATUS AS MONTH1_STUDENT_STATUS, ck1.MONTH_GOT_SERVICE AS MONTH1_GOT_SERVICE,
ck1.MONTH_RIGHT_SERVICE AS MONTH1_RIGHT_SERVICE, ck1.MONTH_RECOMMEND_PROGRAM AS MONTH1_RECOMMEND_PROGRAM, ck1.MONTH_RESUBMIT_MILESTONE AS MONTH1_RESUBMIT_MILESTONE,
ck1.MONTH_MILESTONE_ACHIEVED AS MONTH1_MILESTONE_ACHIEVED, ck1.MONTH_APPROVED_DATE AS MONTH1_APPROVED_DATE,
ck3.REASONCODE AS REASONCODE3,
CASE WHEN ck3.REASONCODE = 'OT3' THEN "Outcome at 3 month" END MONTH3_REASON,
ck3.MONTH_START_DATE AS MONTH3_START_DATE, ck3.MONTH_END_DATE AS MONTH3_END_DATE, ck3.MONTH_OUTCOME_EMP_SITUATION AS MONTH3_OUTCOME_EMP_SITUATION,
ck3.MONTH_EMPLOYMENT_TYPE AS MONTH3_EMPLOYMENT_TYPE, ck3.MONTH_NUM_JOBS AS ,MONTH3_NUM_JOBS, ck3.MONTH_NAICS_DESC AS MONTH3_NAICS_DESC, ck3.MONTH_JOB_NATURE AS MONTH3_JOB_NATURE,
ck3.MONTH_WORK_HOURS AS MONTH3_WORK_HOURS, ck3.MONTH_WAGE AS MONTH3_WAGE, ck3.MONTH_STUDENT_STATUS AS MONTH3_STUDENT_STATUS, ck3.MONTH_GOT_SERVICE AS MONTH3_GOT_SERVICE,
ck3.MONTH_RIGHT_SERVICE AS MONTH3_RIGHT_SERVICE, ck3.MONTH_RECOMMEND_PROGRAM AS MONTH3_RECOMMEND_PROGRAM, ck3.MONTH_RESUBMIT_MILESTONE AS MONTH3_RESUBMIT_MILESTONE,
ck3.MONTH_MILESTONE_ACHIEVED AS MONTH3_MILESTONE_ACHIEVED, ck3.MONTH_APPROVED_DATE AS MONTH3_APPROVED_DATE,
ck6.REASONCODE AS REASONCODE6,
CASE WHEN ck6.REASONCODE = 'OT6' THEN "Outcome at 6 month" END MONTH6_REASON,
ck6.MONTH_START_DATE AS MONTH6_START_DATE, ck6.MONTH_END_DATE AS MONTH6_END_DATE, ck6.MONTH_OUTCOME_EMP_SITUATION AS MONTH6_OUTCOME_EMP_SITUATION,
ck6.MONTH_EMPLOYMENT_TYPE AS MONTH6_EMPLOYMENT_TYPE, ck6.MONTH_NUM_JOBS AS ,MONTH6_NUM_JOBS, ck6.MONTH_NAICS_DESC AS MONTH6_NAICS_DESC, ck6.MONTH_JOB_NATURE AS MONTH6_JOB_NATURE,
ck6.MONTH_WORK_HOURS AS MONTH6_WORK_HOURS, ck6.MONTH_WAGE AS MONTH6_WAGE, ck6.MONTH_STUDENT_STATUS AS MONTH6_STUDENT_STATUS, ck6.MONTH_GOT_SERVICE AS MONTH6_GOT_SERVICE,
ck6.MONTH_RIGHT_SERVICE AS MONTH6_RIGHT_SERVICE, ck6.MONTH_RECOMMEND_PROGRAM AS MONTH6_RECOMMEND_PROGRAM, ck6.MONTH_RESUBMIT_MILESTONE AS MONTH6_RESUBMIT_MILESTONE,
ck6.MONTH_MILESTONE_ACHIEVED AS MONTH6_MILESTONE_ACHIEVED, ck6.MONTH_APPROVED_DATE AS MONTH6_APPROVED_DATE
FROM PROGRAM as pg
LEFT JOIN CASEINFO as ch ON pg.CASEID = ch.CASEID
LEFT JOIN OUTCOME as oc ON pg.CASEID = oc.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT1')ck1 ON pg.CASEID = ck1.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT3')ck3 ON pg.CASEID = ck3.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT6')ck6 ON pg.CASEID = ck6.CASEID
If someone could please help me turn this wide table into a long table, it would be much appreciated.
thank you
You need to do unpivot for outcome and reason columns. But first you need an extra column for overall reason. This is the query:
with a as (
select 12345 as case_reference,
'Employed' as OUTCOME_EMP_SITUATION,
'Employed' as MONTH1_EMP_SITUATION,
'Outcome at 1 month' as MONTH1_REASON,
'Employed' as MONTH3_EMP_SITUATION,
'Outcome at 3 month' as MONTH3_REASON,
'Employed' as MONTH6_EMP_SITUATION,
'Outcome at 6 month' as MONTH6_REASON
from dual
)
select
case_reference,
outcome_emp_situation,
month_reason
from (
select a.*,
cast(null as varchar2(1000)) as reason
from a
) a
unpivot(
(Outcome_emp_situation, Month_Reason)
for mon in (
(OUTCOME_EMP_SITUATION, reason) as 0,
(MONTH1_EMP_SITUATION, MONTH1_REASON) as 1,
(MONTH3_EMP_SITUATION, MONTH3_REASON) as 3,
(MONTH6_EMP_SITUATION, MONTH6_REASON) as 6
)
)
order by mon asc
CASE_REFERENCE | OUTCOME_EMP_SITUATION | MONTH_REASON
-------------: | :-------------------- | :-----------------
12345 | Employed | null
12345 | Employed | Outcome at 1 month
12345 | Employed | Outcome at 3 month
12345 | Employed | Outcome at 6 month
db<>fiddle here
UPD: The explanation below.
The tuple just after unpivot keyword is the result column names, column after for keyword identifies column group which produced that values. Tuples inside in define the columns' groups: for each group that columns' values will be passed to the corresponding (by position) columns of the result tuple and new row will be generated with the value of for column defined after as keyword.
So if you need more columns to be transferred to each row, you need to add new columns to the result tuple (after unpivot) and to each column group inside in. If for some reason you have not enough columns to pass for some groups, you can wrap your source query with outer select and add dummy (or constantly valued) columns for that groups.
Note:
Datatypes of each tuples should be the same (or convertible according to default datatype precedence). I.e. each tuple's member on the same position should have the same type, members at different positions may have different types.
You can reuse the same column in multiple groups and positions.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.

SQL update statement from a history table based on timestamp

I'm trying to write an update statement in Oracle that will find an attribute from a history table based on a timestamp. So, for example the data looks like:
TABLE A
A_ID TIMESTAMP ATTR
---------------------------------
1 5/27/2012 10:30:00 AM ?
TABLE B
B_ID A_ID TIMESTAMP ATTR
---------------------------------------
1 1 5/26/2012 9:01:08 AM W
2 1 5/27/2012 8:38:21 AM X
3 1 5/28/2012 9:01:01 AM Y
4 1 5/29/2012 11:37:54 PM Z
The lower bound is >= B.TIMESTAMP, but I'm not sure how to write the upper bound as < B."the next TIMESTAMP". So, in the example above the attribute on table A should update to "X".
This seems like a fairly common use case. I've seen this post, but it looks like a satisfactory answer was never reached, so I thought I'd post again.
UPDATE A SET attr = (
SELECT b1.attr
FROM B b1
INNER JOIN (
SELECT MAX(b3.timestamp) mx FROM B b3
WHERE b3.timestamp < A.timestamp
) b2 ON b1.timestamp = b2.mx
)
I can't remember if Oracle will allow me to use table A within the inner join sub query... Would you mind trying it?