Merge mutiple rows into single row with json column - sql

I want to merge the rows into a single row with JSON column in SQL Server
Source table:
id
studyid
eye
measurements
subjectref
type
source
dateOn
1
1s
left
{"a":1,"b":2}
1reference
k
abc
10/12/2022
2
1s
left
{"c":"1a"}
1reference
o
abc
10/12/2022
3
1s
left
{"d":2}
1reference
or
abc
10/12/2022
4
2s
left
{"a":1,"b":2}
1reference
k
abc
01/11/2022
Desired output:
studyId
eye
measurements
subjectref
source
dateOn
1s
left
{"a":1,"b":2,"c":"1a","d":2}
1reference
abc
10/12/2022
2s
left
{"a":1,"b:2"}
1reference
abc
01/11/2022
Can you please help with this?

If (as discussed in your deleted previous question) the JSON is simple as shown and there is no possibility of needing to merge different rows having the same key you can just do this with string concatenation. DB Fiddle
The TRIM removes the opening and closing braces, from each measurements document. The ones in each group get concatenated together with a comma and then opening and closing braces are appended onto the result.
SELECT studyId,
eye,
measurements = '{' + STRING_AGG(TRIM('{}' FROM measurements), ',') + '}',
subjectRef,
source,
dateOn
FROM YourTable
GROUP BY studyId,
eye,
subjectRef,
source,
dateOn

Related

how to turn a wide table into a long table

I have a wide table that looks like this:
Case REFERENCE
OUTCOME_EMP_SITUATION
MONTH1_EMP_SITUATION
MONTH1_REASON
MONTH3_EMP_SITUATION
MONTH3_REASON
MONTH6_EMP_SITUATION
MONTH6_REASON
12345
Employed
Employed
Outcome at 1 month
Employed
Outcome at 3 month
Employed
Outcome at 6 month
this is survey results that people completed after they finished employment program. They complete the survey 4 times, once immediately after finishing the program, and then after 1/3/6 month. the problem is, the results for immediately after program completion are in one table (Outcome table) and the 1/3/6 month checkpoint results are in another table (Checkpointinfo table) I would like to combine those tables to create a long table so that instead of having "Outcome" in 5 different columns, I would have it in one column and it would look like this:
Case Reference
Outcome_emp_situation
Month_Reason
12345
Employed
NULL
12345
Employed
Outcome at 1 month
12345
Employed
Outcome at 3 month
12345
Employed
Outcome at 6 month
I was wondering if anyone could please help me out to turn this wide query into a long table query.
Here is the query for the wide table:
Select
ch.CASEREFERENCE, oc.OUTCOME_DATE, oc.OUTCOME_REFERENCE_ID, oc.OUTCOME_EMP_SITUATION, oc.OUTCOME_EMPLOYMENT_TYPE, oc.OUTCOME_NUM_JOBS, oc.OUTCOME_NAICS_DESC, oc.OUTCOME_JOB_NATURE,
oc.OUTCOME_WORK_HOURS, oc.OUTCOME_WAGE, oc.OUTCOME_STUDENT_STATUS, oc.OUTCOME_GOT_SERVICE, oc.OUTCOME_RIGHT_SERVICE, oc.OUTCOME_RECOMMEND_PROGRAM,
ck1.REASONCODE AS REASONCODE1,
CASE WHEN ck1.REASONCODE = 'OT1' THEN "Outcome at 1 month" END MONTH1_REASON,
ck1.MONTH_START_DATE AS MONTH1_START_DATE, ck1.MONTH_END_DATE AS MONTH1_END_DATE, ck1.MONTH_OUTCOME_EMP_SITUATION AS MONTH1_OUTCOME_EMP_SITUATION,
ck1.MONTH_EMPLOYMENT_TYPE AS MONTH1_EMPLOYMENT_TYPE, ck1.MONTH_NUM_JOBS AS ,MONTH1_NUM_JOBS, ck1.MONTH_NAICS_DESC AS MONTH1_NAICS_DESC, ck1.MONTH_JOB_NATURE AS MONTH1_JOB_NATURE,
ck1.MONTH_WORK_HOURS AS MONTH1_WORK_HOURS, ck1.MONTH_WAGE AS MONTH1_WAGE, ck1.MONTH_STUDENT_STATUS AS MONTH1_STUDENT_STATUS, ck1.MONTH_GOT_SERVICE AS MONTH1_GOT_SERVICE,
ck1.MONTH_RIGHT_SERVICE AS MONTH1_RIGHT_SERVICE, ck1.MONTH_RECOMMEND_PROGRAM AS MONTH1_RECOMMEND_PROGRAM, ck1.MONTH_RESUBMIT_MILESTONE AS MONTH1_RESUBMIT_MILESTONE,
ck1.MONTH_MILESTONE_ACHIEVED AS MONTH1_MILESTONE_ACHIEVED, ck1.MONTH_APPROVED_DATE AS MONTH1_APPROVED_DATE,
ck3.REASONCODE AS REASONCODE3,
CASE WHEN ck3.REASONCODE = 'OT3' THEN "Outcome at 3 month" END MONTH3_REASON,
ck3.MONTH_START_DATE AS MONTH3_START_DATE, ck3.MONTH_END_DATE AS MONTH3_END_DATE, ck3.MONTH_OUTCOME_EMP_SITUATION AS MONTH3_OUTCOME_EMP_SITUATION,
ck3.MONTH_EMPLOYMENT_TYPE AS MONTH3_EMPLOYMENT_TYPE, ck3.MONTH_NUM_JOBS AS ,MONTH3_NUM_JOBS, ck3.MONTH_NAICS_DESC AS MONTH3_NAICS_DESC, ck3.MONTH_JOB_NATURE AS MONTH3_JOB_NATURE,
ck3.MONTH_WORK_HOURS AS MONTH3_WORK_HOURS, ck3.MONTH_WAGE AS MONTH3_WAGE, ck3.MONTH_STUDENT_STATUS AS MONTH3_STUDENT_STATUS, ck3.MONTH_GOT_SERVICE AS MONTH3_GOT_SERVICE,
ck3.MONTH_RIGHT_SERVICE AS MONTH3_RIGHT_SERVICE, ck3.MONTH_RECOMMEND_PROGRAM AS MONTH3_RECOMMEND_PROGRAM, ck3.MONTH_RESUBMIT_MILESTONE AS MONTH3_RESUBMIT_MILESTONE,
ck3.MONTH_MILESTONE_ACHIEVED AS MONTH3_MILESTONE_ACHIEVED, ck3.MONTH_APPROVED_DATE AS MONTH3_APPROVED_DATE,
ck6.REASONCODE AS REASONCODE6,
CASE WHEN ck6.REASONCODE = 'OT6' THEN "Outcome at 6 month" END MONTH6_REASON,
ck6.MONTH_START_DATE AS MONTH6_START_DATE, ck6.MONTH_END_DATE AS MONTH6_END_DATE, ck6.MONTH_OUTCOME_EMP_SITUATION AS MONTH6_OUTCOME_EMP_SITUATION,
ck6.MONTH_EMPLOYMENT_TYPE AS MONTH6_EMPLOYMENT_TYPE, ck6.MONTH_NUM_JOBS AS ,MONTH6_NUM_JOBS, ck6.MONTH_NAICS_DESC AS MONTH6_NAICS_DESC, ck6.MONTH_JOB_NATURE AS MONTH6_JOB_NATURE,
ck6.MONTH_WORK_HOURS AS MONTH6_WORK_HOURS, ck6.MONTH_WAGE AS MONTH6_WAGE, ck6.MONTH_STUDENT_STATUS AS MONTH6_STUDENT_STATUS, ck6.MONTH_GOT_SERVICE AS MONTH6_GOT_SERVICE,
ck6.MONTH_RIGHT_SERVICE AS MONTH6_RIGHT_SERVICE, ck6.MONTH_RECOMMEND_PROGRAM AS MONTH6_RECOMMEND_PROGRAM, ck6.MONTH_RESUBMIT_MILESTONE AS MONTH6_RESUBMIT_MILESTONE,
ck6.MONTH_MILESTONE_ACHIEVED AS MONTH6_MILESTONE_ACHIEVED, ck6.MONTH_APPROVED_DATE AS MONTH6_APPROVED_DATE
FROM PROGRAM as pg
LEFT JOIN CASEINFO as ch ON pg.CASEID = ch.CASEID
LEFT JOIN OUTCOME as oc ON pg.CASEID = oc.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT1')ck1 ON pg.CASEID = ck1.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT3')ck3 ON pg.CASEID = ck3.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT6')ck6 ON pg.CASEID = ck6.CASEID
If someone could please help me turn this wide table into a long table, it would be much appreciated.
thank you
You need to do unpivot for outcome and reason columns. But first you need an extra column for overall reason. This is the query:
with a as (
select 12345 as case_reference,
'Employed' as OUTCOME_EMP_SITUATION,
'Employed' as MONTH1_EMP_SITUATION,
'Outcome at 1 month' as MONTH1_REASON,
'Employed' as MONTH3_EMP_SITUATION,
'Outcome at 3 month' as MONTH3_REASON,
'Employed' as MONTH6_EMP_SITUATION,
'Outcome at 6 month' as MONTH6_REASON
from dual
)
select
case_reference,
outcome_emp_situation,
month_reason
from (
select a.*,
cast(null as varchar2(1000)) as reason
from a
) a
unpivot(
(Outcome_emp_situation, Month_Reason)
for mon in (
(OUTCOME_EMP_SITUATION, reason) as 0,
(MONTH1_EMP_SITUATION, MONTH1_REASON) as 1,
(MONTH3_EMP_SITUATION, MONTH3_REASON) as 3,
(MONTH6_EMP_SITUATION, MONTH6_REASON) as 6
)
)
order by mon asc
CASE_REFERENCE | OUTCOME_EMP_SITUATION | MONTH_REASON
-------------: | :-------------------- | :-----------------
12345 | Employed | null
12345 | Employed | Outcome at 1 month
12345 | Employed | Outcome at 3 month
12345 | Employed | Outcome at 6 month
db<>fiddle here
UPD: The explanation below.
The tuple just after unpivot keyword is the result column names, column after for keyword identifies column group which produced that values. Tuples inside in define the columns' groups: for each group that columns' values will be passed to the corresponding (by position) columns of the result tuple and new row will be generated with the value of for column defined after as keyword.
So if you need more columns to be transferred to each row, you need to add new columns to the result tuple (after unpivot) and to each column group inside in. If for some reason you have not enough columns to pass for some groups, you can wrap your source query with outer select and add dummy (or constantly valued) columns for that groups.
Note:
Datatypes of each tuples should be the same (or convertible according to default datatype precedence). I.e. each tuple's member on the same position should have the same type, members at different positions may have different types.
You can reuse the same column in multiple groups and positions.

Concatenating codes to obtain sum

I've been for tha past 2 days trying to solve this problem but can't even seem to find the right terms to google it.
I have 3 tables.
This one, with client codes that changed:
ActualCode=111111111 PreviousCode=44444444
And these two tables with value 1 and value 2:
PreviousCode=11111111, Value1= 50,00, Value2= 0,00
ActualCode=44444444 , Value1= 0,00, Value2 = 50,00
I need to sum the values for each relation of Previous and Actual codes from the first table.
I.E.
For
ActualCode=11111111, PreviousCode=44444444
I need to be able to get:
Code=11111111 Value1=50,00 Value2=50,00
Looking forward for your answer :D
Thanks,
P
You can join the tables and sum the values:
select c.actualcode,
sum(ac.value1) + sum(pc.value1) as value1,
sum(ac.value2) + sum(pc.value2) as value2
from codes c
join actualcodes ac on c.actualcode = ac.actualcode
join previouscodes pc on c.previouscode = pc.previouscode
group by c.actualcode;
Rextester Demo
If you could have values in the main table that don't have corresponding rows in the values tables, then you should use outer joins instead.

DB2 concatenate unique column values into one row, comma seperated

Two tables:
Parts Table:
Part_Number Load_Date TQTY
m-123 19940102 32
1234Cf 20010809 3
wf9-2 20160421 14
Locations Table:
PartNo Condition Location QTY
m-123 U A02 2
1234Cf S A02 3
m-123 U B01 1
wf9-2 S A06 7
m-123 S A18 29
wf9-2 U F16 7
Result:
Part_Number Load_Date TQTY U_LOC UQTY S_LOC SQTY
m-123 19940102 32 A02,B01 3 A18 29
1234Cf 20010809 3 A02 3
wf9-2 20160421 14 F16 7 A06 7
I am having trouble finding a solution to this with my current DB2 version. I am not completely sure how to find the version, but it is running on an AS400 system, and it seems the version of DB2, is tied to the OS version. Which the box is using: Operating system: i5/OS Version: V5R4M0
(I tried some commands to get the DB2 version using these suggestions Here but none of them worked, like most stated).
In regards to concatenating multiple rows of column data into one row I have come across many articles stating to use XMLAGG or xmlserialize, Here and, Here but I get an error stating the command is not recognized.
Not sure where to go from here, as there seem to be solutions, but I can't get those already suggested functions to work.
EDIT:
Using the accepted answer and explanation, as well as the example
HERE to get a basic idea of recursion with a simple example, and it was
HERE using the "SELECT rownumber() over(partition by category)" statements that really helped pull it all together. Once I understood that statement of course.
I also learned to make sure the data used in the recursion is as narrowed down as possible and then joined up with extra data later. This makes for exponentially faster results. <-- This seems pretty obvious, but when trying to figure all of this out, it wasn't obvious and my query was pretty slow. Once I understood what was actually happening better it was easier to make adjustments for really fast results.
This is rather complicated, so I will show all my work:
Table definitions
create table parts
(part_number Varchar(64),
load_date Date,
total_qty Dec(5,0));
create table locations
(part_number Varchar(64),
condition Char(1),
location Char(3),
qty Dec(5,0));
insert into parts
values ('m-123', '1994-01-02', 32),
('1234Cf', '2001-08-09', 3),
('wf9-2', '2016-04-21', 14);
insert into locations
values ('m-123', 'U', 'A02', 2),
('1234Cf', 'S', 'A02', 3),
('m-123', 'U', 'B01', 1),
('wf9-2', 'S', 'A06', 7),
('m-123', 'S', 'A18', 29),
('wf9-2', 'U', 'F16', 7);
The query:
with -- CTE's
-- This collects locations into a comma seperated list
tmp (part_number, condition, location, csv, level) as (
select part_number, condition, min(location),
cast(min(location) as varchar(128)), 1
from locations
group by part_number, condition
union all
select a.part_number, a.condition, b.location,
a.csv || ',' || b.location, a.level + 1
from tmp a
join locations b using (part_number, condition)
where a.csv not like '%' || b.location || '%'
and b.location > a.location),
-- This chooses the correct csv list, and adds quantity for the condition
tmp2 (part_number, condition, csv, qty) as (
select t.part_number, t.condition, t.csv,
(select sum(qty) qty
from locations
where part_number = t.part_number
and condition = t.condition)
from tmp t
where level = (select max(level)
from tmp
where part_number = t.part_number
and condition = t.condition))
-- This is the final select that combines the parts file with
-- the second stage CTE and arranges things horizontally by condition
select p.part_number, p.load_date,
(select sum(qty)
from locations
where part_number = p.part_number) as total_qty,
coalesce(u.csv, '') as u_loc,
coalesce(u.qty, 0) as uqty,
coalesce(s.csv, '') as s_loc,
coalesce(s.qty, 0) as sqty
from parts p
left outer join tmp2 u
on u.part_number = p.part_number and u.condition = 'U'
left outer join tmp2 s
on s.part_number = p.part_number and s.condition = 'S'
order by p.load_date;
EDIT I have had to add some extra bits in here to support more than two locations for a part/condition, and I have made the column naming in the CTEs more consistent. Ok, so let me explain this a bit, there are 3 parts to this quety, 2 CTEs and the query, you can see the three parts are separated by comments. The first CTE is a recursive CTE. It's purpose is to produce the comma separated location list. You should be able to run the select by itself to see just what it does. tmp is the table name, part_number, condition, csv, and level are the column names. A recursive CTE needs a SELECT to prime the CTE and a UNION ALL with a SELECT that fills in the next details. In this case the priming SELECT retrieves a part number, a condition, and the first location (alphabetically) for that combination. level is set to 1. If you run just the priming select, you will get:
part_number condition location csv level
----------- --------- -------- --- -----
1234Cf S A01 A02 1
m-123 S A18 A18 1
m-123 U A02 A02 1
wf9-2 U F16 F16 1
wf9-2 S A06 A06 1
Note one line per part/condition. The remainder of the recursive CTE will fill in the remaining locations in csv, but it will actually add additional records so we need to filter the results here and later. So records are processed as they are added. The first rows listed above are joined with the location file
on part_number and condition. Note in the priming select I have a cast of the second min(location) to a varchar(128). This leaves room for the CSV column to expand. Without this, it will still expand, but not enough to hold more than 2 locations.
The second select in the recursive CTE concatenates a comma and the next location to the end of CSV. The specific bit that does this is a.csv || ',' || b.location. It also increments the level column. This helps us keep track of where we are in the query. Eventually, the row with the highest level is the one we want to use. We also have a way to end the recursion, and some filters to reduce the number of rows added to the temporary result set. If we have 2 locations, A02 and B02, left unchecked, we will get the following rows: A02, A02,A02, A02,B02, A02,A02,A02, A02,B02,A02, A02,A02,B02, A02,B02,B02, ... ad infinitum. The anti-duplication filter where a.csv not like '%' || b.location || '%' is sufficient for two locations to end the recursion, and minimize rows, like above, for locations A02 and B02, with the anti-duplication filter, we will get rows A02, and A02,B02. Note that none of the other results from the first example with duplicate locations are returned. Adding a third location C02 will yield, with anti-duplication filter, the following rows: A02, A02,B02, A02,C02, A02,B02,C02, A02,C02,B02. No duplicates here, but we do have redundant rows, and as you add locations, it gets worse. This is where we need a way to detect these redundant rows. Since we are starting with the lowest location number, we can always make sure that locations added to CSV are greater than the previously added location. To do that all we need to do is include a column in the result set that indicates which column was added (we could interrogate CSV, but that is harder). This is why we need the location column in tmp. Then we can write filter b.location > a.location. In the above 3 location example, this filter prevents row A02,C02,B02 leaving just a single row with all three locations. Adding more than three locations to the locations file will cause the number of rows to expand even more in TMP, but for each part and condition, there will only be one row with all locations, and it will contain all locations in ascending order.
The second CTE does two things. First, it filters TMP to drop all but the rows containing all locations for a given part/condition. Second, it accumulates the total quantity for each part/condition.
The bit that performs the filtering is in the where clause:
where level = (select max(level)
from tmp
where part_number = t.part_number
and condition = t.condition))
Pretty straight forward. The bit that accumulates the total quantity for a part/condition is also an easy to understand sub-query:
(select sum(qty) qty
from locations
where part_number = t.part_number
and condition = t.condition)
The final piece of this monster query is the main select. It joins the parts file with the results of the second CTE to form the ultimate result set:
select p.part_number, p.load_date,
(select sum(qty) from locations where part_number = p.part_number) as total_qty,
coalesce(u.csv, '') as u_loc, coalesce(u.qty, 0) as uqty,
coalesce(s.csv, '') as s_loc, coalesce(s.qty, 0) as sqty
from parts p
left outer join tmp2 u
on u.part_number = p.part_number and u.condition = 'U'
left outer join tmp2 s
on s.part_number = p.part_number and s.condition = 'S'
order by p.load_date
Bits of note are the subquery to retrieve the total quantity from the locations table. You could use the tqty field in parts, but that can get out of sync with the actual quantities in the locations table. In addition there are two left outer joins with tmp2, one for condition U, and another for condition S. These construct the horizontal array of Location/Quantity in the result row. The last thing is the coalesce functions. These give null values (when a result from an outer join is missing) a default value.
End of EDIT
The final result is:
part_number load_date tqty u_loc uqty s_loc sqty
----------- ---------- ---- ------- ---- ----- ----
m-123 1994-01-02 32 A02,B01 3 A18 29
1234Cf 2001-08-09 3 0 A02 3
wf9-2 2016-04-21 14 F16 7 A06 7
Note XMLAGG and XMLSERIALIZE became available at DB2 for i v7.1 and LISTAGG became available at DB2 for i v7.2. Most recent version as of 8/9/2017 is v7.3. As you are on v5r4, it is likely you will need not only a software, but also a hardware upgrade to get current.
No idea what the rules are for UQTY, S_LOC, SQTY but here is the column you asked about ---
SELECT
P.Part_Number,
P.Load_Date,
P.TQTY,
LISTAGG(L.Location, ', ') WITHIN GROUP (ORDER BY L.Location) AS U_LOC
FROM "Parts Table" AS P
LEFT JOIN "Locations Table" AS L ON P.Part_Number = L.Part_Number
GROUP BY P.Part_Number, P.Load_Date, P.TQTY

Postgres complicated SQL query

I'm trying to create a query through Postgres. I have a table like this:
Date----------|Meter-----|------period-----|-----value
2017-05-23-----meter1 ---------- 1 ------------ 2.01
2017-05-23-----meter1 ---------- 3 ------------ 4.01
2017-05-23-----meter1 ---------- 4 ------------ 0.00
The table represents meter readings. There needs to be 48x meter readings per day. However sometimes for some reason a meter reading may not come through so there will be no record. Also, a meter reading could be 0.00.
What I need to do is generate a series that gives me something like this if I search for meter readings for any given meter on a particular date:
Period----Total
1 --- 2.01
2 --- null
3 --- 4.01
4 --- 0.00
.
.
.
It needs to give me 48 readings even if there is no record. I'm trying to populate a google graph from a JSON response and the graph accepts null values but not 'no' values.
I appreciate any help.
You can use generate_series to generate the numbers list and left join your table on to that and show missing periods as null.
select g.val,t.value
from generate_series(1,48) g(val)
left join tbl on t.period=g.val and t.date = '2017-05-23' --change this value as required
If you need it for all the dates and meters in the table, cross join generated periods with dates and meters and left join the table on to that.
select dm.date,dm.meter,g.val,t.value
from generate_series(1,48) g(val)
cross join (select distinct date,meter from t) dm
left join t on t.period=g.val and t.date=dm.date and t.meter=dm.meter
where dm.date = date '2017-05-23' --change this value as required
and dm.meter='meter1' --change this value as required

SQL - Select the longest substrings

I have the data like that.
AB
ABC
ABCD
ABCDE
EF
EFG
IJ
IJK
IJKL
and I just want to get ABCDE,EFG,IJKL. how can i do that oracle sql?
the size of the char are min 2 but doesn't have a fixed length, can be from 2 to 100.
In the event that you mean "longest string for each sequence of strings", the answer is a little different -- you are not guaranteed that all have a length of 4. Instead, you want to find the strings where adding a letter isn't another string.
select t.str
from table t
where not exists (select 1
from table t2
where substr(t2.str, 1, length(t.str)) = t.str and
length(t2.str) = length(t.str) + 1
);
Do note that performance of this query will not be great if you have even a moderate number of rows.
Select all rows where the string is not a substring of any other row. It's not clear if this is what you want though.
select t.str
from table t
where not exists (
select 1
from table t2
where instr(t1.str, t2.str) > 0
);