Json aggregation of data with missing values

Json aggregation of data with missing values - sql

I have a simple table of supplements
create table consumption (
account bigint not null,
date date not null,
supplement text not null check (supplement in ('multiVitamin', 'calMag', 'omega3', 'potassium', 'salt', 'antiOxidant', 'enzymes')),
quantity integer not null default 0
);
And I want to fetch what people have consumed per day. This would be an exmaple of my desired output
[
{
"date" : "2016-01-01",
"multiVitamin" : 7,
"calMag" : 0,
"omega3" : 3,
"potassium" : 3,
"salt" : 2,
"antiOxidant" : 0,
"enzymes" : 1
},
{
"date" : "2016-01-02",
"multiVitamin" : 2,
"calMag" : 1,
"omega3" : 1,
"potassium" : 2,
"salt" : 2,
"antiOxidant" : 0,
"enzymes" : 1
}
]
I'm confused how to get those values into a json object and coalesce so that I return 0 if there aren't any supplements entered for that day. So everyday should return all supplements. This is what I have so far but its very far from complete - it's at least fetching for the dates selected though.
WITH duration_amount AS (
SELECT date_trunc('day', date)::date AS date_group, json_build_object('quantity', SUM(consumption.quantity) )::jsonb->'quantity' as supplement
FROM consumption
WHERE account = 1667
GROUP BY date_group
)
SELECT DISTINCT date_group, supplement
FROM (
SELECT generate_series(date_trunc('day', '2016-10-20'::date), '2016-10-28'::date, '1 day') AS date_group
) x
LEFT JOIN duration_amount
USING (date_group)
ORDER BY date_group DESC

Example data:
insert into consumption values
(1667, '2016-10-21', 'multiVitamin', 1),
(1667, '2016-10-21', 'calMag', 2),
(1667, '2016-10-22', 'multiVitamin', 3),
(1667, '2016-10-22', 'calMag', 4),
(1667, '2016-10-22', 'omega3', 5);
You should prepare a template table containing rows for all possible values. In the example it will contain 14 rows (cross join 2 days with 7 supplements). Next, left join to it your table using coalesce() for missing values:
select
date_group::date as date,
supplement_group as supplement,
coalesce(quantity, 0) quantity
from generate_series('2016-10-21'::date, '2016-10-22', '1 day') as date_group
cross join (
values
('multiVitamin'), ('calMag'), ('omega3'),
('potassium'), ('salt'), ('antiOxidant'), ('enzymes')
) as supplements(supplement_group)
left join consumption
on date_group = date
and supplement = supplement_group
and account = 1667;
date | supplement | quantity
------------+--------------+----------
2016-10-21 | multiVitamin | 1
2016-10-21 | calMag | 2
2016-10-21 | omega3 | 0
2016-10-21 | potassium | 0
2016-10-21 | salt | 0
2016-10-21 | antiOxidant | 0
2016-10-21 | enzymes | 0
2016-10-22 | multiVitamin | 3
2016-10-22 | calMag | 4
2016-10-22 | omega3 | 5
2016-10-22 | potassium | 0
2016-10-22 | salt | 0
2016-10-22 | antiOxidant | 0
2016-10-22 | enzymes | 0
(14 rows)
The result can be easily aggregated to jsonb, see the full example here.

Related

SQL query: Fill missing values with 0

I have a table with gaps in data at certain times (see there is no data between 37 & 46). I need to fill in those gaps with 0 for better display on the frontend.
date | mydata
--------+-----------------
911130 | 10
911131 | 11
911132 | 9
911133 | 6
911134 | 5
911135 | 5
911136 | 10
911137 | 8
911146 | 4
911147 | 5
911148 | 9
911149 | 14
911150 | 8
The times are sequential integers (UNIX timestamps initially). I have aggregated my data into 5 minute time buckets.
The frontend query will pass in a start & end time and aggregate the data into larger buckets. For example:
SELECT
(five_min_time / 6) AS date,
SUM(mydata) AS mydata
FROM mydata_table_five_min
WHERE
five_min_time BETWEEN (1640000000 / 300) AND (1640086400 / 300)
GROUP BY date
ORDER BY date ASC;
I would like to be able to get a result:
date | mydata
--------+-----------------
911130 | 10
911131 | 11
911132 | 9
911133 | 6
911134 | 5
911135 | 5
911136 | 10
911137 | 8
911138 | 0
911139 | 0
911140 | 0
911141 | 0
911142 | 0
911143 | 0
911144 | 0
911145 | 0
911146 | 4
911147 | 5
911148 | 9
911149 | 14
911150 | 8
As a note, this query is being run in AWS Redshift.

Not sure if a recursive CTE works in redshift.
But something like this works in postgresql.
with recursive rcte as (
select
min(half_hour) as n,
max(half_hour) as n_max
from cte_data
union all
select n+1, n_max
from rcte
where n < n_max
)
, cte_data as (
select
(five_min_time / 6) as half_hour,
sum(mydata) as mydata
from mydata_table_five_min
where five_min_time between (date_part('epoch','2021-12-20 12:00'::date)::int / 300)
and (date_part('epoch','2021-12-21 12:00'::date)::int / 300)
group by half_hour
)
select n as date
--, to_timestamp(n*6*300) as dt
, coalesce(t.mydata, 0) as mydata
from rcte c
left join cte_data t
on t.half_hour = c.n
order by date;

Add a IF CASE or something similar inside a query

I have a query result, but there is a field that contains the "PROJECT NUMBER", that appears twice, and where there's supposed to be a COUNT of all the samples, the count appears twice. Any idea how can I solve this?
Update:
I can now show the table data structure
Here's more or less the info of the tables:
SAMPLE table
+---------------+--------------+--------+--------------+--------------+
| sample_number | project | status | template | parent_aliquot|
+---------------+--------------+--------------+----------+-------------+
| 1 | S/180318/01 | C | KPS_DEFAULT | 50 |
| 2 | S/180320/01 | I | KPS_DEFAULT | 100 |
+---------------+--------------+--------+--------------+---------------+
enter image description here
KPS_SMP_DUE_DATE_WEEK_PIVOT_VW table (this table shows the amount of samples of a project that have expired in a year)
+---------------+--------------+----------+-------------+----+----+-----+-----+
| project | product | year | department | w0 | w1 | ... | w53 |
+---------------+--------------+----------+-------------+----+----+-----+-----+
| S/000260/02 | Product v1 | 2020 | MICRO | 1 | | | 1 |
| S/180146/04 | Product v2 | 2021 | QC | | 2 | | 3 |
+---------------+--------------+----------+-------------+----+----+-----+-----+
Here's the code that I have at the moment:
Select
temp.PROJECT,
temp.CUSTOMER_DESC,
temp.PRODUCT,
ISNULL(temp.PREVIOUS_SAMPLES,0),
ISNULL(temp.W3,0) as 'W3',
ISNULL(temp.W4,0) as 'W4',
ISNULL(temp.W5,0) as 'W5',
ISNULL(temp.W6,0) as 'W6',
sum(
ISNULL(temp.PREVIOUS_SAMPLES,0)
+ ISNULL(temp.W3,0)
+ ISNULL(temp.W4,0)
+ ISNULL(temp.W5,0)
+ ISNULL(temp.W6,0)
) as 'TOTAL'
from (
SELECT
vw.PROJECT,
vw.CUSTOMER_DESC,
vw.PRODUCT,
(
SELECT COUNT(s.SAMPLE_NUMBER)
FROM SAMPLE s
WHERE s.PROJECT = vw.PROJECT
AND s.STATUS IN ('I', 'U', 'P')
and s.TEMPLATE = 'KPS_DEFAULT'
AND s.PARENT_ALIQUOT > 0
) as 'PREVIOUS_SAMPLES',
ISNULL(vw.W3,0) as 'W3',
ISNULL(vw.W4,0) as 'W4',
ISNULL(vw.W5,0) as 'W5',
ISNULL(vw.W6,0) as 'W6'
FROM KPS_SMP_DUE_DATE_WEEK_PIVOT_VW vw
WHERE vw.DEPARTMENT IN ('QC')
GROUP BY
vw.PROJECT,
vw.CUSTOMER_DESC,
vw.PRODUCT,
vw.W3,
vw.W4,
vw.W5,
vw.W6
) as TEMP
group by
temp.PROJECT,
temp.CUSTOMER_DESC,
temp.PRODUCT,
temp.PREVIOUS_SAMPLES,
temp.W3,
temp.W4,
temp.W5,
temp.W6
UNION
Select
'Total',
'-',
'-',
sum(ISNULL(temp.PREVIOUS_SAMPLES,0) ),
sum( ISNULL(temp.W3,0) ),
sum( ISNULL(temp.W4,0) ),
sum( ISNULL(temp.W5,0) ),
sum( ISNULL(temp.W6,0) ),
sum(
ISNULL(temp.PREVIOUS_SAMPLES,0)
+ ISNULL(temp.W3,0)
+ ISNULL(temp.W4,0)
+ ISNULL(temp.W5,0)
+ ISNULL(temp.W6,0)
)
from (
SELECT
vw.PROJECT,
vw.CUSTOMER_DESC,
vw.PRODUCT,
(
SELECT COUNT(s.SAMPLE_NUMBER)
FROM SAMPLE s
WHERE s.PROJECT = vw.PROJECT
AND s.STATUS IN ('I', 'U', 'P')
and s.TEMPLATE = 'KPS_DEFAULT'
AND s.PARENT_ALIQUOT > 0
) as 'PREVIOUS_SAMPLES',
ISNULL(vw.W3,0) as 'W3',
ISNULL(vw.W4,0) as 'W4',
ISNULL(vw.W5,0) as 'W5',
ISNULL(vw.W6,0) as 'W6'
FROM KPS_SMP_DUE_DATE_WEEK_PIVOT_VW vw
WHERE vw.DEPARTMENT IN ('QC')
GROUP BY
vw.PROJECT,
vw.CUSTOMER_DESC,
vw.PRODUCT,
vw.W3,
vw.W4,
vw.W5,
vw.W6
) as TEMP
As seen on the image, I marked the repeated project.

Your query is way to complicated for anyone else to really figure out. I would suggest that you simplify it -- and in the process you will probably find the issue.
I can say that your question claims that you want one row per project. However, that is not what the code does.
Your GROUP BY is:
group by
temp.PROJECT,
temp.CUSTOMER_DESC,
temp.PRODUCT,
temp.PREVIOUS_SAMPLES,
temp.W3,
temp.W4,
temp.W5,
temp.W6
This says that you want one row for each unique combination of those values. If you want one row per project, I assume you would simply have:
group by temp.PROJECT
Of course, the rest of the code would need to be adjusted to handle this.

SQL Query to change the column format to row for each value

I have a query that fetches the data somewhat like -
#1
select
assignment_number,
ldg_name,
effective_start_date,
tax_reporting_name,
payroll_relationship_number,
filing_status,
allowance,
additional_tax_extra_withold,
exemp_fit,
exemp_medicare,
exemp_wage_accumulation,
exemp_unemployment
from
( SELECT PPRD.payroll_relationship_number,
PAAM.assignment_number,
PLDG.name
LDG_NAME,
To_char(PAAM.effective_start_date, 'YYYY-MM-DD')
effective_start_date,
(SELECT DISTINCT HOU.name
FROM pay_dir_rep_cards_f PDRCF,
hr_organization_units HOU,
pay_dir_rep_card_usages_f PDRCUF,
pay_rel_groups_dn PRGD1
WHERE PDRCF.dir_card_id = PDCF.dir_card_id
AND HOU.organization_id = PDRCF.tax_unit_id
AND PDRCUF.dir_rep_card_id = PDRCF.dir_rep_card_id
AND PRGD1.relationship_group_id =
PDRCUF.relationship_group_id
AND PDRCF.dir_card_comp_id = PDCCF2.dir_card_comp_id
AND trunc(sysdate) -11 BETWEEN PDRCF.effective_start_date AND
PDRCF.effective_end_date
AND trunc(sysdate) -11 BETWEEN
PRGD1.start_date AND PRGD1.end_date)
TAX_REPORTING_NAME,
(SELECT DISTINCT Decode (dir_information_char1, '1', 'Single',
'2', 'Married',
'3',
'Married and withholding at higher single rate'
FROM pay_dir_card_components_f PDCCF3,
pay_dir_comp_details_f PDCDF3
WHERE PDCCF3.dir_card_id = PDCF.dir_card_id
AND PDCCF3.dir_card_comp_id = PDCDF3.dir_card_comp_id
AND PDCDF3.dir_information_category =
'HRX_US_WTH_FEDERAL_INCOME_TAX'
AND trunc(sysdate) -11 BETWEEN PDCDF3.effective_start_date
AND
PDCDF3.effective_end_date)
FILING_STATUS,
exemp_fit,
PDFC.exemp_medicare,
PDFC.exemp_wage_accumulation,
PDFC.exemp_unemployment,
PDFC.exemp_social_security,
PDFC.regular_rate,
PDFC.regular_amount,
PDFC.supplemental_rate,
PDFC.supplemental_amt,
PDFC.irs_lock_in_date,
PDFC.statutory_employee,
PDFC.cumulative_taxation,
PDFC.primary_address,
PDFC.state_disability_calc,
PDFC.state_unemp_calc,
PDFC.qualifying_dependent,
PDFC.other_dependent,
PDFC.total_dependent,
PDFC.other_income,
PDFC.deduction_amount,
PDFC.max_federal_allowance ,
PDFC.allowance,
PDFC.additional_tax_extra_withold
from
per_all_assignments_m paam,
per_all_payroll_relationship PPRD,
per_legislative_table PLDG,
Per_cards PDCF,
Per_components PDFC
WHERE PAAM.ASSIGNMENT_NUMBER = PPRD.ASSIGNMENT_NUMBER
AND PPRD.LEGAL_UNIT = PLDG.LEGAL_UNIT
AND PDCF.dir_card_id = PDFC.dir_card_id
AND PDFC.ASSIGNMENT_ID = PAAM.ASSIGNMENT_ID
AND trunc(sysdate) -11 BETWEEN PAAM.effective_start_date
AND
PAAM.effective_end_date)
The above query is giving me the correct output in the format-
assignment_number ldg_name effective_start_date tax_reporting_name payroll_relationship_number filing_status allowance additional_tax_extra_withold exemp_fit exemp_medicare exemp_wage_accumulation exemp_unemployment
10 US 02-Aug-2020 Ontario 10-1 Single 1000 10 Y N
I need these columns in the format(Columns into rows) -
ASSIGNMENT_NUMBER ValueDefinitionName
10 filing_status
10 allowance
10 additional_tax_extra_withold
10 exemp_fit
10 exemp_medicare
i.e. the column names which ever is not null for an assignment_number.
Also, in the format ,
ASSIGNMENT_NUMBER ValueDefinitionName Value1
10 filing_status Single
10 allowance 1000
10 additional_tax_extra_withold 10
10 exemp_fit Y
10 exemp_medicare N
Since exemp_wage_accumulation and exemp_unemployment is null it should not be included.
Is there any way possible to achieve using my first query#1?

You should be able to use unpivot for this...
Here is a way to do this. I replaced the large query you got as a temp block with the appropriate data types--> data. You would need to replace the query that i have in the "data" block with the large query you got and also make sure that the datatypes for all of the columns you got is consistent (in this case all of the non-varchar columns are type casted to varchar)
After that you can use unpivot as follows
with data
as (select assignment_number
,ldg_name
,to_char(effective_start_date,'dd-mon-yyyy') as effective_start_date
,tax_reporting_name
,payroll_relationship_number
,filing_status
,to_char(allowance) as allowance
,to_char(additional_tax_extra_withold) as additional_tax_extra_withold
,exemp_fit
,exemp_medicare
,exemp_wage_accumulation
,exemp_unemployment
from t
)
select *
from data
unpivot (value1 for valuedefinitions in (
ldg_name
,effective_start_date
,tax_reporting_name
,payroll_relationship_number
,filing_status
,allowance
,additional_tax_extra_withold
,exemp_fit
,exemp_medicare
,exemp_wage_accumulation
,exemp_unemployment
)
)
+-------------------+------------------------------+-------------+
| ASSIGNMENT_NUMBER | VALUEDEFINITIONS | VALUE1 |
+-------------------+------------------------------+-------------+
| 10 | LDG_NAME | US |
| 10 | EFFECTIVE_START_DATE | 02-aug-2020 |
| 10 | TAX_REPORTING_NAME | Ontario |
| 10 | PAYROLL_RELATIONSHIP_NUMBER | 10-1 |
| 10 | FILING_STATUS | Single |
| 10 | ALLOWANCE | 1000 |
| 10 | ADDITIONAL_TAX_EXTRA_WITHOLD | 10 |
| 10 | EXEMP_FIT | Y |
| 10 | EXEMP_MEDICARE | N |
+-------------------+------------------------------+-------------+
Here is a db fiddle link
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=9d211abbede82e276464333018a70731

You can try below method but it will work only if all columns except assignment number are of same type i.e. varchar or integer. You can have your main query as CTE.
Note: This is not a complete working query but just a hint to get you started.
with table1 as (...You existing query...)
SELECT ASSIGNMENT_NUMBER, "FILING_STATUS" AS VALUEDefinationname, filing_status as value1 from table1 where filing_status is not null
union
SELECT ASSIGNMENT_NUMBER, "allowance", allowance from table1 where allowance is not null

Cascade funtions SQL to go from tree structure to flat data structure

I have 2 tables :
table1 t_SearchCriteria:
------------------------------
ID | VALUE | IDParent |
-----|------------|------------|
0 | root | -1 |
-----|------------|------------|
1 | JAMES | 0 |
-----|------------|------------|
2 | ISAC | 0 |
-----|------------|------------|
3 | LISA | 1 |
-----|------------|------------|
4 | Andrew | 3 |
-----|------------|------------|
5 | LISA | 2 |
-----|------------|------------|
6 | EZREAL | 5 |
-----|------------|------------|
10 | twitch | 2 |
-----|------------|------------|
13 | LUX | 0 |
-----|------------|------------|
14 | LISA | 13 |
-----|------------|------------|
15 | EZREAL | 14 |
-----|------------|------------|
EDIT: here is a representation of the tree:
_______root_______
/ | \
JAMES ISAC LUX
| / \ |
LISA TWITCH LISA LISA
| | |
Andrew EZREAL EZREAL
and my second table is like the following :
table t_Path
idPath|Person1| Son |grandSon|grandgrandSon|
------|-------|-------|--------|-------------|
1 |root |JAMES | LISA |ANDREW |
------|-------|-------|--------|-------------|
2 |root |ISAC | LISA |EZREAL |
------|-------|-------|--------|-------------|
3 |root |ISAC | NULL |TWITCH |
------|-------|-------|--------|-------------|
4 |root |LUX | NULL | NULL |
------|-------|-------|--------|-------------|
5 |root |LUX | LISA | NULL |
------|-------|-------|--------|-------------|
6 |root |LUX | LISA | EZREAL |
------|-------|-------|--------|-------------|
My need is to figure out a way (function or procedure) that starts from table 2 (t_Path) and find each leaf (value of grandgrandSon if not null otherwise grandson if not null etc...) id in t_searchCriteria table:
Since we can have the same value of node in the t_search criteria table then the unicity of a node is its value and its parent value and its grandParentValue (and we have another rule; a parent can't have 2 childs with same name)
I have tried to make a function but i didn't find a way to do a function inside another function besides working with objects like in c# or another programming laguage.
I need to make a function that takes an int ID wich is the ID of a path from table t_Path and figure out the leaf of the path (this is done), the problem here is how to get the id of that leaf from t_searchCriteria table since we can have multiple criterias with same value (name) even with same parent name, the grandParent Value will make difference.
for example of execution:
Select FunctionGetCriteriaId(6)
will return 15
where 6 is the id of the path : 6 |root |LUX | LISA | EZREAL |
and 15 is the id of the criteria : 15 | EZREAL | 14 |
Can anyone help me to figure this out please?
EDIT: to be more specific the fucntion takes the id of the path in table 2, for example 5 ( 5 |root |LUX | LISA | NULL |) and returns the id of "LISA" (the leaf not the others ;) ) in table 1 that is 14. (ofcourse taking note of the rules set before.)
EDIT 2:
updated unicity condition in the tree

You can do this easily using LEFT JOINS and MAX and COALESCE. This is full working example. You can play with it:
DECLARE #t_SearchCriteria TABLE
(
[ID] SMALLINT
,[Value] VARCHAR(12)
,[IDParent] SMALLINT
);
INSERT INTO #t_SearchCriteria ([ID], [Value], [IDParent])
VALUES (0, 'root', -1)
,(1, 'JAMES', 0)
,(2, 'ISAC', 0)
,(3, 'LISA', 1)
,(4, 'Andrew', 3)
,(5, 'LISA', 2)
,(6, 'EZREAL', 5)
,(10, 'twitch', 2)
,(13, 'LUX', 0)
,(14, 'LISA', 13)
,(15, 'EZREAL', 14);
DECLARE #t_Path TABLE
(
[idPath] SMALLINT
,[Person1] VARCHAR(12)
,[Son] VARCHAR(12)
,[grandSon] VARCHAR(12)
,[grandgrandSon] VARCHAR(12)
);
INSERT INTO #t_Path ([idPath], [Person1], [Son], [grandSon], [grandgrandSon])
VALUES (1, 'root', 'JAMES', 'LISA', 'ANDREW')
,(2, 'root', 'ISAC', 'LISA', 'EZREAL')
,(3, 'root', 'ISAC', 'TWITCH', NULL)
,(4, 'root', 'LUX', NULL, NULL)
,(5, 'root', 'LUX', 'LISA', NULL)
,(6, 'root', 'LUX', 'LISA', 'EZREAL');
-- the function input parameter
DECLARE #idPath SMALLINT = 5;
-- the function body
DECLARE #Person1 VARCHAR(12)
,#Son VARCHAR(12)
,#grandSon VARCHAR(12)
,#grandgrandSon VARCHAR(12);
SELECT #Person1 = [Person1]
,#Son = [Son]
,#grandSon = [grandSon]
,#grandgrandSon = [grandgrandSon]
FROM #t_Path P
WHERE P.[idPath] = #idPath;
SELECT COALESCE(MAX(S5.[ID]), MAX(S4.[ID]), MAX(S3.[ID]), MAX(S2.[ID]), MAX(S1.[ID]))
FROM #t_SearchCriteria S1
LEFT JOIN #t_SearchCriteria S2
ON S1.[ID] = S2.[IDParent]
AND S1.[Value] = #Person1
LEFT JOIN #t_SearchCriteria S3
ON S2.[ID] = S3.[IDParent]
AND S2.[Value] = #Son
AND #Person1 IS NOT NULL
LEFT JOIN #t_SearchCriteria S4
ON S3.[ID] = S4.[IDParent]
AND S3.[Value] = #grandSon
AND #grandgrandSon IS NOT NULL
LEFT JOIN #t_SearchCriteria S5
ON S4.[ID] = S5.[IDParent]
AND S4.[Value] = #grandgrandSon
WHERE S1.[Value] = #Person1;

SQLlite strftime function to get grouped data by months

i have table with following structure and data:
I would like to get grouped data by months in given date range for example (from 2014-01-01 to 2014-12-31). Data for some months cannot be available but i still need to have in result information that in given month is result 0.
Result should have following format:
MONTH | DIALS_CNT | APPT_CNT | CONVERS_CNT | CANNOT_REACH_CNT |
2014-01 | 100 | 50 | 20 | 30 |
2014-02 | 100 | 40 | 30 | 30 |
2014-03 | 0 | 0 | 0 | 0 |
etc..
WHERE
APPT_CNT = WHERE call.result = APPT
CONVERS_CNT = WHERE call.result = CONV_NO_APPT
CANNOT_REACH_CNT = WHERE call.result = CANNOT_REACH
How can i do it please with usage function strftime ?
Many thanks for any help or example.

SELECT Month,
(SELECT COUNT(*)
FROM MyTable
WHERE date LIKE Month || '%'
) AS Dials_Cnt,
(SELECT SUM(Call_Result = 'APPT')
FROM MyTable
WHERE date LIKE Month || '%'
) AS Appt_Cnt,
...
FROM (SELECT '2014-01' AS Month UNION ALL
SELECT '2014-02' UNION ALL
SELECT '2014-03' UNION ALL
...
SELECT '2014-12')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Json aggregation of data with missing values - sql

Related

SQL query: Fill missing values with 0

Add a IF CASE or something similar inside a query

SQL Query to change the column format to row for each value

Cascade funtions SQL to go from tree structure to flat data structure

SQLlite strftime function to get grouped data by months

Categories

Resources