SQL query: Fill missing values with 0 - sql

I have a table with gaps in data at certain times (see there is no data between 37 & 46). I need to fill in those gaps with 0 for better display on the frontend.
date | mydata
--------+-----------------
911130 | 10
911131 | 11
911132 | 9
911133 | 6
911134 | 5
911135 | 5
911136 | 10
911137 | 8
911146 | 4
911147 | 5
911148 | 9
911149 | 14
911150 | 8
The times are sequential integers (UNIX timestamps initially). I have aggregated my data into 5 minute time buckets.
The frontend query will pass in a start & end time and aggregate the data into larger buckets. For example:
SELECT
(five_min_time / 6) AS date,
SUM(mydata) AS mydata
FROM mydata_table_five_min
WHERE
five_min_time BETWEEN (1640000000 / 300) AND (1640086400 / 300)
GROUP BY date
ORDER BY date ASC;
I would like to be able to get a result:
date | mydata
--------+-----------------
911130 | 10
911131 | 11
911132 | 9
911133 | 6
911134 | 5
911135 | 5
911136 | 10
911137 | 8
911138 | 0
911139 | 0
911140 | 0
911141 | 0
911142 | 0
911143 | 0
911144 | 0
911145 | 0
911146 | 4
911147 | 5
911148 | 9
911149 | 14
911150 | 8
As a note, this query is being run in AWS Redshift.

Not sure if a recursive CTE works in redshift.
But something like this works in postgresql.
with recursive rcte as (
select
min(half_hour) as n,
max(half_hour) as n_max
from cte_data
union all
select n+1, n_max
from rcte
where n < n_max
)
, cte_data as (
select
(five_min_time / 6) as half_hour,
sum(mydata) as mydata
from mydata_table_five_min
where five_min_time between (date_part('epoch','2021-12-20 12:00'::date)::int / 300)
and (date_part('epoch','2021-12-21 12:00'::date)::int / 300)
group by half_hour
)
select n as date
--, to_timestamp(n*6*300) as dt
, coalesce(t.mydata, 0) as mydata
from rcte c
left join cte_data t
on t.half_hour = c.n
order by date;

Related

SQL Query to change the column format to row for each value

I have a query that fetches the data somewhat like -
#1
select
assignment_number,
ldg_name,
effective_start_date,
tax_reporting_name,
payroll_relationship_number,
filing_status,
allowance,
additional_tax_extra_withold,
exemp_fit,
exemp_medicare,
exemp_wage_accumulation,
exemp_unemployment
from
( SELECT PPRD.payroll_relationship_number,
PAAM.assignment_number,
PLDG.name
LDG_NAME,
To_char(PAAM.effective_start_date, 'YYYY-MM-DD')
effective_start_date,
(SELECT DISTINCT HOU.name
FROM pay_dir_rep_cards_f PDRCF,
hr_organization_units HOU,
pay_dir_rep_card_usages_f PDRCUF,
pay_rel_groups_dn PRGD1
WHERE PDRCF.dir_card_id = PDCF.dir_card_id
AND HOU.organization_id = PDRCF.tax_unit_id
AND PDRCUF.dir_rep_card_id = PDRCF.dir_rep_card_id
AND PRGD1.relationship_group_id =
PDRCUF.relationship_group_id
AND PDRCF.dir_card_comp_id = PDCCF2.dir_card_comp_id
AND trunc(sysdate) -11 BETWEEN PDRCF.effective_start_date AND
PDRCF.effective_end_date
AND trunc(sysdate) -11 BETWEEN
PRGD1.start_date AND PRGD1.end_date)
TAX_REPORTING_NAME,
(SELECT DISTINCT Decode (dir_information_char1, '1', 'Single',
'2', 'Married',
'3',
'Married and withholding at higher single rate'
FROM pay_dir_card_components_f PDCCF3,
pay_dir_comp_details_f PDCDF3
WHERE PDCCF3.dir_card_id = PDCF.dir_card_id
AND PDCCF3.dir_card_comp_id = PDCDF3.dir_card_comp_id
AND PDCDF3.dir_information_category =
'HRX_US_WTH_FEDERAL_INCOME_TAX'
AND trunc(sysdate) -11 BETWEEN PDCDF3.effective_start_date
AND
PDCDF3.effective_end_date)
FILING_STATUS,
exemp_fit,
PDFC.exemp_medicare,
PDFC.exemp_wage_accumulation,
PDFC.exemp_unemployment,
PDFC.exemp_social_security,
PDFC.regular_rate,
PDFC.regular_amount,
PDFC.supplemental_rate,
PDFC.supplemental_amt,
PDFC.irs_lock_in_date,
PDFC.statutory_employee,
PDFC.cumulative_taxation,
PDFC.primary_address,
PDFC.state_disability_calc,
PDFC.state_unemp_calc,
PDFC.qualifying_dependent,
PDFC.other_dependent,
PDFC.total_dependent,
PDFC.other_income,
PDFC.deduction_amount,
PDFC.max_federal_allowance ,
PDFC.allowance,
PDFC.additional_tax_extra_withold
from
per_all_assignments_m paam,
per_all_payroll_relationship PPRD,
per_legislative_table PLDG,
Per_cards PDCF,
Per_components PDFC
WHERE PAAM.ASSIGNMENT_NUMBER = PPRD.ASSIGNMENT_NUMBER
AND PPRD.LEGAL_UNIT = PLDG.LEGAL_UNIT
AND PDCF.dir_card_id = PDFC.dir_card_id
AND PDFC.ASSIGNMENT_ID = PAAM.ASSIGNMENT_ID
AND trunc(sysdate) -11 BETWEEN PAAM.effective_start_date
AND
PAAM.effective_end_date)
The above query is giving me the correct output in the format-
assignment_number ldg_name effective_start_date tax_reporting_name payroll_relationship_number filing_status allowance additional_tax_extra_withold exemp_fit exemp_medicare exemp_wage_accumulation exemp_unemployment
10 US 02-Aug-2020 Ontario 10-1 Single 1000 10 Y N
I need these columns in the format(Columns into rows) -
ASSIGNMENT_NUMBER ValueDefinitionName
10 filing_status
10 allowance
10 additional_tax_extra_withold
10 exemp_fit
10 exemp_medicare
i.e. the column names which ever is not null for an assignment_number.
Also, in the format ,
ASSIGNMENT_NUMBER ValueDefinitionName Value1
10 filing_status Single
10 allowance 1000
10 additional_tax_extra_withold 10
10 exemp_fit Y
10 exemp_medicare N
Since exemp_wage_accumulation and exemp_unemployment is null it should not be included.
Is there any way possible to achieve using my first query#1?
You should be able to use unpivot for this...
Here is a way to do this. I replaced the large query you got as a temp block with the appropriate data types--> data. You would need to replace the query that i have in the "data" block with the large query you got and also make sure that the datatypes for all of the columns you got is consistent (in this case all of the non-varchar columns are type casted to varchar)
After that you can use unpivot as follows
with data
as (select assignment_number
,ldg_name
,to_char(effective_start_date,'dd-mon-yyyy') as effective_start_date
,tax_reporting_name
,payroll_relationship_number
,filing_status
,to_char(allowance) as allowance
,to_char(additional_tax_extra_withold) as additional_tax_extra_withold
,exemp_fit
,exemp_medicare
,exemp_wage_accumulation
,exemp_unemployment
from t
)
select *
from data
unpivot (value1 for valuedefinitions in (
ldg_name
,effective_start_date
,tax_reporting_name
,payroll_relationship_number
,filing_status
,allowance
,additional_tax_extra_withold
,exemp_fit
,exemp_medicare
,exemp_wage_accumulation
,exemp_unemployment
)
)
+-------------------+------------------------------+-------------+
| ASSIGNMENT_NUMBER | VALUEDEFINITIONS | VALUE1 |
+-------------------+------------------------------+-------------+
| 10 | LDG_NAME | US |
| 10 | EFFECTIVE_START_DATE | 02-aug-2020 |
| 10 | TAX_REPORTING_NAME | Ontario |
| 10 | PAYROLL_RELATIONSHIP_NUMBER | 10-1 |
| 10 | FILING_STATUS | Single |
| 10 | ALLOWANCE | 1000 |
| 10 | ADDITIONAL_TAX_EXTRA_WITHOLD | 10 |
| 10 | EXEMP_FIT | Y |
| 10 | EXEMP_MEDICARE | N |
+-------------------+------------------------------+-------------+
Here is a db fiddle link
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=9d211abbede82e276464333018a70731
You can try below method but it will work only if all columns except assignment number are of same type i.e. varchar or integer. You can have your main query as CTE.
Note: This is not a complete working query but just a hint to get you started.
with table1 as (...You existing query...)
SELECT ASSIGNMENT_NUMBER, "FILING_STATUS" AS VALUEDefinationname, filing_status as value1 from table1 where filing_status is not null
union
SELECT ASSIGNMENT_NUMBER, "allowance", allowance from table1 where allowance is not null

query by specific value to query

Hi i am trying to do a stored procedure in postgresql,
and I have to fill a table (vol_raleos), from 3 others, these are the tables:
super
zona | sitio | manejo
1 | 1 | 1
2 | 2 | 2
datos_vol_raleos
zona | sitio | manejo |vol_prodn
1 | 1 | 10 | 0
2 | 2 | 15 | 0
datos_manejos
manejoVR | manejoSuper
10 | 1
15 | 2
table to fill
vol_raleos
zona | sitio | manejo |vol_prodn
1 | 1 | 1 | 0
2 | 2 | 2 | 0
So, what I do is take the data that is in datos_vol_raleos, verify that it is in super, but first I must convert the manejoVR value according to the table datos_manejos
INSERT INTO vol_raleos
(zona, sitio, manejo, edad, densidad, vol_prod1, vol_prod2, ..., vol_prod36)
select zona, sitio, manejo, edad, densidad, vol_prod1, vol_prod2, ..., vol_prod36
from (
select volr.*, sup.zona, sup.sitio, sup.manejo, dm.manejo,
from datos_vol_raleos volr
left join super sup on (sup.zona = volr.zona and sup.sitio = volr.sitio and sup.manejo = volr.manejo) selrs
order by zona, sitio, manejo, edad, densidad
) sel_min_max;
so here I don't know how to get the manejoSuper value from datos_manejos, to later compare
You can insert from a select with a couple of joins. For example:
insert into vol_raleos
select s.zona, s.sitio, s.manejo, m.manejoSuper
from super s
join datos_vol_raleos d on (d.zona, d.sitio) = (s.zona, s.sitio)
join datos_manejos m on m.manejoVR = d.manejo

How to formulate a SQL query to identify sets of matches across a table

I have a relation that has airline routes and the airports that these flights go over on such routes. I'm trying to identify what routes skip over the same airports.
I have whittled down the relation into a table that should be possible to manipulate to create my desired result:
SELECT route_id, airport_id
FROM routes_airports
WHERE stops = false
ORDER BY route_id, airport_id;
I would like to match routes to one another when they have the same values for airport_id across their entries in the table while including the routes that exhibit this property.
So, for example, routes 5 and 7 both skip over airports 10,15,20, so they should be matched together, but not with say, route 10 which only skips over airports 10 and 20.
route_id | skipped_airport_id
----------+------------
1 | 76
2 | 21*
2 | 22*
4 | 42
5 | 21*
5 | 22*
7 | 15
7 | 16
7 | 17
7 | 18
7 | 46
9 | 26
11 | 19
14 | 45*
14 | 46*
14 | 47*
15 | 45*
15 | 46*
15 | 47*
17 | 78
20 | 20
I would like the above example data to result in a table with just the routes that have a match such as below.
route_id
----------
2
5
14
15
You can do this by aggregating all skipped airports into an array and then find out those routes where those arrays are the same:
with skipped as (
select route_id, array_agg(skipped_airport_id order by skipped_airport_id) skipped_airports
from routes_airports
where stops = false
group by route_id
)
select s1.*
from skipped s1
where exists (select *
from skipped s2
where s1.route_id <> s2.route_id
and s1.skipped_airports = s2.skipped_airports);
This returns:
route_id | skipped_airports
---------+-----------------
2 | {21,22}
5 | {21,22}
14 | {45,46,47}
15 | {45,46,47}
Online example: https://rextester.com/MJPJ90714
Try something like this:
SELECT route_id, STRING_AGG(airport_id, ',') AS airports
FROM routes_airports
WHERE stops = FALSE
GROUP BY route_id
ORDER BY 2
This will collect the airport_ids into a single column, and ORDER BY that column.
WITH
skips AS
(
SELECT route_id, STRING_AGG(airport_id, ',' ORDER BY airport_id) AS airport_ids
FROM routes_airports
WHERE stops = false
GROUP BY route_id
)
SELECT airport_ids, STRING_AGG(route_id, ',' ORDER BY route_id) AS route_ids
FROM skips
GROUP BY airport_ids
HAVING COUNT(*) > 1

Oracle SQL - Efficiently calculate number of concurrent phone calls

I know that this question is essentially a duplicate of an older question I asked but quite a few things changed since I asked that question so I thought I'd ask a new question about it.
I have a table that holds phone call records which has the following fields:
END: Holds the timestamp of when a call ended - Data Type: DATE
LINE: Holds the phone line that was used for a call - Data Type: NUMBER
CALLDURATION: Holds the duration of a call in seconds - Data Type: NUMBER
The table has entries like this:
END LINE CALLDURATION
---------------------- ------------------- -----------------------
25/01/2012 14:05:10 6 65
25/01/2012 14:08:51 7 1142
25/01/2012 14:20:36 5 860
I need to create a query that returns the number of concurrent phone calls based on the data from that table. The query should calculate that number in different intervals. What I mean by that is that the results of the query should only contain a new entry whenever a call was started or ended. As long as the number of concurrent phone calls stays the same there should not be any additional entry in the output.
To make this more clear, here is an example of everything the query should return based on the example entries from the previous table:
TIMESTAMP LINE CALLDURATION STATUS CURRENTLYUSEDLINES
---------------------- ----- ------------- ------- -------------------
25/01/2012 13:49:49 7 1142 1 1
25/01/2012 14:04:05 6 65 1 2
25/01/2012 14:05:10 6 65 -1 1
25/01/2012 14:06:16 5 860 1 2
25/01/2012 14:08:51 7 1142 -1 1
25/01/2012 14:20:36 5 860 -1 0
I got the following example query from a colleague but unfortunately I do not fully understand it and it also does not work exactly as it should because for calls with a duration of 0 seconds it would sometimes have "-1" in the CURRENTLYUSEDLINES-column:
SELECT COALESCE (SUM (STATUS) OVER (ORDER BY END ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING), 0) CURRENTLYUSEDLINES
FROM (SELECT END - CALLDURATION / 86400 AS TIMESTAMP,
LINE,
CALLDURATION,
1 AS STATUS
FROM t_calls
UNION ALL
SELECT END,
LINE,
CALLDURATION,
-1 AS STATUS
FROM t_calls) t
ORDER BY 1;
Now I am supposed to make that query work like in the example but I'm not sure how to do that.
Could someone help me out with this or at least explain this query so I can try fixing it myself?
I think this will solve your problem:
SELECT TIMESTAMP,
SUM(SUM(STATUS)) OVER (ORDER BY TIMESTAMP) as CURRENTLYUSEDLINES
FROM ((SELECT END - CALLDURATION / (24*60*60) AS TIMESTAMP,
COUNT(*) AS STATUS
FROM t_calls
GROUP BY END - CALLDURATION / (24*60*60)
) UNION ALL
(SELECT END, - COUNT(*) AS STATUS
FROM t_calls
GROUP BY END
)
) t
GROUP BY TIMESTAMP
ORDER BY 1;
This is a slight simplification of your query. But by doing all the aggregations, you should be getting 0s, but not negative values.
You are getting negative values because the "ends" of the calls are being processed before the begins. This does all the work "at the same time", because there is only one row per timestamp.
You can use an UNPIVOT (using a similar technique to my answer here):
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( END, LINE, CALLDURATION ) AS
SELECT CAST( TIMESTAMP '2012-01-25 14:05:10' AS DATE ), 6, 65 FROM DUAL UNION ALL
SELECT CAST( TIMESTAMP '2012-01-25 14:08:51' AS DATE ), 7, 1142 FROM DUAL UNION ALL
SELECT CAST( TIMESTAMP '2012-01-25 14:20:36' AS DATE ), 5, 860 FROM DUAL;
Query 1:
SELECT p.*,
SUM( status ) OVER ( ORDER BY dt, status DESC ) AS currentlyusedlines
FROM (
SELECT end - callduration / 86400 As dt,
t.*
FROM table_name t
)
UNPIVOT( dt FOR status IN ( dt As 1, end AS -1 ) ) p
Results:
| LINE | CALLDURATION | STATUS | DT | CURRENTLYUSEDLINES |
|------|--------------|--------|----------------------|--------------------|
| 7 | 1142 | 1 | 2012-01-25T13:49:49Z | 1 |
| 6 | 65 | 1 | 2012-01-25T14:04:05Z | 2 |
| 6 | 65 | -1 | 2012-01-25T14:05:10Z | 1 |
| 5 | 860 | 1 | 2012-01-25T14:06:16Z | 2 |
| 7 | 1142 | -1 | 2012-01-25T14:08:51Z | 1 |
| 5 | 860 | -1 | 2012-01-25T14:20:36Z | 0 |

SQLlite strftime function to get grouped data by months

i have table with following structure and data:
I would like to get grouped data by months in given date range for example (from 2014-01-01 to 2014-12-31). Data for some months cannot be available but i still need to have in result information that in given month is result 0.
Result should have following format:
MONTH | DIALS_CNT | APPT_CNT | CONVERS_CNT | CANNOT_REACH_CNT |
2014-01 | 100 | 50 | 20 | 30 |
2014-02 | 100 | 40 | 30 | 30 |
2014-03 | 0 | 0 | 0 | 0 |
etc..
WHERE
APPT_CNT = WHERE call.result = APPT
CONVERS_CNT = WHERE call.result = CONV_NO_APPT
CANNOT_REACH_CNT = WHERE call.result = CANNOT_REACH
How can i do it please with usage function strftime ?
Many thanks for any help or example.
SELECT Month,
(SELECT COUNT(*)
FROM MyTable
WHERE date LIKE Month || '%'
) AS Dials_Cnt,
(SELECT SUM(Call_Result = 'APPT')
FROM MyTable
WHERE date LIKE Month || '%'
) AS Appt_Cnt,
...
FROM (SELECT '2014-01' AS Month UNION ALL
SELECT '2014-02' UNION ALL
SELECT '2014-03' UNION ALL
...
SELECT '2014-12')