PostgreSQL timeseries query with outer join - sql

In PostgreSQL 9.5 database there is a table metrics_raw containing various metrics (types: varchar).
Types are (for example): TRA, RTC.
I'm executing following SQL to get year-to-date monthly aggregations:
SELECT
count(*),
"ticks"."ts" AS "timestamp"
FROM
"metrics_raw"
RIGHT OUTER JOIN
generate_series('2016-01-01'::timestamp, '2016-10-10'::timestamp, '1 month'::interval) AS ticks(ts)
ON
"ticks"."ts" = date_trunc('months', "metrics_raw"."timestamp")
WHERE
"metrics_raw"."type" = 'TRA' OR
"metrics_raw"."type" IS NULL
GROUP BY "ticks"."ts"
ORDER BY "ticks"."ts"
The table contains some records of TRA type (~10 records) and 1 record of RTC type.
Executing the query for TRA I get 10 rows result as expected, but for RTC query I get only 7 rows. One more thing is that having no RTC metrics I get 10 rows too.
Where could be a mistake?

Thank #Nemeros, I fixed query to be:
SELECT
"ticks"."ts" AS "timestamp"
FROM
generate_series('2016-01-01'::timestamp, '2016-10-10'::timestamp, '1 month'::interval) AS ticks(ts)
LEFT OUTER JOIN
(
SELECT *
FROM "metrics_raw"
WHERE "metrics_raw"."type" = 'TRA'
) as "metrics"
ON
"ticks"."ts" = date_trunc('months', "metrics"."timestamp")
GROUP BY "ticks"."ts"
ORDER BY "ticks"."ts"

Related

SQL - Grouping by Last Day of Quarter

I currently have a query running to average survey scores for agents. We use the date range of the LastDayOfTheQuarter and 180 days back to calculate these scores. I ran into an issue for this current quarter.
One of my agents hasn't received any surveys in 2020 which is causing the query to not pull the current lastdayofquarter and 180 days back of results.
The code I am using:
SELECT
Agent,
U.Position,
U.BranchDescription,
(ADDDATE(LastDayOfQuarter, -180)) AS MinDate,
(LastDayOfQuarter) AS MaxDate,
COUNT(DISTINCT Response ID) as SurveyCount,
AVG(CASE WHEN Question ID = Q1_2 THEN Answer Value END) AS EngagedScore,
AVG(CASE WHEN Question ID = Q1_3 THEN Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Question ID = Q1_6 THEN Answer Value END) AS ValuedScore
FROM qualtrics_responses
LEFT JOIN date D
ON (D.`Date`) = (DATE(`End Date`))
LEFT JOIN `users` U
ON U.`UserID` = `Agent ID`
WHERE `Agent` IS NOT NULL
AND DATE(`End Date`) <= (`LastDayOfQuarter`)
AND DATE(`End Date`) >= (ADDDATE(`LastDayOfQuarter`, -180))
GROUP BY `Agent`, (ADDDATE(`LastDayOfQuarter`, -180))
i know the issue is due to the way I am joining the dates and since he doesn't have a result in this current year, the end date to date join isn't grabbing the desired date range. I can't seem to come up with any alternatives. Any help is appreciated.
I make the assumption that table date in your query is a calendar table, that stores the starts and ends of the quarters (most likely with one row per date in the quarter).
If so, you can solve this problem by rearranging the joins: first cross join the users and the calendar table to generate all possible combinations, then bring in the surveys table with a left join:
SELECT
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter - interval 180 day AS MinDate,
D.LastDayOfQuarter AS MaxDate,
COUNT(DISTINCT Q.ResponseID) as SurveyCount,
AVG(CASE WHEN Q.QuestionID = 'Q1_2' THEN Q.Answer Value END) AS EngagedScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_3' THEN Q.Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_6' THEN Q.Answer Value END) AS ValuedScore
FROM date D
CROSS JOIN users U
LEFT JOIN qualtrics_responses Q
ON Q.EndDate >= D.Date
AND Q.EndDate < D.Date + interval 1 day
AND U.UserID = Q.AgentID
AND Q.Agent IS NOT NULL
GROUP BY
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter
Notes:
I adapted the date arithmetics - this assumes that you are using MySQL, as the syntax of the query suggests
You should really qualify all the columns in the query, by prefixing them with the alias of the table they belong to; this makes the query so much easier to understand. I gave a tried at it, you might need to review that.
All non-aggregated columns should appear in the group by clause (also see the comment from Eric); this is a a requirement in most databaseses, and good practice anywhere

Datetime SQL statement (Working in SQL Developer)

I'm new to the SQL scene but I've started to gather some data that makes sense to me after learning a little about SQL Developer. Although, I do need help with a query.
My goal:
To use the current criteria I have and select records only when the date-time value is within 5 minutes of the latest date-time. Here is my current sql statement
`SELECT ABAMS.T_WORKORDER_HIST.LINE_NO AS Line,
ABAMS.T_WORKORDER_HIST.STATE AS State,
ASMBLYTST.V_SEQ_SERIAL_ALL.BUILD_DATE,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO_EXT,
ASMBLYTST.V_SEQ_SERIAL_ALL.UPD_REASON_CODE,
ABAMS.V_SERIAL_LINESET.LINESET_DATE AS "Lineset Time",
ABAMS.T_WORKORDER_HIST.SERIAL_NO AS ESN,
ABAMS.T_WORKORDER_HIST.ITEM_NO AS "Shop Order",
ABAMS.T_WORKORDER_HIST.CUST_NAME AS Customer,
ABAMS.T_ITEM_POLICY.PL_LOC_DROP_ZONE_NO AS PLDZ,
ABAMS.T_WORKORDER_HIST.CONFIG_NO AS Configuration,
ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN AS "Last Sta",
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_ASMBLY_LOC,
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_MES_LOC,
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_ASMBLY_TIME,
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_MES_TIME
FROM ABAMS.T_WORKORDER_HIST
LEFT JOIN ABAMS.V_SERIAL_LINESET
ON ABAMS.V_SERIAL_LINESET.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
LEFT JOIN ASMBLYTST.V_EDP_ENG_LAST_ABSN
ON ASMBLYTST.V_EDP_ENG_LAST_ABSN.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
LEFT JOIN ASMBLYTST.V_SEQ_SERIAL_ALL
ON ASMBLYTST.V_SEQ_SERIAL_ALL.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
LEFT JOIN ABAMS.T_ITEM_POLICY
ON ABAMS.T_ITEM_POLICY.ITEM_NO = ABAMS.T_WORKORDER_HIST.ITEM_NO
LEFT JOIN ABAMS.T_CUR_STATUS
ON ABAMS.T_CUR_STATUS.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
INNER JOIN ASMBLYTST.V_LAST_ENG_LOCATION
ON ASMBLYTST.V_LAST_ENG_LOCATION.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
WHERE ABAMS.T_WORKORDER_HIST.LINE_NO = 10
AND (ABAMS.T_WORKORDER_HIST.STATE = 'PROD'
OR ABAMS.T_WORKORDER_HIST.STATE = 'SCHED')
AND ASMBLYTST.V_SEQ_SERIAL_ALL.BUILD_DATE BETWEEN TRUNC(SysDate) - 10 AND TRUNC(SysDate) + 1
AND (ABAMS.V_SERIAL_LINESET.LINESET_DATE IS NOT NULL
OR ABAMS.V_SERIAL_LINESET.LINESET_DATE IS NULL)
AND (ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN < '1800'
OR ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN IS NULL)
ORDER BY ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN DESC Nulls Last,
ABAMS.V_SERIAL_LINESET.LINESET_DATE Nulls Last,
ASMBLYTST.V_SEQ_SERIAL_ALL.BUILD_DATE,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO_EXT`
Here are some of the records I get from the table
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_ASMBLY_TIME
2018-06-14 01:28:25
2018-06-14 01:29:26
2018-06-14 01:27:30
2018-06-13 22:44:03
2018-06-14 01:28:45
2018-06-14 01:27:37
2018-06-14 01:27:41
What I essentially want is for
2018-06-13 22:44:03
to be excluded from the query because it is not within the 5 minute window from the latest record Which in this data set is
2018-06-14 01:29:26
The one dynamic problem i seem to have is that the values for date-time are constantly updating.
Any ideas?
Thank you!
Here are two different solutions, each uses a table called "ASET".
ASET contains 20 records 1 minute apart:
WITH
aset (ttime, cnt)
AS
(SELECT systimestamp AS ttime, 1 AS cnt
FROM DUAL
UNION ALL
SELECT ttime + INTERVAL '1' MINUTE AS ttime, cnt + 1 AS cnt
FROM aset
WHERE cnt < 20)
select * from aset;
Now using ASET for our data, the following query finds the maximum date in ASET, and restricts the results to the six records within 5 minutes of ASET:
SELECT *
FROM aset
WHERE ttime >= (SELECT MAX (ttime)
FROM aset)
- INTERVAL '5' MINUTE;
An alternative is to use an analytic function:
with bset
AS
(SELECT ttime, cnt, MAX (ttime) OVER () - ttime AS delta
FROM aset)
SELECT *
FROM bset
WHERE delta <= INTERVAL '5' MINUTE

Teradata spool space issue on running a sub query with Count

I am using below query to calculate business days between two dates for all the order numbers. Business days are already available in the teradata table Common_WorkingCalendar. But, i'm also facing spool space issue while i execute the query. I have ample space available in my data lab. Need to optimize the query. Appreciate any inputs.
SELECT
tx."OrderNumber",
(SELECT COUNT(1) FROM Common_WorkingCalendar
WHERE CalDate between Cast(tx."TimeStamp" as date) and Cast(mf.ShipDate as date)) as BusDays
from StoreFulfillment ff
inner join StoreTransmission tx
on tx.OrderNumber = ff.OrderNumber
inner join StoreMerchandiseFulfillment mf
on mf.OrderNumber = ff.OrderNumber
This is a very inefficient way to get this count which results in a product join.
The recommended approach is adding a sequential number to your calendar which increases only on business days (calculated using SUM(CASE WHEN businessDay THEN 1 ELSE 0 END) OVER (ORDER BY CalDate ROWS UNBOUNDED PRECEDING)), then it's two joins, for the start date and the end date.
If this calculation is needed a lot you better add a new column, otherwise you can do it on the fly:
WITH cte AS
(
SELECT CalDate,
-- as this table only contains business days you can use this instead
row_number(*) Over (ORDER BY CalDate) AS DayNo
FROM Common_WorkingCalendar
)
SELECT
tx."OrderNumber",
to_dt.DayNo - from_dt.DayNo AS BusDays
FROM StoreFulfillment ff
INNER JOIN StoreTransmission tx
ON tx.OrderNumber = ff.OrderNumber
INNER JOIN StoreMerchandiseFulfillment mf
ON mf.OrderNumber = ff.OrderNumber
JOIN cte AS from_dt
ON from_dt.CalDate = Cast(tx."TimeStamp" AS DATE)
JOIN cte AS to_dt
ON to_dt.CalDate = Cast(mf.ShipDate AS DATE)

JOIN on DATEPART month and year is causing extra rows

I have two tables that contain a date field. This date field is one of the JOIN causes that I would like to implement, but I only want to JOIN on the month and year, not the day. The # of records about triple when I attempt to do so. I'm guessing there is something wrong with my query? Or is this even possible? I'm using Postgres
SELECT a.load_date , a.mandt, a.vbeln,a.posnr, a.matnr, b.tfed
FROM tableA a
JOIN tableB b
ON date_part('month'::text, a.erdat) = date_part('month'::text, b.gdatu)
AND date_part('year'::text, a.erdat) = date_part('year'::text, b.gdatu)
EDIT Here is my full code
SELECT a.mandt, a.vbeln,
a.erdat, a.erzet, a.ernam, a.angdt, a.audat, a.vbtyp, a.trvog,
a.auart, a.submi, a.lifsk, a.faksk, a.netwr, a.waerk, a.vkorg, a.vtweg, a.spart,
a.vkgrp, a.vkbur, a.knumv, a.vdatu, a.vprgr, a.kalsm, a.vsbed, a.fkara, a.awahr,
a.bstnk, a.bstdk, a.telf1, a.kunnr, a.stafo, a.stwae, a.aedat, a.kvgr1,a.kvgr2,
a.kvgr3, a.kokrs, a.kkber, a.knkli, a.sbgrp, a.ctlpc, a.cmwae, a.cmfre, a.cmngv,
a.amtbl, a.hityp_pr, a.abrvw, a.vgbel, a.objnr, a.bukrs_vf, a.taxk1,a.xblnr,
a.vgtyp, a.abhod, a.abhov, a.stceg_l, a.landtx, a.fmbdat, a.vsnmr_v, a.handle,
a.yybcawv1, a.yybcawv2, a.yybcawv3, a.yyawv1dat, a.yyawv2dat, a.yybcawvc,
a.kvgr5, a.augru, a.autlf, a.bname, a.bnddt, a.bsark, a.cmnup, a.fiscalper,
a.fiscalyr, a.gwldt, a.ihrez, a.intind, a.intsum, a.rplnr, a.taxk2, a.yybabt,
a.yybemail, a.yybfax, a.yybname, a.yybphone, a.yyexporter, a.yypaypal_id,
a.yysd_projid, a.zone, a.zuonr, a.zz_campaign_id, a.zzedate, a.zzrev_cat_01,
a.zzrev_cat_02, a.zzrev_cat_03, a.zzrev_cat_04, a.zzrev_cat_05, a.zzrev_cat_06,
a.zzrev_cat_07, a.zzrev_cat_08, a.zzsdate, a.mahdt,
CASE
WHEN b.fcurr::text = 'USD'::text THEN a.netwr
WHEN b.fcurr::text = 'JPY'::text AND b.kurst::text = 'M'::text THEN a.netwr * b.ukurs / 10::numeric
WHEN b.fcurr::text = 'KRW'::text AND b.kurst::text = 'M'::text THEN a.netwr * b.ukurs / 10::numeric
WHEN b.kurst::text = 'M'::text THEN a.netwr * b.ukurs
ELSE a.netwr
END AS net_value_trans_currency_netwr
FROM src.sap_vbak a
JOIN src.sap_tcurr b
ON a.waerk::text = b.fcurr::text
AND date_part('MONTH'::text, a.erdat::timestamp with time zone) = date_part('MONTH'::text, b.gdatu::timestamp with time zone)
AND date_part('YEAR'::text, a.erdat::timestamp with time zone) = date_part('YEAR'::text, b.gdatu::timestamp with time zone);
I'm attempting to get currency conversions based off of the dates (month and year only) in each of the tables. Some of the currency conversion are different ( the CASE statement for net_value_trans_currency_netwr field). I am wanting the net_value_trans_currency_netwr field to be a new row that displays the currency conversion in USD. The original table has over 5 million rows. After the joins I end up with way more rows. From what I gather I'm getting a full join. How would I be able to execute what I'm trying to do without the full join creating more than needed rows?
You get duplicate rows as you are INNER JOINING on the month and year which are not unique. This is causing a cross join e.g.
Example Rows with dates
Date Month Year
1 01/01/2014 01 14
2 02/01/2014 01 14
Result of above join has 4 rows not 2!
1) Month from (1) Year from (1)
2) Month from (1) Year from (2)
3) Month from (2) Year from (1)
4) Month from (2) Year from (2)
If you want to avoid this you need something else to include in the join that makes each join unique! Adding the day may help but again if you have more than one date recorded on the same day you will get a duplicate. Have a think what else you could include on the join.
Use date_trunc() to simplify the query:
SELECT a.load_date, a.mandt, a.vbeln,a.posnr, a.matnr, b.tfed
FROM tableA a
JOIN tableB b ON date_trunc('month', a.erdat)
= date_trunc('month', b.gdatu);
Plus, you probably want to restrict the join further. This is a limited cross join resulting in a Cartesian product. If you have 3 rows for March 2014 in a tableA and 4 rows for March 2014 in a tableB, you already produce 12 rows in the result.

TERADATA: Aggregate across multiple tables

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)
Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;