JOIN on DATEPART month and year is causing extra rows - sql

I have two tables that contain a date field. This date field is one of the JOIN causes that I would like to implement, but I only want to JOIN on the month and year, not the day. The # of records about triple when I attempt to do so. I'm guessing there is something wrong with my query? Or is this even possible? I'm using Postgres
SELECT a.load_date , a.mandt, a.vbeln,a.posnr, a.matnr, b.tfed
FROM tableA a
JOIN tableB b
ON date_part('month'::text, a.erdat) = date_part('month'::text, b.gdatu)
AND date_part('year'::text, a.erdat) = date_part('year'::text, b.gdatu)
EDIT Here is my full code
SELECT a.mandt, a.vbeln,
a.erdat, a.erzet, a.ernam, a.angdt, a.audat, a.vbtyp, a.trvog,
a.auart, a.submi, a.lifsk, a.faksk, a.netwr, a.waerk, a.vkorg, a.vtweg, a.spart,
a.vkgrp, a.vkbur, a.knumv, a.vdatu, a.vprgr, a.kalsm, a.vsbed, a.fkara, a.awahr,
a.bstnk, a.bstdk, a.telf1, a.kunnr, a.stafo, a.stwae, a.aedat, a.kvgr1,a.kvgr2,
a.kvgr3, a.kokrs, a.kkber, a.knkli, a.sbgrp, a.ctlpc, a.cmwae, a.cmfre, a.cmngv,
a.amtbl, a.hityp_pr, a.abrvw, a.vgbel, a.objnr, a.bukrs_vf, a.taxk1,a.xblnr,
a.vgtyp, a.abhod, a.abhov, a.stceg_l, a.landtx, a.fmbdat, a.vsnmr_v, a.handle,
a.yybcawv1, a.yybcawv2, a.yybcawv3, a.yyawv1dat, a.yyawv2dat, a.yybcawvc,
a.kvgr5, a.augru, a.autlf, a.bname, a.bnddt, a.bsark, a.cmnup, a.fiscalper,
a.fiscalyr, a.gwldt, a.ihrez, a.intind, a.intsum, a.rplnr, a.taxk2, a.yybabt,
a.yybemail, a.yybfax, a.yybname, a.yybphone, a.yyexporter, a.yypaypal_id,
a.yysd_projid, a.zone, a.zuonr, a.zz_campaign_id, a.zzedate, a.zzrev_cat_01,
a.zzrev_cat_02, a.zzrev_cat_03, a.zzrev_cat_04, a.zzrev_cat_05, a.zzrev_cat_06,
a.zzrev_cat_07, a.zzrev_cat_08, a.zzsdate, a.mahdt,
CASE
WHEN b.fcurr::text = 'USD'::text THEN a.netwr
WHEN b.fcurr::text = 'JPY'::text AND b.kurst::text = 'M'::text THEN a.netwr * b.ukurs / 10::numeric
WHEN b.fcurr::text = 'KRW'::text AND b.kurst::text = 'M'::text THEN a.netwr * b.ukurs / 10::numeric
WHEN b.kurst::text = 'M'::text THEN a.netwr * b.ukurs
ELSE a.netwr
END AS net_value_trans_currency_netwr
FROM src.sap_vbak a
JOIN src.sap_tcurr b
ON a.waerk::text = b.fcurr::text
AND date_part('MONTH'::text, a.erdat::timestamp with time zone) = date_part('MONTH'::text, b.gdatu::timestamp with time zone)
AND date_part('YEAR'::text, a.erdat::timestamp with time zone) = date_part('YEAR'::text, b.gdatu::timestamp with time zone);
I'm attempting to get currency conversions based off of the dates (month and year only) in each of the tables. Some of the currency conversion are different ( the CASE statement for net_value_trans_currency_netwr field). I am wanting the net_value_trans_currency_netwr field to be a new row that displays the currency conversion in USD. The original table has over 5 million rows. After the joins I end up with way more rows. From what I gather I'm getting a full join. How would I be able to execute what I'm trying to do without the full join creating more than needed rows?

You get duplicate rows as you are INNER JOINING on the month and year which are not unique. This is causing a cross join e.g.
Example Rows with dates
Date Month Year
1 01/01/2014 01 14
2 02/01/2014 01 14
Result of above join has 4 rows not 2!
1) Month from (1) Year from (1)
2) Month from (1) Year from (2)
3) Month from (2) Year from (1)
4) Month from (2) Year from (2)
If you want to avoid this you need something else to include in the join that makes each join unique! Adding the day may help but again if you have more than one date recorded on the same day you will get a duplicate. Have a think what else you could include on the join.

Use date_trunc() to simplify the query:
SELECT a.load_date, a.mandt, a.vbeln,a.posnr, a.matnr, b.tfed
FROM tableA a
JOIN tableB b ON date_trunc('month', a.erdat)
= date_trunc('month', b.gdatu);
Plus, you probably want to restrict the join further. This is a limited cross join resulting in a Cartesian product. If you have 3 rows for March 2014 in a tableA and 4 rows for March 2014 in a tableB, you already produce 12 rows in the result.

Related

PostgreSQL - How to avoid too many JOINs?

We have company_employee_count table tracking employee count by quarter. Table has a column for previous quarter count to quickly identify
the quarters in which employee count has increased or decreased compared to previous quarter. Now we need to write a SQL query to find
the companies that have consistently increased employee count in the last N quarters. i.e. if we pass four quarters 2019_Q1, 2019_Q2, 2019_Q3, 2019_Q4
quarters, we want to get the companies that have employee_count in all four quarters and current quarter count is more than previous quarter.
TABLE company_employee_count{
company_id
employee_count
quarter (Stored as 2019_Q1, 2019_Q2)
prev_quarter_employee_count
}
We are using postgresql.
I am currently using a query like this which has JOINs on each quarter data.
select * from
(select employee_count,company_id from company_employee_count where quarter='2019_Q4') as q4_19
inner join
(select employee_count,company_id from company_employee_count where quarter='2019_Q3') as q3_19
on q4_19.company_id=q3_19.company_id
inner join
(select employee_count,company_id from company_employee_count where quarter='2019_Q2') as q2_19
on q2_19.company_id=q3_19.company_id
inner join
(select employee_count,company_id from company_employee_count where quarter='2019_Q1') as q1_19
on q2_19.company_id=q1_19.company_id
where
q4_19.employee_count > q3_19.employee_count and
q3_19.employee_count > q2_19.employee_count and
q2_19.employee_count > q1_19.employee_count
I want to avoid JOINing quarterly data and be able to somehow leverage prev_quarter_employee_count.
Appreciate any help/suggestions.
Use conditional aggregation and a subquery:
select c.*
from (select company_id,,
max(employee_count) filter (where quarter = '2019_Q4') as q4_19,
max(employee_count) filter (where quarter = '2019_Q3') as q3_19,
max(employee_count) filter (where quarter = '2019_Q2') as q2_19,
max(employee_count) filter (where quarter = '2019_Q1') as q1_19
from company_employee_count
where quarter in ('2019-Q1', '2019-Q2', '2019-Q3', '2019-Q4')
group by company_id
) c
where q4_19 > q3_19 and q3_19 > q2_19 and q2_19 > q1_19

SQL - Grouping by Last Day of Quarter

I currently have a query running to average survey scores for agents. We use the date range of the LastDayOfTheQuarter and 180 days back to calculate these scores. I ran into an issue for this current quarter.
One of my agents hasn't received any surveys in 2020 which is causing the query to not pull the current lastdayofquarter and 180 days back of results.
The code I am using:
SELECT
Agent,
U.Position,
U.BranchDescription,
(ADDDATE(LastDayOfQuarter, -180)) AS MinDate,
(LastDayOfQuarter) AS MaxDate,
COUNT(DISTINCT Response ID) as SurveyCount,
AVG(CASE WHEN Question ID = Q1_2 THEN Answer Value END) AS EngagedScore,
AVG(CASE WHEN Question ID = Q1_3 THEN Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Question ID = Q1_6 THEN Answer Value END) AS ValuedScore
FROM qualtrics_responses
LEFT JOIN date D
ON (D.`Date`) = (DATE(`End Date`))
LEFT JOIN `users` U
ON U.`UserID` = `Agent ID`
WHERE `Agent` IS NOT NULL
AND DATE(`End Date`) <= (`LastDayOfQuarter`)
AND DATE(`End Date`) >= (ADDDATE(`LastDayOfQuarter`, -180))
GROUP BY `Agent`, (ADDDATE(`LastDayOfQuarter`, -180))
i know the issue is due to the way I am joining the dates and since he doesn't have a result in this current year, the end date to date join isn't grabbing the desired date range. I can't seem to come up with any alternatives. Any help is appreciated.
I make the assumption that table date in your query is a calendar table, that stores the starts and ends of the quarters (most likely with one row per date in the quarter).
If so, you can solve this problem by rearranging the joins: first cross join the users and the calendar table to generate all possible combinations, then bring in the surveys table with a left join:
SELECT
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter - interval 180 day AS MinDate,
D.LastDayOfQuarter AS MaxDate,
COUNT(DISTINCT Q.ResponseID) as SurveyCount,
AVG(CASE WHEN Q.QuestionID = 'Q1_2' THEN Q.Answer Value END) AS EngagedScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_3' THEN Q.Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_6' THEN Q.Answer Value END) AS ValuedScore
FROM date D
CROSS JOIN users U
LEFT JOIN qualtrics_responses Q
ON Q.EndDate >= D.Date
AND Q.EndDate < D.Date + interval 1 day
AND U.UserID = Q.AgentID
AND Q.Agent IS NOT NULL
GROUP BY
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter
Notes:
I adapted the date arithmetics - this assumes that you are using MySQL, as the syntax of the query suggests
You should really qualify all the columns in the query, by prefixing them with the alias of the table they belong to; this makes the query so much easier to understand. I gave a tried at it, you might need to review that.
All non-aggregated columns should appear in the group by clause (also see the comment from Eric); this is a a requirement in most databaseses, and good practice anywhere

YTD for the below query

I want to add add the Year to date component to this code. I have tried some other ways but I am not getting what I would like to see. Can someone please help me revised this to include the YTD in addition to the Month to date that is already there?
SELECT
COST__DESC,
ST.AD_SRV_MTN AS MONTH_OF_AD,
COUNT(DISTINCT CM.CM_NBR) AS CMS,
MEM_MO AS MBR_MTH,
CMS/MBR_MTH*1000 AS CMS_PER_1000
FROM XTR.FT_CM AS CM
JOIN XTR.FT_ST AS ST ON ST.CM_NBR = CM.CM_NBR
JOIN XTR.DIM_MED_CST AS MC ON ST.CST_CK = MCC.CST_CK
JOIN XTR.DIM_AF AS AFF ON ST.PRO_CK = AFF.AFF_CK
JOIN XTR.DIM_ADJDCTN_STAT AS A_S ON ST.ADJDCTN_STAT_CK = A_S.ADJDCTN_STAT_CK
JOIN XTR.DIM_ADJ_OT AS OT ON ST.ADJ_CK = OT.ADJ_CK
LEFT JOIN
(SELECT
CALENDAR_YEAR_MONTH as YEAR_MO,
SUM(MBR.COUNT_NBR) as MEM_MO
FROM XTR.FT_MBR_MONTHS MBR
INNER JOIN DIM_MBR_C ON MBR.DB_MBR_CK = DIM_MBR_C.DB_MBR_CK
AND MBR.DATE_CK BETWEEN DIM_MBR_C.DB_eff_date_ck
AND DIM_MBR_C.DB_END_DATE_CK
INNER JOIN DIM_DATE DT ON ELI_DATE_CK = DT.DATE_CK
WHERE MBR.F_C_CK = 500058321 AND YEAR_MO >= 201701
GROUP BY 1) MM ON ST.AD_SRV_MTN = MM.YEAR_MO
WHERE ST.F_C_CK = 500058321 AND ST.ST_START_DATE_CK >= 20200101
AND ST.AD_SRV_MTN > 201912 AND MC.MED_DESC IN ('Er', 'IP')
AND ST.AD_SRV_MTN < ((EXTRACT (YEAR FROM CURRENT_DATE) *100) +
EXTRACT (MONTH FROM CURRENT_DATE))
GROUP BY 1,2,4
ORDER BY 1,2
Honestly I don't really get your SQL and what is counted, but: Your can play with dates quite easy in Teradata, as Dates are stored (and can be used) internally as INTEGER. Just keep in mind year 1900 as year 0 and format YYYYMMDD.
So e.g. 16-Apr-2020 is in Format YYYYMMDD 20200416 and if you take 1900 as 0 you'll end up with 1200416 which is the internal format. Just try SELECT CURRENT_DATE (INT); - So if you want compare YearNumers you just have to divide by 10000.
With this your can implement YTD as SUM (CASE WHEN CURRENT_DATE/10000 = <YourDateField>/10000 THEN <YourKPI> else 0 END) as YourKPI_YTD. Counting can be done by SUM...THEN 1 ELSE 0 END....

Access SQL statement: if no entries are valid, return the last one

I am trying to get a SQL statement which solves the following issue.
I have a table "calendar" which includes only one column "date". This table has 12 entries for each month in 2019 (01.31.2019, 02.28.2019 and so on). The second table "values" (which I get from an ERP system) has three columns, "from", "to" and "amount" (e.g. 01.01.2019, 06.30.2019, 50 and 08.01.2019, 08.31.2019, 100).
I have this simple statement which checks which entry is valid on the specific date:
SELECT Calendar.Date, Values.From, Values.To, Values.Amount
FROM Calendar, [Values]
WHERE Calendar.Date >= Values.From
AND Calendar.Date <= Values.To;
There is no valid entry (in the table "values") for July, September, October, November and December.
In the case there is no valid entry the last entry should be used. In July it would be 50 and for September, October ... it would be 100.
I tried subquery and left joins, but I never got the wanted result.
Has anybody an idea or better a solution for this issue. I appreciate any support
I think that you are looking for an additional join on the Values table, that will return the last entry before the current date. When the first (LEFT) JOIN does not succeed, you can use the result returned by the second one.
To locate the last entry before the current date, we can use a NOT EXISTS condition with a correlated subquery.
SELECT
c.Date,
Nz(v.From, v1.From) AS [From],
Nz(v.To, v1.To) AS [To],
Nz(v.Amount, v1.Amount) AS [Amount]
FROM Calendar AS c
LEFT JOIN [Values] AS v
ON c.Date >= v.From AND c.Date <= v.To
LEFT JOIN [Values] AS v1
ON v1.To < c.Date
AND NOT EXISTS (
SELECT 1 FROM [Values] v2 WHERE v2.To < c.Date AND v2.To > v1.To
)
PS : it's been a good practice for a long time in SQL to avoid old-school, implicit JOINs, and always use explicit JOINs.
You can do it with a LEFT JOIN and a subquery to get the last amount:
SELECT c.Date, v.From, v.To,
Nz(
v.Amount,
(SELECT MAX([Values].Amount) FROM [Values] WHERE [Values].From =
(SELECT MAX([Values].From) FROM [Values] WHERE [Values].From <= c.Date))
) AS Amount
FROM Calendar AS c LEFT JOIN [Values] AS v
ON c.Date>=v.From AND c.Date<=v.To;

Teradata spool space issue on running a sub query with Count

I am using below query to calculate business days between two dates for all the order numbers. Business days are already available in the teradata table Common_WorkingCalendar. But, i'm also facing spool space issue while i execute the query. I have ample space available in my data lab. Need to optimize the query. Appreciate any inputs.
SELECT
tx."OrderNumber",
(SELECT COUNT(1) FROM Common_WorkingCalendar
WHERE CalDate between Cast(tx."TimeStamp" as date) and Cast(mf.ShipDate as date)) as BusDays
from StoreFulfillment ff
inner join StoreTransmission tx
on tx.OrderNumber = ff.OrderNumber
inner join StoreMerchandiseFulfillment mf
on mf.OrderNumber = ff.OrderNumber
This is a very inefficient way to get this count which results in a product join.
The recommended approach is adding a sequential number to your calendar which increases only on business days (calculated using SUM(CASE WHEN businessDay THEN 1 ELSE 0 END) OVER (ORDER BY CalDate ROWS UNBOUNDED PRECEDING)), then it's two joins, for the start date and the end date.
If this calculation is needed a lot you better add a new column, otherwise you can do it on the fly:
WITH cte AS
(
SELECT CalDate,
-- as this table only contains business days you can use this instead
row_number(*) Over (ORDER BY CalDate) AS DayNo
FROM Common_WorkingCalendar
)
SELECT
tx."OrderNumber",
to_dt.DayNo - from_dt.DayNo AS BusDays
FROM StoreFulfillment ff
INNER JOIN StoreTransmission tx
ON tx.OrderNumber = ff.OrderNumber
INNER JOIN StoreMerchandiseFulfillment mf
ON mf.OrderNumber = ff.OrderNumber
JOIN cte AS from_dt
ON from_dt.CalDate = Cast(tx."TimeStamp" AS DATE)
JOIN cte AS to_dt
ON to_dt.CalDate = Cast(mf.ShipDate AS DATE)