PostgreSQL - How to avoid too many JOINs? - sql

We have company_employee_count table tracking employee count by quarter. Table has a column for previous quarter count to quickly identify
the quarters in which employee count has increased or decreased compared to previous quarter. Now we need to write a SQL query to find
the companies that have consistently increased employee count in the last N quarters. i.e. if we pass four quarters 2019_Q1, 2019_Q2, 2019_Q3, 2019_Q4
quarters, we want to get the companies that have employee_count in all four quarters and current quarter count is more than previous quarter.
TABLE company_employee_count{
company_id
employee_count
quarter (Stored as 2019_Q1, 2019_Q2)
prev_quarter_employee_count
}
We are using postgresql.
I am currently using a query like this which has JOINs on each quarter data.
select * from
(select employee_count,company_id from company_employee_count where quarter='2019_Q4') as q4_19
inner join
(select employee_count,company_id from company_employee_count where quarter='2019_Q3') as q3_19
on q4_19.company_id=q3_19.company_id
inner join
(select employee_count,company_id from company_employee_count where quarter='2019_Q2') as q2_19
on q2_19.company_id=q3_19.company_id
inner join
(select employee_count,company_id from company_employee_count where quarter='2019_Q1') as q1_19
on q2_19.company_id=q1_19.company_id
where
q4_19.employee_count > q3_19.employee_count and
q3_19.employee_count > q2_19.employee_count and
q2_19.employee_count > q1_19.employee_count
I want to avoid JOINing quarterly data and be able to somehow leverage prev_quarter_employee_count.
Appreciate any help/suggestions.

Use conditional aggregation and a subquery:
select c.*
from (select company_id,,
max(employee_count) filter (where quarter = '2019_Q4') as q4_19,
max(employee_count) filter (where quarter = '2019_Q3') as q3_19,
max(employee_count) filter (where quarter = '2019_Q2') as q2_19,
max(employee_count) filter (where quarter = '2019_Q1') as q1_19
from company_employee_count
where quarter in ('2019-Q1', '2019-Q2', '2019-Q3', '2019-Q4')
group by company_id
) c
where q4_19 > q3_19 and q3_19 > q2_19 and q2_19 > q1_19

Related

SQL - Grouping by Last Day of Quarter

I currently have a query running to average survey scores for agents. We use the date range of the LastDayOfTheQuarter and 180 days back to calculate these scores. I ran into an issue for this current quarter.
One of my agents hasn't received any surveys in 2020 which is causing the query to not pull the current lastdayofquarter and 180 days back of results.
The code I am using:
SELECT
Agent,
U.Position,
U.BranchDescription,
(ADDDATE(LastDayOfQuarter, -180)) AS MinDate,
(LastDayOfQuarter) AS MaxDate,
COUNT(DISTINCT Response ID) as SurveyCount,
AVG(CASE WHEN Question ID = Q1_2 THEN Answer Value END) AS EngagedScore,
AVG(CASE WHEN Question ID = Q1_3 THEN Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Question ID = Q1_6 THEN Answer Value END) AS ValuedScore
FROM qualtrics_responses
LEFT JOIN date D
ON (D.`Date`) = (DATE(`End Date`))
LEFT JOIN `users` U
ON U.`UserID` = `Agent ID`
WHERE `Agent` IS NOT NULL
AND DATE(`End Date`) <= (`LastDayOfQuarter`)
AND DATE(`End Date`) >= (ADDDATE(`LastDayOfQuarter`, -180))
GROUP BY `Agent`, (ADDDATE(`LastDayOfQuarter`, -180))
i know the issue is due to the way I am joining the dates and since he doesn't have a result in this current year, the end date to date join isn't grabbing the desired date range. I can't seem to come up with any alternatives. Any help is appreciated.
I make the assumption that table date in your query is a calendar table, that stores the starts and ends of the quarters (most likely with one row per date in the quarter).
If so, you can solve this problem by rearranging the joins: first cross join the users and the calendar table to generate all possible combinations, then bring in the surveys table with a left join:
SELECT
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter - interval 180 day AS MinDate,
D.LastDayOfQuarter AS MaxDate,
COUNT(DISTINCT Q.ResponseID) as SurveyCount,
AVG(CASE WHEN Q.QuestionID = 'Q1_2' THEN Q.Answer Value END) AS EngagedScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_3' THEN Q.Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_6' THEN Q.Answer Value END) AS ValuedScore
FROM date D
CROSS JOIN users U
LEFT JOIN qualtrics_responses Q
ON Q.EndDate >= D.Date
AND Q.EndDate < D.Date + interval 1 day
AND U.UserID = Q.AgentID
AND Q.Agent IS NOT NULL
GROUP BY
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter
Notes:
I adapted the date arithmetics - this assumes that you are using MySQL, as the syntax of the query suggests
You should really qualify all the columns in the query, by prefixing them with the alias of the table they belong to; this makes the query so much easier to understand. I gave a tried at it, you might need to review that.
All non-aggregated columns should appear in the group by clause (also see the comment from Eric); this is a a requirement in most databaseses, and good practice anywhere

Teradata spool space issue on running a sub query with Count

I am using below query to calculate business days between two dates for all the order numbers. Business days are already available in the teradata table Common_WorkingCalendar. But, i'm also facing spool space issue while i execute the query. I have ample space available in my data lab. Need to optimize the query. Appreciate any inputs.
SELECT
tx."OrderNumber",
(SELECT COUNT(1) FROM Common_WorkingCalendar
WHERE CalDate between Cast(tx."TimeStamp" as date) and Cast(mf.ShipDate as date)) as BusDays
from StoreFulfillment ff
inner join StoreTransmission tx
on tx.OrderNumber = ff.OrderNumber
inner join StoreMerchandiseFulfillment mf
on mf.OrderNumber = ff.OrderNumber
This is a very inefficient way to get this count which results in a product join.
The recommended approach is adding a sequential number to your calendar which increases only on business days (calculated using SUM(CASE WHEN businessDay THEN 1 ELSE 0 END) OVER (ORDER BY CalDate ROWS UNBOUNDED PRECEDING)), then it's two joins, for the start date and the end date.
If this calculation is needed a lot you better add a new column, otherwise you can do it on the fly:
WITH cte AS
(
SELECT CalDate,
-- as this table only contains business days you can use this instead
row_number(*) Over (ORDER BY CalDate) AS DayNo
FROM Common_WorkingCalendar
)
SELECT
tx."OrderNumber",
to_dt.DayNo - from_dt.DayNo AS BusDays
FROM StoreFulfillment ff
INNER JOIN StoreTransmission tx
ON tx.OrderNumber = ff.OrderNumber
INNER JOIN StoreMerchandiseFulfillment mf
ON mf.OrderNumber = ff.OrderNumber
JOIN cte AS from_dt
ON from_dt.CalDate = Cast(tx."TimeStamp" AS DATE)
JOIN cte AS to_dt
ON to_dt.CalDate = Cast(mf.ShipDate AS DATE)

Revenue year by year SQL Server query

I have the following query which provides me with the item and item details, values, rate and quantity across each location.
I am trying to get the yearly revenue based on the Start and End Date. Example, if the chosen date was 2013-2015. The final result will create 3 columns one for 2013 revenue, one for 2014 revenue and one for 2015 revenue.
I am a newbie and still not an expert in writing queries, but here is what I have currently:
SELECT
department,
item,
itemdesc,
qty1,
qty2,
rate_1,
rate_2,
SUM(mm.days*mm.rate*mm.qty)
FROM
items it
LEFT JOIN
(SELECT
i.days, i.rate, i.days, ii.todate, ii.itemid
FROM
invoiceofitems ii
JOIN
invoices i on i.id = ii.id
WHERE
ii.todate BETWEEN #StartDate and #EndDate) mm ON mm.itemid = it.itemid
GROUP BY
department,
item,
itemdesc,
qty1, qty2,
rate_1, rate_2
ORDER BY
item
However, this does not provide me with a year to year aggregation of invoice revenue that I require.
I know this is possible to achieve via iterating through this. But how would I accomplish this and where would I start on this?
Would I need to know the start and end date of each year and iterate through that and then add a counter to the year until year= EndDate?
I'm extremely confused. Help would be appreciated.
I hope that PIVOT and YEAR help you to solve this problem (some columns are omitted):
;WITH SRC(department,item, ... , rate_2, yr, calculation) AS
(SELECT it.department, it.item, ..., it.rate_2, YEAR(ii.todate) as yr,
(i.days * i.rate *i.qty) as calculation
FROM items it
LEFT JOIN invoiceofitems ii ON ii.itemid = it.itemid
JOIN invoices i ON i.id = ii.id)
SELECT department,item, ..., [2013],[2014],[2015]
FROM SRC
PIVOT
(SUM(calculation) FOR yr IN ([2013],[2014],[2015])) PVT
The YEAR function returns only 'year' part of your date and makes grouping easier. PIVOT just rotates grouped data from rows to columns.

JOIN on DATEPART month and year is causing extra rows

I have two tables that contain a date field. This date field is one of the JOIN causes that I would like to implement, but I only want to JOIN on the month and year, not the day. The # of records about triple when I attempt to do so. I'm guessing there is something wrong with my query? Or is this even possible? I'm using Postgres
SELECT a.load_date , a.mandt, a.vbeln,a.posnr, a.matnr, b.tfed
FROM tableA a
JOIN tableB b
ON date_part('month'::text, a.erdat) = date_part('month'::text, b.gdatu)
AND date_part('year'::text, a.erdat) = date_part('year'::text, b.gdatu)
EDIT Here is my full code
SELECT a.mandt, a.vbeln,
a.erdat, a.erzet, a.ernam, a.angdt, a.audat, a.vbtyp, a.trvog,
a.auart, a.submi, a.lifsk, a.faksk, a.netwr, a.waerk, a.vkorg, a.vtweg, a.spart,
a.vkgrp, a.vkbur, a.knumv, a.vdatu, a.vprgr, a.kalsm, a.vsbed, a.fkara, a.awahr,
a.bstnk, a.bstdk, a.telf1, a.kunnr, a.stafo, a.stwae, a.aedat, a.kvgr1,a.kvgr2,
a.kvgr3, a.kokrs, a.kkber, a.knkli, a.sbgrp, a.ctlpc, a.cmwae, a.cmfre, a.cmngv,
a.amtbl, a.hityp_pr, a.abrvw, a.vgbel, a.objnr, a.bukrs_vf, a.taxk1,a.xblnr,
a.vgtyp, a.abhod, a.abhov, a.stceg_l, a.landtx, a.fmbdat, a.vsnmr_v, a.handle,
a.yybcawv1, a.yybcawv2, a.yybcawv3, a.yyawv1dat, a.yyawv2dat, a.yybcawvc,
a.kvgr5, a.augru, a.autlf, a.bname, a.bnddt, a.bsark, a.cmnup, a.fiscalper,
a.fiscalyr, a.gwldt, a.ihrez, a.intind, a.intsum, a.rplnr, a.taxk2, a.yybabt,
a.yybemail, a.yybfax, a.yybname, a.yybphone, a.yyexporter, a.yypaypal_id,
a.yysd_projid, a.zone, a.zuonr, a.zz_campaign_id, a.zzedate, a.zzrev_cat_01,
a.zzrev_cat_02, a.zzrev_cat_03, a.zzrev_cat_04, a.zzrev_cat_05, a.zzrev_cat_06,
a.zzrev_cat_07, a.zzrev_cat_08, a.zzsdate, a.mahdt,
CASE
WHEN b.fcurr::text = 'USD'::text THEN a.netwr
WHEN b.fcurr::text = 'JPY'::text AND b.kurst::text = 'M'::text THEN a.netwr * b.ukurs / 10::numeric
WHEN b.fcurr::text = 'KRW'::text AND b.kurst::text = 'M'::text THEN a.netwr * b.ukurs / 10::numeric
WHEN b.kurst::text = 'M'::text THEN a.netwr * b.ukurs
ELSE a.netwr
END AS net_value_trans_currency_netwr
FROM src.sap_vbak a
JOIN src.sap_tcurr b
ON a.waerk::text = b.fcurr::text
AND date_part('MONTH'::text, a.erdat::timestamp with time zone) = date_part('MONTH'::text, b.gdatu::timestamp with time zone)
AND date_part('YEAR'::text, a.erdat::timestamp with time zone) = date_part('YEAR'::text, b.gdatu::timestamp with time zone);
I'm attempting to get currency conversions based off of the dates (month and year only) in each of the tables. Some of the currency conversion are different ( the CASE statement for net_value_trans_currency_netwr field). I am wanting the net_value_trans_currency_netwr field to be a new row that displays the currency conversion in USD. The original table has over 5 million rows. After the joins I end up with way more rows. From what I gather I'm getting a full join. How would I be able to execute what I'm trying to do without the full join creating more than needed rows?
You get duplicate rows as you are INNER JOINING on the month and year which are not unique. This is causing a cross join e.g.
Example Rows with dates
Date Month Year
1 01/01/2014 01 14
2 02/01/2014 01 14
Result of above join has 4 rows not 2!
1) Month from (1) Year from (1)
2) Month from (1) Year from (2)
3) Month from (2) Year from (1)
4) Month from (2) Year from (2)
If you want to avoid this you need something else to include in the join that makes each join unique! Adding the day may help but again if you have more than one date recorded on the same day you will get a duplicate. Have a think what else you could include on the join.
Use date_trunc() to simplify the query:
SELECT a.load_date, a.mandt, a.vbeln,a.posnr, a.matnr, b.tfed
FROM tableA a
JOIN tableB b ON date_trunc('month', a.erdat)
= date_trunc('month', b.gdatu);
Plus, you probably want to restrict the join further. This is a limited cross join resulting in a Cartesian product. If you have 3 rows for March 2014 in a tableA and 4 rows for March 2014 in a tableB, you already produce 12 rows in the result.

TERADATA: Aggregate across multiple tables

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)
Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;