Presto SQL full outer join - sql

I've got a full join that isn't working like how I expected it to.
I'm joining on 4 columns. However, one of the columns has some blank values, causing some numbers to come up[ unjoint. When I take this column out, the query works. However, I have to have this certain column for the report as it's how we are reporting on the numbers.
Query in question: (the column that doesn't work is the OM.line_item_id)
SELECT DBM.dated, DBM.line_item, DBM.line_item_id, DBM.insertion_order, DBM.insertion_order_id, DBM.device_type, DBM.market, DBM.impressions, DBM.clicks, DBM.amount_spent_EUR, OM.orders, OM.revenue, OM.device
FROM
(
SELECT
DATE_FORMAT(DATE_PARSE(date,'%Y/%m/%d'),'%Y-%m-%d') AS dated, line_item, line_item_id, insertion_order, insertion_order_id, device_type, trim(SPLIT_PART(insertion_order,'|',3)) AS market, cast(impressions as double) as impressions, cast(clicks as double) as clicks, CAST(media_cost_advertiser_currency AS DOUBLE)*1.15 AS amount_spent_EUR
FROM ralph_lauren_google_sheet_dbm_data
WHERE dated >= '2019-03-31'
AND dated <= {{days_ago 1}}
GROUP BY 1,2,3,4,5,6,7,8,9,10
)DBM
FULL outer JOIN
(
SELECT dated, line_item_id, device, market, sum(orders) as orders, sum(revenue)+sum(shipping_revenue)-sum(coupon_discount) as revenue
FROM
(
select
dated,
utm_content_v21 as line_item_id,
order_currency_code_v33_evar33,
case lower(mobile_device_type)
when 'other' then 'Desktop'
when 'tablet' then 'Tablet'
when 'mobile phone' then 'Smart Phone'
else 'Other'
End as device,
case geosegmentation_countries
when 'united kingdom' then 'UK'
when 'germany' then 'DE'
when 'france' then 'FR'
when 'italy' then 'IT'
when 'spain' then 'ES'
else 'other'
end as market,
sum(cast(orders as bigint))as orders,
case
WHEN lower(order_currency_code_v33_evar33) LIKE '%gbp%' THEN sum(TRY_CAST(revenue AS DOUBLE)*1.15)
ELSE Sum(TRY_CAST(revenue AS DOUBLE)*1)
END as revenue,
CASE
WHEN lower(order_currency_code_v33_evar33) LIKE '%gbp%' THEN sum(TRY_CAST(order_level_shipping_revenue_e62_event62 AS DOUBLE)*1.15)
ELSE sum(TRY_CAST(order_level_shipping_revenue_e62_event62 AS DOUBLE)*1)
END as shipping_revenue,
CASE
WHEN lower(order_currency_code_v33_evar33) LIKE '%gbp%' THEN sum(TRY_CAST(order_level_coupon_discount_e77_event77 AS DOUBLE)*1.15)
ELSE sum(TRY_CAST(order_level_coupon_discount_e77_event77 AS DOUBLE)*1)
END as coupon_discount
from ralph_lauren_ftp_all_eu_markets_ltc
WHERE dated >= '2019-03-31'
AND dated <= {{days_ago 1}}
and last_touch_channel like 'Retargeting'
and lower(utm_medium_v21) not like '%fbig%'
and cast(orders as bigint) > 0
group by
1,2,3,4,5
)
GROUP BY
1,2,3 ,4-- revenue numbers are getting duplicated for some reason
)OM
ON
DBM.dated = OM.dated AND DBM.line_item_id = OM.line_item_id and DBM.device_type = OM.device AND DBM.market = OM.market
wouldn't a full outer join allow me to join up numbers if they have the other three columns being joint up?
thanks

Related

Calculate the percentage change with respect to the previous year, quarter and category

Using ORACLE SQL, I have a query that gives me the output in the following table(posting an image). However, I need to figure out a way to get the percentage change between year, quarter and metric in an additional column.
Example: Year 2022, Q1, apple against Year 2021, Q1, apple.
I'm relatively new to SQL so I'm not sure if I need to sort the output differently to use the function LEAD, or if there is a better way to do it in general.
My current query with my attempt at the percent change with lead (that didn't work) is like this:
`
SELECT s.YEAR
, t.quarter
, CASE WHEN fruits IN ('tangerine','lemon') THEN 'orange'
ELSE fruits
END metric
, COUNT(DISTINCT s.ID) AS COUNT
-- , ROUND((COUNT(UNIQUE s.ID) - LEAD(COUNT(UNIQUE s.ID)) OVER (ORDER BY t.quarter))/COUNT(UNIQUE s.ID)*100,2) pct_change
FROM s
JOIN sc
ON S.KEY = SC.KEY
JOIN c
ON SC.KEY = C.KEY
JOIN t
ON s.quarter = t.quarter
WHERE S.YEAR BETWEEN '2021' AND '2022'
AND s.quarter IN ('1','2','3')
GROUP BY S.YEAR
, t.quarter
,CASE WHEN fruits IN ('tangerine','lemon') THEN 'orange'
ELSE fruits
END
`
With my query as (
...your original query as above goes here...
)
Select curr.year, curr.quarter, curr.metric
, curr.count as current_count
, Case When prior.count is null then prior.count else if prior.count = 0 then null else (curr.count - prior.count)/ prior.count*100 as pct_chg
From my_query curr Left Outer Join my_query prior
On curr.year-1 = prior.year and
curr.qtr=prior.qtr and
curr.metric=prior.metric

SQL Case When Slowing Down Query

What I'm looking to do is quantify the total value of purchases and the number of months in which a purchase was made within three different timeframes by account. I only want to look at accounts who made a purchase between 1-1-2020 and 4-1-2021.
I'm wondering if there is a more streamlined way to pull in the fields I'm creating using CASE WHEN below (maybe through a series of queries to create the calculations and the left joining?). This query is taking extremely long to pull back, so I'd like to enhance this code where I can. All of my code and desired output is listed below. Thank you!
Creating a temporary table to pull account numbers:
DROP TABLE IF EXISTS #accounts
SELECT DISTINCT s.account_no, c.code, c.code_desc
INTO #accounts
FROM sales AS s
LEFT JOIN customer AS c ON s.account_no = c.account_no
WHERE s.tran_date BETWEEN '2020-01-01' AND '2021-04-01'
GROUP BY s.account_no, c.code, c.code_desc;
Confirming row counts:
SELECT COUNT (*)
FROM #accounts
ORDER BY account_no;
Creating Sales and Sales period count columns for three timeframes:
SELECT
s.account_no, c.code, c.code_desc
SUM(CASE
WHEN s.tran_date BETWEEN '2020-01-01' AND '2021-04-01'
THEN VALUE_USD
END) AS Total_Spend_Pre,
SUM(CASE
WHEN s.tran_date BETWEEN '2021-04-01' AND '2022-03-31'
THEN VALUE_USD
END) Total_Spend_During,
SUM(CASE
WHEN s.tran_date > '2022-04-01'
THEN VALUE_USD
END) Total_Spend_Post,
COUNT(DISTINCT CASE WHEN s.tran_date BETWEEN '2020-01-01' AND '2021-04-01' THEN CONCAT(s.bk_month, s.bk_year) END) Pre_Periods,
COUNT(DISTINCT CASE WHEN s.tran_date BETWEEN '2021-04-01' AND '2022-03-31' THEN CONCAT(s.bk_month, s.bk_year) END) During_Periods,
COUNT(DISTINCT CASE WHEN s.tran_date > '2022-04-01' THEN CONCAT(s.bk_month, s.bk_year) END) Post_Periods
FROM
sales AS s
LEFT JOIN
customer AS c ON s.account_no = c.account_no
WHERE
c.account_no IN (SELECT DISTINCT account_no
FROM #accounts)
GROUP BY
s.account_no, c.code, c.code_desc;
Desired output:
account_no
code
code_desc
Total_Spend_Pre
Total_Spend_During
Total_Spend_Post
Pre_Periods
During_Periods
Post_Periods
25
1234
OTHER
1000
2005
500
2
14
5
11
5678
PC
500
100
2220
5
11
2
You may use your date ranges to join with dataset, and 'Tag' your result like below, this will result in 3 rows, for each group. If you need them in a single row, have PIVOTE over it
;With DateRanges AS (
SELECT CAST('2020-01-01' AS DATE) StartDate, CAST('2021-04-01' AS DATE) EndDate, 'Pre' Tag UNION
SELECT '2021-04-01', '2022-03-31', 'During' UNION
SELECT '2022-04-01', Null, 'Post'
)
SELECT s.account_no, c.code, c.code_desc, d.Tag,
SUM(VALUE_USD) AS Total_Spend,
COUNT(DISTINCT CONCAT(s.bk_month, s.bk_year)) RecordCount
FROM sales as s
LEFT JOIN customer as c
INNER JOIN DateRanges D ON s.tran_date BETWEEN D.StartDate AND ISNULL(D.EndDate,s.tran_date)
ON s.account_no = c.account_no
WHERE c.account_no IN (SELECT DISTINCT account_no FROM #accounts)
GROUP BY s.account_no, c.code, c.code_desc;
with [cte_accountActivityPeriods] as (
select [PeriodOrdinal] = 1, [PeriodName] = 'Total Spend Pre', [PeriodStart] = convert(date,'2020-01-01',23) , [PeriodFinish] = convert(date,'2021-03-31',23) union
select [PeriodOrdinal] = 2, [PeriodName] = 'Total Spend During', [PeriodStart] = convert(date,'2021-04-01',23) , [PeriodFinish] = convert(date,'2022-03-31',23) union
select [PeriodOrdinal] = 3, [PeriodName] = 'Total Spend Post', [PeriodStart] = convert(date,'2022-04-01',23) , [PeriodFinish] = convert(date,'9999-12-31',23)
)
, [cte_allsalesForActivityPeriod]
SELECT s.account_no, bk_month, bk_year, [PeriodOrdinal], s.tran_date, s.value_usd
FROM sales as s
cross join [cte_accountActivityPeriods]
on s.[tran_date] between [cte_ActivityPeriods].[PeriodStart] and [cte_ActivityPeriods].[PeriodFinish]
)
, [cte_uniqueAccounts] as ( /*Unique and qualifying Accounts*/
select distinct account_no from [cte_allsalesForActivityPeriod]
inner join #accounts accs on accs.[account_no] = [cte_allsalesForActivityPeriod].[account_no]
)
, [cte_AllSalesAggregatedByPeriod] as (
select account_no, [PeriodOrdinal], bk_month, bk_year, [PeriodTotalSpend] = sum([value_usd])
from [cte_allsalesForActivityPeriod]
group by s.account_no, [PeriodOrdinal], bk_month, bk_year
)
, [cte_PeriodAnalysis] as (
select account_no, [PeriodOrdinal], [ActivePeriods] = count(distinct concat(bk_month, bk_year))
from [cte_AllSalesAggregatedByPeriod]
group by s.account_no, [PeriodOrdinal]
)
, [cte_pivot_clumsily] as (
/* Aggregations already done - so simple pivot */
select [cte_uniqueAccounts].[account_no]
, [Total_Spend_Pre] = case when [SaleVal].[PeriodOrdinal] in (1) then [SaleVal].[PeriodTotalSpend] else 0 end
, [Total_Spend_During] = case when [SaleVal].[PeriodOrdinal] in (2) then [SaleVal].[PeriodTotalSpend] else 0 end
, [Total_Spend_Post] = case when [SaleVal].[PeriodOrdinal] in (3) then [SaleVal].[PeriodTotalSpend] else 0 end
, [Pre_Periods] = case when [SalePrd].[PeriodOrdinal] in (1) then [SalePrd].[ActivePeriods] else 0 end
, [During_Periods] = case when [SalePrd].[PeriodOrdinal] in (2) then [SalePrd].[ActivePeriods] else 0 end
, [Post_Periods] = case when [SalePrd].[PeriodOrdinal] in (3) then [SalePrd].[ActivePeriods] else 0 end
from [cte_uniqueAccounts]
left join [cte_AllSalesAggregatedByPeriod] [SaleVal] on [SaleVal].[account_no] = [cte_uniqueAccounts].[account_no]
left join [cte_PeriodAnalysis] [SalePrd] on [SalePrd].[account_no] = [cte_uniqueAccounts].[account_no]
)
select c.code, c.code_desc, [cte_pivot_clumsily].*
from [cte_pivot_clumsily]
LEFT JOIN customer as c
ON [cte_pivot_clumsily].account_no = c.account_no

T-SQL Using CTE to aggregate totals for matching and non-matching periods

Matched Sales are provided by the join, It's getting the unmatched that is eluding me.
CTE
With PriorSalesCTE
(
Item
Variant,
Sum(sales)
Date between 7/1/2020 and 7/5/2020
),
CurrentSalesCTE
(
Item
Variant,
Sum(sales)
Date between 7/1/2021 and 7/5/2021
)
Select
SUM(cs.Sales) ‘MatchedSales’
FROM PriorSalesCTE ps join CurrentSalesCTE ps
ON cs.Item = ps.Item
And cs.Variant = ps.Variant
Now I need the empty spaces on both sides
I need the sales for items sold in 2020 but not sold in 2021 – Lost Sales
Conversely, sales for 2021 that did not sell in 2020 – New Sales.
I tried adding these in the CTE as separate sections of the CTE, but the join doesn’t give me what I need.
Any suggestions? Is the CTE simply preventing me for getting everything and maybe add a UNION ALL query to get the unmatched values?
For your actual query, you could use a FULL JOIN, which will give you the results from either side also.
But I think there is another solution: you don't need to join separate queries for this, you can just use conditional aggregation
WITH SalesByItem AS (
SELECT
t.Item,
t.Variant
Sales2020 = SUM(CASE WHEN Date BETWEEN '20200701' and '20200705' THEN t.Sales END),
Sales2021 = SUM(CASE WHEN Date BETWEEN '20210701' and '20210705' THEN t.Sales END)
FROM YourTable t
WHERE (Date BETWEEN '20200701' and '20200705'
OR Date BETWEEN '20210701' and '20210705')
GROUP BY
t.Item,
t.Variant
)
SELECT
NewSales = SUM(CASE WHEN Sales2020 IS NULL THEN Sales2021 END),
MatchedSales = SUM(CASE WHEN Sales2020 IS NOT NULL AND Sales2021 IS NOT NULL THEN Sales2021 END),
LostSales = SUM(CASE WHEN Sales2021 IS NULL THEN Sales2020 END)
FROM SalesByItem s;

SQL query to use group by to get the sum of two different columns within a date range

I have two tables time track and absence for an employee.
person_number Measure start_Date end_date Time_type
73636 10 01-Jan-2020 02-Jan-2020 Double
73636 24 06-Jan-2020 08-jan-2020 Double
73636 10 15-Jan-2020 25-Jan-2020 Regular Pay
73636 11.9 06-Jan-2020 08-jan-2020 Double
73636 27 10-Jan-2020 15-Jan-2020 Regular Pay
Absence det
person_number start_Date end_date duration Absence_type
73636 05-Jan-2020 10-Jan-2020 10 Vacation
73636 06-Jan-2020 18-jan-2020 9 Paid Leave
73636 20-Jan-2020 21-jan-2020 1 Paid Leave
Now when i pass the from and to date as 01-Jan-2020 and 31-Jan-2020, the output should look like -
Person_Number Double Regular Hour_code hour_amount
73636 31.9 37 Paid Leave 10
The hour_code should have only "Paid Leave" and no other absences
Now I have written the below query for this
SELECT
distinct person_number,
sum(
CASE
WHEN elements = 'Double' THEN measure
END
) AS OT_Hours,
sum(
CASE
WHEN elements LIKE 'Regular Pay%' THEN measure
END
) AS regular_measure_hours,
sum(
CASE
WHEN absence_name IN ('Paid Leave') THEN absence_duration
END
) AS hour3_amount,
max(
CASE
WHEN absence_name IN ('Paid Leave') THEN 'Paid Leave'
END
) AS hour3_code
FROM
(
select
person_number,
Time_type elements,
Absence_type absence_name,
duration,
measure
from
time_track_tab,
abs_tab,
per_all_people_F papf
where
time_track_tab.person_id = abs_tab.person_id
and abs_tab.person_id = papf.person_id
and abs_tab.Absence_type = 'Paid Leave'
)
group by
person_number
This is giving me multiple row output and calculation of sum is not coming correctly as in between the to and from date there are different dates present for both absence and time track.
My requirement is to calculate the sum of ALL the duration and measure column within these parameter dates. How can i tweak my query to get the correct sum between these dates ?
Is there a way to use partition by or group by or anything else to calculation these correctly in the column
You probably need to group both tables first then join them together to avoid the cross join.
select person_number, TimeTrack.DoublePay, TimeTrack.Regular,
Absenses.Hour_code, Absenses.hour_amount from
per_all_people_F papf,
(select
person_id, sum(duration) as hour_amount, Absence_type as Hour_code
from
abs_tab
where
abs_tab.Absence_type = 'Paid Leave'
and
start_Date between '2020-01-01' and '2020-01-31'
group by person_id,Absence_type
) Absenses,
(select
person_id,
sum(case when Time_type = 'Double' then Measure end) as DoublePay,
sum(case when Time_type = 'Regular Pay' then Measure end) as Regular
from time_track_tab
where
start_Date between '2020-01-01' and '2020-01-31'
group by person_id
) TimeTrack
where
papf.person_id = TimeTrack.person_id
and
papf.person_id = Absenses.person_id
and
papf.person_id = 73636
I made a SqlFiddle if you want to play with it
http://sqlfiddle.com/#!9/03e460/36
Also my 2 cents; I'd recommend left outer joining from the per_all_people_F table or else people without absenses will get filtered out.
See if, what you need is something like this:
select * from
(SELECT person_number,
sum(
CASE
WHEN Time_type = 'Double' THEN measure
END
) AS Double,
sum(
CASE
WHEN Time_type = ('Regular Pay') THEN measure
END
) AS regular
from time_track_tab
group by person_number
) A
inner join
(SELECT
person_number,
sum(
CASE
WHEN Absence_type = 'Vacation' THEN duration
END
) AS Vacation,
sum(
CASE
WHEN Absence_type = ('Paid Leave') THEN duration
END
) AS paidLeave
from abs_tab
group by person_number
)B on A.person_number = B.person_number
here the fiddle:
http://sqlfiddle.com/#!4/21253/2

merging transactions but keeping all demography columns

I have multiple transactions for the same people and i want to merge them all and get the total spent for each but i want to keep all of their demography variables in the same table, when i have tried some codes like SELECT DISTINCT it will just delete some duplicates instead of merging, my target is to put the customers into 2 groups Low and High value.
Either someone who bought 1 item for $950 or who bought multiple items and spent more than $2500. (each transaction is just a single item no transaction has more than 1 item.
Here is my code so far. (i am preparing it for SAS)
Select
CUS.FirstName
,CUS.LastName
,CUS.NumberChildrenAtHome
,CUS.CommuteDistance
,CUS.CustomerKey
,FIS.SalesAmount
,CUS.Gender
,CUS.MaritalStatus
,CUS.HouseOwnerFlag
,CUS.NumberCarsOwned
,CUS.YearlyIncome
,CUS.TotalChildren
,CUS.EnglishEducation AS Education
,floor(DATEDIFF(DAY,BirthDate,getdate()))/365.25 AS AGE
,CASE
WHEN FIS.UnitPrice >=950 OR FIS.SalesAmount >=2500 THEN 'High Value'
ELSE 'Low Value'
END AS 'Customer Value'
From dbo.FactInternetSales AS FIS
LEFT JOIN DBO.DimCustomer AS CUS
ON FIS.CustomerKey = CUS.CustomerKey
LEFT JOIN dbo.DimSalesTerritory AS DST
ON FIS.SalesTerritoryKey = DST.SalesTerritoryKey
This is my first time using this type of question/answer site so sorry if i did something wrong.
Thank you
You should first SUM BY the amounts, then you can categorise based on the totals. The inner query will give you the sum for each person. The outer query will set the category based on your condition.
-- EDITED --
SELECT
innerQuery.*,
CASE WHEN Total_Unit_Price >= 950
OR Total_Sales_Amount >= 2500 THEN 'High Value' ELSE 'Low Value' END AS 'Customer Value'
FROM
(
Select
CUS.FirstName,
CUS.LastName,
CUS.NumberChildrenAtHome,
CUS.CommuteDistance,
CUS.CustomerKey,
FIS.SalesAmount,
CUS.Gender,
CUS.MaritalStatus,
CUS.HouseOwnerFlag,
CUS.NumberCarsOwned,
CUS.YearlyIncome,
CUS.TotalChildren,
CUS.EnglishEducation AS Education,
floor(DATEDIFF(DAY, BirthDate, getdate())) / 365.25 AS AGE,
SUM(FIS.UnitPrice) AS Total_Unit_Price,
SUM(FIS.SalesAmount) AS Total_Sales_Amount
From
dbo.FactInternetSales AS FIS
LEFT JOIN DBO.DimCustomer AS CUS ON FIS.CustomerKey = CUS.CustomerKey
LEFT JOIN dbo.DimSalesTerritory AS DST ON FIS.SalesTerritoryKey = DST.SalesTerritoryKey
GROUP BY
CUS.FirstName,
CUS.LastName,
CUS.NumberChildrenAtHome,
CUS.CommuteDistance,
CUS.CustomerKey,
FIS.SalesAmount,
CUS.Gender,
CUS.MaritalStatus,
CUS.HouseOwnerFlag,
CUS.NumberCarsOwned,
CUS.YearlyIncome,
CUS.TotalChildren,
CUS.EnglishEducation,
floor(DATEDIFF(DAY, BirthDate, getdate())) / 365.25
) innerQuery