I have the task of developing a SQL query for analysing the behaviour of a customer through time.
So, I started with two tables; a calendar table (with each year-month-day of some years), and a sales table (with an ID and a purchase date of my interest) This is the query:
SELECT [Spice Id], FORMAT([Fecha venta],'yyyyMM') AS Purchase_Date
INTO #Sale_date
FROM SALES
WHERE [Spice Id] IS NOT NULL
GROUP BY [Spice Id], [Fecha venta]
enter image description here and enter image description here
Then I use a cross join to have all dates available even though the customer has no purchase in dates, I use a where clause to limit the table as I want to. Query below
SELECT [Spice Id], year, YearMonth, Purchase_Date, (Purchase_Date) AS First_purchase, (Purchase_Date) AS Last_purchase
INTO #Sorted
FROM #calendar
CROSS JOIN #Sale_date
WHERE year > 2019
AND Purchase_Date > 202001
AND Purchase_Date < FORMAT(GETDATE(),'yyyyMM')
AND YearMonth BETWEEN purchase_date AND FORMAT(GETDATE(), 'yyyyMM')
GROUP BY
[Spice Id], Year, YearMonth, Purchase_Date
ORDER BY
[Spice Id], Year ASC, YearMonth, Purchase_date
enter image description here
Then as you can see first purchase and lastpurchase are just the same as purchase date, so I update both values with the following:
----------UPDATE MIN
UPDATE #Sorted
enter code here`SET #SORTED.First_purchase = t1.minimo
FROM #Sorted
INNER JOIN
(SELECT [SPICE ID],MIN([First_purchase]) AS minimo
FROM #Sorted
GROUP BY [Spice Id])
AS t1 on t1.[spice id] = #Sorted.[spice id]
--------------Update Max
UPDATE #Sorted
SET #SORTED.Last_purchase = t1.maximo
FROM #SORTED
INNER JOIN
(SELECT [SPICE ID],MAX([Last_purchase]) AS maximo
FROM #Sorted
where Purchase_Date <= YearMonth
GROUP BY [Spice Id])
AS t1 on t1.[spice id] = #Sorted.[spice id]
So once I updated both values I got the following result, I'll use a specific ID to be more clear:
enter image description here
AS you can see, I have some mistakes Purchase_Date is not correctly ordered, I really don't mind that much because i can drop that column and leave just Start and Last, my big trouble is with last purchase, it should change overtime and update as in the following example I don't know if you find any mistakes in my logic or a better way to get to it, I wish to see the final table as this
enter image description here
I hope it is clear enough, thank you very much for your help!!
I'm not entirely sure your desired output but the updating logic to get the first and last purchase date is entirely unneccessary. You can use something like the below to always obtain these values
min(Purchase_date) over (partition by spice_id) as first_purchase
max(Purchase_date) over (partition by spice_id) as last_purchase
If you wish for purchase date to be ordered correctly then you need to evalute your order by clause, since this is last in the clause it is ordered after the first several columns have been ordered
Related
I'm trying to match the inventory for the day to the SKU.
Amazon_orders has the SKUs I want to match
Invhistory2 has the SKU, timestamp date (more than 1, here's the issue), and the inventory quantity
I'm trying to create a subquery that averages the inventory for the day, then join the timestamp and SKU to the SKUs on the amazon orders table. Null values are no issue here.
My code looks like this:
(SELECT AVG(quantity) AS qty FROM `perfect-obelisk-289514.inventory_history.invhistory2`) AS qtyok
FROM `perfect-obelisk-289514.reports.flat_file_orders_by_order_datereport` Amazon_Orders
LEFT JOIN (SELECT AVG(quantity) AS qty, sku, CAST(snapshot_date AS DATE) AS invdate
FROM `perfect-obelisk-289514.inventory_history.invhistory2`
GROUP BY sku, invdate) AS table2
ON (
Amazon_Orders.sku = table2.sku AND CAST(LEFT(Amazon_orders.purchase_date,10) AS DATE) = table2.invdate
)
The issue is that I get the same average for each row, it's not joining the quantity using SKU and date.
As you may notice I'm a beginner, looked thoroughly but can't find the solution, any help is appreciated!
Thanks for the help!
I managed to solve it. My code was redundant, I just needed this join:
LEFT JOIN (SELECT AVG(quantity) AS qty,
sku,
CAST(snapshot_date AS DATE) AS invdate
FROM `perfect-obelisk-289514.inventory_history.invhistory2`
GROUP BY sku, invdate) AS table2
ON
(Amazon_Orders.sku = table2.sku AND CAST(LEFT(Amazon_orders.purchase_date,10) AS DATE) = table2.invdate)
This gives me what I need
I am trying to wrap my head round this issue and I am sure the answer exists here a million times but then I am not searching for the right question.
I have a huge sales table [SALES] and I am extracting
SELECT DISTINCT S1.[ORDER ID], S1.SUPPLIER, SUM(S1.[ORDER TOTAL]) AS SUPPLIERTOTAL
FROM [SALES] S1
LEFT JOIN
(
Select s2.[Order ID], S2.[Supplier], S2.[Supplier Colour], SUM(S2.[Order TOTAL]) AS COLOURTOTAL
FROM [SALES]
WHERE [SALES].[SALESDATE] Between '20160101' and '20170101'
) AS s2
ON s1.[Order ID] = s2.[Order ID]
I have thrown this code together as an illustration as I am not by my work PC at present. My issue is that when I do get the re-select to work it produces the correct order value from the first select.
E.G Lets say the manufacturer was Ford and the total value was 100000 over ten orders it returns the 100000 correctly however on the sub select it appears to take the total value and multiply it by the total number of rows in the table. I am trying to work out what is going on with the data and query but cannot see the issue.
The only factor if its of influence is that the table has no primary key but as I am providing referential integrity with the join didn't believe that would be the case...
Anyone able to answer or come across this issue>
A bit of guessing here, as the question is not too clear, but I think you are looking for something like this:
SELECT S1.[ORDER ID], S1.SUPPLIER, SUM(S1.[ORDER TOTAL]) AS SUPPLIERTOTAL, SUM(S2.COLOURTOTAL) as COLOURTOTAL
FROM [SALES] S1
LEFT JOIN
(
Select s2.[Order ID], S2.[Supplier], S2.[Supplier Colour], S2.[Order TOTAL] AS COLOURTOTAL
FROM [SALES]
WHERE [SALES].[SALESDATE] Between '20160101' and '20170101'
) AS s2
ON s1.[Order ID] = s2.[Order ID]
GROUP BY S1.[ORDER ID], S1.SUPPLIER
I'm trying to sum a certain column over a certain date range. The kicker is that I want this to be a CTE, because I'll have to use it multiple times as part of a larger query. Since it's a CTE, it has to have the date column as well as the sum and ID columns, meaning I have to group by date AND ID. That will cause my results to be grouped by ID and date, giving me not a single sum over the date range, but a bunch of sums, one for each day.
To make it simple, say we have:
create table orders (
id int primary key,
itemID int foreign key references items.id,
datePlaced datetime,
salesRep int foreign key references salesReps.id,
price int,
amountShipped int);
Now, we want to get the total money a given sales rep made during a fiscal year, broken down by item. That is, ignoring the fiscal year bit:
select itemName, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName
Simple enough. But when you add anything else, even the price, the query spits out way more rows than you wanted.
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName, price
Now, each group is (name, price) instead of just (name). This is kind of sudocode, but in my database, just this change causes my result set to jump from 13 to 32 rows. Add to that the date range, and you really have a problem:
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between 150101 and 151231
group by itemName, price
This is identical to the last example. The trouble is making it a CTE:
with totals as (
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped, orderDate as startDate, orderDate as endDate
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between startDate and endDate
group by itemName, price, startDate, endDate
)
select totals_2015.itemName as itemName_2015, totals_2015.price as price_2015, ...
totals_2016.itemName as itemName_2016, ...
from (
select * from totals
where startDate = 150101 and endDate = 151231
) totals_2015
join (
select *
from totals
where startDate = 160101 and endDate = 160412
) totals_2016
on totals_2015.itemName = totals_2016.itemName
Now the grouping in the CTE is way off, more than adding the price made it. I've thought about breaking the price query into its own subquery inside the CTE, but I can't escape needing to group by the dates in order to get the date range. Can anyone see a way around this? I hope I've made things clear enough. This is running against an IBM iSeries machine. Thank you!
Depending on what you are looking for, this might be a better approach:
select 'by sales rep' breakdown
, salesRep
, '' year
, sum(price * amountShipped) amount
from etc
group by salesRep
union
select 'by sales rep and year' breakdown
, salesRep
, convert(char(4),orderDate, 120) year
, sum(price * amountShipped) amount
from etc
group by salesRep, convert(char(4),orderDate, 120)
etc
When possible group by the id columns or foreign keys because the columns are indexed already you'll get faster results. This applies to any database.
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte order by rep
or more fancy
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte join reps on cte.rep = reps.rep order by sls desc
I eventually found a solution, and it doesn't need a CTE at all. I wanted the CTE to avoid code duplication, but this works almost as well. Here's a thread explaining summing conditionally that does exactly what I was looking for.
I have got DB table schema ActInfo like this
customerName customerEmail CertificateID Activated_On
A A#xxx.com xxxx 2013-05-20 04:02:39.000
A A#xxxx.com xxxxx 2013-09-11 03:09:34.000
A A#xxxx.com xxxxx 2013-04-03 06:09:34.000
We can see from above data that A has activated certificate three times in a year but i need to give a warning that he has activated three times per year
Is it possible with storedprocedure to check the count if user has activated the certificate more than twice in same year...Certificate ID are same or not it does not matter.
Would any one please help on this query
Many thanks for advance...
In this approach, I start with the most recent activation per customer, then outer join to all the activations for that customer within the past year. We ultimately return the customer and the count of activations within the past year.
SELECT
DerivedLastActivationByCustomer.CustomerName AS [Customer Name],
COUNT(ActInfo.CustomerName) AS [Activations in Past Year]
FROM
(
SELECT
CustomerName,
MAX(Activated_On) AS [Last Activation]
FROM
ActInfo
GROUP BY
CustomerName
) DerivedLastActivationByCustomer
LEFT OUTER JOIN ActInfo ON DerivedLastActivationByCustomer.CustomerName = ActInfo.CustomerName AND DATEDIFF(d, ActInfo.ActivatedOn, DerivedLastActivationByCustomer.[Last Activation]) < 365
GROUP BY
DerivedLastActivationByCustomer.CustomerName
Now, if you want to turn this into a stored procedure, you have options. You don't specify how this SP should work. In the simplest possible form, you could use just the above query and return the recordset.
Or, you could take the customer as an input parameter (i.e. #Cust), then use it as part of the WHERE clause of the inner query to only return info on that one specific customer.
Another possible approach would be to put a WHERE clause on the outermost SELECT statement to only return those with three or more activations (i.e. WHERE [Activations in Past Year] >= 3
Based on comments, the SP would be:
CREATE PROCEDURE [dbo].[GetActivationsInPriorYear]
(
#Cust nvarchar(max),
#ActivationCount int OUTPUT
)
AS
SELECT
DerivedLastActivationByCustomer.CustomerName AS [Customer Name],
#ActivationCount = COUNT(ActInfo.CustomerName) --AS [Activations in Past Year]
FROM
(
SELECT
CustomerName,
MAX(Activated_On) AS [Last Activation]
FROM
ActInfo
WHERE
ActInfo.CustomerName = #Cust
GROUP BY
CustomerName
) DerivedLastActivationByCustomer
LEFT OUTER JOIN ActInfo ON DerivedLastActivationByCustomer.CustomerName = ActInfo.CustomerName AND DATEDIFF(d, ActInfo.ActivatedOn, DerivedLastActivationByCustomer.[Last Activation]) < 365
GROUP BY
DerivedLastActivationByCustomer.CustomerName
GO
I have a query, where I need the MIN of a DateTime field and then I need the value of a corresponding field in the same row.
Now, I have something like this, however I cannot get Price field without putting it also in an aggregate clause, which is not what I want.
SELECT MIN([Registration Time]), Price FROM MyData WHERE [Product Series] = 'XXXXX'
I need the MIN of the Registration Time field and then I just want the corresponding Price field for that row, however how do I show that?
I do also need my WHERE clause as shown.
I'm sure I've overlooked something really obvious. Using SQL Server 2008
If you want just one record with [Registration Time], Price, it'd be as simple as this:
select top 1 [Registration Time], Price
from MyData
where [Product Series] = 'XXXXX'
order by [Registration Time]
If you want minimum [Registration Time] and corresponding Price for all [Product Series], then there's a few approaches, for example, using row_number() function:
with cte as (
select
[Registration Time], Price,
row_number() over(partition by [Product Series] order by [Registration Time]) as rn
from MyData
)
select
[Registration Time], Price, [Product Series]
where rn = 1