In single select statement count number of orders within several time ranges - sql

Thanks in advance for any thoughts, advice, and suggestions!
System: SQL Server 2008 R2
I need to count for a given customer the number of repurchases within several different time intervals (date ranges), and display these counts in a single table. I get this working with several subsequent common table expressions (cte) which I finally join together. This way, however, is cumbersome and rather inefficient (in terms of performance speed).
The SQL code I expected to be shortest and fastest, however, does not work for several reasons and will return error messages like
“ the subqueries (Select (count …….) will return several values and hence “cannot be used as an expression”
or
Another error message is: “An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.”
Please find below a sample table (WDB), the desired result table (WDB_result) and the SQL code that need improvement. Thanks a lot to everyone who may help!
Sample WDB Table:
CustomerID: customer ID
InNo: invoice number
OrderDate: order date
Result table WDB_result:
Columns
A) total number of repurchases
B) number of repurchases within the first 3 months
C) number of repurchases within the first 6 months
D) number of repurchases within the first 12 months
E) number of repurchases with last 3 months
F) number of repurchases with last 6 months
G) number of repurchases with last 12 months
Sample SQL Code to calculate columns A, B, und E:
SELECT
CustomerID
, COUNT(InNo) OVER (PARTITION by CustomerID) -1) as Norepurchases_Total
, (SELECT (COUNT(InNo) OVER (PARTITION by CustomerID) -1) as Count3
FROM WDB
WHERE OrderDate between MIN(OrderDate) and DATEADD(month, 3, MIN(OrderDate))
) as Norepurchases_1st_3months
, (SELECT (COUNT(InNo) OVER (PARTITION by CustomerID) -1) as Count3
FROM WDB
WHERE OrderDate between MAX(OrderDate) and DATEPART(y, DATEADD(m, -3, getdate()))
) as NoRepurchases_Last_3months
FROM WDB;

Typically what I would do in a situation like this is something like
SELECT CustormerID,
SUM(
CASE
WHEN OrderDate > #ThreeMonthsAgo AND OrderDate <= #CurrentDate
1
ELSE 0
END
) InLast3Months,
SUM(
CASE
WHEN OrderDate > #SixMonthsAgo AND OrderDate <= #ThreeMonthsAgo
1
ELSE 0
END
) InLast3To6Months,
...
FROM YourTable
GROUP BY CustomerID
This will alow you to determine the buckets beforehand as variables, as shown, and then count how many items falls in which buckets.

This is a very interesting query and I think what you're after can be achieved if you read over this stackoverflow article on multiple aggregate functions.
Applying the same concept as is used in this question should solve your problem.

Related

Executing a Aggregate function within a case without Group by

I am trying to assign a specific code to a client based on the number of gifts that they have given in the past 6 months using a CASE. I am unable to use WITH (screenshot) due to the limitations of the software that I am creating the query in. It only allows for select functions. I am unsure how to get a distinct count from another table (transaction data) and use that as parameters in the CASE I have currently built (based on my client information table). Does anyone know of any workarounds for this? I am unable to GROUP BY clientID at the end of my query because not all of my columns are aggregate, and I only need to GROUP BY clientID for this particular WHEN statement in the CASE. I have looked into the OVER() clause, but I am needing my date range that I am evaluating to be dynamic (counting transactions over the last six months), and the amount of rows that I would be including is variable, as the transaction count month to month varies. Also, the software that I am building this in does not recognize the PARTITIONED BY parameter of the over clause.
Any help would be great!
EDIT:
it is not letting me attach an image... -____- I have added the two sections of code that I am looking for assistance with!
WITH "6MonthGIftCount" (
"ConstituentID"
,"GiftCount"
)
AS (
SELECT COUNT(DISTINCT "GiftView"."GiftID" FROM "GiftView" WHERE MONTHS_BETWEEN("GiftView"."GiftDate", getdate()) <= 6 GROUP BY "GiftView"."ConstituentID")
SELECT...CASE
WHEN "6MonthGiftCount"."GiftCount" >= 4
THEN 'A010'
)
Perform your grouping/COUNT(1) in a subquery to obtain the total # of donations by ConstituentID, then JOIN this total into your main query that uses this new column to perform its CASE statement.
select
hist.*,
case when timesDonated > 5 then 'gracious donor'
when timesDonated > 3 then 'repeated donor'
when timesDonated >= 1 then 'donor'
else null end as donorCode
from gifthistory hist
left join ( /* your grouping subquery here, pretending to be a new table */
select
personID,
count(1) as timesDonated
from gifthistory i
WHERE abs(months_between(giftDate, sysdate)) <= 6
group by personid ) grp on hist.personid = grp.personID
order by 1;
*Naturally, syntax changes will vary by DB; you didn't specify which it was based on, but you should be able to use this template with whichever you utilize. This works in both Oracle and SQL Server after tweaking the month calculation appropriately.

Impala get the difference between 2 dates excluding weekends

I'm trying to get the day difference between 2 dates in Impala but I need to exclude weekends.
I know it should be something like this but I'm not sure how the weekend piece would go...
DATEDIFF(resolution_date,created_date)
Thanks!
One approach at such task is to enumerate each and every day in the range, and then filter out the week ends before counting.
Some databases have specific features to generate date series, while in others offer recursive common-table-expression. Impala does not support recursive queries, so we need to look at alternative solutions.
If you have a table wit at least as many rows as the maximum number of days in a range, you can use row_number() to offset the starting date, and then conditional aggregation to count working days.
Assuming that your table is called mytable, with column id as primary key, and that the big table is called bigtable, you would do:
select
t.id,
sum(
case when dayofweek(dateadd(t.created_date, n.rn)) between 2 and 6
then 1 else 0 end
) no_days
from mytable t
inner join (select row_number() over(order by 1) - 1 rn from bigtable) n
on t.resolution_date > dateadd(t.created_date, n.rn)
group by id

Redshift: Find MAX in list disregarding non-incremental numbers

I work for a sports film analysis company. We have teams with unique team IDs and I would like to find the number of consecutive weeks they have uploaded film to our site moving backwards from today. Each upload also has its own row in a separate table that I can join on teamid and has a unique date of when it was uploaded. So far I put together a simple query that pulls each unique DATEDIFF(week) value and groups on teamid.
Select teamid, MAX(weekdiff)
(Select teamid, DATEDIFF(week, dateuploaded, GETDATE()) as weekdiff
from leroy_events
group by teamid, weekdiff)
What I am given is a list of teamIDs and unique weekly date differences. I would like to then find the max for each teamID without breaking an increment of 1. For example, if my data set is:
Team datediff
11453 0
11453 1
11453 2
11453 5
11453 7
11453 13
I would like the max value for team: 11453 to be 2.
Any ideas would be awesome.
I have simplified your example assuming that I already have a table with weekdiff column. That would be what you're doing with DATEDIFF to calculate it.
First, I'm using LAG() window function to assign previous value (in ordered set) of a weekdiff to the current row.
Then, using a WHERE condition I'm retrieving max(weekdiff) value that has a previous value which is current_value - 1 for consecutive weekdiffs.
Data:
create table leroy_events ( teamid int, weekdiff int);
insert into leroy_events values (11453,0),(11453,1),(11453,2),(11453,5),(11453,7),(11453,13);
Code:
WITH initial_data AS (
Select
teamid,
weekdiff,
lag(weekdiff,1) over (partition by teamid order by weekdiff) as lag_weekdiff
from
leroy_events
)
SELECT
teamid,
max(weekdiff) AS max_weekdiff_consecutive
FROM
initial_data
WHERE weekdiff = lag_weekdiff + 1 -- this insures retrieving max() without breaking your consecutive increment
GROUP BY 1
SQLFiddle with your sample data to see how this code works.
Result:
teamid max_weekdiff_consecutive
11453 2
You can use SQL window functions to probe relationships between rows of the table. In this case the lag() function can be used to look at the previous row relative to a given order and grouping. That way you can determine whether a given row is part of a group of consecutive rows.
You still need overall to aggregate or filter to reduce the number of rows for each group of interest (i.e. each team) to 1. It's convenient in this case to aggregate. Overall, it might look like this:
select
team,
case min(datediff)
when 0 then max(datediff)
else -1
end as max_weeks
from (
select
team,
datediff,
case
when (lag(datediff) over (partition by team order by datediff) != datediff - 1)
then 0
else 1
end as is_consec
from diffs
) cd
where is_consec = 1
group by team
The inline view just adds an is_consec column to the data, marking whether each row is part of a group of consecutive rows. The outer query filters on that column (you cannot filter directly on a window function), and chooses the maximum datediff from the remaining rows for each team.
There are a few subtleties there:
The case expression in the inline view is written as it is to exploit the fact that the lag() computed for the first row of each partition will be NULL, which does not evaluate unequal (nor equal) to any value. Thus the first row in each partition is always marked consecutive.
The case testing min(datediff) in the outer select clause picks up teams that have no record with datediff = 0, and assigns -1 to column max_weeks for them.
It would also have been possible to mark rows non-consecutive if the first in their group did not have datediff = 0, but then you would lose such teams from the results altogether.

SQL: Average value per day

I have a database called ‘tweets’. The database 'tweets' includes (amongst others) the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss), ‘classified’ and 'processed text'. Within the ‘processed text’ row there are certain strings such as {TICKER|IBM}', to which I will refer as ticker-strings.
My target is to get the average value of ‘classified’ per ticker-string per day. The row ‘classified’ includes the numerical values -1, 0 and 1.
At this moment, I have a working SQL query for the average value of ‘classified’ for one ticker-string per day. See the script below.
SELECT Date( `created_at` ) , AVG( `classified` ) AS Classified
FROM `tweets`
WHERE `processed_text` LIKE '%{TICKER|IBM}%'
GROUP BY Date( `created_at` )
There are however two problems with this script:
It does not include days on which there were zero ‘processed_text’s like {TICKER|IBM}. I would however like it to spit out the value zero in this case.
I have 100+ different ticker-strings and would thus like to have a script which can process multiple strings at the same time. I can also do them manually, one by one, but this would cost me a terrible lot of time.
When I had a similar question for counting the ‘tweet_id’s per ticker-string, somebody else suggested using the following:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG,
coalesce(BAC, 0) AS BAC
FROM dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
This script worked perfectly for counting the tweet_ids per ticker-string. As I however stated, I am not looking to find the average classified scores per ticker-string. My question is therefore: Could someone show me how to adjust this script in such a way that I can calculate the average classified scores per ticker-string per day?
SELECT d.date, t.ticker, COALESCE(COUNT(DISTINCT tweet_id), 0) AS tweets
FROM dates d
LEFT JOIN
(SELECT DATE(created_at) AS date,
SUBSTR(processed_text,
LOCATE('{TICKER|', processed_text) + 8,
LOCATE('}', processed_text, LOCATE('{TICKER|', processed_text))
- LOCATE('{TICKER|', processed_text) - 8)) t
ON d.date = t.date
GROUP BY d.date, t.ticker
This will put each ticker on its own row, not a column. If you want them moved to columns, you have to pivot the result. How you do this depends on the DBMS. Some have built-in features for creating pivot tables. Others (e.g. MySQL) do not and you have to write tricky code to do it; if you know all the possible values ahead of time, it's not too hard, but if they can change you have to write dynamic SQL in a stored procedure.
See MySQL pivot table for how to do it in MySQL.

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate