I have a table books where users donate number of books:
username books date
____________________________________
Jon 3 2017-06-12
Jon 2 2017-05-20
Mary 4 2017-05-12
I want something like
username This month Previous Month
_______________________________________________
Jon 3 2
You will need some kind of grouping or conditional sum operations. The simplest way to get the result in your question - though which may not actually scale to your needs - is:
select username
,sum(case when dateadd(datediff(month,0,[date]),0) = dateadd(datediff(month,0,getdate()),0) then books else 0 end as ThisMonth
,sum(case when dateadd(datediff(month,0,[date]),0) = dateadd(datediff(month,0,getdate())-1,0) then books else 0 end as PreviousMonth
from books
where username = 'Jon'
group by username
Obviously with more details in your question you could get a better suited answer and one that is not specific to SQL Server should you be on a different DBMS. That said, the conditional and date logic is the same in all other DBMS implementations, you may just need to change the function names.
This works by adding the number of months between an arbitrary start date (the 0s in the functions above) and a date value to get the date at the start of the month. The dates returned by this logic for both your date value and today (the getdate() function) will return the same start of the month where they are both within the same calendar month. By adding one less month (the -1 in the script above) you get the start of last month, which you can then also compare to your date value to get the number of books for last month.
The sum and group by is there to 'flatten' your data into the one row per username value.
Related
Let me start by saying that I am somewhat new to SQL/Snowflake and have been putting together queries for roughly 2 months. Some of my query language may not be ideal and I fully understand if there's a better, more efficient way to execute this query. Any and all input is appreciated. Also, this particular query is being developed in Snowflake.
My current query is pulling customer volumes by department and date based on a 45 day window with a 24 day lookback from current date and a 21 day look forward based on scheduled appointments. Each date is grouped based on where it falls within that 45 day window: current week (today through next 7 days), Week 1 (forward-looking days 8-14), and Week 2 (forward-looking days 15-21). I have been working to try and build out a comparison column that, for any date that lands within either the Week 1 or Week 2 group, will pull in prior period volumes from either 14 days prior (Week 1) or 21 days prior (Week 2) but am getting nowhere. Is there a best-practice for this type of column? Generic example of the current output is attached. Please note that the 'Prior Wk' column in the sample output was manually populated in an effort to illustrate the way this column should ideally work.
I have tried several different iterations of count(case...) similar to that listed below; however, the 'Prior Wk' column returns the count of encounters/scheduled encounters for the same day rather than those that occurred 14 or 21 days ago.
Count(Case When datediff(dd,SCHED_DTTM,getdate())
between -21 and -7 then 1 else null end
) as "Prior Wk"
I've tried to use an IFF statement as shown below, but no values return.
(IFF(ENCOUNTER_DATE > dateadd(dd,8,getdate()),
count(case when ENC_STATUS in (“Phone”,”InPerson”) AND
datediff(dd,ENCOUNTER_Date,getdate()) between 7 and 14 then 1
else null end), '0')
) as "Prior Wk"
Also have attempted creating and using a temporary table (example included) but have not managed to successfully pull information from the temp table that didn't completely disrupt my encounter/scheduled counts. Please note for this approach I've only focused on the 14 day group and have not begun to look at the 21 day/Week 2 group. My attempt to use the temp table to resolve the problem centered around the following clause (temp table alias: "Date1"):
CASE when AHS.GL_Number = "DATEVISIT1"."GL_NUMBER" AND
datevisit1.lookback14 = dateadd(dd,14,PE.CONTACT_Date)
then "DATEVISIT1"."ENC_Count"
else null end
as "Prior Wk"*
I am extremely appreciative of any insight on the current best practices around pulling prior period data into a column alongside current period data. Any misuse of terminology on my part is not deliberate.
I'm struggling to understand your requirement but it sounds like you need to use window functions https://docs.snowflake.com/en/sql-reference/functions-analytic.html, in this case likely a SUM window function. The LAG window function, https://docs.snowflake.com/en/sql-reference/functions/lag.html, might also be of some help
I am pretty new to SQL, but i need to use it for my new job as the project requires it and as I am a non-IT-guy, it is more difficult for me, because thats my first time I work professionally with SQL.
Hopefully you can help me with it: (Sry for my english, i am a non-native speaker)
I need to start a query where I get unequal IDs from 2 different reference dates.
So I have one Table with following data:
DATES ID AMOUNT SID
201910 122424 99999 1
201911 41241242 99999 2
201912 12412424 -22222 3
...
GOAL:
So the ID's from the DATE: 201911 shall be compared with those from 201910
and the query should show me the unequal ID's. So only the unmatched ID's shall be displayed.
Out of this query, the Amount should be summed up and grouped into SIDs.
If you have two dates and you want sids that are only on one of them, then:
select sid
from t
where date in (201911, 201910)
group by sid
having count(distinct date) = 1;
I have a table with sales information at the transaction level. We want to institute a new model where we compensate sales reps if a customer has been makes a purchase after more than a year of dormancy. To figure out how much this would have cost historically, I want to add a column with a flag for whether or not each purchase was the Buyer's first in the past 365 days. What I'd like to do is a rowcount in Powerpivot, for all sales made by that customer in the past 365 days, and wrap it in an IF to set the result to 0 or 1.
Example:
Order Date Buyer First Purchase in Year?
1/1/2015 1 1
1/2/2015 2 1
2/1/2015 1 0
4/1/2015 2 0
3/1/2016 2 1
5/1/2017 2 1
Any assistance would be greatly appreciated.
Excellent business use case! It's quite relevant in the business world.
To break this down for you, I will create 3 columns: 2 with some calculations, and 1 with the result. Once you understood how I did this, you can combine all 3 column formulas and make a single column for your dataset, if you like.
Here's a picture of the results:
So here's the 3 columns that I created:
Last Purchase - in order to run this calculation, you need to know when the buyer made their last purchase.
CALCULATE(MAX([Order Date]),FILTER(Table1,[Order Date]<EARLIER([Order Date]) && [Buyer]=EARLIER([Buyer])))
Days Since Last Purchase - now you can compare the Last Purchase date to the current Order Date.
DATEDIFF([Last Purchase],[Order Date],DAY)
First Purchase in 1 Year - finally, the results column. This simply checks to see if it has been more than 365 days since the last purchase OR if the last purchase column is blank (which means it was the first purchase), and creates the flag you want.
IF([Days Since Last Purchase]>365 || ISBLANK([Days Since Last Purchase]),1,0)
Now, you can easily combine the logic of these 3 columns into a single column and get what you want. Hope this helps!
One note I wanted to add is that for this type of analysis it's not a wise move to do row counts as you had originally suggested, as your dataset can easily expand later on (what if you wanted to add more attribute columns?) and then you would have problems. So this solution that I shared with you is much more robust.
My dataset provides a monthly snapshot of customer accounts. Below is a very simplified version:
Date_ID | Acc_ID
------- | -------
20160430| 1
20160430| 2
20160430| 3
20160531| 1
20160531| 2
20160531| 3
20160531| 4
20160531| 5
20160531| 6
20160531| 7
20160630| 4
20160630| 5
20160630| 6
20160630| 7
20160630| 8
Customers can open or close their accounts, and I want to calculate the number of 'new' customers every month. The number of 'exited' customers will also be helpful if this is possible.
So in the above example, I should get the following result:
Month | New Customers
------- | -------
20160430| 3
20160531| 4
20160630| 1
Basically I want to compare distinct account numbers in the selected and previous month, any that exist in the selected month and not previous are new members, any that were there last month and not in the selected are exited.
I've searched but I can't seem to find any similar problems, and I hardly know where to start myself - I've tried using CALCULATE and FILTER along with DATEADD to filter the data to get two months, and then count the unique values. My PowerPivot skills aren't up to scratch to solve this on my own however!
Getting the new users is relatively straightforward - I'd add a calculated column which counts rows for that user in earlier months and if they don't exist then they are a new user:
=IF(CALCULATE(COUNTROWS(data),
FILTER(data, [Acc_ID] = EARLIER([Acc_ID])
&& [Date_ID] < EARLIER([Date_ID]))) = BLANK(),
"new",
"existing")
Once this is in place you can simply write a measure for new_users:
=CALCULATE(COUNTROWS(data), data[customer_type] = "new")
Getting the cancelled users is a little harder because it means you have to be able to look backwards to the prior month - none of the time intelligence stuff in PowerPivot will work out of the box here as you don't have a true date column.
It's nearly always good practice to have a separate date table in your PowerPivot models and it is a good way to solve this problem - essentially the table should be 1 record per date with a unique key that can be used to create a relationship. Perhaps post back with a few more details.
This is an alternative method to Jacobs which also works. It avoids creating a calculated column, but I actually find the calculated column useful to use as a flag against other measures.
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, LASTDATE('Dates'[Date])
)
) - CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, FIRSTDATE('Dates'[Date]) - 1
)
)
It basically uses the dates table to make a distinct count of all Acc_ID from the beginning of time until the first day of the period of time selected, and subtracts that from the distinct count of all Acc_ID from the beginning of time until the last day of the period of time selected. This is essentially the number of new distinct Acc_ID, although you can't work out which Acc_ID's these are using this method.
I could then calculate 'exited accounts' by taking the previous months total as 'existing accounts':
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATEADD('Dates'[Date], -1, MONTH)
)
Then adding the 'new accounts', and subtracting the 'total accounts':
=DISTINCTCOUNT('Accounts'[Acc_ID])
I know
(DateAdd("s",-1,
DateAdd("q",DateDiff("q","1/1/1900",
DateAdd("yyyy",-1,Date())),"1/1/1900")),
"Short Date")
returns the last day of a quarter 1 year ago.
All of the NAV_Dates are the last day of each quarter, and have a value associated with them which makes the row unique. (Closing value titled as NetAssetValue)
How can I use that (or something similar), to get the value associated with the ending year quarterly date, and subtract it from the value of the current quarter's ending value. Note: I do not have to use this, it's just the only SQL I know that will return a value to somewhat close to what I need.
The table's values would be set up similar to this:
+----------+--------------+
|NAV_Date |NetAssetValue |
+----------+--------------+
|12/31/2012| $4,000|
+----------+--------------+
|03/31/2013| $5,000|
+----------+--------------+
The Year to Date would then be (5,000/4,000) - 1 and saved as a percent. Another example would be:
+----------+--------------+
|NAV_Date |NetAssetValue |
+----------+--------------+
|12/31/2012| $4,000|
+----------+--------------+
|06/30/2013| $4,025|
+----------+--------------+
Year to Date calculation: (4,025/4,000) - 1 and saved as a percent.
I know it involves a nested subquery (or possibly more than one) and that we'd essentially have to capture the current quarter's end, use that value, and capture the prior year's quarter end and use that value also. Just not quite sure how to do it.
You were on the right track considering a correlated subquery for this. I think you want the subquery to return the year-end NetAssetValue for each quarterly record.
I hope the WHERE clause in this query makes the logic clear. However it would force the subquery to run the Year() function against every row in the table. Even so, you may be satisfied with the performance if the table is small enough.
SELECT
y1.NAV_Date,
y1.NetAssetValue,
(
SELECT TOP 1 y2.NetAssetValue
FROM YourTable AS y2
WHERE Year(y2.NAV_Date) = Year(y1.NAV_Date)
ORDER BY y2.NAV_Date DESC
) AS YearEndValue
FROM YourTable AS y1;
I think the following WHERE clause should offer better performance than the one above, assuming NAV_Date is indexed. However, you may find it less intuitive. If so, try the first version and then work on this one later if you need it:
WHERE y2.NAV_Date <= DateSerial(Year(y1.NAV_Date), 12, 31)
Beware, in the current year, the query will return NetAssetValue from the most recent quarterly record as YearEndValue, even though the year hasn't ended. I don't know what else you would want in that situation.
Finally, the query should give you NetAssetValue and YearEndValue for each quarter. All you have left is to add your calculation which uses those values.