Finding column count with null data in two other columns - sql

I'm attempting to find the amount of subjects that only have a baseline weight and not a weight at 6 Months or 12 months. The database provided has roughly 8000 entries and we need to create a query to find this information - they mention it can be achieved through joining - but I continually get results that are tied in with 6 months weight and 12 months weight when I only need that data from baseline weight. For instance, I easily found the data for the amount of people that have baseline weight and 6 months weight
SELECT DEMO.ID, DEMO.BL_WGT, [SIX MOS].WEIGHT_6MOS
FROM DEMO RIGHT JOIN [SIX MOS] ON DEMO.ID = [SIX MOS].ID;
I can't for the life of my understand how to qualify this data to find only those entries with baseline weight. Any help would be appreciated thank you! Here is the exact question from the assignment.
Part V. Create a new table named WT_LOSS_ALL that has the following fields:
ID Number
Baseline Weight
6-Month Weight
12-Month Weight
Use the proper join(s) in order to answer the follow questions:
How many participants have only Baseline weight? Baseline & 6 Month Weight? All 3 measures? (5 points)

Try this:
Assumption is the default value for the field is null and baseline always have a value
Baseline weight only
Select count(id) from WT_LOSS_ALL where 6mos is null and 12mos is null
Baseline weight and 6mos
Select count(id) from WT_LOSS_ALL where 6mos is not null and 12mos is null
All 3
Baseline weight and 6mos
Select count(id) from WT_LOSS_ALL where 6mos is not null and 12mos is not null

Related

How to calculated on created fields? Why the calculation is wrong?

I am working on the workforce analysis project. And I did some case when conditional calculations in Google Data Studio. However, when I successfully conducted the creation of the new field, I couldn't do the calculation again based on the fields I created.
Based on my raw data, I generated the start_headcount, new_hires, terminated, end_headcount by applying the Case When conditional calculations. However, I failed in the next step to calculate the Turnover rate and Retention rate.
The formula for Turnover rate is
terms/((start_headcount+end_headcount)/2)
for retention is
end_headcount/start_headcount
However, the result is wrong. Part of my table is as below:
Supervisor sheadcount newhire terms eheadcount turnover Retention
A 1 3 1 3 200% 0%
B 6 2 2 6 200% 500%
C 6 1 3 4 600% 300%
So the result is wrong. The turnover rate for A should be 1/((1+3)/2)=50%; For B should be 2/((6+6)/2)=33.33%.
I don't know why it is going wrong. Can anyone help?
For example, I wrote below for start_headcount for each employee
CASE
WHEN Last Hire Date<'2018-01-01' AND Termination Date>= '2018-01-01'
OR Last Hire Date<'2018-01-01' AND Termination Date IS NULL
THEN 1
ELSE 0
END
which means if an employee meets the above standard, will get 1. And then they all grouped under a supervisor. I think it might be the problem why the turnover rate in sum is wrong since it is not calculated on the grouped date but on each record and then summed up.
Most likely you are trying to do both steps within the same query and thus newly created fields like start_headcount, etc. not visible yet within the same select statement - instead you need to put first calculation as a subquery as in example below
#standardSQL
SELECT *, terms/((start_headcount+end_headcount)/2) AS turnover
FROM (
<query for your first step>
)

Expression for - Row by row variance on grouped datasets

I am having troubles with creating an expression in SSRS.
I'd like to calculate the difference between two figures. The columns are in separate datasets and are grouped. They also show a total at the end of each group.
Eg
Dataset 1 Dataset 2
Month Workshops which Ran Month Workshops which Ran Variance
Apr 40 Apr 30 10
May 50 May 40 10
Jun 45 Jun 35 10
Q1 Total 135 Q1 Total 105 30
The quarters then carry on but, you get the picture.
Is there a way to make an expression to calculate the variance column even though the data is grouped and in different datasets?
Any help would really be appreciated :)
Will
If we assume:
There could be voids in either data set, we could use a full outer join and coalesce.
You want the absolute difference for variance (no negatives)
You want to display the month and workshops which ran in all cases.
Neither dataset would span more than 1 year's period. (if they did we would need the aggregate datasets to contain year along with month and include it on the join)
The Q1 total value (or others) exists in both data sets and is spelled the same.
.
SELECT DS1.Month as [DS1 Month]
, DS1.[Workshops which Ran] as [DS1 Workshops which Ran]
, DS2.Month as [DS2 Month]
, DS2.[Workshops which Ran] as [DS2 Workshops which Ran]
, abs(coalesce(DS1.[Workshops which Ran],0) - coalesce(DS2.[Workshops which Ran],0)) as [Variance]
FROM Dataset1 DS1
FULL OUTER JOIN Dataset2 DS2
on DS1.Month = DS2.Month
The best way is to create a dataset with all your data in one place. If you can't do this for whatever reason, and the data in the datasets is more details than the aggregated data you are showing in your example, then check this post.
http://salvoz.com/blog/2013/05/27/sum-result-of-ssrs-lookupset-function/

Calculating the number of new ID numbers per month in powerpivot

My dataset provides a monthly snapshot of customer accounts. Below is a very simplified version:
Date_ID | Acc_ID
------- | -------
20160430| 1
20160430| 2
20160430| 3
20160531| 1
20160531| 2
20160531| 3
20160531| 4
20160531| 5
20160531| 6
20160531| 7
20160630| 4
20160630| 5
20160630| 6
20160630| 7
20160630| 8
Customers can open or close their accounts, and I want to calculate the number of 'new' customers every month. The number of 'exited' customers will also be helpful if this is possible.
So in the above example, I should get the following result:
Month | New Customers
------- | -------
20160430| 3
20160531| 4
20160630| 1
Basically I want to compare distinct account numbers in the selected and previous month, any that exist in the selected month and not previous are new members, any that were there last month and not in the selected are exited.
I've searched but I can't seem to find any similar problems, and I hardly know where to start myself - I've tried using CALCULATE and FILTER along with DATEADD to filter the data to get two months, and then count the unique values. My PowerPivot skills aren't up to scratch to solve this on my own however!
Getting the new users is relatively straightforward - I'd add a calculated column which counts rows for that user in earlier months and if they don't exist then they are a new user:
=IF(CALCULATE(COUNTROWS(data),
FILTER(data, [Acc_ID] = EARLIER([Acc_ID])
&& [Date_ID] < EARLIER([Date_ID]))) = BLANK(),
"new",
"existing")
Once this is in place you can simply write a measure for new_users:
=CALCULATE(COUNTROWS(data), data[customer_type] = "new")
Getting the cancelled users is a little harder because it means you have to be able to look backwards to the prior month - none of the time intelligence stuff in PowerPivot will work out of the box here as you don't have a true date column.
It's nearly always good practice to have a separate date table in your PowerPivot models and it is a good way to solve this problem - essentially the table should be 1 record per date with a unique key that can be used to create a relationship. Perhaps post back with a few more details.
This is an alternative method to Jacobs which also works. It avoids creating a calculated column, but I actually find the calculated column useful to use as a flag against other measures.
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, LASTDATE('Dates'[Date])
)
) - CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, FIRSTDATE('Dates'[Date]) - 1
)
)
It basically uses the dates table to make a distinct count of all Acc_ID from the beginning of time until the first day of the period of time selected, and subtracts that from the distinct count of all Acc_ID from the beginning of time until the last day of the period of time selected. This is essentially the number of new distinct Acc_ID, although you can't work out which Acc_ID's these are using this method.
I could then calculate 'exited accounts' by taking the previous months total as 'existing accounts':
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATEADD('Dates'[Date], -1, MONTH)
)
Then adding the 'new accounts', and subtracting the 'total accounts':
=DISTINCTCOUNT('Accounts'[Acc_ID])

SQL change over time query

I have created 2 tables. one table has 4 fields. a unique name, a date and 3 figures. The second table contains the same fields but records the output of a merge function. therefore has a date at which time the update or insert function happened. what I want to do is retrieve a sum of either the difference between 2 days or alternatively the totals of the 2 days to work out how much the value has changed over the day. The merge function only updates if a value has changed or it needs to insert a new value.
so far I have this
select sum(Change_Table_1.Disk_Space) as total,
Change_Table_1.Date_Updated
from VM_Info
left join Change_Table_1
on VM_Info.VM_Unique = Change_Table_1.VM_Unique
where VM_Info.Agency = 'test'
group by Change_Table_1.Date_Updated
but this would just return the sum of that days updated total rather than the difference between the two days. One answer to this question would be to to add all new records to the table but this would contain a number of duplicates. So in my head what I want it to do is loop over the current figures for the day then loop over the next day but also to include all values that haven't updated. sorry if I haven't explained this well. so what I want to achieve is to get some sort of change of the total over time. If its poor design im in a position to accept that also.
Any help is much appreciated.
maybe this would explain it better. show me total for day 1, if the value hasn't changed then show me the same value for day 2 if it has changed show me new value. and so on...
ok to further elaborate.
the Change_Table looks like
vm date created action value_Field1 value_field_2 Disk_Space
abc 14/10/2013 insert 5 5 30
def 14/10/2013 insert 5 5 75
abc 15/10/2013 update 5 5 75
so the out put I want is for the 14th the total for the last column is 105. On the 15th abc has changed from 30 to 75 but def hasn't changed but still neds to be included giving 150
so the output would look like
date disk_Space
14/10/2013 105
15/10/2013 150
Does this help? If not, can you provide a few rows of sample data, and an example of the desired result?
select
(VM_Info.Disk_Space - Change_Table_1.Disk_Space) as DiskSpaceChange,
Change_Table_1.Date_Updated
from
VM_Info
left join Change_Table_1 on VM_Info.VM_Unique = Change_Table_1.VM_Unique and VM_Info.Date = Change_Table_1.Date_Updated
where
VM_Info.Agency = 'test'

Dax formula to calculate cumulative students

I am building first cube in SSAS 2012 Tabular modeling. I got one fact table contains following columns
TermDate StudentKey PaperKey marks CumulativeNoOfStudents
20100601 1 1 70 2
20100601 2 1 70 2
20100601 3 1 69 3
20100601 4 2 68 1
Now i need to generate Cumulative Number Of Students (5th column) as an output (calculated column) against each row using DAX.
Can someone help me to build the DAX formula please.
On the basis that your StudentKey is numeric, sequential and unique you can use the following:
=CALCULATE(COUNTROWS(Table), FILTER(Table,Table[StudentKey]<=EARLIER(Table[StudentKey]))
Assuming your table is called 'Table'
HTH
Jacob
on the basis of some assumption like studentkey is numeric and your date table is DimDate with date as unique column, and fact table name as FactStudent can use the below formula also.
Cumalative No of Students :=CALCULATE (CountRows(FactStudent), FILTER(ALL(DimDate[Date]), DimDate[Date] <= MAX(DimDate[Date])))