MDX Sum Function - sum

We have this data...
Date PhonesSold
1/1/2011 - 1
1/2/2011 - 3
1/3/2011 - 2
1/4/2011 - 4
Is it possible to create an MDX query that results in the following results?
Date TotalPhonesSold
1/1/2011 - 1
1/2/2011 - 4
1/3/2011 - 6
1/4/2011 - 10
I'm guessing I'll use the SUM function. It has this syntax.
Sum( Set_Expression [ , Numeric_Expression ] )
Now, how do I specify the set?

So you want a running total?
Speed up Running Total MDX calculated measure?
http://www.sqldev.org/sql-server-analysis-services/mdx-running-total-calculated-member-18945.shtml
http://social.msdn.microsoft.com/Forums/en/sqlanalysisservices/thread/100521f2-8533-4fd1-9071-96dc5373572c
This looks like a good one for performance:
http://social.msdn.microsoft.com/Forums/en/sqlanalysisservices/thread/fcd061a7-e941-4c4a-9945-59e1b48c422e

Related

Access Query: Match Two FKs, Select Record with Max (Latest) Time, Return 3d Field From Record

I have an Access table (Logs) like this:
pk
modID
relID
DateTime
TxType
1
1234
22.3
10/1/22 04:00
1
2
1234
23.1
10/10/22 06:00
1
3
1234
23.1
10/11/22 07:00
2
4
1234
23.1
10/12/22 08:00
3
5
4321
22.3
10/2/22 06:00
7
6
4321
23.1
10/10/22 06:00
1
7
4321
23.1
10/11/22 07:30
3
Trying to write a query as part of a function that searches this table:
for all records matching a given modID and relID (e.g. 1234 and 23.1),
picks the most recent one (the MAX of DateTime),
returns the TxType for that record.
However, a bit new to Access and its query structure is vexing me. I landed on this but because I have to include a Total/Aggregate function for TxType I had to either choose Group By (not what I want) or Last (closer, but returns junk results). The SQL for my query is currently:
SELECT Last(Logs.TxType) AS LastOfTxType, Max(Logs.DateTime) AS MaxOfDateTime
FROM Logs
GROUP BY Logs.dmID, Logs.relID
HAVING (((Logs.dmID)=[EnterdmID]) AND ((Logs.relID)=[EnterrelID]));
It returns the TxType field when I pass it the right parameters, but not the correct record - I would like to be rid of the Last() bit but if I remove it Access complains that I don't have it as part of an aggregate function.
Anyone that can point me in the right direction here?
Have you tried
SELECT TOP 1 TxtType
FROM Logs
WHERE (((Logs.dmID)=[EnterdmID]) AND ((Logs.relID)=[EnterrelID]))
ORDER BY DateTime DESC;
That will give you the latest single data row based on your DateTime field and other criteria.

how to handle the missing values like this and date format for regression?

I want to make the regression model from this dataset(first two are dependent variable and last one is dependent variable).I have import dataset using dataset=pd.read_csv('data.csv')
Now I have made model previously also but never have done with date format dataset as independent variable so how should we handle these date format to make the regression model.
also how should we handle 0 value data in given dataset.
My dataset is like:in .csv format:
Month/Day, Sales, Revenue
01/01 , 0 , 0
01/02 , 100000, 0
01/03 , 400000, 0
01/06 ,300000, 0
01/07 ,950000, 1000000
01/08 ,10000, 15000
01/10 ,909000, 1000000
01/30 ,12200, 12000
02/01 ,950000, 1000000
02/09 ,10000, 15000
02/13 ,909000, 1000000
02/15 ,12200, 12000
I don't know to handle this format date and 0 value
Here's a start. I saved your data into a file and stripped all the whitespace.
import pandas as pd
df = pd.read_csv('20180112-2.csv')
df['Month/Day'] = pd.to_datetime(df['Month/Day'], format = '%m/%d')
print(df)
Output:
Month/Day Sales Revenue
0 1900-01-01 0 0
1 1900-01-02 100000 0
2 1900-01-03 400000 0
3 1900-01-06 300000 0
4 1900-01-07 950000 1000000
5 1900-01-08 10000 15000
6 1900-01-10 909000 1000000
7 1900-01-30 12200 12000
8 1900-02-01 950000 1000000
9 1900-02-09 10000 15000
10 1900-02-13 909000 1000000
11 1900-02-15 12200 12000
The year defaults to 1900 since it is not provided in your data. If you need to change it, that's an additional, different question. To change the year, see: Pandas: Change day
import datetime as dt
df['Month/Day'] = df['Month/Day'].apply(lambda dt: dt.replace(year = 2017))
print(df)
Output:
Month/Day Sales Revenue
0 2017-01-01 0 0
1 2017-01-02 100000 0
2 2017-01-03 400000 0
3 2017-01-06 300000 0
4 2017-01-07 950000 1000000
5 2017-01-08 10000 15000
6 2017-01-10 909000 1000000
7 2017-01-30 12200 12000
8 2017-02-01 950000 1000000
9 2017-02-09 10000 15000
10 2017-02-13 909000 1000000
11 2017-02-15 12200 12000
Finally, to find the correlation between columns, just use df.corr():
print(df.corr())
Output:
Sales Revenue
Sales 1.000000 0.953077
Revenue 0.953077 1.000000
How to handle missing data?
There is a number of ways to replace it. By average, by median or using moving average window or even RF-approach (or similar, MICE and so on).
For 'sales' column you can try any of this methods.
For 'revenue' column better not to use any of this especially if you have many missing values (it will harm the model). Just remove rows with missing values in 'revenue' column.
By the way, a few methods in ML accept missing values: XGBoost and in some way Trees/Forests. For the latest ones you may replace zeroes to some very different values like -999999.
What to do with the data?
Many things related to feature engineering can be done here:
1. Day of week
2. Weekday or weekend
3. Day in month (number)
4. Pre- or post-holiday
5. Week number
6. Month number
7. Year number
8. Indication of some factors (for example, if it is fruit sales data you can some boolean columns related to it)
9. And so on...
Almost every feature here should be preprocessed via one-hot-encoding.
And clean from correlations of course if you use linear models.

Checking for maximum value in the same column repeatedly and replacing with new found maximum

I'm trying to find the maximum value in a column and keep the same value until the next max value is found, if found then replace it with the new one, in SQL as data lies in MS SQL Server
Algorithm would be basically keep a global variable for MaxDelay and keep rewriting that with new max or run two 'for loops' one for current row and other for row-1 until next project is found.
Should I be using a mix of recursive query and funtion in SQL?
I tried Lag() but I'm unable to use Max() of Lag() due to windowed functions limitation.
SELECT *,
(select max(v) from (VALUES([Delay],Lag([Delay], 1) OVER(ORDER BY [Milestones], [Project Name])))as value(v)) as [MaxDate]
FROM dbo.[Scenario Testing with Previous Row]
error:
Msg 4108, Level 15, State 1, Line 2
Windowed functions can only appear in the SELECT or ORDER BY clauses.
Would appreciate if someone could give me some pointers.
Project Name Milestones Baseline Date Actual Date Delay Max Delay Worst Case Date
Project 1 MS_1 12/12/2016 15/12/2016 3 3 15/12/2016
Project 1 MS_2 14/12/2016 16/12/2016 2 3 17/12/2016
Project 1 MS_3 31/12/2016 09/01/2017 9 9 09/01/2017
Project 1 MS_4 11/01/2017 12/01/2017 1 9 20/01/2017
Project 1 MS_5 21/01/2017 24/01/2017 3 9 30/01/2017
Project 1 MS_6 01/02/2017 15/02/2017 14 14 15/02/2017
Project 1 MS_7 15/02/2017 16/02/2017 1 14 01/03/2017
Project 1 MS_8 26/02/2017 26/02/2017 0 14 12/03/2017
Project 1 MS_9 31/03/2017 31/03/2017 0 14 14/04/2017
If you want a cumulative max, then use the appropriate function. Something like this:
select t.*,
max(delay) over (order by [Milestones], [Project Name]) as running_max
from dbo.[Scenario Testing with Previous Row] t;

Max date among records and across tables - SQL Server

I tried max to provide in table format but it seem not good in StackOver, so attaching snapshot of the 2 tables. Apologize about the formatting.
SQL Server 2012
**MS Table**
**mId tdId name dueDate**
1 1 **forecastedDate** 1/1/2015
2 1 **hypercareDate** 11/30/2016
3 1 LOE 1 7/4/2016
4 1 LOE 2 7/4/2016
5 1 demo for yy test 10/15/2016
6 1 Implementation – testing 7/4/2016
7 1 Phased Rollout – final 7/4/2016
8 2 forecastedDate 1/7/2016
9 2 hypercareDate 11/12/2016
10 2 domain - Forte NULL
11 2 Fortis completion 1/1/2016
12 2 Certification NULL
13 2 Implementation 7/4/2016
-----------------------------------------------
**MSRevised**
**mId revisedDate**
1 1/5/2015
1 1/8/2015
3 3/25/2017
2 2/1/2016
2 12/30/2016
3 4/28/2016
4 4/28/2016
5 10/1/2016
6 7/28/2016
7 7/28/2016
8 4/28/2016
9 8/4/2016
9 5/28/2016
11 10/4/2016
11 10/5/2016
13 11/1/2016
----------------------------------------
The required output is
1. Will be passing the 'tId' number, for instance 1, lets call it tid (1)
2. Want to compare tId (1)'s all milestones (except hypercareDate) with tid(1)'s forecastedDate milestone
3. return if any of the milestone date (other than hypercareDate) is greater than the forecastedDate
The above 3 steps are simple, but I have to first compare the milestones date with its corresponding revised dates, if any, from the revised table, and pick the max date among all that needs to be compared with the forecastedDate
I managed to solve this. Posting the answer, hope it helps aomebody.
//Insert the result into temp table
INSERT INTO #mstab
SELECT [mId]
, [tId]
, [msDate]
FROM [dbo].[MS]
WHERE ([msName] NOT LIKE 'forecastedDate' AND [msName] NOT LIKE 'hypercareDate'))
// this scalar function will get max date between forecasted duedate and forecasted revised date
SELECT #maxForecastedDate = [dbo].[fnGetMaxDate] ( 'forecastedDate');
// this will get the max date from temp table and compare it with forecasatedDate/
SET #maxmilestoneDate = (SELECT MAX(maxDate)
FROM ( SELECT ms.msDueDate AS dueDate
, mr.msRevisedDate AS revDate
FROM #mstab as ms
LEFT JOIN [MSRev] as mr on ms.msId = mr.msId
) maxDate
UNPIVOT (maxDate FOR DateCols IN (dueDate, revDate))up );

Counting Instances of Unique Value in Field

Suppose you have a table in SQL:
Prices
------
13.99
14.00
52.00
52.00
52.00
13.99
How would you count the amount of times a DIFFERENT field has been entered in? Therefore an example of such a count would output:
13.99 - 2 times.
14.00 - 1 times.
52.00 - 3 times.
OR perhaps:
3 (i.e. 13.99, 14.00, 52.00)
Can anyone advise? Cheers.
How about:
SELECT Prices, COUNT(*) FROM TheTable GROUP BY Prices
Can't say I've tried it on MySql, but I'd expect it to work...