Named set vs 'grouped set' in MDX - ssas

To get a named set, for example the years between 2015-2018 (four values -- 2015, 2016, 2017, 2018) I can do:
WITH set [2015-2018] as {[Season].[Season].[Season].&[2015]:[Season].[Season].[Season].&[2018]}
SELECT {
[Measures].[Wins]
} ON COLUMNS, {
[2015-2018]
} ON ROWS
FROM [Nfl2]
And it gives me:
Wins
2015 256
2016 254
2017 256
2018 254
How would I create a single grouped set (or whatever it is termed in MDX) where I could do the following:
Wins
2015 256
2016 254
2017 256
2018 254
2015-2018 1020 *** how to do this in mdx?

In Mdx we don't say Grouped set, but members, your aim is to create a member that holds the sum of wins from 2015 to 2018. In order to achieve that you need to specify an aggregation function for the calculated member [2015-2018] like this:
WITH
Member [Season].[Season].[All].[2015-2018] AS SUM({[Season].[Season].&[2015]:[Season].[Season].&[2018]})
set [2015-2018-All] as {[Season].[Season].&[2015]:[Season].[Season].&[2018],[2015-2018] }
SELECT {
[Measures].[Wins]
} ON COLUMNS, {
[2015-2018-All]
} ON ROWS
FROM [Nfl2]

Related

Different filters for rows and columns in MDX

I have the following MDX query
SELECT
{[Year].[2020],[Year].[2019],[Year].[2020]} on 0,
{[Sales].[GER],[Position].[EU],[Position].[US],[Position].[BL]} on 1
FROM
[DB]
WHERE ([Period].[FULL],[Content].[ALL],[CUR].[EUR])
Returning this table:
YEAR 2020 (€) 2019 (€) 2020 (€)
Position
GER
EU
US
BL
However, for each row and column I want to apply a fiter. For columns, I want the currency to change and for rows I want the Period to change.
My table should look therefore like this:
YEAR 2020 ($) 2019 (€) 2020 (€)
Position
GER (YE)
EU (YB)
US (YE)
BL (YB)
I tried using subselect and filter but it did not work.
Any ideas?
The rows and columns axis are defined by sets. A set has tuples. A tuple has members. So we created a columns axis with a set of 3 tuples. Each tuple specifies two members: the year and the currency. That’s how to apply a different filter to each column.
SELECT
{
([Year].[2020] ,[CUR].[USD]),
([Year].[2019] ,[CUR].[EUR]),
([Year].[2020] ,[CUR].[EUR])
} on 0,
{[Sales].[GER],[Position].[EU],[Position].[US],[Position].[BL]} on 1
FROM
[DB]
WHERE ([Period].[FULL],[Content].[ALL])

How to group years in decades in sqlite3 in jupyter notebook?

I'm suppose to Find the decade D with the largest number of films and the total number of films in D. A decade is a sequence of 10 consecutive years. For example, say in your database you have movie information starting from 1965. Then the first decade is 1965, 1966, ..., 1974; the second one is 1967, 1968, ..., 1976 and so on.
I'm suppose to implement this in jupyter note book where I imorpted sqlite3
I wrote the following code for it.
Select count(*) as total_films,concat(decade,'-',decade+9)
FROM (Select floor(YEAR('year')/10)*10 as decade FROM movie) t
GROUP BY decade
Order BY total_films desc;
However, the notebook threw error like "no such function: floor" and "no such function: Year" and no such function: concat"
Therefore, after going through sqlite documentation I changed code to
Select count(*) as total_films,decade||'-'||decade+9
FROM (Select cast(strftime('%Y',year)/10 as int)*10 as decade FROM movie) t
GROUP BY decade
Order BY total_films desc;
However, I got an incorrect output :
count(*) decade||'-'||decade+9
0 117 NaN
1 3358 -461.0
Would appreciate insights on why this is happening.
Updating question after going through comments by c.Perkins
1) I began, checking the type of year column
using the query PRAGMA table_info(movie)
Got the following result
cid name type notnull dflt_value pk
0 0 index INTEGER 0 None 0
1 1 MID TEXT 0 None 0
2 2 title TEXT 0 None 0
3 3 year TEXT 0 None 0
4 4 rating REAL 0 None 0
5 5 num_votes INTEGER 0 None 0
Since the year column is of the type text I changed to int using the cast function and check for nulls or NaN SELECT CAST(year as int) as yr FROM MOVIE WHERE yr is null
I didn't get any results, therefore it appears there are no nulls. However, on using the query SELECT CAST(year as int) as yr FROM MOVIE order by yr asc I see a lot of zeros in the year column
yr
0 0
1 0
2 0
3 0
4 0
-
-
-
-
3445 2018
3446 2018
3447 2018
3448 2018
3449 2018
3450 2018
From the above we see that the year is given as it is and in another stamp, therefore using strftime('%Y', year) did not yield result as mentioned in the comment.
Therefore, keeping all the above in mind, I changed the inner query to
SELECT (CAST( (year/10) as int) *10) as decade FROM MOVIE WHERE decade!=0 order by decade asc
Output for the above query :
decade
0 1930
1 1930
2 1930
3 1930
4 1930
5 1930
6 1940
7 1940
8 1940
-
-
-
3353 2010
3354 2010
3355 2010
3356 2010
3357 2010
Finally, placing this inner query, in the first query I wrote above
Select count(*) as total_films,decade||'-'||decade+9 as period
FROM (SELECT (CAST( (year/10) as int) *10) as decade FROM MOVIE WHERE decade!=0 order by decade asc)
GROUP BY decade
Output :
total_films period
0 6 1939
1 12 1949
2 71 1959
3 145 1969
4 254 1979
5 342 1989
6 551 1999
7 959 2009
8 1018 2019
As far as I can see the only issue is with period column where instead of showing 1930-1939 it is only showing 1939 and so on, if is using || is not right, is there anythother function that could be used ? because concat is not working.
Thanks in advance.
Pending updates to the question as requested in comments, here are a few immediate points that might help solve the problem without having all details:
Does the movie.year column contain null values? Likewise non-numeric or non-date values? the NaN (Not A Number) result likely indicates a null/invalid data in the source. (Technically there is no such NaN value in SQLite, so I'm assuming that the question data is copied form some other data grid or processed output.)
What type of data is in the column movie.year? Does it contain full ISO-8601 date strings or a Julian-date numeric value? Or does it only contain the year (as the column name implies)? If it only contains the year (as a string or integer), then the function call like strftime('%Y', year) will NOT return what you expect and is unnecessary. Just refer to the column directly.
I suspect this is where the -461.0 is coming from.
The operator / is an "integer division" operator if both operands are integers. A valid isolated year value will be an integer and the literal 10 is of course an integer, so integer division will automatically drop any decimal part and returns only the integer part of the division without having to explicitly cast to integer.
According to sqlite docs, the concatenation operator || has highest precedence. That means in the expression decade||'-'||decade+9, the concatenation is applied first so that one possible intermediate would be '1930-1930'+9. (Technically I would consider this result undefined since the string value does not contain a basic data type. In practices on my system, the string is apparently interpreted as 1930 and the overall result is the integer value 1939. Either way you will get unexpected bogus results rather than the desired string.)

Combining various results from various queries/tables/views in summary table

I've been pulling data from several different databases to create a summarised table of information all hinging on specific columns that they would have in common.
All of the tables have 3 columns in common:
Year
Month
Client
Other than this, they are a mixture of counts,sums,calculations and just general queries on various aspects of a client. I'm trying to map out a basic summary page on how each client is. My dream was to pull all of this into a centralised DB, with detailed information intact into tables. Then to have a series of views on each of these to summarise these tables 1 view per table. Then to have a summary table/view grouping all the views by year/month/client.
However i'm struggling to put everything together, I've got raw data in the tables like.
Ordernumber / Lines/ Client/Year/Month
with the view doing:
Count of orders / sum of lines / client/year/month.
However due to the variation with the views I can't do something like a UNION.
Example data (of the views)
View1
Year Month Count Sum ClientCode
2017 May 18 146 A
2017 May 7 110 B
2017 May 2 17 C
View2
Year Month CountOfOrders CountOfFiles SumOfLines ClientCode
2017 May 8 2 140 A
2017 May 7 6 25 B
Dream goal would be:
Year Month ClientCode Count Sum CountOfOrders CountOfFiles SumOfLines
2017 May A 18 146 8 2 140
2017 May B 7 110 7 6 25
2017 May C 2 17 0 0 0
Any advice would be great, I've tried doing a UNIONALL, so that I could do WHERE ALL_TABLES = Year 2017, Month = May. But realised that UNION's won't work as they merge rows now columns.
You can join views just like tables... Seems a LEFT JOIN is what you want here, with COALESCE() to handle nulls:
SELECT V1.Year, V1.Month, V1.ClientCode, V1.Count, V1.Sum,
COALESCE(CountOfOrders,0) COALESCE(CountOfFiles,0) COALESCE(SumOfLines,0)
FROM View1 V1
LEFT JOIN View2 V2 ON V1.Year = V2.Year
AND V1.Month = V2.Month
AND V1.ClientCode = V2.ClientCode
Only thing to note, you will need more logic if not there are Year/Month/ClientCode combinations that exist in View2, and aren't in View1.

SQl Use IIF to compare 2 values in same row

I am using SQL in MS Access and need to compare 2 values in a row to determine the result of a 3rd field.
The "Yearmove" field does not have a value in all cases. Here is the test code I am running
SELECT z.SCHOOL, z.Year as Year, ProgBuild.BldgNo, ProgMoveBuild.yearmove AS Yearmove,
IIF([Year]=[Yearmove], 1, 0) AS Result
The results are fine in the cases where "Yearmove" is blank, and show 0 as expected. Anyplace "Yearmove" has a number the Result field reads "#ERROR", so both when the condition is met and when it is not. I have tried populating the "YEAR" field with both text (e.g. "2014") and non-text (e.g. 2014) but get the same result. All data is numeric.
Here is what a section of the results look like:
SCHOOL Year BldgNo Yearmove Result
254 2014 254 0
256 2014 256 0
260 2014 260 0
261 2014 261 0
262 2014 202 0
301 2014 301 0
307 2014 307 2019 #ERROR
313 2014 313 2019 #ERROR
314 2014 314 0
321 2014 321 0
322 2014 322 0
This query throws an error, "Data type mismatch in criteria expression":
SELECT IIf('2014'=2014, 1, 0)
So I suspect your Year and yearmove fields are different datatypes: one is numeric; and the other is text.
Find out what datatypes you're dealing with ...
SELECT
z.Year,
TypeName(z.Year) AS TypeOfYear,
ProgMoveBuild.yearmove,
TypeName(ProgMoveBuild.yearmove) AS TypeOfyearmove
FROM [whatever you have now as your data source(s)]
If changing the datatype of one of those fields not practical, you can cast a field's values to another type for the IIf condition in your query. CStr() will cast a numeric value to a string. And Val() will cast a string value to a number. However, either of those functions will complain when you give them Nulls. So you can use Nz() to substitute something else for Null before you feed the value to CStr() or Val().

Find Best-Fit-Line and Correlation Between Two Tables in Microsoft Access?

I have two queries in a Microsoft Access database. They are named Average and Home_Runs. They both share the same first three columns Name, [Year] and Month.
Query: Average
Name Year Month Average
Cabrera 2013 5 .379
Fielder 2013 5 .245
Martinez 2013 5 .235
Cabrera 2013 6 .378
Fielder 2013 6 .278
Martinez 2013 6 .240
Query: Home_Runs
Name Year Month Home Runs
Cabrera 2013 5 12
Fielder 2013 5 2
Martinez 2013 5 2
Cabrera 2013 6 9
Fielder 2013 6 4
Martinez 2013 6 4
I need to offset the data before I begin the calculations. I need to determine how the Home Runs from one month relate the the Average from the previous month. So it is not a direct month-to-month comparison. I need to perform a month-to-previous-month comparison.
I need to do two calculate two things from these two queries.
First: With Average being the X-axis and Home_Runs being the Y-Axis. I need to find the correlation between these data points.
Second: With Average being the X-axis and Home_Runs being the Y-Axis. I need to find the equation of the best-fit-line between all of these data points. More specifically I need to find the value of the Y variable when the X variable equals certain values.
Additional Information:
In the end I need to return a table that looks like this:
Calculation Tier 1 Tier 2 Tier 3 Correlation
Avgerage to Home Runs .04 3.00 6.00 .80
What is the best way to accomplish these things?
Here is the SQL Fiddle example for you to play with and tweak to get it exactly right:
SELECT (Avg(A.Paverage * H.HomeRuns) - Avg(A.Paverage) * Avg(H.HomeRuns)) /
(StDevP(A.Paverage) * StDevP(H.HomeRuns)) AS Correlation,
(Sum(A.Paverage * H.HomeRuns) - (Sum(A.Paverage) * Sum(H.HomeRuns) /
Count(*))) / (Sum(A.Paverage * A.Paverage) - (Sum(A.Paverage) * Sum(A.Paverage) / Count(*))) AS LineBestFit
FROM Averages AS A
INNER JOIN Home_Runs AS H
ON (A.Pname = H.Pname)
AND (A.Pyear = H.Pyear)
AND ((A.Pmonth - 1) = H.Pmonth)