MDX (how to have various Date Dimensions in a single X Axis?) - mdx

I have a cube containing facts each containing at least one Date Dimension :
Fact : Fact_Sales , Fact_Leads , Fact_Budget
Dimension Column : FactorDateID , LeadIssueDateID , BudgetDateID
Now I need a Bar-chart with dates in X Axis and measures from all these facts in Y Axis (as Bar Values). Would you please guide me how to do that?
**I am forced to have all facts in a single cube and also i have to deploy different date usages for these different Dates(budgetdate,factordate,...) because they really differ from each other.

Related

data visualization across time dimension

We have
set A
set B
time dimension T
We like to see
A minus B
B minus A
A intersect B
across Time T
So far we only use excel to plot out the period we like to compare.
E.g. let say the grains are one month for the time dimension.
If we are comparing
Between 2019Q1 vs 2019Q2
Between 2019Q2 vs 2019Q3
Between 2019Q3 vs 2019Q4
We would have 9 excel file generated.
Is there a better way to do this ?

Delete duplicate number in text cell (Teradata database) [duplicate]

This question already has an answer here:
Delete duplicates GPS coordinates in column in each row
(1 answer)
Closed 3 years ago.
I have columns in which coordinates are presented in the text format. Each set of coordinates in one cell. All coordinates all coordinates are in one table cell, like text. And i have more than 1000 cells and each contains more than 100 coordinates.
For example:
23.453411011874813 41.74245395132344, 23.453972640029299 41.74214208390741, 23.453977029220994 41.741827739090233, 23.454523642352295 41.741515869012523, 23.441100249526403 41.741203996333724, 23.441661846243466 41.740892121053918,
23.456223434003668 41.74058024317317, 23.441661846243466 41.740892121053918
In the case of repeating coordinates, I need to delete the last of them (bold in the example) and delete the coordinate located between them (italic in the example).
Please tell me how this can be done?
Thanks a lot!
OLAP functions will be your friend.
- ROW_NUMBER() will identify the 2nd, 3rd,... occurences
- with COUNT() OLAP you can identify the double ones
- with CASE and some MAX-ROWS PRECEEDING you can tag the rows between 1st and 2nd
Two crucial questions for the concrete solution, you have to ask:
- by which criteria are your rows ordered (I guess a not shown column with TimeStamps...)
- what happens if a coordinate occurs 3 -times (or even more)? - Delete all between 1st and last or just between 1st and 2nd or always between uneven&even?

How to select columns based on value they contain pandas

I am working in pandas with a certain dataset that describes the population of a certain country per year. The dataset is construed in a weird way wherein the years aren't the columns themselves but rather the years are a value within the first row of the set. The dataset describes every year from 1960 up til now but I only need 1970, 1980, 1990 etc. For this purpose I've created a list with all those years and tried to make a new dataset which is equivalent to the old one but only has the columns that contain a value from said list so I don't have all this extra info I'm not using. Online I can only find instructions for removing rows or selecting by column name, since both these criteria don't apply in this situation I thought i should ask here.
The dataset is a csv file which I've downloaded off some world population site. here a link to a screenshot of the data
As you can see the years are given in scientific notation for some years, which is also how I've added them to my list.
pop = pd.read_csv('./maps/API_SP.POP.TOTL_DS2_en_csv_v2_10576638.csv',
header=None, engine='python', skiprows=4)
display(pop)
years = ['1.970000e+03','1.980000e+03','1.990000e+03','2.000000e+03','2.010000e+03','2.015000e+03', 'Country Name']
pop[pop.columns[pop.isin(years).any()]]
This is one of the things I've tried so far which I thought made the most sense, but I am still very new to pandas so any help would be greatly appreciated.
Using the data at https://data.worldbank.org/indicator/sp.pop.totl, copied into pastebin (first time using the service, so apologies if it doesn't work for some reason):
# actual code using CSV file saved to desktop
#df = pd.read_csv(<path to CSV>, skiprows=4)
# pastebin for reproducibility
df = pd.read_csv(r'https://pastebin.com/raw/LmdGySCf',sep='\t')
# manually select years and other columns of interest
colsX = ['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
'1990', '1995', '2000']
dfX = df[colsX]
# select every fifth year
colsY = df.filter(regex='19|20', axis=1).columns[[int(col) % 5 == 0 for col in df.filter(regex='19|20', axis=1).columns]]
dfY = df[colsY]
As a general comment:
The dataset is construed in a weird way wherein the years aren't the columns themselves but rather the years are a value within the first row of the set.
This is not correct. Viewing the CSV file, it is quite clear that row 5 (Country Name, Country Code, Indicator Name, Indicator Code, 1960, 1961, ...) are indeed column names. You have read the data into pandas in such a way that those values are not column years, but your first step, before trying to subset your data, should be to ensure you have read in the data properly -- which, in this case, would give you column headers named for each year.

Querying a DataFrame values from a specific year

I have a pandas dataframe I have created from weather data that shows the high and low temperatures by day from 2005-2015. I want to be able to query my dataframe such that it only shows the values with the year 2015. Is there any way to do this without first changing the datetime values to only show year (i.e. not making strtime(%y) only first)?
DataFrame Creation:
df=pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv')
df['Date']=pd.to_datetime(df.Date)
df['Date'] = df['Date'].dt.strftime('%m-%d-%y')
Attempt to Query:
daily_df=df[df['Date']==datetime.date(year=2015)]
Error: asks for a month and year to be specified.
Data:
An NOAA dataset has been stored in the file data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv. The data for this assignment comes from a subset of The National Centers for Environmental Information (NCEI) Daily Global Historical Climatology Network (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.
Each row in the assignment datafile corresponds to a single observation.
The following variables are provided to you:
id : station identification code
date : date in YYYY-MM-DD format (e.g. 2012-01-24 = January 24, 2012)
element : indicator of element type
TMAX : Maximum temperature (tenths of degrees C)
TMIN : Minimum temperature (tenths of degrees C)
value : data value for element (tenths of degrees C)
Image of DataFrame:
I resolved this by adding a row with just the year and then querying that way but there has to be a better way to do this?
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%d-%m-%y')
df['Year']=pd.to_datetime(df['Date']).dt.strftime('%y')
daily_df = df[df['Year']=='15']
return daily_df

MDX and SSAS - Subtotal on Filtering in Power BI

I have a sales cube deployed in SSAS that need to be displayed using Power BI. Assume that I have arrange a layout – simulated with Excel - like the following, a dimension (Product - with 5 members) and two Measures (Measures 1 and Measures 2).
Measures 3 is a calculated member
([Measures].[Measures 1], [Product].CurrentMember.Parent)*[Measures].[Measure 2]
If viewed in Excel, the calculation in Excel will be something like this.
And then user can filter Product on any data point, simulated below with A, B, and C as product filtering and produce value below.
This is where I have no clue on how to get the grand total of measures with filters, like the yellow columns. What should I know to achieve this?