How many Axis can we use in MDX practically? - ssas

I heard about there are around 128 Axis in MDX.
AXIS(0) or simply 0 – Columns
AXIS(1) or simply 1 – Rows
AXIS(2) or simply 2 – Pages
AXIS(3) or simply 3 – Sections
So far I have used only two of them, Column (0) & Row (1).
I am just curious about
when or why
can I use other MDX Axis ?
As SQL SSMS only supports two Axis, If I am not wrong.

How :
select ... on 0, ... on 1, ... on 2 and so on .... from [cube]
Where :
Any client that will not crash with unexpected result format ;-)
When / Why :
A client could take advantage of several axis for rendering the result in 3D using 3 axis. Even if the the client does not render the result in 3D, it might be interesting to ask the server to return the result split over 3 axis for ad-hoc (or easier) processing.

I do not know of any standard client that supports this.
But a typical application that comes to mind: Some years ago (before I was working with Analysis Services), we had a client requiring one and the same report for ten countries and five markets on fifty PowerPoint slides. If we had used Analysis Services at that time, we might have written a custom client application that uses a four dimensional report and thus can get the data to be put into all fifty PowerPoint slides with a single MDX query.
You need not think of OLAP dimensions as dimensions in space. You also can think of them (as the name aliases suggest) as e. g. pages and chapters.


How to sample rows from a table with a specific probability?

I'm using BigQuery at my new position, and I'm totally new to SQL/BigQuery.
I'm testing a machine learning model and monitoring an A/B test with a different ratio, e.g., 3 vs. 10. To compare the A/B results, e.g., # of page view, I want to make the ratios equal first so that I can compare easily. For example, say we have a table with 13 records (3 are from A and 10 are from B). In addition, each row contains an id field that is identical. What I want to do is to extract only 3 samples out of 10 for B to match the sample number to A.
I'm trying to use the FARM_FINGERPRINT function to map fields to integers. Then I'm taking ABS and then calculating MOD to convert the integer numbers to a specific range, e.g., [0, 10). Eventually, I would like to get 3 in 10 items using the following line:
However, I found that even if I run A/B with exactly the same ML model with different A/B ratio, the result is different between A and B (The results should be same because A and B are running the same ML model with just the different ratio). This made me doubt that the above implementation may bring some biased data sampling. I also read this post and confirmed the FARM_FINGERPRINT might not bring a randomly distributed result.
*There's a critical reason why I cannot simply multiply 3/10 to B, which is confidential and cannot disclose here.
Is there a better way to accomplish the equally distributed sampling?
Thank you in advance. (I'm sorry if the question is vague, as I'm hiding the confidential parts.)

R-error: "number of levels of each grouping factor must be < number of observations"

I'm relatively new to R and have to perform a Linear Mixed Model-Analysis on some data for my university studies.
To describe my data ("data_complete_group"):
I tested flow (variable: "fss_M") for individual soccer players deriving from 4 different teams. Each team (and therefore each player in the related team) was allocated to one of two conditions of the variable "group".
Each person also completed three different surveys on three different days and therefore had a personalized "ID" (which is also a variable in the model).
The variable "team_num" represents the related team for each player.
I now want test whether the group-factor has a significant influence on the flow-score.
The model looks as follows:
model1 <- lmer(fss_M ~ group + (1 + group|team_num/ID), data = data_complete_group)
If I understood it correctly, this means that "fss_M ~ group" is the fixed effect with "1 + group|team_num/ID" as random effect.
Unfortunately I get an error message when I want to run the code:
Eror: number of levels of each grouping factor must be < number of observations (problems: ID:team_num)
In contrast to that, the analysis works when I remove the term for the random effect.
How can I understand this? What's wrong with the code for the analysis with fixed + random effect?
I'm glad for every answer to this, thanks a lot!

Determining which polygon contains the majority of a line - Oracle Spatial

I have an oracle database (11g spatial) that includes a series of area polygons and water mains. I'm trying to attribute each of these mains to the area in which it is contained and for the most part this is straightforward enough (using the SDO_CONTAINS function) but I'm not sure how to deal with mains that straddle multiple polygons due to errors in digitisation.
In cases like this what I'd ideally like to do is attribute a main to an area polygon if the majority of it's length (>50%) is contained within onit. I know that I can use the SDO_RELATE function to determine every polygon that any given main interacts with, but I don't know how to then go about determining how much of it's length is contained within each area.
The principle is like this:
Correlate mains and areas. Assuming you have many mains and many areas, the most efficient approach is to use SDO_JOIN
For each couple (main/area) returned, compute their intersection (SDO_GEM.SDO_INTERSECTION) and measure the length of that intersection (SDO_GEOM.SDO_LENGTH).
From those results, retain the area for each main where the length is the maximum
If you want a full SQL example, allow me a bit of time to write that using sample data.

SDK2 query for counting: which is more efficient?

I have an app that is displaying metrics about defects in a project.
I have the option of making one query that returns all the defects, and from that I can break out about four different metrics (How many defects escaped QA in 90 days, 180 days, and then the same metrics again but only counting sev1/sev2 defects).
I could make four queries and limit the results to one so that I just get a count for each. Or I could make one query that encompass them all (all defects that escaped QA in 180 days) and then count up the difference.
I'm figuring worst case, the number of defects that escaped QA in the last six months will generally be less than 100, certainly less 500 worst case.
Which would you do-- four queryies with one result each, or one single query that on average might return 50, perhaps worst case 500?
And I guess the key question is-- where are the inflections points? Perhaps I have more metrics tomorrow (who knows, 8?) and a different average defect counts. Is there a rule of thumb I could use to help choose which approach?
Well I would probably make the series of four queries and use the result count. If you are expecting 500 defects that will end up being three queries each with 200 defects anyways.
The solution where you do each individual query and use the total result count would be safe with even a very large amount of defects. Plus I usually find it to be a bad plan to think that I know the data sets that an App will be dealing with. Most of my Apps end up living much longer and being used on larger datasets than I intended.
The max page size is 200, so it sounds like you'd be requesting between 1 and 3 pages to get all the data vs. 4 queries with a page size of 1 and using the TotalResultCount...
You'd definitely have less aggregation code to write if you use the multi query approach (letting the server do the counting for you based on your supplied filters).
I'd guess the 4 independent queries might be faster but it would be interesting to hear back your experimental results...

SQL Server 2008+ : Best method for detecting if two polygons overlap?

We have an application that has a database full of polygons (currently stored as points) that a .net app pulls out and checks if they overlap.
I occurred to me that it would be much nicer to convert these point arrays to polygon / polyline objects within the database and use sql to get a bool of weather they overlap or not.
I have seen different methods suggested to do this but non of the examples given were quite in-line with my needs.
I would be very happy to receive input from those kind enough to offer their experience.
In response to questions: It is indeed 2D. and yes any crossover of the two is considered true. The polygons have n points and can be concave. The polygons will be saved as 1 per row (after data conversion task) as polygons (i.e. the polygon type .. it might be called something else spatial / geom my memory is not on my side right now)
You can use .STIntersection with .STAsText() to test for overlapping polygons. (I really hate the terminology Microsoft has used (or whoever set the standard terms). "Touching," in my mind, should be a test for whether or not two geometry/geography shapes overlap at all, not just share a border.)
If #RadiusGeom is a geometry representing a radius from a point, the following will return a list of any two polygons where an intersection (a geometry that represents the area where two geometries overlap) is not empty.
SELECT CT.ID AS CTID, CT.[Geom] AS CensusTractGeom
FROM CensusTracts CT
WHERE CT.[Geom].STIntersection(#RadiusGeom).STAsText() <> 'GEOMETRYCOLLECTION EMPTY'
If your geometry field is spatially indexed, this runs pretty quickly. I ran this on 66,000 US CT records in about 3 seconds. There may be a better way, but since no one else had an answer, this was my attempt at an answer for you. Hope it helps!
Calculate and store the bounding rectangle of each polygon in a set of new fields within the row which is associated with that polygon. (I assume you have one; if not, create one.) When your dotnet app has a polygon and is looking for overlapping polygons, it can fetch from the database only those polygons whose bounding rectangles overlap, using a relatively simple SQL SELECT statement. Those polygons should be relatively few, so this will be efficient. Then, your dotnet app can perform the finer polygon overlap calculations in order to determine which ones of those really overlap.
Okay, I got another idea, so I am posting it as a different answer. I think my previous answer with the bounding polygons probably has some merit on its own, even if it was to reduce the number of polygons fetched from the database by a small percentage, but this one is probably better.
MSSQL supports integration with the CLR since version 2005. This means that you can define your own data type in an assembly, register the assembly with MSSQL, and from that moment on MSSQL will be accepting your user-defined data type as a valid type for a column, and it will be invoking your assembly to perform operations with your user-defined data type.
An example article for this technique on the CodeProject: Creating User-Defined Data Types in SQL Server 2005
I have never used this mechanism, so I do not know details about it, but I presume that you should be able to either define a new operation on your data type, or perhaps overload some existing operation like "less-than", so that you can check if one polygon intersects another. This is likely to speed things up a lot.