Get Distinct User Count Over Particular Time MDX QUERY - ssas

I need some mdx help.
Cube Details:
Measures.Users -> Distinct Count on Users.
I want to find an mdx equivalent of this query:
Select a.shopId , Month(TransactionDate) Month_Transaction,
Year(TransactionDate) Year_Transaction,
count(distinct b.UserID) UniqueUserCount
FROM [dbo].[shop] a
JOIN users b ON a.UserID = b.UserID
where TransactionDate >= '2018-01-01'
Group by a.shopId ,Month(TransactionDate), Year(TransactionDate)
This is what I have so far which produces unique count irrespective of date. I want unique count in the date range. Pls let me know how to achieve this ?
SELECT {
[Date].[Month].&[2020]&[2020-Q3]&[2020-09],
[Date].[Month].&[2020]&[2020-Q4]&[2020-10],
[Date].[Month].&[2020]&[2020-Q4]&[2020-11],
[Date].[Month].&[2020]&[2020-Q4]&[2020-12]
} ON COLUMNS, NON EMPTY
{
[ShopLocations].[Hierarchy].[Shop]
} ON ROWS
FROM [ShopperCube]
where (Measures.Users)

The built-in distinct count measure gives the most flexibility. It sounds like you already have one as Measure.Users? Is the measure group for Users connected to your dimensions for Date and ShopLocations?
To help get that working, I would review the Distinct Count pattern in the Many-to-Many Revolution paper. That approach gives a no-code solution that is more flexible and probably faster to run:
https://sqlbi.com/whitepapers/many2many

You can use the DistinctCount MDX function.
The official documentation is not very clear, but, the general principal is: You pass in a "Set" to this function to get the distinct values.
A sample MDX
WITH SET MySet AS
{ [Dim User].[User Id].Children }
MEMBER Measures.SetDistinctCount AS
DISTINCTCOUNT(MySet)
SELECT { Measures.SetDistinctCount, Measures.Amount } ON 0
, { [Dim Date].[Date Key].AllMembers } ON 1
FROM [Mine]
To validate this, the following is my setup:
Query result
Also, given the sample sql to create table and work with different data:
IF OBJECT_ID('FactTransaction') IS NOT NULL
DROP TABLE FactTransaction
GO
CREATE TABLE FactTransaction (ShopId INT, TransactionDateKey INT, UserId INT, Amount INT)
GO
IF OBJECT_ID('DimDate') IS NOT NULL
DROP TABLE DimDate
GO
CREATE TABLE DimDate(DateKey INT, FullDate DATE)
GO
IF OBJECT_ID('DimUser') IS NOT NULL
DROP TABLE DimUser
GO
CREATE TABLE DimUser(UserId INT, UserName VARCHAR(50))
GO
IF OBJECT_ID('DimShop') IS NOT NULL
DROP TABLE DimShop
GO
CREATE TABLE DimShop(ShopId INT, ShopName VARCHAR(50))
GO
--Shop 1
INSERT INTO FactTransaction values(1, 20210101, 1, 10)
INSERT INTO FactTransaction values(1, 20210101, 2, 5)
INSERT INTO FactTransaction values(1, 20210101, 3, 20)
INSERT INTO FactTransaction values(1, 20210102, 2, 10)
INSERT INTO FactTransaction values(1, 20210102, 4, 15)
INSERT INTO FactTransaction values(1, 20210103, 3, 5)
INSERT INTO FactTransaction values(1, 20210103, 4, 10)
INSERT INTO FactTransaction values(1, 20210103, 5, 20)
INSERT INTO FactTransaction values(1, 20210103, 1, 20)
--Shop 2
INSERT INTO FactTransaction values(2, 20210103, 2, 10)
INSERT INTO FactTransaction values(2, 20210103, 2, 5)
INSERT INTO FactTransaction values(2, 20210103, 2, 20)
GO
INSERT INTO DimDate VALUES(20210101, '2021-01-01')
INSERT INTO DimDate VALUES(20210102, '2021-01-02')
INSERT INTO DimDate VALUES(20210103, '2021-01-03')
GO
INSERT INTO DimUser VALUES(1, 'First')
INSERT INTO DimUser VALUES(2, 'Second')
INSERT INTO DimUser VALUES(3, 'Third')
INSERT INTO DimUser VALUES(4, 'Fourth')
INSERT INTO DimUser VALUES(5, 'Fifth')
GO
INSERT INTO DimShop VALUES(1, 'Shop 1')
INSERT INTO DimShop VALUES(2, 'Shop 2')
GO

Related

Sample observations per group without replacement in SQL

Using the provided table I would like to sample let's say 2 users per day so that users assigned to the two days are different. Of course the problem I have is more sophisticated, but this simple example gives the idea.
drop table if exists test;
create table test (
user_id int,
day_of_week int);
insert into test values (1, 1);
insert into test values (1, 2);
insert into test values (2, 1);
insert into test values (2, 2);
insert into test values (3, 1);
insert into test values (3, 2);
insert into test values (4, 1);
insert into test values (4, 2);
insert into test values (5, 1);
insert into test values (5, 2);
insert into test values (6, 1);
insert into test values (6, 2);
The expected results would look like this:
create table results (
user_id int,
day_of_week int);
insert into results values (1, 1);
insert into results values (2, 1);
insert into results values (3, 2);
insert into results values (6, 2);
You can use window functions. Here is an example . . . although the details do depend on your database (functions for random numbers vary by database):
select t.*
from (select t.*, row_number() over (partition by day_of_week order by random()) as seqnum
from test t
) t
where seqnum <= 2;

How to insert multiple row values into SQL

I have the following 3 tables:
CREATE TABLE Tests (
Test_ID INT,
TestName VARCHAR(50));
INSERT INTO Tests VALUES (1, 'SQL Test');
INSERT INTO Tests VALUES (2, 'C# Test');
INSERT INTO Tests VALUES (3, 'Java Test');
CREATE TABLE Users (
[User_ID] INT,
UserName VARCHAR(50));
INSERT INTO Users VALUES (1, 'Joe');
INSERT INTO Users VALUES (2, 'Jack');
INSERT INTO Users VALUES (3, 'Jane');
CREATE TABLE UserTests (
ID INT,
[User_ID] INT,
Test_ID INT,
Completed INT);
INSERT INTO UserTests VALUES (1, 1, 1, 0);
INSERT INTO UserTests VALUES (2, 1, 2, 1);
INSERT INTO UserTests VALUES (3, 1, 3, 1);
INSERT INTO UserTests VALUES (4, 2, 1, 0);
INSERT INTO UserTests VALUES (5, 2, 2, 0);
INSERT INTO UserTests VALUES (6, 2, 3, 0);
INSERT INTO UserTests VALUES (7, 3, 1, 1);
INSERT INTO UserTests VALUES (8, 3, 2, 1);
INSERT INTO UserTests VALUES (9, 3, 3, 1);
I would like to create some rule/trigger so that when a new user gets added to the Users table, an entry for each Test and that user's Id will get added to the UserTests table.
Something like this if the new user ID is 5:
INSERT dbo.UserTest
(USER_ID, TEST_ID, Completed)
VALUES
(5, SELECT TEST_ID FROM Tests, 0)
That syntax is of course wrong but to give an idea of what I expect to happen.
So I expect that statement to add these values to the UserTests table:
User ID| Test ID| Completed
5 | 1 | 0
5 | 2 | 0
5 | 3 | 0
You can use after trigger for user table.
Create Trigger tr_user on Users
After Insert
AS Begin
INSERT UserTest(USER_ID, TEST_ID, Completed)
Select I.USER_ID, t.TEST_ID, 0
From Inserted I, Tests t
END
Here's a SQL Fiddle that finds missing records and inserts them.
SQL Fiddle
The SELECT:
select u.user_id, t.test_id, 0 as Completed
from users u
cross join tests t
where not exists (
select 1
from usertests ut
where ut.user_id = u.user_id and ut.test_id = t.test_id)
Adding insert into UserTests (User_Id, Test_Id, Completed) before the select will insert these records.
You can add a user id on to the where clause to do it for a single user if required. It is re-runnable so it won't re-insert test ids for a user that already has them, but will add new ones if new tests are introduced.

Update / Merge table with summing values

I have a schema:
http://sqlfiddle.com/#!4/e9917/1
CREATE TABLE test_table (
id NUMBER,
period NUMBER,
amount NUMBER
);
INSERT INTO test_table VALUES (1000, 1, 100);
INSERT INTO test_table VALUES (1000, 1, 500);
INSERT INTO test_table VALUES (1001, 1, 200);
INSERT INTO test_table VALUES (1001, 2, 300);
INSERT INTO test_table VALUES (1002, 1, 900);
INSERT INTO test_table VALUES (1002, 1, 250);
I want to update the amount field by adding amounts of records which has same (id, period) pair. like after op :
ID| period| amount
1000 1 600
1001 1 200
1001 2 300
1002 1 1150
I Couldn't figure out how :(
EDIT:
In actual case this table is populated by insertion operation from other 2 tables:
CREATE TABLE some_table1(
id NUMBER,
period NUMBER,
amount NUMBER
);
INSERT INTO some_table1 VALUES (1000, 1, 100);
INSERT INTO some_table1 VALUES (1000, 1, 500);
INSERT INTO some_table1 VALUES (1001, 1, 200);
INSERT INTO some_table1 VALUES (1001, 2, 300);
INSERT INTO some_table1 VALUES (1002, 1, 900);
INSERT INTO some_table1 VALUES (1002, 1, 250);
CREATE TABLE some_table2(
id NUMBER,
period NUMBER,
amount NUMBER
);
INSERT INTO some_table2 VALUES (1000, 1, 30);
INSERT INTO some_table2 VALUES (1000, 1, 20);
INSERT INTO some_table2 VALUES (1001, 1, 15);
INSERT INTO some_table2 VALUES (1001, 2, 20);
INSERT INTO some_table2 VALUES (1002, 1, 50);
INSERT INTO some_table2 VALUES (1002, 1, 60);
Dublicates occures when two insertions done:
INSERT INTO TEST_TABLE (id,period,amount) SELECT id,period,amount from some_table1
INSERT INTO TEST_TABLE (id,period,amount) SELECT id,period,amount from some_table2
new sqlfiddle link: http://sqlfiddle.com/#!4/cd45b/1
May be it can be solved during insertion from two table..
A script like this would do what you want:
CREATE TABLE test_table_summary (
id NUMBER,
period NUMBER,
amount NUMBER
);
INSERT INTO test_table_summary (id, period, amount)
SELECT id, period, SUM(amount) AS total_amount FROM test_table
GROUP BY id, period;
DELETE FROM test_table;
INSERT INTO test_table (id, period, amount)
SELECT id, period, total_amount FROM test_table_summary;
DROP TABLE test_table_summary;
But you should actually decide if test_table is to have a primary key and the total amount or all the detail data. It's not a good solution to use one table for both.
By what you have added, then I'd say you can use the Oracle MERGE INTO statement:
MERGE INTO test_table t
USING (SELECT id, period, amount FROM some_table1) s
ON (t.id=s.id AND t.period=s.period)
WHEN MATCHED THEN UPDATE SET t.amount=t.amount+s.amount
WHEN NOT MATCHED THEN INSERT (t.id, t.period, t.amount)
VALUES (s.id, s.period, s.amount);
Beware though... this will work only if test_table already has no duplicate id, period rows to begin with. So if your table is already messed up, you still have to reinitialize it properly a first time (and maybe add a unique id, period key to avoid problems in the future).

Is it possible to do this in SQL (summing rows over different criteria per user)

I have an SQL table that looks like this:
CREATE TABLE diet_watch (
entry_date date NOT NULL,
user_id int default 1,
weight double precision NOT NULL
);
INSERT INTO diet_watch VALUES ('2001-01-01', 1, 128.2);
INSERT INTO diet_watch VALUES ('2001-01-02', 1, 121.2);
INSERT INTO diet_watch VALUES ('2001-01-03', 1, 100.6);
INSERT INTO diet_watch VALUES ('2001-01-04', 1, 303.7);
INSERT INTO diet_watch VALUES ('2001-01-05', 1, 121.0);
INSERT INTO diet_watch VALUES ('2001-01-01', 2, 121.0);
INSERT INTO diet_watch VALUES ('2001-01-06', 2, 128.0);
INSERT INTO diet_watch VALUES ('2001-01-07', 2, 138.0);
INSERT INTO diet_watch VALUES ('2001-01-01', 3, 128.2);
INSERT INTO diet_watch VALUES ('2001-01-02', 3, 125.5);
INSERT INTO diet_watch VALUES ('2001-01-03', 3, 112.8);
INSERT INTO diet_watch VALUES ('2001-01-06', 3, 111.2);
I further have this table:
CREATE TABLE summing_period (
user_id INT NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL);
insert into summing_period VALUES (1, '2001-01-01', '2001-01-03');
insert into summing_period VALUES (2, '2001-01-02', '2001-01-06');
insert into summing_period VALUES (3, '2001-01-03', '2001-01-06');
I want to write a query that returns DISTINCT ROWS with the following columns:
the user_id
the sum of the weights in table diet_watch between the specified dates in table summing_period (for the user_id)
So the result of the query based on the data in table summing period should be:
1,350.0
2,128.0
3,224.0
Unfortunately, this time, I have reached the limit of my SQLfu - and I no idea how to even get started in writing the SQL. Ideally, the solution should be ANSI SQL (i.e. db agnostic). however, since I am developing to a PostgreSQL 8.4 backend, if the solution is db centric, it must at least run on PG.
SELECT
sp.user_id, SUM(dw.weight)
FROM
summing_period sp,
diet_watch dw
WHERE
dw.user_id = sp.user_id AND
dw.entry_date >= sp.start_date AND
dw.entry_date <= sp.end_date
GROUP BY
sp.user_id
What this query does is join each diet_watch row to the row in summing_period that matches its user_id and whose date falls in the summing_period's range.
The SELECT then asks for the SUM of the weights for each different user_id (as a result of the GROUP BY user_id).

Oracle PIVOT, twice?

I have been trying to move away from using DECODE to pivot rows in Oracle 11g, where there is a handy PIVOT function. But I may have found a limitation:
I'm trying to return 2 columns for each value in the base table. Something like:
SELECT somethingId, splitId1, splitName1, splitId2, splitName2
FROM (SELECT somethingId, splitId
FROM SOMETHING JOIN SPLIT ON ... )
PIVOT ( MAX(splitId) FOR displayOrder IN (1 AS splitId1, 2 AS splitId2),
MAX(splitName) FOR displayOrder IN (1 AS splitName1, 2 as splitName2)
)
I can do this with DECODE, but I can't wrestle the syntax to let me do it with PIVOT. Is this even possible? Seems like it wouldn't be too hard for the function to handle.
Edit: is StackOverflow maybe not the right Overflow for SQL questions?
Edit: anyone out there?
From oracle-developer.net it would appear that it can be done like this:
SELECT somethingId, splitId1, splitName1, splitId2, splitName2
FROM (SELECT somethingId, splitId
FROM SOMETHING JOIN SPLIT ON ... )
PIVOT ( MAX(splitId) ,
MAX(splitName)
FOR displayOrder IN (1 AS splitName1, 2 as splitName2)
)
I'm not sure from what you provided what the data looks or what exactly you would like. Perhaps if you posted the decode version of the query that returns the data you are looking for and/or the definition for the source data, we could better answer your question. Something like this would be helpful:
create table something (somethingId Number(3), displayOrder Number(3)
, splitID Number(3));
insert into something values (1, 1, 10);
insert into something values (2, 1, 11);
insert into something values (3, 1, 12);
insert into something values (4, 1, 13);
insert into something values (5, 2, 14);
insert into something values (6, 2, 15);
insert into something values (7, 2, 16);
create table split (SplitID Number(3), SplitName Varchar2(30));
insert into split values (10, 'Bob');
insert into split values (11, 'Carrie');
insert into split values (12, 'Alice');
insert into split values (13, 'Timothy');
insert into split values (14, 'Sue');
insert into split values (15, 'Peter');
insert into split values (16, 'Adam');
SELECT *
FROM (
SELECT somethingID, displayOrder, so.SplitID, sp.splitname
FROM SOMETHING so JOIN SPLIT sp ON so.splitID = sp.SplitID
)
PIVOT ( MAX(splitId) id, MAX(splitName) name
FOR (displayOrder, displayOrder) IN ((1, 1) AS split, (2, 2) as splitname)
);