How to use the equivalent SQL with sub query and where clause in SSAS MDX Cube to create a measure? - ssas

I'm trying to "convert" a SQL query to SSAS MDX expression to create a measure in SSAS Multidimensional existent cube.
In SQL I have the table below:
Product
ProductCode
OperatorName
Step
StepSequence
StepDate
StepTimeinDays
Chair
Black_Chair
Joseph Smith
0_Start
0
02/05/2019
0
Chair
Black_Chair
Joseph Smith
1_Material
1
02/05/2019
0
Chair
Black_Chair
Joseph Smith
2_Saw
2
02/05/2019
1
Chair
Black_Chair
Joseph Smith
3_Build
3
03/05/2019
1
Chair
Black_Chair
Joseph Smith
4_Paint
4
04/05/2019
0
Chair
Black_Chair
Joseph Smith
5_Varnish
5
04/05/2019
0
Chair
Black_Chair
Joseph Smith
6_Packing
6
04/05/2019
1
Table
Table_2350
John Doe
0_Start
0
02/05/2019
0
Table
Table_2350
John Doe
1_Material
1
02/05/2019
0
Table
Table_2350
John Doe
3_Build
3
02/05/2019
0
Table
Table_2350
John Doe
4_Paint
4
02/05/2019
0
Table
Table_2350
John Doe
5_Varnish
5
02/05/2019
1
Table
Table_2350
John Doe
6_Packing
6
03/05/2019
1
Chair
Black_Chair
Joseph Doe
0_Start
0
03/05/2019
0
Chair
Black_Chair
Joseph Doe
1_Material
1
03/05/2019
1
Chair
Black_Chair
Joseph Doe
2_Saw
2
04/05/2019
0
Chair
Black_Chair
Joseph Doe
3_Build
3
04/05/2019
1
Chair
Black_Chair
Joseph Doe
4_Paint
4
05/05/2019
1
Chair
Yellow_Chair
Joseph Smith
0_Start
0
04/05/2019
0
Chair
Yellow_Chair
Joseph Smith
1_Material
1
04/05/2019
1
Chair
Yellow_Chair
Joseph Smith
3_Build
3
05/05/2019
0
Chair
Yellow_Chair
Joseph Smith
4_Paint
4
05/05/2019
1
Chair
Black_Chair
John Doe
0_Start
0
04/05/2019
0
Chair
Black_Chair
John Doe
1_Material
1
04/05/2019
1
Chair
Black_Chair
John Doe
3_Build
3
05/05/2019
0
Chair
Black_Chair
John Doe
4_Paint
4
05/05/2019
1
In SQL running the query ( provided by Dale in other question specific to SQL ) below:
select Step
, sum(StepTimeinDays) as SumofStepTimeinDays
from dbo.build_process t1
where exists (
select 1
from dbo.build_process t2
where t1.ProductCode = t2.ProductCode and t1.OperatorName = t2.OperatorName
and t2.Step = '2_Saw'
)
group by Step
order by Step asc;
The return filter the table only with "2_Saw" present:
Product
ProductCode
OperatorName
Step
StepSequence
StepDate
StepTimeinDays
Chair
Black_Chair
Joseph Smith
0_Start
0
02/05/2019
0
Chair
Black_Chair
Joseph Smith
1_Material
1
02/05/2019
0
Chair
Black_Chair
Joseph Smith
2_Saw
2
02/05/2019
1
Chair
Black_Chair
Joseph Smith
3_Build
3
03/05/2019
1
Chair
Black_Chair
Joseph Smith
4_Paint
4
04/05/2019
0
Chair
Black_Chair
Joseph Smith
5_Varnish
5
04/05/2019
0
Chair
Black_Chair
Joseph Smith
6_Packing
6
04/05/2019
1
Chair
Black_Chair
Joseph Doe
0_Start
0
03/05/2019
0
Chair
Black_Chair
Joseph Doe
1_Material
1
03/05/2019
1
Chair
Black_Chair
Joseph Doe
2_Saw
2
04/05/2019
0
Chair
Black_Chair
Joseph Doe
3_Build
3
04/05/2019
1
Chair
Black_Chair
Joseph Doe
4_Paint
4
05/05/2019
1
The final result of this SQL query is ( this is tha same result that I need to get in SSAS cube ):
Step
Sum of StepTimeinDays
0_Start
0
1_Material
1
2_Saw
1
3_Build
2
4_Paint
1
5_Varnish
0
6_Packing
1
For example, for the Total Percent measure I used the Expression below and works fine.
([Measures].[StepTimeinDays],[Process].[Step].CurrentMember)/([Measures].[StepTimeinDays],[Process].[Step].[All])
Now the same "table" are in a SSAS multidimensional cube, I'm trying to get the same result of SQL query in the SSAS expression field to create a measure without success.
I tried to use MDX expression and the more close that I have is below:
([Measures].[StepTimeinDays],[Process].[Step].[All],[Process].[Step].&[2_Saw])
But this is returning wrong value. Other expressions with parent returned syntax errors.

This was solved in another forum, but the measure could be created without curly braces to get set of data:
sum ( NonEmpty ( [Product].[OperatorName].[OperatorName] * [Product].[ProductCode].[ProductCode], ([Measures].[StepTimeinDays], [Product].[Step].&[2_Saw]) ), [Measures].[StepTimeinDays] )

Related

How to put data from a specific column into an one unique row?

I have a table named dbo.build_main_process with data such as a transaction table with three or can be four different date for the same record. Sometimes the date retun null.
Product
ProductCode
OperatorName
Step
StepDate
BuildLocatoin
Chair
Black_Chair
Joseph Smith
0_Start
2019-05-02
ZA
Chair
Black_Chair
Joseph Smith
1_Material
2019-05-03
ZA
Chair
Black_Chair
Joseph Smith
3_Build
2019-05-06
ZA
Table
Table_2350
John Doe
0_Start
2019-05-03
MX
Table
Table_2350
John Doe
1_Material
2019-05-06
MX
Table
Table_2350
John Doe
3_Build
2019-05-07
MX
Chair
Black_Chair
Joseph Doe
0_Start
2019-05-03
AR
Chair
Black_Chair
Joseph Doe
1_Material
null
AR
Chair
Black_Chair
Joseph Doe
3_Build
2019-05-06
AR
Chair
Yellow_Chair
Joseph Smith
0_Start
2019-05-07
ZA
Chair
Yellow_Chair
Joseph Smith
1_Material
2019-05-08
ZA
Chair
Yellow_Chair
Joseph Smith
3_Build
null
ZA
Chair
Black_Chair
John Doe
0_Start
null
MX
Chair
Black_Chair
John Doe
1_Material
2019-05-07
MX
Chair
Black_Chair
John Doe
3_Build
2019-05-09
MX
I need to select data kind a "Excel pivot" for date in columns with unique rows per record:
Product
ProductCode
OperatorName
0_Start_Date
1_Material_Date
3_Build_Date
BuildLocatoin
Chair
Black_Chair
Joseph Smith
2019-05-02
2019-05-03
2019-05-06
ZA
Table
Table_2350
John Doe
2019-05-03
2019-05-06
2019-05-07
MX
Chair
Black_Chair
Joseph Doe
2019-05-03
null
2019-05-06
AR
Chair
Yellow_Chair
Joseph Smith
2019-05-07
2019-05-08
null
ZA
Chair
Black_Chair
John Doe
null
2019-05-07
2019-05-09
MX
I'm trying to use "join" in the same table to get unique records with date in columns:
select
t1.Product
,t1.ProductCode
,t1.OperatorName
,case when t1.Step = '0_Start' then StepDate end as 0_Start_Date
,case when t2.Step = '1_Material' then StepDate end as 1_Material_Date
,case when t3.Step = '3_Build' then StepDate end as 3_Build_Date
,BuildLocatoin
from
dbo.build_main_process as t1
inner join dbo.build_main_process as t2
on t1.ProductCode = t2.ProductCode
and t1.OperatorName = t2.OperatorName
inner join dbo.build_main_process as t3
on t1.ProductCode = t3.ProductCode
and t1.OperatorName = t3.OperatorName
go
I'm not sure why the result not show unique rows, my result is very similar to this:
Product
ProductCode
OperatorName
0_Start_Date
1_Material_Date
3_Build_Date
BuildLocatoin
Chair
Black_Chair
Joseph Smith
null
null
null
ZA
Chair
Black_Chair
Joseph Smith
2019-05-02
2019-05-03
null
ZA
Chair
Black_Chair
Joseph Smith
2019-05-02
2019-05-03
2019-05-06
ZA
Chair
Black_Chair
Joseph Smith
2019-05-02
2019-05-03
null
ZA
Chair
Black_Chair
Joseph Smith
null
2019-05-03
null
ZA
Chair
Black_Chair
Joseph Smith
null
2019-05-03
null
ZA
Chair
Black_Chair
Joseph Smith
null
2019-05-03
null
ZA
Table
Table_2350
John Doe
null
null
null
MX
Table
Table_2350
John Doe
2019-05-03
null
null
MX
Table
Table_2350
John Doe
2019-05-03
2019-05-06
2019-05-07
MX
Table
Table_2350
John Doe
2019-05-03
2019-05-06
2019-05-07
MX
Table
Table_2350
John Doe
2019-05-03
null
2019-05-07
MX
Table
Table_2350
John Doe
null
null
null
MX
Chair
Black_Chair
Joseph Doe
null
null
null
AR
Chair
Black_Chair
Joseph Doe
2019-05-03
null
null
AR
Chair
Black_Chair
Joseph Doe
2019-05-03
null
2019-05-06
AR
Chair
Black_Chair
Joseph Doe
2019-05-03
null
2019-05-06
AR
Chair
Black_Chair
Joseph Doe
2019-05-03
null
2019-05-06
AR
Chair
Black_Chair
Joseph Doe
null
null
null
AR
Chair
Black_Chair
Joseph Doe
null
null
null
AR
Chair
Yellow_Chair
Joseph Smith
null
null
null
ZA
Chair
Yellow_Chair
Joseph Smith
null
null
null
ZA
Chair
Yellow_Chair
Joseph Smith
null
null
null
ZA
Chair
Yellow_Chair
Joseph Smith
2019-05-07
null
null
ZA
Chair
Yellow_Chair
Joseph Smith
2019-05-07
2019-05-08
null
ZA
Chair
Yellow_Chair
Joseph Smith
2019-05-07
2019-05-08
null
ZA
Chair
Yellow_Chair
Joseph Smith
null
null
null
ZA
Chair
Yellow_Chair
Joseph Smith
null
2019-05-08
null
ZA
Chair
Yellow_Chair
Joseph Smith
null
null
null
ZA
Chair
Black_Chair
John Doe
null
2019-05-07
2019-05-09
MX
Chair
Black_Chair
John Doe
null
2019-05-07
null
MX
Chair
Black_Chair
John Doe
null
2019-05-07
2019-05-09
MX
Chair
Black_Chair
John Doe
null
2019-05-07
2019-05-09
MX
Chair
Black_Chair
John Doe
null
2019-05-07
null
MX
Below the DDL:
---Create table
create table [dbo].[build_process](
[Product] varchar(50) NOT NULL,
[ProductCode] varchar(50) NOT NULL,
[OperatorName] varchar(50) NOT NULL,
[Step] varchar(15) NULL,
[StepDate] date NULL,
[StepTimeinDays] int NULL,
[BuildLocation] varchar(2) NULL
) on [primary]
go
---Populate table
insert into [dbo].[build_main_process] values
('Chair','Black_Chair','Joseph Smith','0_Start','2019-05-02','ZA'),
('Chair','Black_Chair','Joseph Smith','1_Material','2019-05-03','ZA'),
('Chair','Black_Chair','Joseph Smith','3_Build','2019-05-06','ZA'),
('Table','Table_2350','John Doe','0_Start','2019-05-03','MX'),
('Table','Table_2350','John Doe','1_Material','2019-05-06','MX'),
('Table','Table_2350','John Doe','3_Build','2019-05-07','MX'),
('Chair','Black_Chair','Joseph Doe','0_Start','2019-05-03','AR'),
('Chair','Black_Chair','Joseph Doe','1_Material',null,'AR'),
('Chair','Black_Chair','Joseph Doe','3_Build','2019-05-06','AR'),
('Chair','Yellow_Chair','Joseph Smith','0_Start','2019-05-07','ZA'),
('Chair','Yellow_Chair','Joseph Smith','1_Material','2019-05-08','ZA'),
('Chair','Yellow_Chair','Joseph Smith','3_Build',null,'ZA'),
('Chair','Black_Chair','John Doe','0_Start',null,'MX'),
('Chair','Black_Chair','John Doe','1_Material','2019-05-07','MX'),
('Chair','Black_Chair','John Doe','3_Build','2019-05-09','MX')
go
What you want is a PIVOT. I believe the below query does what you require.
-- implicit grouping
SELECT piv.Product, piv.ProductCode, piv.OperatorName, piv.BuildLocation, piv.[0_Start], piv.[1_Material], piv.[3_Build]
FROM #build_process BP
PIVOT (
MAX(StepDate)
FOR Step IN ([0_Start], [1_Material], [3_Build])
) piv
It's worth noting that there is an implicit grouping in a pivot by all non-pivoted columns. In this case, we group by StepTimeInDays but do not show it. For your data schema, assuming valid data, this doesn't seem significant. However, we can make the grouping explicit as below. This is sometimes considered a best practice when working with pivots to make it more obvious how your data ends up grouped.
-- explicit grouping
SELECT piv.Product, piv.ProductCode, piv.OperatorName, piv.BuildLocation, piv.[0_Start], piv.[1_Material], piv.[3_Build]
FROM (
SELECT BP.Product, BP.ProductCode, BP.OperatorName, BP.BuildLocation, BP.Step, MAX(BP.StepDate) StepDate
FROM #build_process BP
GROUP BY BP.Product, BP.ProductCode, BP.OperatorName, BP.BuildLocation, BP.Step
) grouped
PIVOT (
MAX(StepDate)
FOR Step IN ([0_Start], [1_Material], [3_Build])
) piv
I changed your query a bit and got the result below
SELECT t1.product,
t1.productcode,
t1.operatorname,
CASE
WHEN t1.step = '0_Start' THEN t1.stepdate
END AS "0_Start_Date",
CASE
WHEN t1.step = '1_Material' THEN t1.stepdate
END AS "1_Material_Date",
CASE
WHEN t1.step = '3_Build' THEN t1.stepdate
END AS "3_Build_Date",
t1.buildlocation
FROM dbo.build_main_process AS t1
INNER JOIN dbo.build_main_process AS t2
ON t1.productcode = t2.productcode
AND t1.operatorname = t2.operatorname
AND t1.step = t2.step
AND t1.stepdate = t2.stepdate
AND t1.buildlocation = t2.buildlocation

Sum of days in specific when a specific row is present

I have a table named dbo.build_process. Usually I need to sum the Step Time in Days grouped by Step and this is simple. So, now was requested to to the same but only when step "2_Saw" is available.
Product
ProductCode
OperatorName
Step
StepSequence
StepDate
StepTimeinDays
Chair
Black_Chair
Joseph Smith
0_Start
0
02/05/2019
0
Chair
Black_Chair
Joseph Smith
1_Material
1
02/05/2019
0
Chair
Black_Chair
Joseph Smith
2_Saw
2
02/05/2019
1
Chair
Black_Chair
Joseph Smith
3_Build
3
03/05/2019
1
Chair
Black_Chair
Joseph Smith
4_Paint
4
04/05/2019
0
Chair
Black_Chair
Joseph Smith
5_Varnish
5
04/05/2019
0
Chair
Black_Chair
Joseph Smith
6_Packing
6
04/05/2019
1
Table
Table_2350
John Doe
0_Start
0
02/05/2019
0
Table
Table_2350
John Doe
1_Material
1
02/05/2019
0
Table
Table_2350
John Doe
3_Build
3
02/05/2019
0
Table
Table_2350
John Doe
4_Paint
4
02/05/2019
0
Table
Table_2350
John Doe
5_Varnish
5
02/05/2019
1
Table
Table_2350
John Doe
6_Packing
6
03/05/2019
1
Chair
Black_Chair
Joseph Doe
0_Start
0
03/05/2019
0
Chair
Black_Chair
Joseph Doe
1_Material
1
03/05/2019
1
Chair
Black_Chair
Joseph Doe
2_Saw
2
04/05/2019
0
Chair
Black_Chair
Joseph Doe
3_Build
3
04/05/2019
1
Chair
Black_Chair
Joseph Doe
4_Paint
4
05/05/2019
1
Chair
Yellow_Chair
Joseph Smith
0_Start
0
04/05/2019
0
Chair
Yellow_Chair
Joseph Smith
1_Material
1
04/05/2019
1
Chair
Yellow_Chair
Joseph Smith
3_Build
3
05/05/2019
0
Chair
Yellow_Chair
Joseph Smith
4_Paint
4
05/05/2019
1
Chair
Black_Chair
John Doe
0_Start
0
04/05/2019
0
Chair
Black_Chair
John Doe
1_Material
1
04/05/2019
1
Chair
Black_Chair
John Doe
3_Build
3
05/05/2019
0
Chair
Black_Chair
John Doe
4_Paint
4
05/05/2019
1
When I run the query
select
Step,
sum(StepTimeinDays) as SumofStepTimeinDays
from
dbo.build_process
group by
Step
order by
Step asc
The return is:
Step
SumofStepTimeinDays
0_Start
0
1_Material
3
2_Saw
1
3_Build
2
4_Paint
3
5_Varnish
1
6_Packing
2
I need to return the sum to Step Time in Days only for "blocks" where I have the Step 2_Saw:
Product
ProductCode
OperatorName
Step
StepSequence
StepDate
StepTimeinDays
Chair
Black_Chair
Joseph Smith
0_Start
0
02/05/2019
0
Chair
Black_Chair
Joseph Smith
1_Material
1
02/05/2019
0
Chair
Black_Chair
Joseph Smith
2_Saw
2
02/05/2019
1
Chair
Black_Chair
Joseph Smith
3_Build
3
03/05/2019
1
Chair
Black_Chair
Joseph Smith
4_Paint
4
04/05/2019
0
Chair
Black_Chair
Joseph Smith
5_Varnish
5
04/05/2019
0
Chair
Black_Chair
Joseph Smith
6_Packing
6
04/05/2019
1
Chair
Black_Chair
Joseph Doe
0_Start
0
03/05/2019
0
Chair
Black_Chair
Joseph Doe
1_Material
1
03/05/2019
1
Chair
Black_Chair
Joseph Doe
2_Saw
2
04/05/2019
0
Chair
Black_Chair
Joseph Doe
3_Build
3
04/05/2019
1
Chair
Black_Chair
Joseph Doe
4_Paint
4
05/05/2019
1
The expected result is:
Step
Sum of StepTimeinDays
0_Start
0
1_Material
1
2_Saw
1
3_Build
2
4_Paint
1
5_Varnish
0
6_Packing
1
I tried to use over partition but couldn't get the expected result.
select
Step,
sum(StepTimeinDays) as SumofStepTimeinDays
from(
select
t.*,
row_number() over(partition by ProductCode, OperatorName, Step order by StepSequence asc)
from
dbo.build_process
) as t
group by
Step
order by
Step asc
Below the DDL to create the table and populate in SQL Server:
--Create the table
CREATE TABLE [dbo].[build_process](
[Product] varchar(50) NOT NULL,
[ProductCode] varchar(50) NOT NULL,
[OperatorName] varchar(50) NOT NULL,
[Step] varchar(15) NULL,
[StepSequence] int NULL,
[StepDate] date NULL,
[StepTimeinDays] int NULL
) ON [PRIMARY]
GO
--Populate the table
INSERT INTO [dbo].[build_process] VALUES
('Chair','Black_Chair','Joseph Smith','0_Start',0,'2019-05-02',0),
('Chair','Black_Chair','Joseph Smith','1_Material',1,'2019-05-02',0),
('Chair','Black_Chair','Joseph Smith','2_Saw',2,'2019-05-02',1),
('Chair','Black_Chair','Joseph Smith','3_Build',3,'2019-05-03',1),
('Chair','Black_Chair','Joseph Smith','4_Paint',4,'2019-05-04',0),
('Chair','Black_Chair','Joseph Smith','5_Varnish',5,'2019-05-04',0),
('Chair','Black_Chair','Joseph Smith','6_Packing',6,'2019-05-04',1),
('Table','Table_2350','John Doe','0_Start',0,'2019-05-02',0),
('Table','Table_2350','John Doe','1_Material',1,'2019-05-02',0),
('Table','Table_2350','John Doe','3_Build',3,'2019-05-02',0),
('Table','Table_2350','John Doe','4_Paint',4,'2019-05-02',0),
('Table','Table_2350','John Doe','5_Varnish',5,'2019-05-02',1),
('Table','Table_2350','John Doe','6_Packing',6,'2019-05-03',1),
('Chair','Black_Chair','Joseph Doe','0_Start',0,'2019-05-03',0),
('Chair','Black_Chair','Joseph Doe','1_Material',1,'2019-05-03',1),
('Chair','Black_Chair','Joseph Doe','2_Saw',2,'2019-05-04',0),
('Chair','Black_Chair','Joseph Doe','3_Build',3,'2019-05-04',1),
('Chair','Black_Chair','Joseph Doe','4_Paint',4,'2019-05-05',1),
('Chair','Yellow_Chair','Joseph Smith','0_Start',0,'2019-05-04',0),
('Chair','Yellow_Chair','Joseph Smith','1_Material',1,'2019-05-04',1),
('Chair','Yellow_Chair','Joseph Smith','3_Build',3,'2019-05-04',0),
('Chair','Yellow_Chair','Joseph Smith','4_Paint',4,'2019-05-05',1),
('Chair','Black_Chair','John Doe','0_Start',0,'2019-05-04',0),
('Chair','Black_Chair','John Doe','1_Material',1,'2019-05-04',1),
('Chair','Black_Chair','John Doe','3_Build',3,'2019-05-05',0),
('Chair','Black_Chair','John Doe','4_Paint',4,'2019-05-05',1)
GO
You can just use an exists clause to check whether step "2_Saw" exists.
select Step
, sum(StepTimeinDays) as SumofStepTimeinDays
from dbo.build_process t1
where exists (
select 1
from dbo.build_process t2
where t1.ProductCode = t2.ProductCode and t1.OperatorName = t2.OperatorName
and t2.Step = '2_Saw'
)
group by Step
order by Step asc;

How to change the Row Number behavior in my T-SQL Query?

I am using SQL Server 2014 and I have the following T-SQL query running against a table (tbl1).
Extract of tbl1:
emp_code Name Address Company
---------------------------------------
100 Peter London ABC
125 Allan Cambridge DCE
125 Allan Cambridge DCE
115 John Suffolk ABC
115 John Suffolk XYZ
154 Mary Highlands ABC
154 Mary Bristol ABC
124 Mary Chester ABC
My T-SQL query stands as follows:
SELECT
[ID],
[Name],
[Address],
[Company],
ROW_NUMBER() OVER (PARTITION BY [emp_code] ORDER BY [Address]) AS RowNumber
FROM
[tbl1]
Output from above query:
emp_code Name Address Company RowNumber
--------------------------------------------------------
100 Peter London ABC 1
125 Allan Cambridge DCE 1
125 Allan Cambridge DCE 2
115 John Suffolk ABC 1
115 John Suffolk XYZ 2
154 Mary Highlands ABC 1
154 Mary Bristol ABC 2
154 Mary Chester ABC 3
Output I'm after:
emp_code Name Address Company RowNumber
---------------------------------------------------------
100 Peter London ABC 1
125 Allan Cambridge DCE 1
125 Allan Cambridge DCE 1
115 John Suffolk ABC 1
115 John Suffolk XYZ 1
154 Mary Highlands ABC 1
154 Mary Bristol ABC 2
154 Mary Chester ABC 3
I want my RowNumber (or change the column name if need be) to change based on the [Address] column for each [emp_code]. If the employee has the SAME address, it should have the same value (that is, 1). Else, it should give the values as in the case of employee "Mary" (above output).
I am assuming the Row_Number() function is not the right one to be used for what I'm after.
Any help would be appreciated.
I think you want DENSE_RANK here rather than ROW_NUMBER():
SELECT [ID], [Name], [Address], [Company],
DENSE_RANK() OVER (PARTITION BY [emp_code]
ORDER BY [Address]) AS DenseRank
FROM [tbl1];
Demo

SQL number of first names belonging to two families

Say I have a table with three columns : the first column is the id, the second contains first names and the third column contains last names. They can be rows with the same name but different last names. However, the same name cannot occur twice with the same last name twice in the table.
ID
First_Name
Last_Name
0
John
SMITH
1
John
BROWN
2
John
JONES
3
John
WILLIAMS
4
John
MILLER
5
John
DAVIS
6
John
WILSON
7
John
TAYLOR
8
John
WHITE
9
John
CLARK
10
Michael
SMITH
11
Michael
BROWN
12
James
JONES
13
James
WILLIAMS
14
Robert
MILLER
15
Robert
DAVIS
16
Robert
WILSON
17
Robert
BROWN
18
Robert
JONES
19
Robert
WILLIAMS
20
Jennifer
MILLER
21
Jennifer
DAVIS
22
Jennifer
SMITH
23
Jennifer
BROWN
24
Jennifer
JONES
25
Jennifer
WILLIAMS
26
Jennifer
WILSON
27
Jennifer
TAYLOR
28
Jennifer
WHITE
How do I get a matrix M whose rows and columns are all possible values of last names and M(f1,f2) represents the number of first names who occur with f1 and f2 in the table?
i.e to get this result
Thank you for your help
Getting a (dynamic) matrix as a result of a pure SQL is not doable.
The best you can do is to get the 2 last names in 2 columns, like this:
SELECT
t1.last_n name_a, t2.last_n name_b,
count(case when t1.first_n = t2.first_n then 1 end) count_match
FROM data_table t1,data_table t2
GROUP BY 1,2
ORDER BY 1,2
The results would look like this (this is like the expected results just not in a matrix):
Results
name_a name_b count_match
BROWN BROWN 4
BROWN CLARK 1
BROWN DAVIS 3
BROWN JONES 3
BROWN MILLER 3
BROWN SMITH 3
BROWN TAYLOR 2
BROWN WHITE 2
BROWN WILLIAMS 3
BROWN WILSON 3
CLARK BROWN 1
CLARK CLARK 1
CLARK DAVIS 1
CLARK JONES 1
CLARK MILLER 1
CLARK SMITH 1
CLARK TAYLOR 1
CLARK WHITE 1
CLARK WILLIAMS 1
CLARK WILSON 1
DAVIS BROWN 3
DAVIS CLARK 1
DAVIS DAVIS 3
DAVIS JONES 3
DAVIS MILLER 3
DAVIS SMITH 2
DAVIS TAYLOR 2
DAVIS WHITE 2
DAVIS WILLIAMS 3
DAVIS WILSON 3
JONES BROWN 3
JONES CLARK 1
JONES DAVIS 3
....

SQL Queries (Difference between tables)

I'm trying to find a difference between two tables. The tables are
Sample Data
PERSON_PHOTO
ID USERID FNAME
801 uid01 Geroge
801 uid05 George
803 uid01 George
901 uid01 Alice
201 uid01 Alice
330 uid01 Alice
802 uid05 Alice
803 uid05 Alice
804 uid05 Alice
901 uid05 Alice
701 uid05 Alice
201 uid05 Alice
101 uid05 Alice
330 uid05 Alice
501 uid05 Alice
501 uid12 Jane
330 uid12 Jane
101 uid12 Jane
201 uid12 Jane
701 uid12 Jane
801 uid12 Jane
901 uid12 Jane
101 uid07 Mary
101 uid03 Mary
201 uid03 Mary
801 uid03 Mary
901 uid03 Mary
201 uid15 Tom
801 uid15 Tom
Table VALID_FRIEND
FNAME USERID
Bill uid02
George uid01
Mary uid07
Jane uid12
Tom uid15
Alice uid05
Mary uid03
SAMPLE OUTPUT
USERID PHOTOS NOT IN
uid02 0
uid01 5
uid07 9
uid12 3
uid15 8
uid05 8
uid03 6
The query I'm trying to perform is to find the number of Photos that the person is not in. I'm trying to output by USERID and the number of photos not currently in. I know I need to find the count of the distinct PID in person photo and take the difference of the count of the userid in photo. Thanks for any help.