Showing one row for each calendar week in SQL - sql

I have a SQL query which pulls unit sales by item, by week:
SELECT sls_vendor,
sls_item,
sls_units,
DATEPART(week, sls_week) AS sls_date
FROM mytable
Assume I'm looking at a 8 week period, but not every item/vendor combination has a full 8 weeks of sales. However I need my query to show a null value in that instance. So the query would return 8 rows for each item/vendor combination regardless of existence.
I tried creating a temp table which has the numbers 28 to 35 and performing a left join on the query above, but that doesn't show null values. The results are no different than running the original query alone.
I can think of how this would be done using a crosstab/pivot query, but isn't this something the join should be doing?
Edit: Updated to show my join query. Datetable just has 8 rows with 1 incremental number for each calendar week.
SELECT * FROM #datetable
LEFT JOIN
(SELECT
sls_vendor,
sls_item,
sls_units,
datepart(week,sls_week) AS sls_date
FROM mytable) AS QRY
ON temp_week = qry.sls_date

Your method should work just fine:
;with mytable as (
select 1 as sls_vendor, 'Test' as sls_item, 30 as sls_units, '8/7/2011' as sls_week union
select 1 as sls_vendor, 'Test' as sls_item, 30 as sls_units, '8/14/2011' as sls_week union
select 1 as sls_vendor, 'Test' as sls_item, 30 as sls_units, '8/21/2011' as sls_week
)
,datetable as (
select 28 as temp_week union
select 29 union
select 30 union
select 31 union
select 32 union
select 33 union
select 34 union
select 35
)
SELECT * FROM datetable
LEFT JOIN
(SELECT
sls_vendor,
sls_item,
sls_units,
datepart(week,sls_week) AS sls_date
FROM mytable) AS QRY
ON temp_week=qry.sls_date
Output:
temp_week sls_vendor sls_item sls_units sls_date
28 NULL NULL NULL NULL
29 NULL NULL NULL NULL
30 NULL NULL NULL NULL
31 NULL NULL NULL NULL
32 NULL NULL NULL NULL
33 1 Test 30 33
34 1 Test 30 34
35 1 Test 30 35
Edit: If you want to include all week values for every sales vendor, include a cross join with the distinct selection of vendors:
;with mytable as (
select 1 as sls_vendor, 'Test' as sls_item, 30 as sls_units, '8/7/2011' as sls_week union
select 2 as sls_vendor, 'Test' as sls_item, 30 as sls_units, '8/14/2011' as sls_week union
select 3 as sls_vendor, 'Test' as sls_item, 30 as sls_units, '8/21/2011' as sls_week
)
,datetable as (
select 28 as temp_week union
select 29 union
select 30 union
select 31 union
select 32 union
select 33 union
select 34 union
select 35
)
SELECT * FROM datetable
cross join (select distinct sls_vendor from mytable) v
LEFT JOIN
(SELECT
sls_vendor,
sls_item,
sls_units,
datepart(week,sls_week) AS sls_date
FROM mytable) AS QRY
ON temp_week=qry.sls_date and v.sls_vendor=qry.sls_vendor
Output:
temp_week sls_vendor sls_vendor sls_item sls_units sls_date
28 1 NULL NULL NULL NULL
29 1 NULL NULL NULL NULL
30 1 NULL NULL NULL NULL
31 1 NULL NULL NULL NULL
32 1 NULL NULL NULL NULL
33 1 1 Test 30 33
34 1 NULL NULL NULL NULL
35 1 NULL NULL NULL NULL
28 2 NULL NULL NULL NULL
29 2 NULL NULL NULL NULL
30 2 NULL NULL NULL NULL
31 2 NULL NULL NULL NULL
32 2 NULL NULL NULL NULL
33 2 NULL NULL NULL NULL
34 2 2 Test 30 34
35 2 NULL NULL NULL NULL
28 3 NULL NULL NULL NULL
29 3 NULL NULL NULL NULL
30 3 NULL NULL NULL NULL
31 3 NULL NULL NULL NULL
32 3 NULL NULL NULL NULL
33 3 NULL NULL NULL NULL
34 3 NULL NULL NULL NULL
35 3 3 Test 30 35

Does it work for you?
SELECT sls_vendor,
sls_item,
sls_units,
DATEPART(WEEK, sls_week) AS sls_date
FROM (
SELECT VALUE = 28 UNION ALL
SELECT VALUE = 29 UNION ALL
SELECT VALUE = 30 UNION ALL
SELECT VALUE = 31 UNION ALL
SELECT VALUE = 32 UNION ALL
SELECT VALUE = 33 UNION ALL
SELECT VALUE = 34 UNION ALL
SELECT VALUE = 35
) dates
LEFT JOIN mytable m
ON dates.value = DATEPART(WEEK, m.sls_week)

The following query works in Data.StackExchange. See here. It gets the top Post per week by score.
WITH weeksyears
AS (SELECT w.NUMBER AS week,
y.NUMBER AS year
FROM (SELECT v.NUMBER
FROM MASTER..spt_values v
WHERE TYPE = 'P'
AND v.NUMBER BETWEEN 1 AND 52) w,
(SELECT v.NUMBER
FROM MASTER..spt_values v
WHERE TYPE = 'P'
AND v.NUMBER BETWEEN 2008 AND 2012) y),
topPostPerWeek
AS (SELECT score,
Datepart(week, creationdate) week,
Datepart(YEAR, creationdate) YEAR,
Row_number() OVER (PARTITION BY Datepart(wk, creationdate),
Datepart(
YEAR,
creationdate) ORDER BY score DESC) row
FROM posts)
SELECT *
FROM weeksyears wy
LEFT JOIN topPostPerWeekt
ON wy.week = t.week
AND wy.YEAR = t.YEAR
WHERE row = 1
OR row IS NULL
ORDER BY wy.YEAR, wy.WEEK
​
Every row prior to the 38 week in 2008 is empty except for week and year. As well as the rows after the 35 week in 2011.
However if you edit the query and remove OR row IS NULL the query will act just as if it were an INNER JOIN
My guess is that there's somthing in your WHERE that's referring to the "RIGHT" table. Just add OR [rightTable.field] IS NULL and you'll be fine.

Related

SQL Permutation of Columns

I have a purchases table that looks like this:
store_id. industry_code amt_age_18_24 amt_age_25-34 amt_men amt_women
1 1000 100 20 80 40
2 2000 100 100 130 70
What I'm trying to do is find every permutation of purchases by age and gender for each store. Something like this, where each row is unique:
store_id. industry_code amt_age_18_24 amt_age_25-34 amt_men amt_women
1 1000 100 NULL 80 NULL
1 1000 100 NULL NULL 40
1 1000 NULL 20 80 NULL
1 1000 NULL 20 NULL 80
2 2000 100 NULL 130 NULL
2 2000 100 NULL NULL 70
2 2000 NULL 100 130 NULL
2 2000 NULL 100 NULL 70
What's the best way to do this? A self join?
This looks like union all:
select store_id, instrustry_code, amt_age_18_24, null as amt_age_25_34, amt_men, null as amt_women
from t
union all
select store_id, instrustry_code, amt_age_18_24, null as amt_age_25_34, null as amt_men, amt_women
from t
union all
. . .
Here is an approach using a cross join with a derived table that contains "column masks":
select
t.store_id,
t.industry_code,
t.amt_age_18_24 * x.amt_age_18_24 as amt_age_18_24,
t.amt_age_25_34 * x.amt_age_25_34 as amt_age_25_34,
t.amnt_men * x.amnt_men as amnt_men,
t.amt_women * x.amt_women as amt_women
from mytable t
cross join (
select 1 as amt_age_18_24, null as amt_age_25_34, 1 as amnt_men, null as amt_women
union all select 1, null, null, 1
union all select null, 1, 1, null
union all select null, 1, null, 1
) x
The upside is that this does not require scanning the table multiple times, as opposed to the union all approach.
You can use union for each permutation as you wish:
select store_id, instrustry_code, amt_age_18_24, null as amt_age_25_34, amt_men, null as amt_women
from t
union all
select store_id, instrustry_code, amt_age_18_24, null as amt_age_25_34, null as amt_men, amt_women
from t
and do it for as many columns as you want

SQL - spread previous values from one column into multiple new columns

I have a SQL table of Customer_ID, showing Payments by Year. The first (of many) customer appears like this:
ID Payment Year
112 0 2004
112 0 2005
112 0 2006
112 9592 2007
112 12332 2008
112 9234 2011
112 5400 2012
112 7392 2014
112 8321 2015
Note that some years are missing. I need to create 10 new columns, showing the Payments in the previous 10 years, for each row. The resulting table should look like this:
ID Payment Year T-1 T-2 T-3 T-4 T-5 T-6 T-7 T-8 T-9 T-10
112 0 2004 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
112 0 2005 0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
112 0 2006 0 0 NULL NULL NULL NULL NULL NULL NULL NULL
112 952 2007 0 0 0 NULL NULL NULL NULL NULL NULL NULL
112 1232 2008 952 0 0 0 NULL NULL NULL NULL NULL NULL
112 924 2011 NULL NULL 1232 952 0 0 0 NULL NULL NULL
112 500 2012 924 NULL NULL 1232 952 0 0 0 NULL NULL
112 392 2014 NULL 500 924 NULL NULL 1232 952 0 0 0
112 821 2015 392 NULL 500 924 NULL NULL 1232 952 0 0
I am well aware that this is a large duplication of data, and so seems like a strange thing to do. However, I would still like to do it! (the data is being prepared for a predictive model, in which previous payments (and other info) will be used to predict the current year's payment)
I'm not really sure where to start with this. I have been looking at using pivot, but can't figure out how to get it to select values from a customer's previous year.
I would very much like to do this in SQL. If that is not possible I may be able to copy the table into R - but SQL is my preference.
Any help much appreciated.
You could use lag() if you had full data:
select t.*,
lag(payment, 1) over (partition by id order by year) as t_1,
lag(payment, 2) over (partition by id order by year) as t_2,
. . .
from t;
However, for your situation with missing intermediate years, left join may be simpler:
select t.*,
t1.payment as t_1,
t2.payment as t_2,
. . .
from t left join
t t1
on t1.id = t.id and
t1.year = t.year - 1 left join
t t2
on t1.id = t.id and
t1.year = t.year - 2 left join
. . .;
I thnk your friend will be LAG
Here's an implementation:
Declare #t table (
ID int,
Payment int,
Yr int
)
Insert Into #t Values(112,0,2004)
Insert Into #t Values(112,0,2005)
Insert Into #t Values(112,0,2006)
Insert Into #t Values(112,9592,2007)
Insert Into #t Values(112,12332,2008)
Insert Into #t Values(112,9234,2011)
Insert Into #t Values(112,5400,2012)
Insert Into #t Values(112,7392,2014)
Insert Into #t Values(112,8321,2015)
Insert Into #t Values(113,0,2009)
Insert Into #t Values(113,9234,2011)
Insert Into #t Values(113,5400,2013)
Insert Into #t Values(113,8321,2015)
;with E1(n) as (Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1)
,E2(n) as (Select 1 From E1 a, E1 b)
,E4(n) as (Select 1 From E2 a, E2 b)
,E5(n) as (Select row_number() over(order by isnull(null,1)) From E4 a, E1 b)
,IDYears as (
Select z.ID, Yr = y.n
From (
Select
Id,
MinYear = min(Yr),
MaxYear = max(Yr)
From #t a
Group By Id
) z
Inner Join E5 y On y.n between z.MinYear and z.MaxYear
)
Select
*,
[t-1] = Lag(B.Payment, 1) Over(Partition By a.ID Order By a.Yr),
[t-2] = Lag(B.Payment, 2) Over(Partition By a.ID Order By a.Yr),
[t-3] = Lag(B.Payment, 3) Over(Partition By a.ID Order By a.Yr),
[t-4] = Lag(B.Payment, 4) Over(Partition By a.ID Order By a.Yr),
[t-5] = Lag(B.Payment, 5) Over(Partition By a.ID Order By a.Yr),
[t-6] = Lag(B.Payment, 6) Over(Partition By a.ID Order By a.Yr),
[t-7] = Lag(B.Payment, 7) Over(Partition By a.ID Order By a.Yr),
[t-8] = Lag(B.Payment, 8) Over(Partition By a.ID Order By a.Yr),
[t-9] = Lag(B.Payment, 9) Over(Partition By a.ID Order By a.Yr),
[t-10] = Lag(B.Payment, 10) Over(Partition By a.ID Order By a.Yr)
From IDYears a
Left Join #t b On a.ID = b.ID and a.Yr = b.Yr
Order By A.ID

SQL join multiple tables with/without data

I have no idea how to create an SQL statement to join 4 tables.
1) The 'Vendor Table will always match entries from each table on Vendor #
2) Each of the remaining 3 will match to each other by Vendor # & Seq #
3) Any combination of the 3 can have data (or not)
4) I don't want to select from the Vendor table unless I get a hit on at least one of the 3
VENDOR
Vendor # Name
-------- ----
1 Tom Smith
2 Bruce Lee
3 Seamus O’Leary
4 Jonathan Stewart
5 Benjamin Franklin
Month Range Selected
Vendor # Seq # MonthFrom MonthTo
-------- ----- --------- -------
1 1 3 6
1 2 7 9
3 2 5 6
Week Selected
Vendor # Seq # Week #
-------- ----- ------
1 1 3
3 1 4
4 1 1
Day Selected
Vendor # Seq # Day #
1 1 15
1 2 25
2 1 12
4 1 05
5 1 19
Desired Table (Joined)
Vendor# Name Seq# MonthFrom MonthTo Week# Day#
1 Tom Smith 1 3 6 3 15
1 Tom Smith 2 7 9 NULL 25
2 Bruce Lee 1 NULL NULL NULL 12
3 Seamus O’Leary 1 NULL NULL 4 NULL
3 Seamus O’Leary 2 5 6 NULL NULL
4 Jonathan Stewart 1 NULL NULL 1 05
5 Benjamin Franklin 1 NULL NULL NULL 19
The trick being that any of the 3 (not including 'Vendor') can or cannot have data and I only want a row returned if there is something from one or more of the 3.
Any Advice?
To join it on Vendor and Seq, we first need to have all possible combinations. Then we can filter the tables based on these combinations. I've ran the following in SQL server:
Setup
declare #Vendors table(id int, name varchar(20));
declare #MonthRangeSelected table (vendor int, seq int null, monthFrom int null, monthTo int null);
declare #WeekSelected table (vendor int, seq int null, week int null);
declare #DaySelected table (vendor int, seq int null, day int null);
insert into #Vendors
select 1, 'Tom Smith'
union all
select 2, 'Bruce Lee'
union all
select 3, 'Seamus O’Leary'
union all
select 4, 'Jonathan Stewart'
union all
select 5, 'Benjamin Franklin';
insert into #MonthRangeSelected
select 1, 1, 3, 6
union all
select 1, 2, 7, 9
union all
select 3, 2, 5, 6;
insert into #WeekSelected
select 1, 1, 3
union all
select 3, 1, 4
union all
select 4, 1, 1;
insert into #DaySelected
select 1, 1, 15
union all
select 1, 2, 25
union all
select 2, 1, 12
union all
select 4, 1, 05
union all
select 5, 1, 19;
Query
select v.Id, v.name, combinations.seq, MonthFrom, MonthTo, Week, Day
from #Vendors v
inner join (select m.vendor, m.seq
from #MonthRangeSelected m
union
select w.vendor, w.seq
from #WeekSelected w
union
select d.vendor, d.seq
from #DaySelected d) combinations
on combinations.vendor = v.id
left join #MonthRangeSelected m
on m.Vendor = combinations.vendor
and m.seq = combinations.seq
left join #WeekSelected w
on w.Vendor = combinations.vendor
and w.seq = combinations.seq
left join #DaySelected d
on d.Vendor = combinations.vendor
and d.seq = combinations.seq
where (MonthFrom is not null
or MonthTo is not null
or Week is not null
or Day is not null)
And this is the result:
Id name seq MonthFrom MonthTo Week Day
1 Tom Smith 1 3 6 3 15
1 Tom Smith 2 7 9 NULL 25
2 Bruce Lee 1 NULL NULL NULL 12
3 Seamus O’Leary 1 NULL NULL 4 NULL
3 Seamus O’Leary 2 5 6 NULL NULL
4 Jonathan Stewart 1 NULL NULL 1 5
5 Benjamin Franklin 1 NULL NULL NULL 19
This is more complicated than it sounds. According to the result, you do not want a cartesian product when there are multiple matches in a table. So, you need to take seqnum into account.
select v.Vendor, v.name, coalesce(m.seq, w.seq, d.seq) as Seq,
m.MonthFrom, m.MonthTo, w.Week, d.Day
from Vendors v left join
SMonthRangeSelected m
on v.Vendor = m.Vendor full join
WeekSelected w
on v.Vendor = w.Vendor and m.seq = w.seq full join
DaySelected d
on v.Vendor = d.Vendor and d.seq in (w.seq, m.seq)
where m.Vendor is not null or
w.Vendor is not null or
d.Vendor is not null;
Strange things can happen when using full join, particularly if you want any filtering. An alternative approach uses union all and group by:
select mwd.Vendor, v.name, mwd.seq,
max(MonthFrom) as MonthFrom, max(MonthTo) as monthTo,
max(Week) as week, max(Day) as day
from ((select m.Vendor, m.seq, m.MonthFrom, m.MonthTo, NULL as week, NULL as day
from month m
) union all
(select w.Vendor, w.seq, NULL as MonthFrom, NULL as MonthTo, w.week, NULL as day
from week
) union all
(select d.Vendor, d.seq, NULL as MonthFrom, NULL as MonthTo, NULL as week, d.day
from day d
)
) mwd join
Vendor v
on v.vendor = vmwd.vendor
group by mwd.Vendor, v.vname, mwd.seq;
Note that this version does not require the Vendor table.
You should left outer join to each of the 3 tables, and then include the following in your where clause:
(MonthFrom is not null or Week# is not null or Day# is not null)
it sounds like you can inner join to the Vendor table
I believe that this will do what you need:
SELECT
V.[Vendor#], -- I'll never understand why people insist on using names that require brackets
V.Name,
COALESCE(M.[Seq#], W.[Seq#], D.[Seq#]) AS [Seq#],
M.MonthFrom,
M.MonthTo,
W.[Week#],
D.[Day#]
FROM
Vendor V
LEFT OUTER JOIN MonthRange M ON M.[Vendor#] = V.[Vendor#]
LEFT OUTER JOIN Week W ON W.[Vendor#] = V.[Vendor#]
LEFT OUTER JOIN Day D ON D.[Vendor#] = V.[Vendor#]
WHERE
(
M.[Vendor#] IS NOT NULL OR
W.[Vendor#] IS NOT NULL OR
D.[Vendor#] IS NOT NULL
) AND
(M.[Seq#] = W.[Seq#] OR M.[Seq#] IS NULL OR W.[Seq#] IS NULL) AND
(M.[Seq#] = D.[Seq#] OR M.[Seq#] IS NULL OR D.[Seq#] IS NULL) AND
(D.[Seq#] = W.[Seq#] OR D.[Seq#] IS NULL OR W.[Seq#] IS NULL)
You need full outer joins on the three tables, so as to get all vendor and seq number combinations. Join these with vendor and you are done:
select vendorno, v.name, x.seqno, x.monthfrom, x.monthto, x.weekno, x.dayno
from vendor v
join
(
select vendorno, seqno, m.monthfrom, m.monthto, w.weekno, d.dayno
from monthsel m
full outer join weeksel w using (vendorno, seqno)
full outer join daysel d using (vendorno, seqno)
) x using(vendorno)
order by vendorno, x.seqno;
UPDATE: Without a USING clause the same query get slightly less readable (and thus slightly more error-prone):
select v.vendorno, v.name, x.seqno, x.monthfrom, x.monthto, x.weekno, x.dayno
from vendor v
join
(
select
coalesce(m.vendorno, w.vendorno, d.vendorno) as vendorno,
coalesce(m.seqno, w.seqno, d.seqno) as seqno,
m.monthfrom, m.monthto, w.weekno, d.dayno
from monthsel m
full outer join weeksel w on w.vendorno = m.vendorno and w.seqno = m.seqno
full outer join daysel d on d.vendorno in (m.vendorno, w.vendorno)
and d.seqno in (m.seqno, w.segno)
) x on x.vendorno = v.vendorno
order by v.vendorno, x.seqno;
(Hope I didn't mix things up here. It's easy to make copy & paste errors with such a query. So if it doesn't work properly, look out for typos.)

SQL Server 2008 Group Based on a Sequence

I'm struggling to find if this is possible to use SQL Server 2008 to assign a sequence without having to use cursors. Let's say I have the following table which defines a driver's driving route going from one location to another (null means he is going from home):
RouteID SourceLocationID DestinationLocationID DriverID Created Updated
------- ---------------- --------------------- -------- ------- -------
1 NULL 219 1 10:20 10:23
2 219 266 1 10:21 10:24
3 266 NULL 1 10:22 10:25
4 NULL 54 2 10:23 10:26
5 54 NULL 2 10:24 10:27
6 NULL 300 1 10:25 10:28
7 300 NULL 1 10:26 10:29
I want to group the records between the rows where sourceLID is NULL and the destinationLID is null, so I get the following (generating a sequence number for each grouping set):
DriverID DestinationLocationID TripNumber
-------- --------------------- ----------
1 219 1 (his first trip)
1 266 1
1 300 2 (his second trip)
2 54 1
Is there a way I could use GROUP BY here rather than cursors?
a quick try:
with cte as
( select DestinationLocationID
, DriverID
, tripid = row_number()
over ( partition by driverid
order by DestinationLocationID)
from table1
where sourcelocationid is NULL
UNION ALL
select table1.DestinationLocationID
, table1.DriverID
, cte.tripid
from table1
join cte on table1.SourceLocationID=cte.DestinationLocationID
and table1.DriverID=cte.DriverID
where cte.DestinationLocationID is not null
)
select * from cte
Try this:
select driverid, destinationlocationid, count(destinationlocationid) from
(
select driverid, destinationlocationid from table1 where sourcelocationid is NULL
union all
select driverid, sourcelocationid from table1 where destinationlocationid is NULL
)A group by driverid, destinationlocationid
Try this,
Declare #t table(RouteID int, SourceLocationID int,DestinationLocationID int
,DriverID int,Created time, Updated time)
insert into #t
values(1, NULL, 219, 1, '10:20','10:23'),
(2 ,219,266, 1, '10:21','10:24'),
(3,266, NULL, 1, '10:22','10:25'),
(4, NULL, 54, 2, '10:23','10:26'),
(5,54, NULL, 2, '10:24','10:27'),
(6,NULL,300, 1, '10:25','10:28'),
(7,300,NULL, 1, '10:26','10:29')
;
WITH CTE
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY DriverID ORDER BY Created
) RN
FROM #t
)
,CTE1
AS (
SELECT *
,1 TripNumber
FROM CTE
WHERE RN = 1
UNION ALL
SELECT A.*
,CASE
WHEN A.SourceLocationID IS NULL
THEN B.TripNumber + 1
ELSE B.TripNumber
END
FROM CTE1 B
INNER JOIN CTE A ON B.DriverID = A.DriverID
WHERE A.RN > B.RN
)
SELECT DISTINCT DestinationLocationID
,DriverID
,TripNumber
FROM CTE1
WHERE DestinationLocationID IS NOT NULL
ORDER BY DriverID
Use a correlated sub-query to count previous trips, plus 1 to get this trip number.
select DriverID,
DestinationLocationID,
(select count(*) + 1
from routes t2
where t1.DriverID = t2.DriverID
and t1.RouteID > t2.RouteID
and DestinationLocationID IS NULL) as TripNumber
from routes t1
where DestinationLocationID IS NOT NULL
order by DriverID, DestinationLocationID;
Executes like this:
SQL>select DriverID,
SQL& DestinationLocationID,
SQL& (select count(*) + 1
SQL& from routes t2
SQL& where t1.DriverID = t2.DriverID
SQL& and t1.RouteID > t2.RouteID
SQL& and DestinationLocationID IS NULL) as TripNumber
SQL&from routes t1
SQL&where DestinationLocationID IS NOT NULL
SQL&order by DriverID, DestinationLocationID;
DriverID DestinationLocationID TripNumber
=========== ===================== ============
1 219 1
1 266 1
1 300 2
2 54 1
4 rows found

SQL Server Query to find CHI-SQUARE Values (Not Working)

I am trying to find the Chi-Square test from my following SQL Server Query on the sample data:
SELECT sessionnumber, sessioncount, timespent, expected, dev, dev*dev/expected as chi_square
FROM (SELECT clusters.sessionnumber, clusters.sessioncount, clusters.timespent,
(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected,
clusters.cnt-(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as dev
FROM clusters JOIN
(SELECT sessionnumber, SUM(cnt) as cnt FROM clusters
GROUP BY sessionnumber) dim1 ON clusters.sessionnumber = dim1.sessionnumber JOIN
(SELECT sessioncount, SUM(cnt) as cnt FROM clusters
GROUP BY sessioncount) dim2 ON clusters.sessioncount = dim2.sessioncount JOIN
(SELECT timespent, SUM(cnt) as cnt FROM clusters
GROUP BY timespent) dim3 ON clusters.timespent = dim3.timespent CROSS JOIN
(SELECT SUM(cnt) as cnt FROM clusters) dimall) a
My table has this sort of sample data:
sessionnumber sessioncount timespent cnt
1 17 28 NULL
2 22 8 NULL
3 1 1 NULL
4 1 1 NULL
5 8 111 NULL
6 8 65 NULL
7 11 5 NULL
8 1 1 NULL
9 62 64 NULL
10 6 42 NULL
The problem is that this query works fine but it gives wrong output or you can say no output at all. The output it gives my is like:
sessionnumber sessioncount timespent expected dev chi_square
1 17 28 NULL NULL NUL
2 22 8 NULL NULL NULL
3 1 1 NULL NULL NULL
4 1 1 NULL NULL NULL
5 8 111 NULL NULL NULL
6 8 65 NULL NULL NULL
7 11 5 NULL NULL NULL
8 1 1 NULL NULL NULL
9 62 64 NULL NULL NULL
10 6 42 NULL NULL NULL
How can I get rid of this problem because I tried my best at all! Thanks in advance telling me what I' doing wrong!
In your sample data, cnt is NULL, so the results are also NULL. You can replace these NULL values with a default value (1 for example, I don't know what is the context) using ISNULL, like
SELECT sessionnumber, SUM(ISNULL(cnt, 1)) as cnt FROM clusters GROUP BY sessionnumber