SQL Server Query to find CHI-SQUARE Values (Not Working) - sql

I am trying to find the Chi-Square test from my following SQL Server Query on the sample data:
SELECT sessionnumber, sessioncount, timespent, expected, dev, dev*dev/expected as chi_square
FROM (SELECT clusters.sessionnumber, clusters.sessioncount, clusters.timespent,
(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected,
clusters.cnt-(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as dev
FROM clusters JOIN
(SELECT sessionnumber, SUM(cnt) as cnt FROM clusters
GROUP BY sessionnumber) dim1 ON clusters.sessionnumber = dim1.sessionnumber JOIN
(SELECT sessioncount, SUM(cnt) as cnt FROM clusters
GROUP BY sessioncount) dim2 ON clusters.sessioncount = dim2.sessioncount JOIN
(SELECT timespent, SUM(cnt) as cnt FROM clusters
GROUP BY timespent) dim3 ON clusters.timespent = dim3.timespent CROSS JOIN
(SELECT SUM(cnt) as cnt FROM clusters) dimall) a
My table has this sort of sample data:
sessionnumber sessioncount timespent cnt
1 17 28 NULL
2 22 8 NULL
3 1 1 NULL
4 1 1 NULL
5 8 111 NULL
6 8 65 NULL
7 11 5 NULL
8 1 1 NULL
9 62 64 NULL
10 6 42 NULL
The problem is that this query works fine but it gives wrong output or you can say no output at all. The output it gives my is like:
sessionnumber sessioncount timespent expected dev chi_square
1 17 28 NULL NULL NUL
2 22 8 NULL NULL NULL
3 1 1 NULL NULL NULL
4 1 1 NULL NULL NULL
5 8 111 NULL NULL NULL
6 8 65 NULL NULL NULL
7 11 5 NULL NULL NULL
8 1 1 NULL NULL NULL
9 62 64 NULL NULL NULL
10 6 42 NULL NULL NULL
How can I get rid of this problem because I tried my best at all! Thanks in advance telling me what I' doing wrong!

In your sample data, cnt is NULL, so the results are also NULL. You can replace these NULL values with a default value (1 for example, I don't know what is the context) using ISNULL, like
SELECT sessionnumber, SUM(ISNULL(cnt, 1)) as cnt FROM clusters GROUP BY sessionnumber

Related

row_number() but only increment value after a specific value in a column

Query: SELECT (row_number() OVER ()) as grp, * from tbl
Edit: the rows below are returned by a pgrouting shortest path function and it does have a sequence.
seq grp id
1 1 8
2 2 3
3 3 2
4 4 null
5 5 324
6 6 82
7 7 89
8 8 null
9 9 1
10 10 2
11 11 90
12 12 null
How do I make it so that the grp column is only incremented after a null value on id - and also keep the same order of rows
seq grp id
1 1 8
2 1 3
3 1 2
4 1 null
5 2 324
6 2 82
7 2 89
8 2 null
9 3 1
10 3 2
11 3 90
12 3 null
demo:db<>fiddle
Using a cumulative SUM aggregation is a possible approach:
SELECT
SUM( -- 2
CASE WHEN id IS NULL THEN 1 ELSE 0 END -- 1
) OVER (ORDER BY seq) as grp,
id
FROM mytable
If the current (ordered!) value is NULL, then make it 1, else 0. Now you got a bunch of zeros, delimited by a 1 at each NULL record. If you'd summerize these values cumulatively, at each NULL record, the sum increased.
Execution of the cumulative SUM() using window functions
This yields:
0 8
0 3
0 2
1 null
1 324
1 82
1 89
2 null
2 1
2 2
2 90
3 null
As you can see, the groups start with the NULL records, but you are expecting to end it.
This can be achieved by adding another window function: LAG(), which moves the records to the next row:
SELECT
SUM(
CASE WHEN next_id IS NULL THEN 1 ELSE 0 END
) OVER (ORDER BY seq) as grp,
id
FROM (
SELECT
LAG(id) OVER (ORDER BY seq) as next_id,
seq,
id
FROM mytable
) s
The result is your expected one:
1 8
1 3
1 2
1 null
2 324
2 82
2 89
2 null
3 1
3 2
3 90
3 null

Repeat a string based on a column - BigQuery/SQL Standard

It should be very easy, but I am stucked in this.
It is simple as the image, but I have a column named "category" and once the row has a category, I want this value repeated for 'n' times, let's say 10 times (or whatever I want), in a new column.
I've tried to use FIRST_VALUE(), but there is no pattern about when this category will appear, so most of times I have 'null' as a repetition.
I've seen ROW_NUMBER() OVER(PRECEDING AND FOLLOWING) but I can't use a string in this, just an aggregation and I don't want to calculate, I wanna classify. I even tried using CASE WHEN xxx category * 10 etc, or category + 10 but of course doesn't work.
Any suggestion? Thanks!
What I've tried:
WITH table1 AS(
SELECT
date,
hour,
minute,
category,
ROW_NUMBER() OVER() AS rn
FROM table1),
table2 AS(
SELECT
*,
CASE
WHEN category IS NOT NULL THEN 1
ELSE 0
END AS flag_category
FROM table1)
SELECT
*,
CASE
WHEN flag_category = 1
THEN (SELECT
a.category,
FROM table2 AS a
INNER JOIN table2 AS b
ON a.rn = b.rn + 10)
ELSE '-'
END AS category_repetition
FROM table2
image explication here
W H A T I H A V E WHAT I WANT
date hour minute qty category category_repetition
20210412 0 0 2 null null
20210412 0 1 0 null null
20210412 0 2 6 null null
20210412 0 3 7 null null
20210412 0 4 7 null null
20210412 0 5 6 null null
20210412 0 6 3 null null
20210412 0 7 8 null null
20210412 0 8 4 null null
20210412 0 9 3 category A category A
20210412 0 10 4 null category A
20210412 0 11 0 null category A
20210412 0 12 5 null category A
20210412 0 13 2 null category A
20210412 0 14 3 null category A
20210412 0 15 3 null category A
20210412 0 16 4 null category A
20210412 0 17 3 null category A
20210412 0 18 5 null category A
20210412 0 19 4 null category A
You seem to want last_value(ignore nulls):
select t.*,
last_value(category ignore nulls) over (order by date, hour, minute) as category_repetition
from t;
I'm not sure what the "10" means in the question. This should produce the data that you want, based on the sample data and results.

SQL: subset data: select id when time_id for id satisfy a condition from another column

I have a data (dt) in SQL like the following:
ID time_id act rd
11 1 1 1
11 2 4 1
11 3 7 0
12 1 8 1
12 2 2 0
12 3 4 1
12 4 3 1
12 5 4 1
13 1 4 1
13 2 1 0
15 1 3 1
16 1 8 0
16 2 8 0
16 3 8 0
16 4 8 0
16 5 8 0
and I want to take the subset of this data such that only ids (and their corresponding time_id, act, rd) that has time_id == 5 is retained. The desired output is the following
ID time_id act rd
12 1 8 1
12 2 2 0
12 3 4 1
12 4 3 1
12 5 4 1
16 1 8 0
16 2 8 0
16 3 8 0
16 4 8 0
16 5 8 0
I know I should use having clause somehow but have not been successful so far (returns me empty outputs). below is my attempt:
SELECT * FROM dt
GROUP BY ID
Having min(time_id) == 5;
This query:
select id from tablename where time_id = 5
returns all the ids that you want in the results.
Use it with the operator IN:
select *
from tablename
where id in (select id from tablename where time_id = 5)
You can use a correlated subquery with exists:
select t.*
from t
where exists (select 1 from t t2 where t2.id = t.id and t2.time_id = 5);
WITH temp AS
(
SELECT id FROM tab WHERE time_id = 5
)
SELECT * FROM tab t join temp tp on(t.id=tp.id);
check this query
select * from table t1 join (select distinct ID from table t where time_id = 5) t2 on t1.id =t2.id;

Sequencing and re-setting in SQL Server 2008

I am actually new to SQL server 2008, and I am trying to sequence and re-set a number in a table. The source is something like:
Row Refrec FLAG
1 5 NULL
2 4 X
3 3 NULL
4 2 NULL
5 1 Y
6 5 A
7 4 B
8 3 NULL
9 2 NULL
10 1 NULL
The result should look like:
Row Refrec FLAG SEQUENCE
1 5 NULL NULL
2 4 X 0
3 3 NULL 1
4 2 NULL 2
5 1 Y 0
6 5 A 0
7 4 B 0
8 3 NULL 1
9 2 NULL 2
10 1 NULL 3
Thanks!
It looks like you want to enumerate the sequence values for NULL values, setting all the other values to 0. I'm not sure why the first value is NULL, but that is easily fixed.
The following may do what you want:
select t.*,
(case when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);
If you really care about the first value:
select t.*,
(case when row = 1 then NULL
when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);

Grouping Hierarchical data (parentID+ID) and running sum?

I have the following data:
ID parentID Text Price
1 Root
2 1 Flowers
3 1 Electro
4 2 Rose 10
5 2 Violet 5
6 4 Red Rose 12
7 3 Television 100
8 3 Radio 70
9 8 Webradio 90
I am trying to group this data with Reporting Services 2008 and have a sum of the price per group of level 1 (Flowers/Electro) and for level 0 (Root).
I have a table grouped on [ID] with a recursive parent of [parendID] and I am able to calculate the sum for the level 0 (just one more row in the table outside the group), but somehow I am not able to create sum's per group as SRSS does "create" groups per level. My desired result looks like so:
ID Text Price
1 Root
|2 Flowers
|-4 Rose 10
|-5 Violet 5
| |-6 Red Rose 12
| Group Sum-->27
|3 Electro
|-7 Television 100
|-8 Radio 70
|-9 Webradio 90
Group Sum-->260
----------------------
Total 287
(indentation of ID just added for level clarification)
With my current approach I cannot get the group sums, so I figured out I would need the following data structure:
ID parentID Text Price level0 level1 level2 level3
1 Root 1
2 1 Flowers 1 1
3 1 Electro 1 2
4 2 Rose 10 1 1 1
5 2 Violet 5 1 1 2
6 4 Red Rose 12 1 1 1 1
7 3 Television 100 1 2 1
8 3 Radio 70 1 2 2
9 8 Webradio 90 1 2 2 1
When having the above structure I can create an outer grouping of level0, with child groupings level1, level2, level3 accordingly . When now having a "group sum" on level1, and the total sum outside the group I have EXACTLY what I want.
My question is the following:
How do I either achieve my desired result with my current data structure, or how do I convert my current data structure (outer left joins?) into the "new data structure" temporarily - so I can run my report off of the temp table?
Thanks for taking your time,
Dennis
WITH q AS
(
SELECT id, parentId, price
FROM mytable
UNION ALL
SELECT p.id, p.parentID, q.price
FROM q
JOIN mytable p
ON p.id = q.parentID
)
SELECT id, SUM(price)
FROM q
GROUP BY
id
Update:
A test script to check:
DECLARE #table TABLE (id INT NOT NULL PRIMARY KEY, parentID INT, txt VARCHAR(200) NOT NULL, price MONEY)
INSERT
INTO #table
SELECT 1, NULL, 'Root', NULL
UNION ALL
SELECT 2, 1, 'Flowers', NULL
UNION ALL
SELECT 3, 1, 'Electro', NULL
UNION ALL
SELECT 4, 2, 'Rose', 10
UNION ALL
SELECT 5, 2, 'Violet', 5
UNION ALL
SELECT 6, 4, 'Red Rose', 12
UNION ALL
SELECT 7, 3, 'Television', 100
UNION ALL
SELECT 8, 3, 'Radio', 70
UNION ALL
SELECT 9, 8, 'Webradio', 90;
WITH q AS
(
SELECT id, parentId, price
FROM #table
UNION ALL
SELECT p.id, p.parentID, q.price
FROM q
JOIN #table p
ON p.id = q.parentID
)
SELECT t.*, psum
FROM (
SELECT id, SUM(price) AS psum
FROM q
GROUP BY
id
) qo
JOIN #table t
ON t.id = qo.id
Here's the result:
1 NULL Root NULL 287,00
2 1 Flowers NULL 27,00
3 1 Electro NULL 260,00
4 2 Rose 10,00 22,00
5 2 Violet 5,00 5,00
6 4 Red Rose 12,00 12,00
7 3 Television 100,00 100,00
8 3 Radio 70,00 160,00
9 8 Webradio 90,00 90,00
I found a really ugly way to do what I want - maybe there is something better?
SELECT A.Text, A.Price,
CASE
WHEN D.Text IS NULL
THEN
CASE
WHEN C.Text IS NULL
THEN
CASE
WHEN B.Text IS NULL
THEN
A.ID
ELSE B.ID
END
ELSE C.ID
END
ELSE D.ID
END
AS LEV0,
CASE
WHEN D.Text IS NULL
THEN
CASE
WHEN C.Text IS NULL
THEN
CASE
WHEN B.Text IS NULL
THEN
NULL
ELSE A.ID
END
ELSE B.ID
END
ELSE C.ID
END
AS LEV1,
CASE
WHEN D.Text IS NULL
THEN
CASE
WHEN C.Text IS NULL
THEN
NULL
ELSE A.ID
END
ELSE B.ID
END
AS LEV2,
CASE
WHEN D.Text IS NULL
THEN NULL
ELSE A.ID
END
AS LEV3
FROM dbo.testOld AS A LEFT OUTER JOIN
dbo.testOld AS B ON A.parentID = B.ID LEFT OUTER JOIN
dbo.testOld AS C ON B.parentID = C.ID LEFT OUTER JOIN
dbo.testOld AS D ON C.parentID = D.ID
Output of this is:
Text Price LEV0 LEV1 LEV2 LEV3
---------- ----------- ----------- ----------- ----------- -----------
Root NULL 1 NULL NULL NULL
Flowers NULL 1 3 NULL NULL
Electro NULL 1 4 NULL NULL
Television 100 1 4 5 NULL
Radio 70 1 4 6 NULL
Rose 10 1 3 7 NULL
Violet 5 1 3 8 NULL
Webradio 90 1 4 5 14
Red Rose 12 1 3 7 15
With this structure I can go ahead and create 4 nested groups on the LEV0-3 columns including subtotals per group (as shown above in my desired result).