Repeat a string based on a column - BigQuery/SQL Standard - sql

It should be very easy, but I am stucked in this.
It is simple as the image, but I have a column named "category" and once the row has a category, I want this value repeated for 'n' times, let's say 10 times (or whatever I want), in a new column.
I've tried to use FIRST_VALUE(), but there is no pattern about when this category will appear, so most of times I have 'null' as a repetition.
I've seen ROW_NUMBER() OVER(PRECEDING AND FOLLOWING) but I can't use a string in this, just an aggregation and I don't want to calculate, I wanna classify. I even tried using CASE WHEN xxx category * 10 etc, or category + 10 but of course doesn't work.
Any suggestion? Thanks!
What I've tried:
WITH table1 AS(
SELECT
date,
hour,
minute,
category,
ROW_NUMBER() OVER() AS rn
FROM table1),
table2 AS(
SELECT
*,
CASE
WHEN category IS NOT NULL THEN 1
ELSE 0
END AS flag_category
FROM table1)
SELECT
*,
CASE
WHEN flag_category = 1
THEN (SELECT
a.category,
FROM table2 AS a
INNER JOIN table2 AS b
ON a.rn = b.rn + 10)
ELSE '-'
END AS category_repetition
FROM table2
image explication here
W H A T I H A V E WHAT I WANT
date hour minute qty category category_repetition
20210412 0 0 2 null null
20210412 0 1 0 null null
20210412 0 2 6 null null
20210412 0 3 7 null null
20210412 0 4 7 null null
20210412 0 5 6 null null
20210412 0 6 3 null null
20210412 0 7 8 null null
20210412 0 8 4 null null
20210412 0 9 3 category A category A
20210412 0 10 4 null category A
20210412 0 11 0 null category A
20210412 0 12 5 null category A
20210412 0 13 2 null category A
20210412 0 14 3 null category A
20210412 0 15 3 null category A
20210412 0 16 4 null category A
20210412 0 17 3 null category A
20210412 0 18 5 null category A
20210412 0 19 4 null category A

You seem to want last_value(ignore nulls):
select t.*,
last_value(category ignore nulls) over (order by date, hour, minute) as category_repetition
from t;
I'm not sure what the "10" means in the question. This should produce the data that you want, based on the sample data and results.

Related

row_number() but only increment value after a specific value in a column

Query: SELECT (row_number() OVER ()) as grp, * from tbl
Edit: the rows below are returned by a pgrouting shortest path function and it does have a sequence.
seq grp id
1 1 8
2 2 3
3 3 2
4 4 null
5 5 324
6 6 82
7 7 89
8 8 null
9 9 1
10 10 2
11 11 90
12 12 null
How do I make it so that the grp column is only incremented after a null value on id - and also keep the same order of rows
seq grp id
1 1 8
2 1 3
3 1 2
4 1 null
5 2 324
6 2 82
7 2 89
8 2 null
9 3 1
10 3 2
11 3 90
12 3 null
demo:db<>fiddle
Using a cumulative SUM aggregation is a possible approach:
SELECT
SUM( -- 2
CASE WHEN id IS NULL THEN 1 ELSE 0 END -- 1
) OVER (ORDER BY seq) as grp,
id
FROM mytable
If the current (ordered!) value is NULL, then make it 1, else 0. Now you got a bunch of zeros, delimited by a 1 at each NULL record. If you'd summerize these values cumulatively, at each NULL record, the sum increased.
Execution of the cumulative SUM() using window functions
This yields:
0 8
0 3
0 2
1 null
1 324
1 82
1 89
2 null
2 1
2 2
2 90
3 null
As you can see, the groups start with the NULL records, but you are expecting to end it.
This can be achieved by adding another window function: LAG(), which moves the records to the next row:
SELECT
SUM(
CASE WHEN next_id IS NULL THEN 1 ELSE 0 END
) OVER (ORDER BY seq) as grp,
id
FROM (
SELECT
LAG(id) OVER (ORDER BY seq) as next_id,
seq,
id
FROM mytable
) s
The result is your expected one:
1 8
1 3
1 2
1 null
2 324
2 82
2 89
2 null
3 1
3 2
3 90
3 null

How do I find corresponding row data based on max column values?

I want to take the max value of each partitioned block and find the correlating id(in the same row). I then want to use the singular show_id as the 'winner' and bool_flag all rows in the same partition with a matching show_id.
I am having trouble implementing this, especially the window function-- I have hit multiple issues saying that the subquery is not supported, or "must appear in the GROUP BY clause or be used in an aggregate function sql"
subQ1 as (
select subQ0.*,
case
**when show_id =
(select id from (select show_id, max(rn_max_0)
over (partition by tv_id, show_id)))**
then 1
else 0
end as winner_flag
from subQ0
)
What I have:
tv_id show_id partition_count
1 42 1
1 42 2
1 42 3
1 7 1
2 12 1
2 12 2
2 12 3
2 27 1
What I want:
tv_id show_id partition_count flag
1 42 1 1
1 42 2 1
1 42 3 1
1 7 1 0
2 12 1 1
2 12 2 1
2 12 3 1
2 27 1 0
Because tv_id 1 has the most connections to show_id 42, those rows get flagged.
Ideally, something similar to SQL select only rows with max value on a column, but the partitions and grouping have led to issues. This dataset also has billions of rows so a union would be a nightmare.
Thanks in advance!
For each tv_id, you seem to want the show_id that appears the most. If so:
select s.*,
(case when cnt = max(cnt) over (partition by tv_id)
then 1 else 0
end) as flag
from (select s.*, count(*) over (partition by tv_id, show_id) as cnt
from subQ0 s
) s;

Sequencing and re-setting in SQL Server 2008

I am actually new to SQL server 2008, and I am trying to sequence and re-set a number in a table. The source is something like:
Row Refrec FLAG
1 5 NULL
2 4 X
3 3 NULL
4 2 NULL
5 1 Y
6 5 A
7 4 B
8 3 NULL
9 2 NULL
10 1 NULL
The result should look like:
Row Refrec FLAG SEQUENCE
1 5 NULL NULL
2 4 X 0
3 3 NULL 1
4 2 NULL 2
5 1 Y 0
6 5 A 0
7 4 B 0
8 3 NULL 1
9 2 NULL 2
10 1 NULL 3
Thanks!
It looks like you want to enumerate the sequence values for NULL values, setting all the other values to 0. I'm not sure why the first value is NULL, but that is easily fixed.
The following may do what you want:
select t.*,
(case when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);
If you really care about the first value:
select t.*,
(case when row = 1 then NULL
when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);

SQL Server Query to find CHI-SQUARE Values (Not Working)

I am trying to find the Chi-Square test from my following SQL Server Query on the sample data:
SELECT sessionnumber, sessioncount, timespent, expected, dev, dev*dev/expected as chi_square
FROM (SELECT clusters.sessionnumber, clusters.sessioncount, clusters.timespent,
(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected,
clusters.cnt-(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as dev
FROM clusters JOIN
(SELECT sessionnumber, SUM(cnt) as cnt FROM clusters
GROUP BY sessionnumber) dim1 ON clusters.sessionnumber = dim1.sessionnumber JOIN
(SELECT sessioncount, SUM(cnt) as cnt FROM clusters
GROUP BY sessioncount) dim2 ON clusters.sessioncount = dim2.sessioncount JOIN
(SELECT timespent, SUM(cnt) as cnt FROM clusters
GROUP BY timespent) dim3 ON clusters.timespent = dim3.timespent CROSS JOIN
(SELECT SUM(cnt) as cnt FROM clusters) dimall) a
My table has this sort of sample data:
sessionnumber sessioncount timespent cnt
1 17 28 NULL
2 22 8 NULL
3 1 1 NULL
4 1 1 NULL
5 8 111 NULL
6 8 65 NULL
7 11 5 NULL
8 1 1 NULL
9 62 64 NULL
10 6 42 NULL
The problem is that this query works fine but it gives wrong output or you can say no output at all. The output it gives my is like:
sessionnumber sessioncount timespent expected dev chi_square
1 17 28 NULL NULL NUL
2 22 8 NULL NULL NULL
3 1 1 NULL NULL NULL
4 1 1 NULL NULL NULL
5 8 111 NULL NULL NULL
6 8 65 NULL NULL NULL
7 11 5 NULL NULL NULL
8 1 1 NULL NULL NULL
9 62 64 NULL NULL NULL
10 6 42 NULL NULL NULL
How can I get rid of this problem because I tried my best at all! Thanks in advance telling me what I' doing wrong!
In your sample data, cnt is NULL, so the results are also NULL. You can replace these NULL values with a default value (1 for example, I don't know what is the context) using ISNULL, like
SELECT sessionnumber, SUM(ISNULL(cnt, 1)) as cnt FROM clusters GROUP BY sessionnumber

SQL Query to filter record for particular record count

I have a table which have Identity, RecordId, Type, Reading And IsDeleted columns. Identity is primary key that is auto increment, RecordId is integer that can have duplicate values, Type is a type of reading that can be either 'one' or 'average', Reading is integer that contains any integer value, and IsDeleted is bit that can be 0 or 1 i.e. false or true.
Now, I want the query that contains all the records of table in such a manner that if COUNT(Id) for each RecordId is greater than 2 then display all the records of that RecordId.
If COUNT(Id) == 2 for that specific RecordId and Reading value of both i.e. 'one' or 'average' type of the records are same then display only average record.
If COUNT(Id) ==1 then display only that record.
For example :
Id RecordId Type Reading IsDeleted
1 1 one 4 0
2 1 one 5 0
3 1 one 6 0
4 1 average 5 0
5 2 one 1 0
6 2 one 3 0
7 2 average 2 0
8 3 one 2 0
9 3 average 2 0
10 4 one 5 0
11 4 average 6 0
12 5 one 7 0
Ans result can be
Id RecordId Type Reading IsDeleted
1 1 one 4 0
2 1 one 5 0
3 1 one 6 0
4 1 average 5 0
5 2 one 1 0
6 2 one 3 0
7 2 average 2 0
9 3 average 2 0
10 4 one 5 0
11 4 average 6 0
12 5 one 7 0
In short I want to skip the 'one' type reading which have an average reading with same value and its count for 'one' type reading not more than one.
Check out this program
DECLARE #t TABLE(ID INT IDENTITY,RecordId INT,[Type] VARCHAR(10),Reading INT,IsDeleted BIT)
INSERT INTO #t VALUES
(1,'one',4,0),(1,'one',5,0),(1,'one',6,0),(1,'average',5,0),(2,'one',1,0),(2,'one',3,0),
(2,'average',2,0),(3,'one',2,0),(3,'average',2,0),(4,'one',5,0),(4,'average',6,0),(5,'one',7,0),
(6,'average',6,0),(6,'average',6,0),(7,'one',6,0),(7,'one',6,0)
--SELECT * FROM #t
;WITH GetAllRecordsCount AS
(
SELECT *,Cnt = COUNT(RecordId) OVER(PARTITION BY RecordId ORDER BY RecordId)
FROM #t
)
-- Condition 1 : When COUNT(RecordId) for each RecordId is greater than 2
-- then display all the records of that RecordId.
, GetRecordsWithCountMoreThan2 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt > 2
)
-- Get all records where count = 2
, GetRecordsWithCountEquals2 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt = 2
)
-- Condition 3 : When COUNT(RecordId) == 1 then display only that record.
, GetRecordsWithCountEquals1 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt = 1
)
-- Condition 1: When COUNT(RecordId) > 2
SELECT * FROM GetRecordsWithCountMoreThan2 UNION ALL
-- Condition 2 : When COUNT(RecordId) == 2 for that specific RecordId and Reading value of
-- both i.e. 'one' or 'average' type of the records are same then display only
-- average record.
SELECT t1.* FROM GetRecordsWithCountEquals2 t1
JOIN (Select RecordId From GetRecordsWithCountEquals2 Where [Type] = ('one') )X
ON t1.RecordId = X.RecordId
AND t1.Type = 'average' UNION ALL
-- Condition 2: When COUNT(RecordId) = 1
SELECT * FROM GetRecordsWithCountEquals1
Result
ID RecordId Type Reading IsDeleted Cnt
1 1 one 4 0 4
2 1 one 5 0 4
3 1 one 6 0 4
4 1 average5 0 4
5 2 one 1 0 3
6 2 one 3 0 3
7 2 average2 0 3
9 3 average2 0 2
11 4 average6 0 2
12 5 one 7 0 1
;with a as
(
select Id,RecordId,Type,Reading,IsDeleted, count(*) over (partition by RecordId, Reading) cnt,
row_number() over (partition by RecordId, Reading order by Type, RecordId) rn
from table
)
select Id,RecordId,Type,Reading,IsDeleted
from a where cnt <> 2 or rn = 1
Assuming your table is named the_table, let's do this:
select main.*
from the_table as main
inner join (
select recordId, count(Id) as num, count(distinct Reading) as reading_num
from the_table
group by recordId
) as counter on counter.recordId=main.recordId
where num=1 or num>2 or reading_num=2 or main.type='average';
Untested, but it should be some variant of that.
EDIT TEST HERE ON FIDDLE
The short summary is that we want to join the table with an aggregated version of o=itself, then filter it based in the count criteria you mentioned (num=1, then show it; num=2, show just average record if reading numbers are the same otherwise show both; num>2, show all records).