capture non occured data from dynamic input data - sql

I have data rules like given below
1|Group1|Mandatory|1st occurrence
2|Group1|Optional|1st occurrence
3|Group1|Mandatory|1st occurrence
1|Group1|Mandatory|2nd occurrence
2|Group1|Optional|2nd occurrence
3|Group1|Mandatory|2nd occurrence
4|Group2|Mandatory|1st occurrence
5|Group2|Mandatory|1st occurrence
6|Group2|Optional|1st occurrence
Here as you can see Group 1 is present two times for data record 1, 2 and 3. It means group 1 can appear min 1 time and max two times. And also can see the occurrence of that specific record under group 1 when it occurs. Mandatory should occur always and optional is may or may not be occur in input data. But all needs to be captured ..what's missing
And here is my input column data. That's a only column am having in input data
1
2
3
1
2
4
5
Is there any way I could get result to identify which data set if missing according to data rules table from input data ? Like in this example, output should like saying Mandatory record(3) is missing from Group 1 in second occurrence. That's only available information would be coming from input data and data rules table.
If any things needs to be added to get desired result...I would like to hear..what it is. All suggestions are welcome.
Thanks

I think You need something like this:
with input as (select column_value id,
count(1) over (partition by column_value order by null
rows between unbounded preceding and current row) cnt
from table(sys.odcinumberlist(1, 2, 3, 1, 2, 4, 5)))
select *
from data
where status = 'Mandatory'
and (id, occurence) not in (select id, cnt from input)
demo
ID GRP STATUS OCCURENCE
---- ---------- ---------- ---------
3 Group1 Mandatory 2
Count how many times id appears in input data and compare result with mandatory occurences in your data.
Edit: explanation
select column_value id,
count(1) over (partition by column_value order by null
rows between unbounded preceding and current row) cnt
from table(sys.odcinumberlist(1, 2, 3, 1, 2, 4, 5))
This part simulates you input data. table(sys.odcinumberlist(1, 2, 3, 1, 2, 4, 5)) is just simulation of inputs, probably these ids are in some table, select them from there. For each provided id I'm counting it's growing number of occurences using function count() in analytic version, so we have this:
id cnt
--- ---
1 1
1 2
2 1
2 2
3 1
4 1
5 1
Next these pairs are compared with mandatory pairs (id, occurence) in your data. If something is missing last select displays this row with a clause not in.
This is how I understood Your question, perhaps You'll need some modifications, but now You have some hints. Hope this helps (and sorry for my bad English ;-) ).

Related

Select query to fetch required data from SQL table

I have some data like this as shown below:
Acc_Id || Row_No
1 1
2 1
2 2
2 3
3 1
3 2
3 3
3 4
and I need a query to get the results as shown below:
Acc_Id || Row_No
1 1
2 3
3 4
Please consider that I'm a beginner in SQL.
I assume you want the Count of the row
SELECT Acc_Id, COUNT(*)
FROM Table
GROUP BY Acc_Id
Try this:
select Acc_Id, MAX(Row_No)
from table
group by Acc_Id
As a beginner then this is your first exposure to aggregation and grouping. You may want to look at the documentation on group by now that this problem has motivated your interest in a solutions. Grouping operates by looking at rows with common column values, that you specify, and collapsing them into a single row which represents the group. In your case values in Acc_Id are the names for your groups.
The other answers are both correct in the the final two columns are going to be equivalent with your data.
select Acc_Id, count(*), max(Row_No)
from T
group by Acc_Id;
If you have gaps in the numbering then they won't be the same. You'll have to decide whether you're actually looking for a count of rows of a maximum of a value within a column. At this point you can also consider a number of other aggregate functions that will be useful to you in the future. (Note that the actual values here are pretty much meaningless in this context.)
select Acc_Id, min(Row_No), sum(Row_No), avg(Row_No)
from T
group by Acc_Id;

Find preceding and following rows for a matching row in BigQuery?

Is it possible to find rows preceding and following a matching rows in a BigQuery query? For example if I do:
select textPayload from logs.logs_20160709 where textPayload like "%something%"
and say that I get these results back:
something A
something B
How can I also show the 3 rows preceding and following the matching rows? Something like this:
some text 1
some text 2
some text 3
something A
some text 4
some text 5
some text 6
some text 90
some text 91
some text 92
something B
some text 93
some text 94
some text 95
Is this possible and if so how?
While on Zuma Beach - I was thinking of avoiding CROSS JOIN in my original answer.
Check below - should be much cheaper especially for big set
SELECT textPayload
FROM (
SELECT textPayload,
SUM(match) OVER(ORDER BY ts ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS flag
FROM (
SELECT textPayload, ts, IF(textPayload CONTAINS 'something', 1, 0) AS match
FROM YourTable
)
)
WHERE flag > 0
Of course another way to avoid cross join is to use BigQuery Standard SQL. But still - above solution with no joins at all is better than my original answer
I think, one piece is missing in your example - extra field that will define the order, so I added ts field for this in my answer. This mean I assume your table has two fields involved : textPayload and ts
Try below. Should give you exactly what you need
SELECT
all.textPayload
FROM (
SELECT start, finish
FROM (
SELECT textPayload,
LAG(ts, 3) OVER(ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS start,
LEAD(ts, 3) OVER(ORDER BY ts ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) AS finish
FROM YourTable
)
WHERE textPayload CONTAINS 'something'
) AS matches
CROSS JOIN YourTable AS all
WHERE all.ts BETWEEN matches.start AND matches.finish
Please note: depends on type of your ts field - you might need to do some data casting in query for this field. hope not

Group rows into sets of 5

TableA
Col1
----------
1
2
3
4....all the way to 27
I want to add a second column that assigns a number to groups of 5.
Results
Col1 Col2
----- ------
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2...and so on
The 6th group should have 2 rows in it.
NTILE doesn't accomplish what I want because of the way NTILE handles the groups if they aren't divisible by the integer.
If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.
This is clearly demonstrated in this SQL Fiddle. Groups 4, 5, 6 each have 4 rows while the rest have 5. I have some started some solutions but they were getting lengthy and I feel like I'm missing something and that this could be done in a single line.
You can use this:
;WITH CTE AS
(
SELECT col1,
RN = ROW_NUMBER() OVER(ORDER BY col1)
FROM TableA
)
SELECT col1, (RN-1)/5+1 col2
FROM CTE;
In your sample data, col1 is a correlative without gaps, so you could use it directly (if it's an INT) without using ROW_NUMBER(). But in the case that it isn't, then this answer works too. Here is the modified sqlfiddle.
A bit of math can go a long way. subtracting 1 from all values puts the 5s (edge cases) into the previous group here, and 6's into the next. flooring the division by your group size and adding one give the result you're looking for. Also, the SQLFiddle example here fixes your iterative insert - the table only went up to 27.
SELECT col1,
floor((col1-1)/5)+1 as grpNum
FROM tableA

Missing gaps in recurring series within a group

We have a table with following data
Id,ItemId,SeqNumber;DateTimeTrx
1,100,254,2011-12-01 09:00:00
2,100,1,2011-12-01 09:10:00
3,200,7,2011-12-02 11:00:00
4,200,5,2011-12-02 10:00:00
5,100,255,2011-12-01 09:05:00
6,200,3,2011-12-02 09:00:00
7,300,0,2011-12-03 10:00:00
8,300,255,2011-12-03 11:00:00
9,300,1,2011-12-03 10:30:00
Id is an identity column.
The sequence for an ItemId starts from 0 and goes till 255 and then resets to 0. All this information is stored in a table called Item. The order of sequence number is determined by the DateTimeTrx but such data can enter any time into the system. The expected output is as shown below-
ItemId,PrevorNext,SeqNumber,DateTimeTrx,MissingNumber
100,Previous,255,2011-12-01 09:05:00,0
100,Next,1,2011-12-01 09:10:00,0
200,Previous,3,2011-12-02 09:00:00,4
200,Next,5,2011-12-02 10:00:00,4
200,Previous,5,2011-12-02 10:00:00,6
200,Next,7,2011-12-02 11:00:00,6
300,Previous,1,2011-12-03 10:30:00,2
300,Next,255,2011-12-03 16:30:00,2
We need to get those rows one before and one after the missing sequence. In the above example for ItemId 300 - the record with sequence 1 has entered first (2011-12-03 10:30:00) and then 255(2011-12-03 16:30:00), hence the missing number here is 2. So 1 is previous and 255 is next and 2 is the first missing number. Coming to ItemId 100, the record with sequence 255 has entered first (2011-12-02 09:05:00) and then 1 (2011-12-02 09:10:00), hence 255 is previous and then 1, hence 0 is the first missing number.
In the above expected result, MissingNumber column is the first occuring missing number just to illustrate the example.
We will not have a case where we would have a complete series reset at one time i.e. it can be either a series rundown from 255 to 0 as in for itemid 100 or 0 to 255 as in ItemId 300. Hence we need to identify sequence missing when in ascending order (0,1,...255) or either in descending order (254,254,0,2) etc.
How can we accomplish this in a t-sql?
Could work like this:
;WITH b AS (
SELECT *
,row_number() OVER (ORDER BY ItemId, DateTimeTrx, SeqNumber) AS rn
FROM tbl
), x AS (
SELECT
b.Id
,b.ItemId AS prev_Itm
,b.SeqNumber AS prev_Seq
,c.ItemId AS next_Itm
,c.SeqNumber AS next_Seq
FROM b
JOIN b c ON c.rn = b.rn + 1 -- next row
WHERE c.ItemId = b.ItemId -- only with same ItemId
AND c.SeqNumber <> (b.SeqNumber + 1)%256 -- Seq cycles modulo 256
)
SELECT Id, prev_Itm, 'Previous' AS PrevNext, prev_Seq
FROM x
UNION ALL
SELECT Id, next_Itm ,'Next', next_Seq
FROM x
ORDER BY Id, PrevNext DESC
Produces exactly the requested result.
See a complete working demo on data.SE.
This solution takes gaps in the Id column into consideration, as there is no mention of a gapless sequence of Ids in the question.
Edit2: Answer to updated question:
I updated the CTE in the query above to match your latest verstion - or so I think.
Use those columns that define the sequence of rows. Add as many columns to your ORDER BY clause as necessary to break ties.
The explanation to your latest update is not entirely clear to me, but I think you only need to squeeze in DateTimeTrx to achieve what you want. I have SeqNumber in the ORDER BY additionally to break ties left by identical DateTimeTrx. I edited the query above.

SQL COUNT of COUNT

I have some data I am querying. The table is composed of two columns - a unique ID, and a value. I would like to count the number of times each unique value appears (which can easily be done with a COUNT and GROUP BY), but I then want to be able to count that. So, I would like to see how many items appear twice, three times, etc.
So for the following data (ID, val)...
1, 2
2, 2
3, 1
4, 2
5, 1
6, 7
7, 1
The intermediate step would be (val, count)...
1, 3
2, 3
7, 1
And I would like to have (count_from_above, new_count)...
3, 2 -- since three appears twice in the previous table
1, 1 -- since one appears once in the previous table
Is there any query which can do that? If it helps, I'm working with Postgres. Thanks!
Try something like this:
select
times,
count(1)
from ( select
id,
count(distinct value) as times
from table
group by id ) a
group by times