ORACLE: How to check for and remove repeating column values - sql

For the "B numbers" - (the number to which a call was made), please limit each occurrence to 3. That is, from the list of "A numbers"-(the number that made the call to the "B number"), we may have multiple persons calling the same "B number". In instances where the "B number" appears more than 3 times in the total dial list, please remove them from the subsequent "A numbers" that they may show up for.
I want to figure out how can i check for and remove these repeating "B numbers" when they are greater than 3 occurrences.
Here is a sample list the table structure.
So where the B number occurs more than three time i want to keep the A number but remove the B number. Any Thought?

Limiting your results to 3 B Numbers at most is easy using the row_number() analytic function.
select a_number, b_number
from (select a_number, b_number,
row_number() over (partition by b_number order by null) as rn
from your_table)
where rn <= 3
However, the above query is not explicit about which 3 rows it will preserve (order by null).
If you want to keep the first 3 occurrences of a B Number in your list, then you need a way to explicitly define the order of your list. Do you have some timestamp field perhaps?
In any case, whatever field(s) define(s) the order of your list, use that in the order by clause of the row_number() function call:
row_number() over (partition by b_number order by pick_an_ordering_column)

Related

Calculating the mode/median/most frequent observation in categorical variables in SQL impala

I would like to calculate the mode/median or better, most frequent observation of a categorical variable within my query.
E.g, if the variable has the following string values:
dog, dog, dog, cat, cat and I want to get dog since its 3 vs 2.
Is there any function that does that? I tried APPX_MEDIAN() but it only returns the first 10 characters as median and I do not want that.
Also, I would like to get the most frequent observation with respect to date if there is a tie-break.
Thank you!
the most frequent observation is mode and you can calculate it like this.
Single value mode can be calculated like this on a value column. Get the count and pick up row with max count.
select count(*),value from mytable group by value order by 1 desc limit 1
now, in case you have multiple modes, you need to join back to the main table to find all matches.
select orig.value from
(select count(*) c, value v from mytable) orig
join (select count(*) cmode from mytable group by value order by 1 desc limit 1) cmode
ON orig.c= cmode.cmode
This will get all count of values and then match them based on count. Now, if one value of count matches to max count, you will get 1 row, if you have two value counts matches to max count, you will get 2 rows and so on.
Calculation of median is little tricky - and it will give you middle value. And its not most frequent one.

SQL: Apply sequence number to a column based on nth occurrence of each distinct value

I have a table with a column of values where each value occurs a variable number of times (i.e., one value may occur 1 time, and another value may occur 3 times). I need to add a column that identifies the occurrence sequence # of its corresponding value.
Input Table
SOURCE_VAL
a
a
b
c
c
c
Output table
SEQUENCE_VAL
SOURCE_VAL
1
a
2
a
1
b
1
c
2
c
3
c
What would the SQL for this be to generate the SEQUENCE_VAL column based on SOURCE_VAL?
You are looking for row_number(). Without an ordering column, you can use:
select t.*,
row_number() over (partition by source_val order by source_val) as sequence_val
from t
order by source_val, sequence_val;
Note: This assumes that you do not care about the ordering of the value. If you have another column that does specify the ordering for each source_val, then use that in the order by.

While loop until find a word

I have only 2 columns in sql server. The first column (name) starts with the name "abc" in the first line and it ends in the 8th line (Endabc).
I need to have second red color column:
(while the first column starts with "abc" until the word "Endab"c, update second column and put 'abc' in all those lines).
How can i do this?
Thanks.
Your question pre-supposes an ordering column. SQL tables represent unordered sets. So there is no ordering unless that information is in a column.
You can use a conditional sum to identify the groups and then spread the values. Assuming that the ends are not really needed (because a new value starts right away):
select t.*,
max(case when name not like 'End%' then name end) over (partition by grp) as imputed_name
from (select t.*,
count(case when name not like 'End%' then name end) over (order by <ordering col>) as grp
from t
) t
Here is a db<>fiddle.

Suppress repeating values on subsequent rows

item_no parent item_no_child item_name text
123 3 xxx the item is resistant to water
123 5 yyy The item is resistant to heat
123 6 zzz The item is ....
I will be giving the parent item_no as input and retrieve child item no's. Now I have to check each child item's text and if they have same text I should not display the item_name else I should.
The row_number() analytic function is a neat way of implementing such distinct queries:
SELECT item_name
FROM (SELECT item_name,
ROW_NUMBER() OVER (PARTITION BY text ORDER BY 1) AS rn
FROM items
WHERE item_no parent = 123)
WHERE rn = 1
EDIT:
Some explanation, as requested in the comments - row_number is an analytic function (sometimes also referred to as a windowing function). It returns one result per row of input (like a row function), but takes into account all the other rows too (like an aggregate function). In this case, row_number simply returns the number of current row (i.e., a simple counter). This counting is done per different value of text (the partition by clause). row_number requires an order by clause so it knows in which order to count these rows. Since here we don't care about which row (per different value of text) comes first, I simply order by a constant 1.

Manually specify starting value for Row_Number()

I want to define the start of ROW_NUMBER() as 3258170 instead of 1.
I am using the following SQL query
SELECT ROW_NUMBER() over(order by (select 3258170)) as 'idd'.
However, the above query is not working. When I say not working I mean its executing but its not starting from 3258170. Can somebody help me?
The reason I want to specify the row number is I am inserting Rows from one table to another. In the first Table the last record's row number is 3258169 and when I insert new records I want them to have the row number from 3258170.
Just add the value to the result of row_number():
select 3258170 - 1 + row_number() over (order by (select NULL)) as idd
The order by clause of row_number() is specifying what column is used for the order by. By specifying a constant there, you are simply saying "everything has the same value for ordering purposes". It has nothing, nothing at all to do with the first value chosen.
To avoid confusion, I replaced the constant value with NULL. In SQL Server, I have observed that this assigns a sequential number without actually sorting the rows -- an observed performance advantage, but not one that I've seen documented, so we can't depend on it.
I feel this is easier
ROW_NUMBER() OVER(ORDER BY Field) - 1 AS FieldAlias (To start from 0)
ROW_NUMBER() OVER(ORDER BY Field) + 3258169 AS FieldAlias (To start from 3258170)
Sometimes....
The ROW_NUMBER() may not be the best solution especially when there could be duplicate records in the underlying data set (for JOIN queries etc.). This may result in more rows returned than expected. You may consider creating a SEQUENCE which can be in some cases considered a cleaner solution.
i.e.:
CREATE SEQUENCE myRowNumberId
START WITH 1
INCREMENT BY 1
GO
SELECT NEXT VALUE FOR myRowNumberId AS 'idd' -- your query
GO
DROP SEQUENCE myRowNumberId; -- just to clean-up after ourselves
GO
The downside is that sequences may be difficult to use in complex queries with DISTINCT, WINDOW functions etc. See the complete sequence documentation here.
I had a situation where I was importing a hierarchical structure into an application where a seq number had to be unique within each hierarchical level and start at 110 (for ease of subsequent manual insertion). The data beforehand looked like this...
Level Prod Type Component Quantity Seq
1 P00210005 R NZ1500 57.90000000 120
1 P00210005 C P00210005M 1.00000000 120
2 P00210005M R M/C Operation 20.00000000 110
2 P00210005M C P00210006 1.00000000 110
2 P00210005M C P00210007 1.00000000 110
I wanted the row_number() function to generate the new sequence numbers but adding 10 and then multiplying by 10 wasn't achievable as expected. To force the sequence of arithmetic functions you have to enclose the entire row_number(), and partition clause in brackets. You can only perform simple addition and substraction on the row_number() as such.
So, my solution for this problem was
,10*(10+row_number() over (partition by Level order by Type desc, [Seq] asc)) [NewSeq]
Note the position of the brackets to allow the multiplication to occur after the addition.
Level Prod Type Component Quantity [Seq] [NewSeq]
1 P00210005 R NZ1500 57.90000000 120 110
1 P00210005 C P00210005M 1.00000000 120 120
2 P00210005M R M/C Operation 20.00000000 110 110
2 P00210005M C P00210006 1.00000000 110 120
2 P00210005M C P00210007 1.00000000 110 130
ROW_NUMBER() OVER(ORDER BY Field) - 1 AS FieldAlias (To start from 0)
ROW_NUMBER() OVER(ORDER BY Field) - 2862718 AS FieldAlias (To start from 2862718)
The order by clause of row_number() is specifying what column is used for the order by. By specifying a constant there, you are simply saying "everything has the same value for ordering purposes". It has nothing, nothing at all to do with the first value chosen.