SQL (PLSQL) number a set of rows - sql

I attended an interview recently and the interviewer asked me number the occurrences of 'A', 'B', 'C' and so on. To put in table and columns - there is a table tab with column as col. The values in col is 'A', 'B', 'C' etc.
create table tab226 (col varchar2(3) );
insert into tab226 VALUES ('A');
insert into tab226 VALUES ('B');
insert into tab226 VALUES ('C');
insert into tab226 VALUES ('B');
insert into tab226 VALUES ('A');
insert into tab226 VALUES ('C');
insert into tab226 VALUES ('C');
insert into tab226 VALUES ('A');
insert into tab226 VALUES ('B');
The expected output is :
Interviewer told me I can use SQL or PLSQL to achieve it. I thought about it for almost 10 mins but couldn't come up with a plan let alone the solution. Does anyone know if this can be achieved in Oracle SQL or PLSQL?

Doesn't make much sense to me, but - would this do?
SQL> select col,
2 count(*) over (partition by col order by rowid) exp_output
3 from tab226
4 order by rowid;
COL EXP_OUTPUT
--- ----------
A 1
B 1
C 1
B 2
A 2
C 2
C 3
A 3
B 3
9 rows selected.
SQL>

As written, you cannot accomplish -- consistently -- what they are asking for. The problem is that SQL tables represent unordered sets. There is no way to run a query on the original data and preserve the ordering.
However, the final column appears to simply be an enumeration, so you can use row_number() for that:
select col, row_number() over (partition by col order by NULL)
from tab226;
But if you have an ordering column -- say id in this example -- then you would do:
select col, row_number() over (partition by col order by NULL)
from tab226;
Here is a db<>fiddle.

Related

Select rows that do not have the opposite value in another row

I am trying to write a query that will only select rows that does not have the opposite value. For example if a column (payments) has 2 negative numbers(-11) and 3 positive numbers(11) both the negative numbers would cancel out and one positive number will remain. I may be explaining this wrong. But any help is appreciated.
table :
CREATE TABLE hamzachecks(
ID VARCHAR(2) NOT NULL
,CHECK VARCHAR(10) NOT NULL
,Payment VARCHAR(8) NOT NULL
);
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('1','9549549544','-112.96');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('2','9549549544','-112.96');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('3','9549549544','112.96');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('4','9549549544','112.96');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('5','9549549544','-165.92');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('6','9549549544','225.92');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('7','9549549544','-299.3');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('8','9549549544','-299.3');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('9','9549549544','-299.3');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('10','9549549544','299.3');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('11','9549549544','299.3');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('12','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('13','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('14','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('15','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('16','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('17','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('18','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('19','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('20','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('21','9549549544','-415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('22','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('23','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('24','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('25','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('26','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('27','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('28','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('29','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('30','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('31','9549549544','415.14');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('32','9549549544','-1024.22');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('33','9549549544','1024.22');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('34','9549549578','-253.77');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('35','9549549578','253.77');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('36','9549549578','-3332.16');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('37','9549549578','-6664.29');
INSERT INTO hamzachecks(ID,CHECK,Payment) VALUES ('38','9549549578','6664.29');
The basic logic is the same as #Marty's, assign a row number to each row in with the same chk/payment combination.
SELECT chk
,id
,Payment
,Row_Number() Over (PARTITION BY chk, Payment ORDER BY Id) AS rn
FROM hamzachecks
order by
chk
-- remove the leading '-'', for numeric data: abs(Payment)
,substring(Payment, charindex('-', Payment)+1, 8000)
,rn
,Payment
It there's a matching row both rows will share the same rn
...
9549549544 3 112.96 1 -- 1st group
9549549544 1 -112.96 1 -- matching value: remove
9549549544 4 112.96 2 -- 2nd group
9549549544 2 -112.96 2 -- matching value: remove
...
9549549544 10 299.3 1 -- 1st group
9549549544 7 -299.3 1 -- matching value: remove
9549549544 11 299.3 2 -- 2nd group
9549549544 8 -299.3 2 -- matching value: remove
9549549544 9 -299.3 3 -- 3rd group, no matching value: keep
...
Now remove the groups with two rows using aggregation:
;WITH cte AS
(
SELECT chk
,id
,Payment
,Row_Number() Over (PARTITION BY chk, Payment ORDER BY Id) AS rn
FROM hamzachecks
)
SELECT
Min(id)
,chk
,max(Payment)
FROM cte
GROUP BY
chk
-- remove the leading '-'', for numeric data: abs(Payment)
,substring(Payment, charindex('-', Payment)+1, 8000)
,rn
HAVING count(*) = 1
See fiddle
I'm going to ignore possible considerations for why you want to do this particular job in this particular way.
So basically: the following SQL will give you the result you are asking for...
;with b (id, payment, rn) as
(select id, payment, row_number() over (partition by Payment order by Id) as rn
from hamzachecks
where payment < 0),
a (id, payment, rn) as
(select id, payment, row_number() over (partition by Payment order by Id) as rn
from hamzachecks
where payment > 0),
matched_ids (id1,id2) as
(select a.id id1, b.id id2
from a
join b on a.payment = -b.payment and a.rn = b.rn)
select *
from hamzachecks
where id not in (select id1 from matched_ids union select id2 from matched_ids)
order by id
This needs a change in your schema: define the Payment column as MONEY (or other number type) instead of VARCHAR(8).
This would be a starting point for further work, as this solution is ignoring the CHECK column.
Explanation: first we split the source table into two groups: a - above zero (payment > 0) and b - below zero (payment < 0). Both of those groups will get another column row_number() ... that get gets each row an "rowID" - based on the position in the particular group (partition by Payment) that basically says: "This is n-th row with the same Payment". The join part then matches all rows that have the same (but opposite) Payment in the other group and also checks the rowID - we now know all rows that have an opposite value. So the main select just ignores those rows and return only those that don't have an opposite value.

Order guarantee for identity assignment in multi-row insert in SQL Server

When using a Table Value Constructor (http://msdn.microsoft.com/en-us/library/dd776382(v=sql.100).aspx) to insert multiple rows, is the order of any identity column populated guaranteed to match the rows in the TVC?
E.g.
CREATE TABLE A (a int identity(1, 1), b int)
INSERT INTO A(b) VALUES (1), (2)
Are the values of a guaranteed by the engine to be assigned in the same order as b, i.e. in this case so they match a=1, b=1 and a=2, b=2.
Piggybacking on my comment above, and knowing that the behavior of an insert / select+order by will guarantee generation of identity order (#4: from this blog)
You can use the table value constructor in the following fashion to accomplish your goal (not sure if this satisfies your other constraints) assuming you wanted your identity generation to be based on category id.
insert into thetable(CategoryId, CategoryName)
select *
from
(values
(101, 'Bikes'),
(103, 'Clothes'),
(102, 'Accessories')
) AS Category(CategoryID, CategoryName)
order by CategoryId
It depends as long as your inserting the records in one shot . For example after inserting if you delete the record where a=2 and then again re insert the value b=2 ,then identity column's value will be the max(a)+1
To demonstrate
DECLARE #Sample TABLE
(a int identity(1, 1), b int)
Insert into #Sample values (1),(2)
a b
1 1
2 2
Delete from #Sample where a=2
Insert into #Sample values (2)
Select * from #Sample
a b
1 1
3 2

Select records with order of IN clause

I have
SELECT * FROM Table1 WHERE Col1 IN(4,2,6)
I want to select and return the records with the specified order which i indicate in the IN clause
(first display record with Col1=4, Col1=2, ...)
I can use
SELECT * FROM Table1 WHERE Col1 = 4
UNION ALL
SELECT * FROM Table1 WHERE Col1 = 6 , .....
but I don't want to use that, cause I want to use it as a stored procedure and not auto generated.
I know it's a bit late but the best way would be
SELECT *
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')
Or
SELECT CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')s_order,
*
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY s_order
You have a couple of options. Simplest may be to put the IN parameters (they are parameters, right) in a separate table in the order you receive them, and ORDER BY that table.
The solution is along this line:
SELECT * FROM Table1
WHERE Col1 IN(4,2,6)
ORDER BY
CASE Col1
WHEN 4 THEN 1
WHEN 2 THEN 2
WHEN 6 THEN 3
END
select top 0 0 'in', 0 'order' into #i
insert into #i values(4,1)
insert into #i values(2,2)
insert into #i values(6,3)
select t.* from Table1 t inner join #i i on t.[in]=t.[col1] order by i.[order]
Replace the IN values with a table, including a column for sort order to used in the query (and be sure to expose the sort order to the calling application):
WITH OtherTable (Col1, sort_seq)
AS
(
SELECT Col1, sort_seq
FROM (
VALUES (4, 1),
(2, 2),
(6, 3)
) AS OtherTable (Col1, sort_seq)
)
SELECT T1.Col1, O1.sort_seq
FROM Table1 AS T1
INNER JOIN OtherTable AS O1
ON T1.Col1 = O1.Col1
ORDER
BY sort_seq;
In your stored proc, rather than a CTE, split the values into table (a scratch base table, temp table, function that returns a table, etc) with the sort column populated as appropriate.
I have found another solution. It's similar to the answer from onedaywhen, but it's a little shorter.
SELECT sort.n, Table1.Col1
FROM (VALUES (4), (2), (6)) AS sort(n)
JOIN Table1
ON Table1.Col1 = sort.n
I am thinking about this problem two different ways because I can't decide if this is a programming problem or a data architecture problem. Check out the code below incorporating "famous" TV animals. Let's say that we are tracking dolphins, horses, bears, dogs and orangutans. We want to return only the horses, bears, and dogs in our query and we want bears to sort ahead of horses to sort ahead of dogs. I have a personal preference to look at this as an architecture problem, but can wrap my head around looking at it as a programming problem. Let me know if you have questions.
CREATE TABLE #AnimalType (
AnimalTypeId INT NOT NULL PRIMARY KEY
, AnimalType VARCHAR(50) NOT NULL
, SortOrder INT NOT NULL)
INSERT INTO #AnimalType VALUES (1,'Dolphin',5)
INSERT INTO #AnimalType VALUES (2,'Horse',2)
INSERT INTO #AnimalType VALUES (3,'Bear',1)
INSERT INTO #AnimalType VALUES (4,'Dog',4)
INSERT INTO #AnimalType VALUES (5,'Orangutan',3)
CREATE TABLE #Actor (
ActorId INT NOT NULL PRIMARY KEY
, ActorName VARCHAR(50) NOT NULL
, AnimalTypeId INT NOT NULL)
INSERT INTO #Actor VALUES (1,'Benji',4)
INSERT INTO #Actor VALUES (2,'Lassie',4)
INSERT INTO #Actor VALUES (3,'Rin Tin Tin',4)
INSERT INTO #Actor VALUES (4,'Gentle Ben',3)
INSERT INTO #Actor VALUES (5,'Trigger',2)
INSERT INTO #Actor VALUES (6,'Flipper',1)
INSERT INTO #Actor VALUES (7,'CJ',5)
INSERT INTO #Actor VALUES (8,'Mr. Ed',2)
INSERT INTO #Actor VALUES (9,'Tiger',4)
/* If you believe this is a programming problem then this code works */
SELECT *
FROM #Actor a
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY case when a.AnimalTypeId = 3 then 1
when a.AnimalTypeId = 2 then 2
when a.AnimalTypeId = 4 then 3 end
/* If you believe that this is a data architecture problem then this code works */
SELECT *
FROM #Actor a
JOIN #AnimalType at ON a.AnimalTypeId = at.AnimalTypeId
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY at.SortOrder
DROP TABLE #Actor
DROP TABLE #AnimalType
ORDER BY CHARINDEX(','+convert(varchar,status)+',' ,
',rejected,active,submitted,approved,')
Just put a comma before and after a string in which you are finding the substring index or you can say that second parameter.
And first parameter of CHARINDEX is also surrounded by , (comma).

SQL query for selecting only first occurrences of rows with same data in the first column

Is there a neat SQL query that would return rows so that only first occurrences of rows, that have same data in the first column, would be returned? That is, if I have rows like
blah something
blah somethingelse
foo blah
bar blah
foo hello
the query should give me the first, third and fourth rows (because first row is the first occurrence of "blah" in the first column", third row is the first occurrence of "foo" in the first column, and fourth row is the first occurrence of "bar" in the first column).
I'm using H2 database engine, if that matters.
Update: sorry about the unclear table definition, here's it better; the "blah", "foo" etc. denote the value of the first column in the row.
blah [rest of columns of first row]
blah [rest of columns of second row]
foo [-""- third row]
bar [-""- fourth row]
foo [-""- fifth row]
If you meant alphabetically on column 2, here is some SQL to get those rows:
create table #tmp (
c1 char(20),
c2 char(20)
)
insert #tmp values ('blah','something')
insert #tmp values ('blah','somethingelse')
insert #tmp values ('foo','ahhhh')
insert #tmp values ('foo','blah')
insert #tmp values ('bar','blah')
insert #tmp values ('foo','hello')
select c1, min(c2) c2 from #tmp
group by c1
Analytic request could do the trick.
Select *
from (
Select rank(c1) over (partition by c1) as myRank, t.*
from myTable t )
where myRank = 1
But this is only a priority 2 for the V1.3.X
http://www.h2database.com/html/roadmap.html?highlight=RANK&search=rank#firstFound
I think this does what you want but I'm not 100% sure. (Based on MS SQL Server too.)
create table #t
(
PKCol int identity(1,1),
Col1 varchar(200)
)
Insert Into #t
Values ('blah something')
Insert Into #t
Values ('blah something else')
Insert Into #t
Values ('foo blah')
Insert Into #t
Values ('bar blah')
Insert Into #t
Values ('foo hello')
Select t.*
From #t t
Join (
Select min(PKCol) as 'IDToSelect'
From #t
Group By Left(Col1, CharIndex(space(1), col1))
)q on t.PKCol = q.IDToSelect
drop table #t
If you are interested in the fastest possible query: It's relatively important to have an index on the first column of the table. That way the query processor can scan the values from that index. Then, the fastest solution is probably to use an 'outer' query to get the distinct c1 values, plus an 'inner' or nested query to get one of the possible values of the second column:
drop table test;
create table test(c1 char(20), c2 char(20));
create index idx_c1 on test(c1);
-- insert some data (H2 specific)
insert into test select 'bl' || (x/1000), x from system_range(1, 100000);
-- the fastest query (64 ms)
select c1, (select i.c2 from test i where i.c1=o.c1 limit 1) from test o group by c1;
-- the shortest query (385 ms)
select c1, min(c2) c2 from test group by c1;

Test the sequentiality of a column with a single SQL query

I have a table that contains sets of sequential datasets, like that:
ID set_ID some_column n
1 'set-1' 'aaaaaaaaaa' 1
2 'set-1' 'bbbbbbbbbb' 2
3 'set-1' 'cccccccccc' 3
4 'set-2' 'dddddddddd' 1
5 'set-2' 'eeeeeeeeee' 2
6 'set-3' 'ffffffffff' 2
7 'set-3' 'gggggggggg' 1
At the end of a transaction that makes several types of modifications to those rows, I would like to ensure that within a single set, all the values of "n" are still sequential (rollback otherwise). They do not need to be in the same order according to the PK, just sequential, like 1-2-3 or 3-1-2, but not like 1-3-4 or 1-2-3-3-4.
Due to the fact that there might be thousands of rows within a single set I would prefer to do it in the db to avoid the overhead of fetching the data just for verification after making some small changes.
Also there is the issue of concurrency. The way locking in InnoDB (repeatable read) works (as I understand) is that if I have an index on "n" then InnoDB also locks the "gaps" between values. If I combine set_ID and n to a single index, would that eliminate the problem of phantom rows appearing?
Looks to me like a common problem. Any brilliant ideas?
Thanks!
Note: using MySQL + InnoDB
Look for sequences where max - min + 1 > count:
IF EXISTS (SELECT set_ID
FROM mytable
GROUP BY set_ID
HAVING MAX(n) - MIN(n) + 1 > COUNT(n)
)
ROLLBACK
If the sequence must start at 1, do this instead:
IF EXISTS (SELECT set_ID
FROM mytable
GROUP BY set_ID
HAVING MIN(n) = 1 AND MAX(n) > COUNT(n)
)
ROLLBACK
You also need to avoid duplicate sequence numbers. But this can be done by creating a unique key on set_ID and n.
Try this:
IF EXISTS (SELECT set_ID
FROM mytable
GROUP BY set_ID
HAVING MIN(n) = 1 AND MAX(n) <> COUNT(DISTINCT n)
)
ROLLBACK
works on SQL Server (I don't have MySql to try it out):
DECLARE #YourTable table (ID int, set_ID char(5), some_column char(10),n int)
INSERT #YourTable VALUES (1, 'set-1' ,'aaaaaaaaaa' ,1)
INSERT #YourTable VALUES (2, 'set-1' ,'bbbbbbbbbb' ,2)
INSERT #YourTable VALUES (3, 'set-1' ,'cccccccccc' ,3)
INSERT #YourTable VALUES (4, 'set-2' ,'dddddddddd' ,1)
INSERT #YourTable VALUES (5, 'set-2' ,'eeeeeeeeee' ,2)
INSERT #YourTable VALUES (6, 'set-3' ,'ffffffffff' ,2)
INSERT #YourTable VALUES (7, 'set-3' ,'gggggggggg' ,1)
INSERT #YourTable VALUES (8, 'set-3' ,'ffffffffff' ,4)
INSERT #YourTable VALUES (9, 'set-3' ,'ffffffffff' ,4)
--this will list all "bad" sets
SELECT set_ID
FROM #YourTable
GROUP BY set_ID
HAVING MIN(n) = 1 AND MAX(n) <> COUNT(DISTINCT n)
OUTPUT:
set_ID
------
set-3