How to split and display distinct letters from a word in SQL? - sql

Yesterday in a job interview session I was asked this question and I had no clue about it. Suppose I have a word "Manhattan " I want to display only the letters 'M','A','N','H','T'
in SQL. How to do it?
Any help is appreciated.

Well, here is my solution (sqlfiddle) - it aims to use a "Relational SQL" operations, which may have been what the interviewer was going for conceptually.
Most of the work done is simply to turn the string into a set of (pos, letter) records as the relevant final applied DQL is a mere SELECT with a grouping and ordering applied.
select letter
from (
-- All of this just to get a set of (pos, letter)
select ns.n as pos, substring(ss.s, ns.n, 1) as letter
from (select 'MANHATTAN' as s) as ss
cross join (
-- Or use another form to create a "numbers table"
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9)) as X(n)
) as ns
) as pairs
group by letter -- guarantees distinctness
order by min(pos) -- ensure output is ordered MANHT
The above query works in SQL Server 2008, but the "Numbers Table" may have to be altered for other vendors. Otherwise, there is nothing used that is vendor specific - no CTE, or cross application of a function, or procedural language code ..
That being said, the above is to show a conceptual approach - SQL is designed for use with sets and relations and multiplicity across records; the above example is, in some sense, merely a perversion of such.
Examining the intermediate relation,
select ns.n as pos, substring(ss.s, ns.n, 1) as letter
from (select 'MANHATTAN' as s) as ss
cross join (
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9)) as X(n)
) as ns
uses a cross join to generate the Cartesian product of the string (1 row) with the numbers (9 rows); the substring function is then applied with the string and each number to obtain each character in accordance with its position. The resulting set contains the records-
POS LETTER
1 M
2 A
3 N
..
9 N
Then the outer select groups each record according to the letter and the resulting records are ordered by the minimum (first) occurrence position of the letter that establishing the grouping. (Without the order by the letters would have been distinct but the final order would not be guaranteed.)

One way (if using SQL Server) is with a recursive CTE (Commom Table Expression).
DECLARE #source nvarchar(100) = 'MANHATTAN'
;
WITH cte AS (
SELECT SUBSTRING(#source, 1, 1) AS c1, 1 as Pos
WHERE LEN(#source) > 0
UNION ALL
SELECT SUBSTRING(#source, Pos + 1, 1) AS c1, Pos + 1 as Pos
FROM cte
WHERE Pos < LEN(#source)
)
SELECT DISTINCT c1 from cte
SqlFiddle for this is here. I had to inline the #source for SqlFiddle, but the code above works fine in Sql Server.
The first SELECT generates the initial row(in this case 'M', 1). The second SELECT is the recursive part that generates the subsequent rows, with the Pos column getting incremented each time until the termination condition WHERE Pos < LEN(#source) is finally met. The final select removes the duplicates. Internally, SELECT DISTINCT sorts the rows in order to facilitate the removal of duplicates, which is why the final output happens to be in alphabetic order. Since you didn't specify order as a requirement, I left it as-is. But you could modify it to use a GROUP instead, that ordered on MIN(Pos) if you needed the output in the characters' original order.
This same technique can be used for things like generating all the Bigrams for a string, with just a small change to the general structure above.

declare #charr varchar(99)
declare #lp int
set #charr='Manhattan'
set #lp=1
DECLARE #T1 TABLE (
FLD VARCHAR(max)
)
while(#lp<=LEN(#charr))
begin
if(not exists(select * from #T1 where FLD=(select SUBSTRING(#charr,#lp,1))))
begin
insert into #T1
select SUBSTRING(#charr,#lp,1)
end
set #lp=#lp+1
end
select * from #T1
check this it may help u

Here's an Oracle version of #user2864740's answer. The only difference is how you construct the "numbers table" (plus slight differences in aliasing)
select letter
from (
select ns.n as pos, substr(ss.s, ns.n, 1) as letter
from (select 'MANHATTAN' as s from dual) ss
cross join (
SELECT LEVEL as n
FROM DUAL
CONNECT BY LEVEL <= 9
ORDER BY LEVEL) ns
) pairs
group by letter
order by min(pos)

Related

Count Similar Substrings SQL query

I've tried a few scenarios and googled a lot, but still can't find a solution.
I have a table of user names with entries something like the below:
UserName
Cakes420
18Jack01
18Jack04
16Jack22
22Jack16
Mapple7609
Chrom44
chrom22
chrom77
013Cake
016Cake
122Cake
123Cake87
So I need a query that checks for all records that share 4 or more (in sequence) characters in the table.
So I need to return something like :
Characters
Times Used
Names Sharing
Cake
5
Cakes420, 013Cake, 016Cake, 122Cake, 123Cake87
Chro
3
Chrom44, chrom22, chrom77
or anything similar as I'd prefer not to repeat patterns, but hey, at this stage if it returns the values properly, I don't mind.
The shared characters can naturally appear in any place in the string, which is what makes this so difficult.
Should you do this in T-SQL? Probably not.
Can you do this in T-SQL? Yes.
Sample data
create table Names
(
Name nvarchar(20)
);
insert into Names (Name) values
('Cakes420'),
('18Jack01'),
('18Jack04'),
('16Jack22'),
('22Jack16'),
('Mapple7609'),
('Chrom44'),
('chrom22'),
('chrom77'),
('013Cake'),
('016Cake'),
('122Cake'),
('123Cake87');
Solution
Using STRING_AGG() for easy concatenation. Available from SQL Server 2017. Alternatives available for older SQL versions (use the search box on this site, there are many examples).
with rcte as
(
select n.Name,
convert(nvarchar(4), substring(n.Name, 1, 4)) as Part,
1 as PartFrom
from Names n
where len(n.Name) >= 4
union all
select r.Name,
convert(nvarchar(4), substring(r.Name, r.PartFrom+1, r.PartFrom+4)),
r.PartFrom+1
from rcte r
where len(r.Name) >= r.PartFrom+4
),
cte_count as
(
select r.Part,
count(1) as PartCount
from rcte r
where r.Part not like '%[0-9]%' -- exclude parts with numbers in them
group by r.Part
having count(1) > 1
)
select c.Part,
c.PartCount,
string_agg(r.Name, ', ') as Names
from cte_count c
join rcte r
on r.Part = c.Part
group by c.Part,
c.PartCount
order by c.Part;
Result
Part PartCount Names
---- --------- ----------------------------------------------
Cake 5 Cakes420, 123Cake87, 122Cake, 016Cake, 013Cake
Chro 3 Chrom44, chrom22, chrom77
hrom 3 chrom77, chrom22, Chrom44
Jack 4 22Jack16, 16Jack22, 18Jack04, 18Jack01
Fiddle to see it in action with the intermediate CTE results.
Let's use Itzik Ben-Gan's Tally Function to break out a list of substrings, then group them. This is called N-Gram, after the more common Trigram which is 3-character substrings.
I've removed one extra cross-join from the function to speed it up slightly, it's now good for up to varchar(65536):
CREATE OR ALTER FUNCTION dbo.GetNums(#num AS BIGINT)
RETURNS TABLE
AS
RETURN
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L2 )
SELECT TOP(#num)
rownum AS rn
FROM Nums
ORDER BY rownum;
GO
DECLARE #substringLen int = 4;
SELECT
Characters,
[Times Used] = COUNT(*),
[Names Sharing] = STRING_AGG(Username, ', ')
FROM (
SELECT DISTINCT
-- remove DISTINCT if you want to know about multiple in a single username
t.Username,
Characters = SUBSTRING(t.Username, n.rn, #substringLen)
FROM myTable t
CROSS APPLY dbo.GetNums (LEN(t.UserName) - #substringLen + 1) n
) t
GROUP BY t.Characters
HAVING COUNT(*) > 1

SQL Server 2008: duplicate a row n-times, where n is a value in a field

In SQL Server 2018 I have three tables:
T1 (idService, dateStart, dateStop)
T2 (idService, totalCostOfService)
T3 (idService, companyName)
Using joins, I created a view:
V1 (idService, dateStart, dateStop, totalCostOfService, companyName)
And we are fine. I can do my selects on the view and obtain the list of services done.
What I would like to do now is to duplicate every row of the view n times, where n=dateStart-dateStop; every row should have a "new" totalCostOfService = totalCostOfService/n.
I can do that using a temporary table, declaring variables, insert in temp using some while etc. etc. Let's call it "the procedure"
But what I would like to understand is:
is it possibile to do that directly with a select on V1? If not, is it possible to save "the procedure" as a view so that I can have it as a easy select?
Sorry if my question looks somewhat stupid, but I'm totally new with SQL. I tried searching here and on google but I couldn't find what an answer to my questions.
Thank you!
Rather than an rCTE (which is RBAR), you could use a Tally Table:
WITH N AS (
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4) --10000
SELECT *
FROM YourTable
JOIN Tally T ON T.I <= dateStart-dateStop --Assumes dateStart and DateStop are integer values, even though their name implies otherwise
--If they are dates, then use DATEDIFF(DAY, dateStart, dateEnd)
That tally will generate numbers up to 10000 (which over 27 years worth of days. That should be far more than enough).
I will assume the existence of a numbers table which has the column val for the individual value numbers. If you don't, you will find plenty by searching around.
Add this in the end of the FROM clause of your view:
cross apply (select datediff(day,T1.dateStart,T1.dateStop)+1 as n_days)q1 -- number of days INCLUDING start
cross apply (select dateadd(day,T1.dateStart,n.val) as day_of_charge)q2 from numbers n where n.val between 0 and n_days-1)
Then you will be able to have the following field on your SELECT:
T2.totalCostOfService/n_days as totalCostOfService
I'll add a numbers table solution shortly.
You can use a recursive CTE:
with cte as (
select idService, dateStart, dateStop,
totalCostOfService / (datediff(day, datestop, datestart) + 1) as dailyCostOfService,
companyName
from v1
union all
select idService,
dateadd(day, 1, dateStart),
dateStop,
dailyCostOfService
companyName
from cte
)
select idservice, dateStart as dateOfService,
dailyCostOfService, companyName
from cte;
Note that if there are more than 100 days in any row, then you will need to add OPTION (MAXRECURSION 0).

SQL to find missing numbers in sequence starting from min?

I have found several examples of SQL queries which will find missing numbers in a sequence. For example this one:
Select T1.val+1
from table T1
where not exists(select val from table T2 where T2.val = T1.val + 1);
This will only find gaps in an existing sequence. I would like to find gaps in a sequence starting from a minimum.
For example, if the values in my sequence are 2, 4 then the query above will return 3,5.
I would like to specify that my sequence must start at 0 so I would like the query to return 0,1,3,5.
How can I add the minimum value to my query?
A few answers to questions below:
There is no maximum, only a minimum
The DB is oracle
This is quite easy in Postgres:
select x.i as missing_sequence_value
from (
select i
from generate_series(0,5) i -- 0,5 are the lower and upper bounds you want
) x
left join the_table t on t.val = x.i
where t.val is null;
SQLFiddle: http://www.sqlfiddle.com/#!15/acb07/1
Edit
The Oracle solution is a bit more complex because generating the numbers requires a workaround
with numbers as (
select level - 1 as val
from dual
connect by level <= (select max(val) + 2 from the_table) -- this is the maximum
), number_range as (
select val
from numbers
where val >= 0 -- this is the minimum
)
select nr.val as missing_sequence_value
from number_range nr
left join the_table t on t.val = nr.val
where t.val is null;
SQLFiddle: http://www.sqlfiddle.com/#!4/71584/4
The idea (in both cases) is to generate a list of numbers you are interested in (from 0 to 5) and then doing an outer join against the values in your table. The rows where the outer join does not return something from your table (that's the condition where t.val is null) are the values that are missing.
The Oracle solution requires two common table expressions ("CTE", the "with" things) because you can't add a where level >= x in the first CTE that generates the numbers.
Note that the connect by level <= ... is relies on an undocumented (and unsupported) way of using connect by. But so many people are using that to get a "number generator" that I doubt that Oracle will actually remove this.
If you have to option to use a common table expression you can generate a sequence of numbers and use that as the source of numbers.
The variables #start and #end defines the range of numbers (you could easily use max(val) from yourtable as end instead).
This example is for MS SQL Server (but CTEs is a SQL 99 feature supported by many databases):
declare #start int, #end int
select #start=0, #end=5
;With sequence(num) as
(
select #start as num
union all
select num + 1
from sequence
where num < #end
)
select * from sequence seq
where not exists(select val from YourTable where YourTable.val = seq.num)
Option (MaxRecursion 1000)
Sample SQL Fiddle
Just presenting 2 small variations to suggestions above, both are for Oracle.
First is a small variation to one presented by jpw using a recursive Common Table Expression (CTE); simply to demonstrate that this technique is also available in Oracle.
WITH
seq (val)
AS (
SELECT 0 FROM dual
UNION ALL
SELECT val + 1
FROM seq
WHERE val < (
SELECT MAX(val) FROM the_table -- note below
)
)
SELECT
seq.val AS missing_sequence_value
FROM seq
LEFT JOIN the_table t
ON seq.val = t.val
WHERE t.val IS NULL
ORDER BY
missing_sequence_value
;
This variation at SQLfiddle
Notable difference to SQL Server: you can use a subquery to limit the recursion
Also, Oracle documentation often refers to Subquery Factoring e.g. subquery_factoring_clause::= instead of CTE
Second is a variation on the use of connect by level as used by a_horse_with_no_name
Level is a pseudo column available in Oracle's hierarchical queries and the root of a hierarchy is 1. When using connect by level, by default this will commence at 1
For this variation I just wished to demonstrate that it does not need to be coupled with CTE's at all, and hence the syntax can be quite concise.
SELECT
seq.val AS missing_sequence_value
FROM (
SELECT
level - 1 AS val
FROM dual
CONNECT BY LEVEL <= (SELECT max(val) FROM the_table)
) seq
LEFT JOIN the_table t
ON seq.val = t.val
WHERE t.val IS NULL
ORDER BY
missing_sequence_value
;
This variation at SQLfiddle

Returning the lowest integer not in a list in SQL

Supposed you have a table T(A) with only positive integers allowed, like:
1,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18
In the above example, the result is 10. We always can use ORDER BY and DISTINCT to sort and remove duplicates. However, to find the lowest integer not in the list, I came up with the following SQL query:
select list.x + 1
from (select x from (select distinct a as x from T order by a)) as list, T
where list.x + 1 not in T limit 1;
My idea is start a counter and 1, check if that counter is in list: if it is, return it, otherwise increment and look again. However, I have to start that counter as 1, and then increment. That query works most of the cases, by there are some corner cases like in 1. How can I accomplish that in SQL or should I go about a completely different direction to solve this problem?
Because SQL works on sets, the intermediate SELECT DISTINCT a AS x FROM t ORDER BY a is redundant.
The basic technique of looking for a gap in a column of integers is to find where the current entry plus 1 does not exist. This requires a self-join of some sort.
Your query is not far off, but I think it can be simplified to:
SELECT MIN(a) + 1
FROM t
WHERE a + 1 NOT IN (SELECT a FROM t)
The NOT IN acts as a sort of self-join. This won't produce anything from an empty table, but should be OK otherwise.
SQL Fiddle
select min(y.a) as a
from
t x
right join
(
select a + 1 as a from t
union
select 1
) y on y.a = x.a
where x.a is null
It will work even in an empty table
SELECT min(t.a) - 1
FROM t
LEFT JOIN t t1 ON t1.a = t.a - 1
WHERE t1.a IS NULL
AND t.a > 1; -- exclude 0
This finds the smallest number greater than 1, where the next-smaller number is not in the same table. That missing number is returned.
This works even for a missing 1. There are multiple answers checking in the opposite direction. All of them would fail with a missing 1.
SQL Fiddle.
You can do the following, although you may also want to define a range - in which case you might need a couple of UNIONs
SELECT x.id+1
FROM my_table x
LEFT
JOIN my_table y
ON x.id+1 = y.id
WHERE y.id IS NULL
ORDER
BY x.id LIMIT 1;
You can always create a table with all of the numbers from 1 to X and then join that table with the table you are comparing. Then just find the TOP value in your SELECT statement that isn't present in the table you are comparing
SELECT TOP 1 table_with_all_numbers.number, table_with_missing_numbers.number
FROM table_with_all_numbers
LEFT JOIN table_with_missing_numbers
ON table_with_missing_numbers.number = table_with_all_numbers.number
WHERE table_with_missing_numbers.number IS NULL
ORDER BY table_with_all_numbers.number ASC;
In SQLite 3.8.3 or later, you can use a recursive common table expression to create a counter.
Here, we stop counting when we find a value not in the table:
WITH RECURSIVE counter(c) AS (
SELECT 1
UNION ALL
SELECT c + 1 FROM counter WHERE c IN t)
SELECT max(c) FROM counter;
(This works for an empty table or a missing 1.)
This query ranks (starting from rank 1) each distinct number in ascending order and selects the lowest rank that's less than its number. If no rank is lower than its number (i.e. there are no gaps in the table) the query returns the max number + 1.
select coalesce(min(number),1) from (
select min(cnt) number
from (
select
number,
(select count(*) from (select distinct number from numbers) b where b.number <= a.number) as cnt
from (select distinct number from numbers) a
) t1 where number > cnt
union
select max(number) + 1 number from numbers
) t1
http://sqlfiddle.com/#!7/720cc/3
Just another method, using EXCEPT this time:
SELECT a + 1 AS missing FROM T
EXCEPT
SELECT a FROM T
ORDER BY missing
LIMIT 1;

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.
This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).
Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here
If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;
One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next
Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example