Is it possible to group by a combination of rows? - sql

I have a result that gives me a range of values to query from my database:
Start End
----- ---
1 3
5 6
137 139
From those, I need to query the database for the records in that range, which might return something like:
Id Name
----- ------
1 foo
2 bar
3 baz
Id Name
----- ------
5 foo
6 baz
Id Name
----- ------
137 foo
138 bar
139 baz
I want to group the result of those, keeping any of the id ranges since they correlate to the same thing. For example, 1-3 is the same as 137-139, so it would have a count of 2, but of course, the 'range' can be either of the 2:
RangeStart RangeEnd Count
---------- -------- -----
137 139 2
5 6 1
Also note that the order should change the grouping, so foo/bar/baz is not the same as foo/baz/bar.
How can this be accomplished?
EDIT: I have the beginning result (start,end) and I only care about the end result (RangeStart,RangeEnd,Count). I don't actually need the intermediate results, I just use them as explanation.

Here are two queries:
The first one concatenates the strings
into groups based on the ranges and
then shows the first range for each
group of strings. It also has the
total number of times the string
appeared.
The second one shows the concatenated
strings and their respective totals.
Setup:
DECLARE #Tags TABLE (
TagID INT,
Tag VARCHAR(3)
)
INSERT #Tags
SELECT 1, 'Foo' UNION ALL
SELECT 2, 'Bar' UNION ALL
SELECT 3, 'Baz' UNION ALL
SELECT 4, 'Foo' UNION ALL
SELECT 5, 'Bar' UNION ALL
SELECT 6, 'Baz'
DECLARE #Ranges TABLE (
StartRange INT,
EndRange INT
)
INSERT #Ranges
SELECT 1,3 UNION ALL
SELECT 2,3 UNION ALL
SELECT 3,4 UNION ALL
SELECT 4,6
Query To Show First Ranges and Results:
/* Get the first start and end ranges with a match and */
/* the total number of occurences of that match */
SELECT
StartRange,
EndRange,
Total
FROM (
SELECT
StartRange,
EndRange,
Csv,
ROW_NUMBER() OVER (PARTITION BY Csv ORDER BY StartRange ASC) AS RowNum,
ROW_NUMBER() OVER (PARTITION BY Csv ORDER BY StartRange DESC) AS Total
FROM (
/* For each range and its associated Tag values, */
/* Concatenate the tags together using FOR XML */
/* and the STUFF function */
SELECT
StartRange,
EndRange,
(
SELECT STUFF(
(SELECT ',' + Tag
FROM #Tags WHERE TagID BETWEEN r.StartRange AND r.EndRange
ORDER BY TagID
FOR XML PATH('')),1,1,'')
) AS Csv
FROM #Ranges r
) t1
) t2
WHERE RowNum = 1
ORDER BY StartRange, EndRange
/* Results */
StartRange EndRange Total
----------- ----------- -----
1 3 2
2 3 1
3 4 1
Query to show concatenanted strings and totals:
/* Get the concatenated tags and their respective totals */
SELECT
Csv,
COUNT(*) AS Total
FROM (
/* For each range and its associated Tag values, */
/* Concatenate the tags together using FOR XML */
/* and the STUFF function */
SELECT
StartRange,
EndRange,
(
SELECT STUFF(
(SELECT ',' + Tag
FROM #Tags WHERE TagID BETWEEN r.StartRange AND r.EndRange
ORDER BY TagID
FOR XML PATH('')),1,1,'')
) AS Csv
FROM #Ranges r
) t1
GROUP BY Csv
ORDER BY Csv
/* Results */
Csv Total
------------ -----------
Bar,Baz 1
Baz,Foo 1
Foo,Bar,Baz 2
String concatentation method courtesy of Jeremiah Peschka

Related

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

Where clause between union all in sql?

I have a query that vertically expands data by using Union condition. Below are the 2 sample tables:
create table #temp1(_row_ord int,CID int,_data varchar(10))
insert #temp1
values
(1,1001,'text1'),
(2,1001,'text2'),
(4,1002,'text1'),
(5,1002,'text2')
create table #temp2(_row_ord int,CID int,_data varchar(10))
insert #temp2
values
(1,1001,'sample1'),
(2,1001,'sample2'),
(4,1002,'sample1'),
(5,1002,'sample2')
--My query
select * from #temp1
union
select * from #temp2 where CID in (select CID from #temp1)
order by _row_ord,CID
drop table #temp1,#temp2
So my current output is:
I want to group the details of every client together for which I am unable to use 'where' clause across Union condition.
My desired output:
Any help?! Order by is also not helping me.
I can imagine you want all of the rows for a CID sorted by _row_ord from the first table before the ones from the second table. And the CID should be the outermost sort criteria.
If that's right, you can select literals from your tables. Let the literal for the first table be less than that of the second table. Then first sort by CID, then that literal and finally by _row_ord.
SELECT cid,
_data
FROM (SELECT 1 s,
_row_ord,
cid,
_data
FROM #temp1
UNION ALL
SELECT 2 s,
_row_ord,
cid,
_data
FROM #temp2) x
ORDER BY cid,
s,
_row_ord;
db<>fiddle
If I correctly understand your need, you need the output to be sorted the way that #temp1 rows appear before #temp2 rows for each cid value.
What you could do is generate additional column ordnum assigning values for each table, just for sorting purposes, and then get rid of it in the outer select statement.
select cid, _data
from (
select 1 as ordnum, *
from #temp1
union all
select 2 as ordnum, *
from #temp2 t2
where exists (
select 1
from #temp1 t1
where t1.cid = t2.cid
)
) q
order by cid, ordnum
I have also rewritten your where condition for an equivalent which should work faster using exists operator.
Live DEMO - click me!
Output
cid _data
1001 text1
1001 text2
1001 sample1
1001 sample2
1002 text1
1002 text2
1002 sample1
1002 sample2
Use With. here is my first try with your sql
create table #temp1(_row_ord int,CID int,_data varchar(10))
insert #temp1
values
(1,1001,'text1'),
(2,1001,'text2'),
(4,1002,'text1'),
(5,1002,'text2')
create table #temp2(_row_ord int,CID int,_data varchar(10))
insert #temp2
values
(1,1001,'sample1'),
(2,1001,'sample2'),
(4,1002,'sample1'),
(5,1002,'sample2');
WITH result( _row_ord, CID,_data) AS
(
--My query
select * from #temp1
union
select * from #temp2 where CID in (select CID from #temp1)
)
select * from tmp order by CID ,_data
drop table #temp1,#temp2
result
_row_ord CID _data
1 1001 sample1
2 1001 sample2
1 1001 text1
2 1001 text2
4 1002 sample1
5 1002 sample2
4 1002 text1
5 1002 text2
Union is placed between two result set blocks and forms a single result set block. If you want a where clause on a particular block you can put it:
select a from a where a = 1
union
select z from z
select a from a
union
select z from z where z = 1
select a from a where a = 1
union
select z from z where z = 1
The first query in a union defines column names in the output. You can wrap an output in brackets, alias it and do a where on the whole lot:
select * from
(
select a as newname from a where a = 1
union
select z from z where z = 2
) o
where o.newname = 3
It is important to note that a.a and z.z will combine into a new column, o.newname. As a result, saying where o.newname will filter on all rows from both a and z (the rows from z are also stacked into the newname column). The outer query knows only about o.newname, it knows nothing of a or z
Side note, the query above produces nothing because we know that only rows where a.a is 1 and z.z is 2 are output by the union as o.newname. This o.newname is then filtered to only output rows that are 3, but no rows are 3
select * from
(
select a as newname from a
union
select z from z
) o
where o.newname = 3
This query will pick up any rows in a or z where a.a is 3 or z.z is 3, thanks to the filtering of the resulting union

sql server: finding names with repetitive char

I want to find names with repetitive char in them for example :
ana
asdbbiop
a1for1
#mail#ban
I found a similar question here but they were consecutive ones.
I know 2 way :
1.using helping table and join
2.a very long "where like" statement
is there any way to do it with regex?
This codes will helps you find the repetitive characters find in the word and count of characters
Sample Data
DECLARE #Table TABLE (ID INT,String nvarchar(100))
INSERT INTO #Table
SELECT 1,'ana' UNION ALL
SELECT 2,'asdbbiop' UNION ALL
SELECT 3,'a1for1' UNION ALL
SELECT 4,'#mail#ban'
SELECT * FROM #Table
Using Recursive Cte we get the expected Result
;With cte
AS
(
SELECT ID, String
,CAST(LEFT(String,1)AS VARCHAR(10)) AS Letter
,RIGHT(String,LEN(String)-1) AS Remainder
FROM #Table
WHERE LEN(String)>1
UNION ALL
SELECT ID, String
,CAST(LEFT(Remainder,1)AS VARCHAR(10)) AS Letter
,RIGHT(Remainder,LEN(Remainder)-1) AS Remainder
FROM cte
WHERE LEn(Remainder)>0
)
SELECT ID,
String,
letter,
Ascii(letter) AS CharCode,
Count(letter) AS CountOfLetter
FROM cte
GROUP BY ID,
String,
letter,
Ascii(letter)
HAVING COUNT(letter)>1
Result
ID String letter CharCode CountOfLetter
---------------------------------------------
1 ana a 97 2
2 asdbbiop b 98 2
3 a1for1 1 49 2
4 #mail#ban # 64 2
4 #mail#ban a 97 2

Fixing duplicate rows in a table

I have a table like below
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
which has following value
1, 'abc'
2, 'abc'
1, 'abc'
3, 'abc'
I want to update this table so that it has the following values
1, 'abc'
2, 'abc_1'
1, 'abc'
3, 'abc_2'
Could someone help me out with this
Use a cursor to move over the table and try to insert every row in a second temporary table. If you get a collision (technically with a select), you can run a second query to get the maximum number (if any) that's appended to your item.
Once you know what maximum number is used (use isnull to cover the case of the first duplicate) just run an update over your original table and keep going with your scan.
Are you looking to remove duplicates? or just change the values so they aren't duplicate?
to change the values use
update producttotals
set value = 'abc_1'
where id =2;
update producttotals
set value = 'abc_2'
where id =3;
to find duplicate rows do a
select id, value
from producttotals
group by id, value
having count() > 2;
Assuming SQL Server 2005 or greater
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
INSERT INTO #ProductTotals
VALUES (1, 'abc'),
(2, 'abc'),
(1, 'abc'),
(3, 'abc')
;WITH CTE as
(SELECT
ROW_NUMBER() OVER (Partition by value order by id) rn,
id,
value
FROM
#ProductTotals),
new_values as (
SELECT
pt.id,
pt.value,
pt.value + '_' + CAST( ROW_NUMBER() OVER (partition by pt.value order by pt.id) as varchar) new_value
FROM
#ProductTotals pt
INNER JOIN CTE
ON pt.id = CTE.id
and pt.value = CTE.value
WHERE
pt.id NOT IN (SELECT id FROM CTE WHERE rn = 1)) --remove any with the lowest ID for the value
UPDATE
#ProductTotals
SET
pt.value = nv.new_value
FROM
#ProductTotals pt
inner join new_values nv
ON pt.id = nv.id and pt.value = nv.value
SELECT * FROM #ProductTotals
Will produce the following
id value
----------- --------------------------------------------------
1 abc
2 abc_1
1 abc
3 abc_2
Explanation of the SQL
The first CTE creates a row number Value. So the numbering gets restarted whenever it sees a new value
rn id value
-------------------- ----------- --------
1 1 abc
2 1 abc
3 2 abc
4 3 abc
The second CTE called new_values ignores any IDs that are assoicated with with a RN of 1. So rn 1 and rn 2 get removed because they share the same ID. It also uses ROW_NUMBER() again to determine the number for the new_value
id value new_value
----------- ------ -------------
2 abc abc_1
3 abc abc_2
The final statement just updates the Old value with the new value

Pivot values of a column based on a search string

Note: I would like to do this in a single SQL statement. not pl/sql, cursor loop, etc.
I have data that looks like this:
ID String
-- ------
01 2~3~1~4
02 0~3~4~6
03 1~4~5~1
I want to provide a report that somehow pivots the values of the String column into distinct rows such as:
Value "Total number in table"
----- -----------------------
1 3
2 1
3 2
4 3
5 1
6 1
How do I go about doing this? It's like a pivot table but I am trying to pivot the data in a column, rather than pivoting the columns in the table.
Note that in real application, I do not actually know what the values of the String column are; I only know that the separation between values is '~'
Given this test data:
CREATE TABLE tt (ID INTEGER, VALUE VARCHAR2(100));
INSERT INTO tt VALUES (1,'2~3~1~4');
INSERT INTO tt VALUES (2,'0~3~4~6');
INSERT INTO tt VALUES (3,'1~4~5~1');
This query:
SELECT VALUE, COUNT(*) "Total number in table"
FROM (SELECT tt.ID, SUBSTR(qq.value, sp, ep-sp) VALUE
FROM (SELECT id, value
, INSTR('~'||value, '~', 1, L) sp -- 1st posn of substr at this level
, INSTR(value||'~', '~', 1, L) ep -- posn of delimiter at this level
FROM tt JOIN (SELECT LEVEL L FROM dual CONNECT BY LEVEL < 20) q -- 20 is max #substrings
ON LENGTH(value)-LENGTH(REPLACE(value,'~'))+1 >= L
) qq JOIN tt on qq.id = tt.id)
GROUP BY VALUE
ORDER BY VALUE;
Results in:
VALUE Total number in table
---------- ---------------------
0 1
1 3
2 1
3 2
4 3
5 1
6 1
7 rows selected
SQL>
You can adjust the maximum number of items in your search string by adjusting the "LEVEL < 20" to "LEVEL < your_max_items".