TSQL counting '1' in a string position, by positions

TSQL counting '1' in a string position, by positions - sql

There are fields for categories like that:
"101011111000000101010011000101..." every position in this strings represents a certain category if set to "1".
So "1" means set and "0" means not set.
I would like to count the categories with the highest number of "1" and order them descending.
My current solution is like that:
SELECT COUNT(SUBSTRING([Interests], 1, 1)) AS xcount, 1 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 1, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 2, 1)) AS xcount, 2 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 2, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 3, 1)) AS xcount, 3 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 3, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 4, 1)) AS xcount, 4 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 4, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 5, 1)) AS xcount, 5 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 5, 1) = '1'
ORDER BY xcount DESC
Is there a better or faster way to count those categories?

SELECT SUM(CASE WHEN SUBSTRING([Interests], _ID.ID, 1) = '1' THEN 1 ELSE 0 END) AS xcount, _ID.ID
FROM [db1].[dbo].[Contacts], (VALUES (1),(2),(3),(4),(5)) AS _ID(ID)
GROUP BY _ID.ID
ORDER BY xcount DESC
For more categories just increase _ID sequence.

This will count number of '1' in a string consisting of 0 and 1
declare #s varchar(100) ='101011111000000101010011000101';
select cnt = len(#s) - len(replace(#s,'0',''))

Related

How to find the highest numbered version of text?

How to find the highest numbered version of text? For example I have the data with text+digit:
Supra1, Supra2,...,SupraN in column1: translated_description.
select
*
from
oe.product_descriptions
where
translated_description like '%Supra%';
I need to extract the value from another column (column2) for the highest number e.g. N=30 for Supra30 in column1.

If all of the values in column1 have numbers with the same number of digits, you can order by it and use the fetch first syntax:
SELECT column2
FROM mytable
WHERE column1 LIKE 'Supra%'
ORDER BY column1 DESC
FETCH FIRST ROW ONLY
If the number of digits in column1 varies, you'll have to extract them, convert the number, and sort numerically:
SELECT column2
FROM mytable
WHERE column1 LIKE 'Supra%'
ORDER BY TO_NUMBER(REPLACE(column1, 'Supra', '')) DESC
FETCH FIRST ROW ONLY

Try using regexp_substr to extract the number and then apply max on it:
SELECT max(to_number(regexp_substr(t.translated_description, 'Supra([0-9]+)', 1, 1, NULL, 1)))
FROM oe.product_descriptions t
This will extract the number, assuming that the format of the content of the column is SOMETEXTnumber

If there could be more than 1 Supra in the column and if the numbers after the word supra could be 1 or 2 then (just for testing) you could have data like this:
WITH
tbl AS
(
Select 1 "ID", '10GB Removable HDD ... Supra7 disk drives ... transfer rate up to 160MB/s' "COL_1" From Dual Union All
Select 2 "ID", 'Some words ... Supra9 some more words ... and numbers 16 - 32GB' "COL_1" From Dual Union All
Select 3 "ID", 'Words, words, ... Supra12, and Supra13 are considered better than Supra15... words, words' "COL_1" From Dual
),
In this case you should check where the Supra words are located within the string and get ridd of everyting infront. So, here is a cte checking for three words Supra within the text:
supras AS
(
Select
ID,
CASE WHEN InStr(Upper(COL_1), 'SUPRA', 1, 1) > 0 THEN SubStr(COL_1, InStr(Upper(COL_1), 'SUPRA', 1, 1)) END "SUPRA_1",
CASE WHEN InStr(Upper(COL_1), 'SUPRA', 1, 2) > 0 THEN SubStr(COL_1, InStr(Upper(COL_1), 'SUPRA', 1, 2)) END "SUPRA_2",
CASE WHEN InStr(Upper(COL_1), 'SUPRA', 1, 3) > 0 THEN SubStr(COL_1, InStr(Upper(COL_1), 'SUPRA', 1, 3)) END "SUPRA_3"
From
tbl
)
ID SUPRA_1 SUPRA_2 SUPRA_3
---------- ----------------------------------------------------------------------------------------- ------------------------------------------------------------- --------------------------
1 Supra7 disk drives ... transfer rate up to 160MB/s
2 Supra9 some more words ... and numbers 16 - 32GB
3 Supra12, and Supra13 are considered better than Supra15... words, words Supra13 are considered better than Supra15... words, words Supra15... words, words
And, finaly, this resulting dataset could be transformed to just numbers among which you should select the MAX one:
-- Main SQL
SELECT MAX(GREATEST(SUPRA_1, SUPRA_2, SUPRA_3)) "MAX_SUPRA"
FROM
(
Select
ID,
CASE WHEN INSTR('0123456789', SubStr(SUPRA_1, 6, 1)) > 0 And INSTR('0123456789', SubStr(SUPRA_1, 7, 1)) > 0 THEN To_Number(SubStr(SUPRA_1, 6, 2))
WHEN INSTR('0123456789', SubStr(SUPRA_1, 6, 1)) > 0 THEN To_Number(SubStr(SUPRA_1, 6, 1))
ELSE 0
END "SUPRA_1",
--
CASE WHEN INSTR('0123456789', SubStr(SUPRA_2, 6, 1)) > 0 And INSTR('0123456789', SubStr(SUPRA_2, 7, 1)) > 0 THEN To_Number(SubStr(SUPRA_2, 6, 2))
WHEN INSTR('0123456789', SubStr(SUPRA_2, 6, 1)) > 0 THEN To_Number(SubStr(SUPRA_2, 6, 1))
ELSE 0
END "SUPRA_2",
--
CASE WHEN INSTR('0123456789', SubStr(SUPRA_3, 6, 1)) > 0 And INSTR('0123456789', SubStr(SUPRA_3, 7, 1)) > 0 THEN To_Number(SubStr(SUPRA_3, 6, 2))
WHEN INSTR('0123456789', SubStr(SUPRA_3, 6, 1)) > 0 THEN To_Number(SubStr(SUPRA_3, 6, 1))
ELSE 0
END "SUPRA_3"
From
supras
)
MAX_SUPRA
----------
15
-- Below is result of the inner query in main sql
ID SUPRA_1 SUPRA_2 SUPRA_3
---------- ---------- ---------- ----------
1 7 0 0
2 9 0 0
3 12 13 15

if you use SQL SERVER ,you can try this Query:
select ID as Column1,CONVERT(INT,SUBSTRING(ID,6, (LEN(ID)-5))) as Column2
from T
order by CONVERT(INT,SUBSTRING(ID,6, (LEN(ID)-5))) desc

SQL Server exclusive select on column value

Let's say I am returning the following table from a select
CaseId
DocId
DocumentTypeId
DocumentType
ExpirationDate
1
1
1
I797
01/02/23
1
2
2
I94
01/02/23
1
3
3
Some Other Value
01/02/23
I want to select ONLY the row with DocumentType = 'I797', then if there is no 'I797', I want to select ONLY the row where DocumentType = 'I94'; failing to find either of those two I want to take all rows with any other value of DocumentType.
Using SQL Server ideally.
I think I'm looking for an XOR clause but can't work out how to do that in SQL Server or to get all other values.

Similar to #siggemannen answer
select top 1 with ties
case when DocumentType='I797' then 1
when DocumentType='I94' then 2
else 3
end gr
,docs.*
from docs
order by
case when DocumentType='I797' then 1
when DocumentType='I94' then 2
else 3
end
Shortest:
select top 1 with ties
docs.*
from docs
order by
case when DocumentType='I797' then 1
when DocumentType='I94' then 2
else 3
end

Something like this perhaps:
select *
from (
select t.*, DENSE_RANK() OVER(ORDER BY CASE WHEN DocumentType = 'I797' THEN 0 WHEN DocumentType = 'I94' THEN 1 ELSE 2 END) AS prioorder
from
(
VALUES
(1, 1, 1, N'I797', N'01/02/23')
, (1, 2, 2, N'I94', N'01/02/23')
, (1, 3, 3, N'Some Other Value', N'01/02/23')
, (1, 4, 3, N'Super Sekret', N'01/02/23')
) t (CaseId,DocId,DocumentTypeId,DocumentType,ExpirationDate)
) x
WHERE x.prioorder = 1
The idea is to rank rows by 1, 2, 3 depending on document type. Since we rank "the rest" the same, you will get all rows if I797 and I94 is missing.

select * from YourTable where DocumentType = 'I797'
union
select * from YourTable t where DocumentType = 'I94' and (not exists (select * from YourTable where DocumentType = 'I797'))
union
select * from YourTable t where (not exists (select * from YourTable where DocumentType = 'I797' or DocumentType = 'I94' ))

BigQuery SQL query to Indicate a sequence of 3 rows sharing the same value

I need a query that every time the indicator column turns into zero and there are 3 zeros in a row, I would like to assign them a unique group number.
Here is a sample data:
select 0 as offset, 1 as indicator, -1 as grp union all
select 1, 1, -1 union all
select 2, 1, -1 union all
select 3, 1, -1 union all
select 4, 1, -1 union all
select 5, 1, -1 union all
select 6, 1, -1 union all
select 7, 0, 1 union all
select 8, 0, 1 union all
select 9, 0, 1 union all
select 10, 1, -1 union all
select 11, 0, 2 union all
select 12, 0, 2 union all
select 13, 0, 2 union all
select 14, 1, -1 union all
select 15, 1, -1 union all
select 16, 1, -1
In this example there are two sequences of 3 zeros, indicated as grp=1 and grp=2.

Consider below approach
select offset, indicator, if(grp = 0, -1, grp) as grp
from (
select offset, indicator, dense_rank() over(order by pregroup) - 1 as grp
from (
select offset, indicator,
if(countif(indicator = 0) over(partition by pregroup) = 3 and indicator = 0, pregroup, -1) as pregroup
from (
select offset, indicator, count(*) over win - countif(indicator = 0) over win as pregroup
from your_table
window win as (order by offset)
)
)
)
if applied to slightly modified sample data n your question (with sequence of 4 zeros - just for test purpose) - output is

The below query solves this.
Firstly it assigns all of the desired groups a tag.
Secondly, we get the row number for them and use integer casting on row_number to assign them a unique group number.
with data as (select 0 as offset, 1 as indicator, -1 as grp union all
select 1, 1, -1 union all
select 2, 1, -1 union all
select 3, 1, -1 union all
select 4, 1, -1 union all
select 5, 1, -1 union all
select 6, 1, -1 union all
select 7, 0, 1 union all
select 8, 0, 1 union all
select 9, 0, 1 union all
select 10, 1, -1 union all
select 11, 0, 2 union all
select 12, 0, 2 union all
select 13, 0, 2 union all
select 14, 1, -1 union all
select 15, 1, -1 union all
select 16, 1, -1 ),
tagged as (select
*,
-- mark as part of the group if both indicators in front, both indicators behind, or one indicator in front and behind are 0.
case
when indicator = 0 and lead(indicator) over(order by offset) = 0 and lead(indicator, 2) over(order by offset) = 0 then true
when indicator = 0 and lead(indicator) over(order by offset) = 0 and lag(indicator) over(order by offset) = 0 then true
when indicator = 0 and lag(indicator) over(order by offset) = 0 and lag(indicator, 2) over(order by offset) = 0 then true
else false
end as part_of_group
from data),
group_tags as (
select
*,
-- use cast as int to acquire the group number from the row number
CAST((row_number() over(order by offset) + 1)/3 AS INT) as group_tag
from
tagged
where
part_of_group = true)
-- rejoin this data back together
select
d.*,
gt.group_tag
from data as d
left join
group_tags as gt
on
d.offset = gt.offset

You may consider below approach as well,
WITH partitions AS (
SELECT *, indicator = 0 AND COUNT(div) OVER (PARTITION BY div, indicator) = 3 AS flag
FROM (
SELECT *, SUM(indicator) OVER (ORDER BY offset) AS div FROM sample_data
)
)
SELECT offset, indicator, IF(flag, DENSE_RANK() OVER w, -1) AS grp
FROM partitions
WINDOW w AS (PARTITION BY CASE WHEN flag THEN 0 ELSE 1 END ORDER BY div)
ORDER BY offset;
Query results

Fewest number of buckets to bag elements in bigquery

I have a matrix with buckets and elements like below. If an element can fit in a bucket it is 1 in the corresponding cell
For example: If you look at the image, element x can fit in bucket-a,b,c and not in d and e
I want to find the fewest buckets to group my elements. In this case, buckets c and d could group all the elements in just two buckets.
Any idea if i can do this in bigquery dynamically and efficiently ? original data is not as simple as this.
select "element-x" as element , 1 as bucketa, 1 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
union all
select "element-y" as element , 0 as bucketa, 0 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
union all
select "element-z" as element , 1 as bucketa, 0 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
union all
select "element-p" as element , 0 as bucketa, 0 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete
union all
select "element-q" as element , 1 as bucketa, 0 as bucketb, 0 as bucketc, 1 as bucketd, 0 as buckete
union all
select "element-r" as element , 0 as bucketa, 1 as bucketb, 0 as bucketc, 1 as bucketd, 1 as buckete

Consider below solution - obviously you need to make sure you provide accurate data in matrix CTE and also you need respectively adjust buckets_elements CTE to reflect all buckets in matrix. The rest of CTE's and final query will make a work for you!
with matrix as (
select "element-x" as element, 1 as bucketa, 1 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete union all
select "element-y", 0, 0, 1, 0, 0 union all
select "element-z", 1, 0, 1, 0, 0 union all
select "element-p", 0, 0, 1, 0, 0 union all
select "element-q", 1, 0, 0, 1, 0 union all
select "element-r", 0, 1, 0, 1, 1
), buckets_elements as (
select array[struct(a), struct(b), struct(c), struct(d), struct(e)] buckets
from (
select
array_agg(if(bucketa = 1, element, null) ignore nulls) a,
array_agg(if(bucketb = 1, element, null) ignore nulls) b,
array_agg(if(bucketc = 1, element, null) ignore nulls) c,
array_agg(if(bucketd = 1, element, null) ignore nulls) d,
array_agg(if(buckete = 1, element, null) ignore nulls) e
from matrix
)
), columns_names as (
select
regexp_extract_all(to_json_string((select as struct * except(element) from unnest([t]))), r'"([^"]+)"') cols
from matrix t limit 1
), columns_index as (
select generate_array(0, array_length(cols) - 1) as arr
from columns_names
), buckets_combinations as (
select
(select array_agg(
case when n & (1<<pos) <> 0 then arr[offset(pos)] end
ignore nulls)
from unnest(generate_array(0, array_length(arr) - 1)) pos
) as combo
from columns_index cross join
unnest(generate_array(1, cast(power(2, array_length(arr)) - 1 as int64))) n
)
select
array(select cols[offset(i)] from columns_names, unnest(combo) i) winners
from (
select combo,
rank() over(order by (select count(distinct el) from unnest(val) v, unnest(v.a) el) desc, array_length(combo)) as rnk
from (
select any_value(c).combo, array_agg(buckets[offset(i)]) val
from buckets_combinations c, unnest(combo) i, buckets_elements b
group by format('%t', c)
)
)
where rnk = 1
with output

How to convert below query from Oracle to Postgres having connect by?

I am converting my application DB from oracle to postgres. I am stuck on a function having connect by syntax. Below is the Oracle query.
PROCEDURE Get_Report_Data(parm_Billing_Month VARCHAR2, OUT Ref_Cur) IS
BEGIN
OPEN p_Data FOR
SELECT CASE
WHEN Id = 1 THEN
'Amount < 10000'
WHEN Id = 2 THEN
'10000-15000'
WHEN Id = 3 THEN
'15000-20000'
ELSE
'Amount > 20000'
END "Range",
SUM(Nvl(N1, 0)) N1,
SUM(Nvl(N2, 0)) N2,
SUM(Nvl(C1, 0)) C1,
SUM(Nvl(C2, 0)) C2,
SUM(Nvl(C3, 0)) C3,
SUM(Nvl(S1, 0)) S1,
SUM(Nvl(S2, 0)) S2,
COUNT(Site_Id) "No of Sites"
FROM (SELECT CASE
WHEN Nvl(Ed.Actual_Bill, 0) < 10000 THEN
1
WHEN Ed.Actual_Bill < 15000 THEN
2
WHEN Ed.Actual_Bill < 20000 THEN
3
ELSE
4
END Amount_Sort,
Decode(Er.Region_Id, 1, 1, 0) N1,
Decode(Er.Region_Id, 2, 1, 0) N2,
Decode(Er.Region_Id, 3, 1, 0) C1,
Decode(Er.Region_Id, 4, 1, 0) C2,
Decode(Er.Region_Id, 5, 1, 0) C3,
Decode(Er.Region_Id, 6, 1, 0) S1,
Decode(Er.Region_Id, 7, 1, 0) S2,
Ed.Site_Id
FROM Tbl_Details Ed,
Tbl_Site Es,
Tbl_Region Er,
Tbl_Subregion Esr
WHERE Ed.Site_Id = Es.Site_Id
AND Es.Subregion_Id = Esr.Subregion_Id
AND Esr.Region_Id = Er.Region_Id
AND Ed.Billing_Month_f = parm_Billing_Month) Data,
(SELECT Regexp_Substr('1,2,3,4,', '[^,]+', 1, Rownum) Id
FROM Dual
CONNECT BY Rownum <= Length('1,2,3,4,') -
Length(REPLACE('1,2,3,4,', ','))) All_Value
WHERE Data.Amount_Sort(+) = All_Value.Id
GROUP BY All_Value.Id
ORDER BY AVG(All_Value.Id);
END;
When I convert this query to postgres having some changes like Ref_Cur to refcursor and NVL to Coalesce function. I am still unable to resolve the connect by syntax. Some people suggested to use CTE's but I am unable to get it. Any help guys?
Edit
For random googlers below is the answer to my above problem. Special thanks to MTO.
WHERE Ed.Site_Id = Es.Site_Id
AND Es.Subregion_Id = Esr.Subregion_Id
AND Esr.Region_Id = Er.Region_Id
AND Ed.Billing_Month_f = p_Billing_Month) data
Right Outer Join (Select 1 as Id union All
Select 2 as Id union All
Select 3 as Id union All
Select 4 as Id) all_value
On data.Amount_Sort = all_value.Id
GROUP BY all_value.Id
ORDER BY AVG(all_value.Id);

The "generation" of IDs can be simplified in Postgres.
either use a values() clause:
Right Outer Join ( values (1,2,3,4) ) as all_value(id) On data.Amount_Sort = all_value.Id
or, if those are always a consecutive numbers, use generate_series():
Right Outer Join generate_series(1,4) as all_value(id) On data.Amount_Sort = all_value.Id

Since your hierarchical query appears to be using static strings, you can convert this:
SELECT Regexp_Substr('1,2,3,4,', '[^,]+', 1, Rownum) Id
FROM Dual
CONNECT BY Rownum <= Length('1,2,3,4,') - Length(REPLACE('1,2,3,4,', ',')
To:
SELECT 1 AS id FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL UNION ALL
SELECT 4 FROM DUAL
Which should then be simpler to convert to PostgreSQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TSQL counting '1' in a string position, by positions - sql

SELECT SUM(CASE WHEN SUBSTRING([Interests], _ID.ID, 1) = '1' THEN 1 ELSE 0 END) AS xcount, _ID.ID FROM [db1].[dbo].[Contacts], (VALUES (1),(2),(3),(4),(5)) AS _ID(ID) GROUP BY _ID.ID ORDER BY xcount DESC For more categories just increase _ID sequence.

This will count number of '1' in a string consisting of 0 and 1 declare #s varchar(100) ='101011111000000101010011000101'; select cnt = len(#s) - len(replace(#s,'0',''))

Related

How to find the highest numbered version of text?

SQL Server exclusive select on column value

BigQuery SQL query to Indicate a sequence of 3 rows sharing the same value

Fewest number of buckets to bag elements in bigquery

How to convert below query from Oracle to Postgres having connect by?

Categories

Resources