SQL: select with disctint left

SQL: select with disctint left - sql

table
id name data
---------------
1 name1 data1
2 name2 data2
3 name3 data3
4 name4 data1abc
5 name5 data2abc
6 name6 data1abcd
7 name7 data2abcde
The output that I need is rows with ids 3, 6 and 7.
I need to search for distinct data terms. The terms data1, data1abc, data1abcd should all be counted as one term and the rows with unique most characters should be returned i.e., data1abcd, data2abcde, data3
Can you help please?
This is what I have written so far, it doesnt work:
SELECT *
FROM table
WHERE LEFT(data, 5) = (
SELECT distinct LEFT(data, 5)
FROM table
)

This will work for any string length:
select s
from (
select
s,
case
when s = left(lag(s) over (order by s desc), length(s))
then false else true
end as u
from t
) t
where u
order by s
;
s
------------
data1abcd
data2abcde
data3
Sample data:
create table t (id int, name text, s text);
insert into t (id, name, s) values
(1, 'name1', 'data1'),
(2, 'name2', 'data2'),
(3, 'name3', 'data3'),
(4, 'name4', 'data1abc'),
(5, 'name5', 'data2abc'),
(6, 'name6', 'data1abcd'),
(7, 'name7', 'data2abcde');

You could partition the rows by the first five characters and take the longest:
SELECT name, data
FROM (SELECT name,
data,
RANK() OVER (PARTITION BY LEFT(data, 5)
ORDER BY LENGTH(data) DESC) AS rk
FROM mytable) t
WHERE rk = 1
Note:
The rank function allows for ties (e.g., data1a and data1b have the same length, so they could both have the rank of 1 if there a no longer strings). If you do not want to allow for ties, you should use row_number instead.

It seems that you want distinct non-numeric values. You may change data_len as len(data_r) if you want the longest non-numeric records
ALTER TABLE <yourtable>
ADD
data_r as REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (data, '0', ''),'1', ''),'2', ''),'3', ''),'4', ''),'5', ''),'6', ''),'7', ''),'8', ''),'9', '')
,data_len as len(data)
SELECT * FROM <yourtable> t
WHERE EXISTS ( SELECT MAX(data_len), data_r
FROM <yourtable> t1
WHERE t.data_r = t1.data_r
AND t.data_len = t1.data_len
GROUP BY data_r
)

Related

Find matching first 7 chars to identify duplicates

I'm trying to identify duplicate state_num that are failing validation. The R is causing issues with validation, but I want to just search the first 7 characters and find the duplicate values, so that it returns the row that has an R in the string and the row that doesn't. The column is a type: char(15) But when trying to run a query it is not finding the matching 7 characters. My table only showing how it should look, its not showing what is actually being returned. It basically is just finding the state and only finding non R state_num in results. It should be returning around 480 rows but is returning like 20k rows and not just showing the duplicates
I've tried querying a bunch of different ways but i've spen the last hour only being able to return the R row if i ad AND state_num[8] = 'R' to the end of the query. Which defeats what I'm trying to find the duplicate first 7 characters. This is an informix db.
My Query:
SELECT id_ref, cont_ref, formatted, state_num, type, state
FROM state_form sf1
WHERE EXISTS (select cont_ref, san
FROM state_form sf2
WHERE sf1.cont_ref = sf2.cont_ref and left(sf1.state_num,7) = LEFT(sf2.state_num,7)
GROUP BY cont_ref, state_num
HAVING COUNT(state_num) > 1)
AND state = 'MT';
This is what I'd like my results to return:
id_ref
cont_ref
formatted
state_num
type
state
658311
5237
71-75011R
7175011R
Y
MT
1459
5237
71-75011
7175011
I
MT
7501
555678
99-67894
9967894
I
MT
345443
555678
99-67894R
9967894R
Y
MT

Here are a couple options producing the same results. This may need to be changed if you need to identify the 8th character as something such as a Letter. That is, this will also catch 12345678 and 1234567.
create table my_data (
id_ref integer,
cont_ref integer,
state_num varchar(20),
type varchar(5),
state varchar(5)
);
insert into my_data values
(1, 5237, '7175011R', 'Y', 'MT'),
(2, 5237, '7175011', 'I', 'MT'),
(3, 6789, '7878787', 'Y', 'CA'),
(4, 6789, '7878787R', 'I', 'CA'),
(5, 555678, '9967894', 'I', 'MT'),
(6, 555678, '9967894R', 'Y', 'MT'),
(7, 98765, '123456', 'I', 'MT');
Query #1
with dupes as (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
)
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;
Query #2
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
where m.cont_ref in (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
);
id_ref
cont_ref
state_num
type
state
1
5237
7175011R
Y
MT
2
5237
7175011
I
MT
5
555678
9967894
I
MT
6
555678
9967894R
Y
MT
View on DB Fiddle
UPDATE
If Informix does not want to group by left(column, 7), then you could get the target cont_ref values using this. Here's the CTE method, but you could also do with sub-query.
with dupes as (
select cont_ref
from (
select cont_ref, left(state_num, 7) as left_seven
from my_data
where state = 'MT'
)z
group by cont_ref
having count(*) > 1
)
select m.*
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;

How to select just the rows in a table that fit a criteria , avoiding duplicates in SQL

So I have a df like follows:
USER Value object
0001 V V
0002 A NULL
0002 C C
0003 A NULL
0004 A NULL
0004 A NULL
0003 V V
So I basically want USER to be the unique id for each row of this DF. If there is an A in the Value column, I only want it if that's the only option for the ID. So there are two 002's, I only want to see the instance where it is not A , so C.
Because 0004 doesn't have a non-A Value, I'll take the A.
Final result:
USER Value
0001 V
0002 C
0003 V
0004 A

I think you are looking for the following:
select user,
'A' as value
from tbl
group by user
having sum(case when value = 'A' then 1 else 0 end) > 0
and sum(case when value <> 'A' then 1 else 0 end) = 0
union all
select user,
value
from tbl
where value <> 'A'
order by user;
See Fiddle:
http://www.sqlfiddle.com/#!9/b28f4c/2/0
The desired result is achieved with your example data. However, your example data does not contain any users having more than one non-A value row. The above query will keep all of them. If you only want to keep one or some, explain how to pick which you want.

This will return the one Value per tuple, returning A at last resort (if A is the smallest of the potential values):
select USER, max(Value) as value from Table
group by User
or, this might return multiple users if they have several tuples with different object (when not null)
select distinct user, coalesce(object, value)
from table ;

Here's a solution if you don't like typing :-)
select
distinct USR
,VAL
from
TBL
qualify
max(ascii(VAL)) over (partition by USR ) = ascii(VAL)
Copy|Paste|Run in snowflake:
CREATE or replace TABLE tbl( USR varchar(4), VAL varchar(1), OBJ varchar(4));
INSERT INTO tbl (USR,VAL,OBJ)
VALUES
('0001', 'V', 'V'),
('0002', 'A', NULL),
('0002', 'C', 'C'),
('0003', 'A', NULL),
('0004', 'A', NULL),
('0004', 'A', NULL),
('0003', 'V', 'V');
select
distinct USR
,VAL
from
TBL
qualify
max(ascii(VAL)) over (partition by USR ) = ascii(VAL);

You can try the following if you are using SQL-Server
select distinct USER
,Value
from
(
select *,rank() over (partition by USER order by Value desc) as ranking
from your_table_name
) as t
where ranking =1

Conditional group by with window function in Snowflake query

I have a table in Snowflake in following format:
create temp_test(name string, split string, value int)
insert into temp_test
values ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400), ('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000), ('B', 'd',4000), ('B','e', 5000)
First step, I needed only top 2 value per name (sorted on value), so I used following query to get that:
select name, split, value,
row_number() over (PARTITION BY (name) order by value desc) as row_num
from temp_test
qualify row_num <= 2
Which gives me following resultset:
NAME SPLIT VALUE ROW_NUM
A e 500 1
A d 400 2
B e 5000 1
B d 4000 2
Now, I need to sum values other than Top 2 and put it in a different Split named as "Others", like this:
NAME SPLIT VALUE
A e 500
A d 400
A Others 600
B e 5000
B d 4000
B Others 6000
How to do that in Snowflake query or SQL in general?

with data as (
select name, split, value,
row_number() over (partition by (name) order by value desc) as row_num
from temp_test
)
select
name,
case when row_num <= 2 then split else 'Others' end as split,
sum(value) as value
from data
group by name, case when row_num <= 2 then row_num else 3 end

Shawnt00's answer is good, but for the record in Snowflake this can be written simpler:
Firstly the group by at the end can refer to the results by index or name:
GROUP BY 1,2
or
GROUP BY name, split
also as the CASE only has too branches an IFF can be used and seems you are using a CTE to add the row_number you can push the IFF into the CTE also
WITH data AS (
SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY value DESC) AS row_num,
IFF(row_num < 3, split, 'Others') as n_split
FROM VALUES ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400),
('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000),
('B', 'd',4000), ('B','e', 5000)
v(name, split, value)
)
SELECT
name,
n_split,
SUM(value) AS value
FROM data
GROUP BY name, n_split;
and if super keen on small SQL push the ROW_NUMBER into the IFF:
WITH data AS (
SELECT name, value,
IFF(ROW_NUMBER() OVER (PARTITION BY name ORDER BY value DESC) < 3, split, 'Others') as n_split
FROM VALUES ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400),
('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000),
('B', 'd',4000), ('B','e', 5000)
v(name, split, value)
)
SELECT
name,
n_split AS split,
SUM(value) AS value
FROM data
GROUP BY name, n_split;
gives:
NAME SPLIT VALUE
A e 500
A d 400
A Others 600
B e 5000
B d 4000
B Others 6000

SQL Server loop through a table for every 5 rows

I need to write a stored procedure or table function to return a new data table as a new data source.
I wish to loop through the original table for every 5 rows base on the invoice ID column (it's possible not start from 1), the first 5 rows add to the left of the new table and the second 5 rows add to the right of the new table, the third 5 rows to the left and so on.
For example, Here is the original table:
Here is the expect table:
Thanks in advance!

declare #rowCount int = 5;
with cte as (
select *,( (IN_InvoiceID-1) / #rowCount ) % 2 group1
,( (IN_InvoiceID-1) / #rowCount ) group2
,IN_InvoiceID % #rowCount group3
from T
)
select * from cte
select T1.INID,T1.IN_InvoiceID,T1.IN_InvoiceAmount,T2.INID,T2.IN_InvoiceID,T2.IN_InvoiceAmount
from CTE T1
left join CTE T2 on T2.group1 = 1 and T1.group2 = T2.group2-1 and T1.group3 = T2.group3
where T1.group1 = 0
Test DDL
CREATE TABLE T
([INID] varchar(38), [IN_InvoiceID] int, [IN_InvoiceAmount] int)
;
INSERT INTO T
([INID], [IN_InvoiceID], [IN_InvoiceAmount])
VALUES
('DB3E17E6-35C5-41:121-93B1-F809BF6B2972', 1, 2999),
('3212F048-8213-4FCC-AB64-121485B77D4E43', 2, 3737),
('E3526373-A204-40F5-801C-7F8302A4E5E2', 3, 3175),
('76CC9C19-BF79-4E8A-8034-A33805AD3390', 4, 391),
('EC7A2FBC-B62D-4865-88DE-A8097975F125', 5, 1206),
('52AD3046-21331-4F0A-BD1D-67F232C54244', 6, 402),
('CA48F132-A9F5-4516-9E58-CDEE6644AAD1', 7, 1996),
('02E10C31-CAB2-4220-B66A-CEE5E67A9378', 8, 3906),
('98F1EEFF-B07A-4B65-87F4-E165264284DD', 9, 2575),
('91EBDD8B-B73C-470C-8900-DD66078483DB', 10, 2965),
('6E2490E5-C4DE-4833-877F-1590F7BDC1B8', 11, 1603),
('00985921-AC3C-4E3E-BAE1-7F58302F831A', 12, 1302)
;
Result:

Could you please check article Display Data in Multiple Columns using SQL showing with example case how a database developer can show the list of data rows in a columnar mode using Row_Number() function and mode arithmetic expression
You need to add additional columns from the same row that is different in the sample

Seems as if you want to split the table into 2 tables with alternating 5 rows. An easy way to do this would be:
Take data into a temp table having an extra column (lets say
grouping_id)
Update the grouping id so that each 5 rows have the same id. You can
use in_invoiceId % 5 (the nod function). After this step the first 5
rows will have grouping_id 0, next 5 will have 1, next will have 2
(assuming your invoice id is incremented +1 for all rows).
You can just do a normal select with where clause for odd and even grouping_id

Ideally, you can manage with the 2 tables Master and detail table.
But due to my curiosity, I am able to solve and give the answer as
Declare #table table(id int identity, invoice_id int)
; WITH Numbers AS
(
SELECT n = 1
UNION ALL
SELECT n + 1
FROM Numbers
WHERE n+1 <= 50
)
insert into #table SELECT n
FROM Numbers
Select (a.id )%5 ,* from #table a join #table b on a.id+5 = b.id and a.id != b.id
;WITH Numbers AS
(
SELECT n = 1, o = 5
UNION ALL
SELECT n + 10, o = o+10
FROM Numbers
WHERE n+1 <= 50
)
select a.id ParentId,a.invoice_id ParentInvoiceId, --b.n, b.o,
c.invoice_id childInvoiceID from #table a
join Numbers b on a.id between b.n and b.o
left join #table c on a.id + 5 = c.id

Here is my solution
First i create grps based on whether the in_invoiceid is divisible by 5 or not.(Ignore the remainders)
After that i create a category to indicate between alternative groups(ie by checking if the remainder is 0 or otherise)
Then its a matter of dense_ranking the records on the basis of the category field ordered by in_invoiceid
Lastly a join with category=1 rows with same dense_rank as those records in category=0
create table Invoicetable(IN_ID varchar(100), IN_InvoiceID int)
INSERT INTO Invoicetable (IN_ID, IN_InvoiceID)
VALUES
('2345-BCDE-6645-1DDF', 1),
('2345-BCDE-6645-3DDF', 2),
('2345-BCDE-6645-4DDF', 3),
('2345-BCDE-6645-5DDF', 4),
('2345-BCDE-6645-6DDF', 5),
('2345-BCDE-6645-7DDF', 6),
('2345-BCDE-6645-aDDF', 7),
('2345-BCDE-6645-sDDF', 8),
('2345-BCDE-6645-dDDF', 9),
('2345-BCDE-6645-dDDF', 10),
('2345-BCDE-6645-dDDF', 11),
('2345-BCDE-6645-dDDF', 12);
with data
as (
select *
,(in_invoiceid-1)/5 as grp
,case when ((in_invoiceid-1)/5)%2=0 then '1' else '0' end as category
,dense_rank() over(partition by case when ((in_invoiceid-1)/5)%2=0 then '1' else '0' end
order by in_invoiceid) as rnk
from invoicetable a
)
select *
from data a
left join data b
on a.rnk=b.rnk
and b.category=0
where a.category=1
Here is db fiddle link.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=287f101737c580ca271940764b2536ae

You may try with the following approach. Dividing the table is done with (((ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) - 1) / 5) % 2 = 0) which groups records in left and right groups.
CREATE TABLE #InvoiceTable(
IN_ID varchar(24),
IN_InvoiceID int
)
INSERT INTO #InvoiceTable (IN_ID, IN_InvoiceID)
VALUES
('2345-BCDE-6645-1DDF', 1),
('2345-BCDE-6645-3DDF', 2),
('2345-BCDE-6645-4DDF', 3),
('2345-BCDE-6645-5DDF', 4),
('2345-BCDE-6645-6DDF', 5),
('2345-BCDE-6645-7DDF', 6),
('2345-BCDE-6645-aDDF', 7),
('2345-BCDE-6645-sDDF', 8),
('2345-BCDE-6645-dDDF', 9),
('2345-BCDE-6645-dDDF', 10),
('2345-BCDE-6645-dDDF', 11),
('2345-BCDE-6645-dDDF', 12);
WITH cte AS (
SELECT
IN_ID,
IN_InvoiceID,
CASE
WHEN (((ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) - 1) / 5) % 2 = 0) THEN 'L'
ELSE 'R'
END AS IN_Position
FROM #InvoiceTable
),
cteL AS (
SELECT IN_ID, IN_InvoiceID, ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) AS IN_RowNumber
FROM cte
WHERE IN_Position = 'L'
),
cteR AS (
SELECT IN_ID, IN_InvoiceID, ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) AS IN_RowNumber
FROM cte
WHERE IN_Position = 'R'
)
SELECT cteL.IN_ID, cteL.IN_InvoiceID, cteR.IN_ID, cteR.IN_InvoiceID
FROM cteL
LEFT JOIN cteR ON (cteL.IN_RowNumber = cteR.IN_RowNumber)
Output:
IN_ID IN_InvoiceID IN_ID IN_InvoiceID
2345-BCDE-6645-1DDF 1 2345-BCDE-6645-7DDF 6
2345-BCDE-6645-3DDF 2 2345-BCDE-6645-aDDF 7
2345-BCDE-6645-4DDF 3 2345-BCDE-6645-sDDF 8
2345-BCDE-6645-5DDF 4 2345-BCDE-6645-dDDF 9
2345-BCDE-6645-6DDF 5 2345-BCDE-6645-dDDF 10
2345-BCDE-6645-dDDF 11 NULL NULL
2345-BCDE-6645-dDDF 12 NULL NULL

sql generate code based on three column values

I have three columns
suppose
row no column1 column2 column3
1 A B C
2 A B C
3 D E F
4 G H I
5 G H C
I want to generate code by combining these three column values
For Eg.
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001
by checking combination of three columns
logic is that
if values of three columns are same then like first time it shows 'ABC001'
and 2nd time it shows 'ABC002'

You can try this:
I dont know what you want for logic with 00, but you can add them manuel or let the rn decide for you
declare #mytable table (rowno int,col1 nvarchar(50),col2 nvarchar(50),col3 nvarchar(50)
)
insert into #mytable
values
(1,'A', 'B', 'C'),
(2,'A', 'B', 'C'),
(3,'D', 'E', 'F'),
(4,'G', 'H', 'I'),
(5,'G', 'H', 'C')
Select rowno,col1,col2,col3,
case when rn >= 10 and rn < 100 then concatcol+'0'+cast(rn as nvarchar(50))
when rn >= 100 then concatcol+cast(rn as nvarchar(50))
else concatcol+'00'+cast(rn as nvarchar(50)) end as ConcatCol from (
select rowno,col1,col2,col3
,Col1+col2+col3 as ConcatCol,ROW_NUMBER() over(partition by col1,col2,col3 order by rowno) as rn from #mytable
) x
order by rowno
My case when makes sure when you hit number 10 it writes ABC010 and when it hits above 100 it writes ABC100 else if its under 10 it writes ABC001 and so on.
Result

TSQL: CONCAT(column1,column2,column3,RIGHT(REPLICATE("0", 3) + LEFT(row_no, 3), 3))

You should combine your columns like below :
SELECT CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(ORDER BY
(
SELECT NULL
)))+') '+DATA AS Data
FROM
(
SELECT column1+column2+column3+'00'+CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(PARTITION BY column1,
column2,
column3 ORDER BY
(
SELECT NULL
))) DATA
FROM <table_name>
) T;
Result :
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001

MySQL:
CONCAT(column1,column2,column3,LPAD(row_no, 3, '0'))
[you will need to enclose the 'row no' in ticks if there is a space in the name of the field instead of underscore.]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: select with disctint left - sql

Related

Find matching first 7 chars to identify duplicates

How to select just the rows in a table that fit a criteria , avoiding duplicates in SQL

Conditional group by with window function in Snowflake query

SQL Server loop through a table for every 5 rows

sql generate code based on three column values

Categories

Resources