postgresql find common patterns of minimum length between strings - sql

I'm using postgresql 9.3 and I'm trying to inner join two tables on common string patterns of a minimum length.
Also I'm a noob to SQL try to be patient if possible.
For example:
TABLE 1
ID DATA
1 '1234,5678,1234,1111'
2 '1111,2222'
3 '4321'
TABLE 2
IDa DATA
1a '1111,2222,1234,5678,4321'
2a '1111,3837,2222'
3a '4321'
joining DATA column on strings matching more than 9 chars would yield:
IDa ID DATA
1a 2 '1111,2222'
1a 1 '1234,5678'
I had some success using LIKE but I can't force a minimum match length condition(or at least I don't know how). I'm assuming a regex is the solution here but I haven't been able to write one that accomplished what I'm looking for.

Your examples match on 2 x 4 characters, not more than 9 chars.
I suggest using array types (int[]) instead of character types, in combination with the handy intersection operator & from the additional module intarray. More details:
- Error when creating unaccent extension on PostgreSQL
- Postgresql intarray error: undefined symbol: pfree
Query could look like this
SELECT t2.ida, t1.id, t1.data & t2.data AS intersecting_data
FROM tbl1 t1
JOIN tbl2 t2 ON array_length(t1.data & t2.data, 1) = 2; -- or "> 1" ?
Not very efficient, this kind of cross join does not scale well.
Normalize
Faster alternative: a normalized schema with 1 row per data item. Then the operation boils down to a relational-division.
tbl1_data
tbl1_id item
1 1234
1 5678
1 1234
1 1111
2 1111
...
tbl2_data
tbl1_id item
1a 1111
1a 2222
...
Then the query could be:
SELECT tbl1_id, tbl2_id, array_agg(item) AS data
FROM tbl1_data d1
JOIN tbl2_data d2 USING (item)
GROUP BY 1,2
HAVING count(*) = 2; -- or "> 1" ?

Related

How to Join Table with Different Length of ID

I have two columns that look like this :
Table_1
termid
Nominal
Total
1234
75.000.000
1
123
11.432.105.000
61
2345
339.660.000
3
234
199.800.000
2
12345
3.760.079.000
29
Table_2
tid
type
region
locatin
merk
00012345
PSW01
Jakarta I
JKT1-LANTAMAL
HYOSUNG
DTBA234
EDC
Jakarta I
JKT1-RKB BRI
HYOSUNG
00001234
PSW01
Jakarta I
JKT1-APOTIK KIMIA FARMA
HYOSUNG
EDC2345
EDC
Jakarta III
JKT1-KPU JAKARTA PUSAT
WINCOR
00000123
PSW01
Jakarta I
JKT-SPBU CIDENG
HYOSUNG
So i want to left join the table with this query :
SELECT *
FROM Table_1 AS t1
LEFT JOIN Table_2 AS t2
ON t1.Termid = CAST(t2.tid AS INT)
The query can run perfectly when I exclude the EDC type. But since I want to concatenate the whole line, I'm having an error like:
Conversion failed when converting the varchar value 'DTBA234' to data type int.
I know that the error is because there are characters other than numeric. But I don't know how to solve like the above case.
Can you help me?
Thank you.
** Note : Sorry for my english
You can extract the integer from the tid column of Table_2 and join with the termid column of Table_1.
Assuming the integer value in the tid column will be together, you can do the following:
SELECT *
FROM Table_1 AS t1
LEFT JOIN Table_2 AS t2
ON t1.Termid = CAST(SUBSTRING(t2.tid, PATINDEX('%[0-9]%', t2.tid), LEN(t2.tid)) AS INT)
NOTE: If tid has any value like EDC2345DBTA123, this won't work.
As Sql Server doesnt support regexp_replace like other dbs theres no direct way but alternatively you can use SUBSTRING(id, Patindex...) like below reference :
How to find and remove alphabet letters in in a column on MS sql server

Need to join 2 tables using substring of different lengths on one column with length stated in second table

I need to join 2 tables of which I need to substring a column in Table 1. What is unknown is the length of the substring in order to join to Table 2. The first few numbers are the joining key with differing lengths. Table 2 does state the length and will be the indicator on which entries need to be substringed with a specific length. The second table has fixed length of 9 so will also need to be substringed (which will be easy to do). Table 1 is my problem. The length column in Table 2 tells you how much of the ShortRef to use as well as how much to substring RefNr in Table1 which then becomes the join. However, I am not sure how to do this in SSMS or whether it is possible.
Since table 2 informs how much to substring, I currently don't see a solution and I don't know if like will work or how to do this using like.
Example:
TABLE 1
|RefNr |
|----------------|
|1234567890101234|
|9876543210090876|
|1234569000100223|
TABLE 2
|ShortRef | Length | Name |
|---------|--------|------|
|123456789|8 |Alice |
|123456909|8 |Cindy |
|987654999|6 |Ben |
RESULTS SHOULD BE:
|RefNr | Substr Table1&2 based on Length in Table2 | Name |
|----------------|-------------------------------------------|------|
|1234567890101234| 12345678 |Alice |
|9876543210090876| 987654 |Ben |
|1234569000100223| 12345690 |Cindy |
EXAMPLE OF TABLES
I don't know if you're wanting this exactly, but i took correct output for the example table with this.
SELECT RefNr
,tb2."Substring Table1 based on length in Table2"
,tb2.Name
FROM Table1
INNER JOIN
(SELECT SUBSTRING(ShortRef, 0, Length+1) as "Substring Table1 based on length in Table2",
Name,
Length
FROM Table2) as tb2
ON SUBSTRING(RefNr, 0, tb2.Length + 1) = tb2."Substring Table1 based on length in Table2"
It looks like you want to join the tables together with string operations:
select t1.*, left(t2.refnr, t2.length), t2.name
from table1 t1 join
table2 t2
on left(t2.refnr, t2.length) = left(t1.refnr, t2.length);

Select row with shortest string in one column if there are duplicates in another column?

Let's say I have a database with rows like this
ID PNR NAME
1 35 Television
2 35 Television, flat screen
3 35 Television, CRT
4 87 Hat
5 99 Cup
6 99 Cup, small
I want to select each individual type of item (television, hat, cup) - but for the ones that have multiple entries in PNR I only want to select the one with the shortest NAME. So the result set would be
ID PNR NAME
1 35 Television
4 87 Hat
5 99 Cup
How would I construct such a query using SQLite? Is it even possible, or do I need to do this filtering in the application code?
Since SQLite 3.7.11, you can use MIN() or MAX() to select a row in a group:
SELECT ID,
PNR,
Name,
min(length(Name))
FROM MyTable
GROUP BY PNR;
You can use MIN(length(name))-aggregate function to find out the minimum length of several names; the slightly tricky thing is to get corresponding ID and NAME into the result. The following query should work:
select mt1.ID, mt1.PNR, mt1.Name
from MyTable mt1 inner join (
select pnr, min(length(Name)) as minlength
from MyTable group by pnr) mt2
on mt1.pnr = mt2.pnr and length(mt1.Name) = mt2.minlength

Sybase SQL CASE with CAST

I have a Sybase table (which I can't alter) that I am trying to get into a specific table format. The table contains three columns all which are string values, with an id (which is not unique), a "position" which is a number that represents a field name, and a field column that is the value. The table looks like:
id position field
100 0 John
100 1 Jane
100 2 25
100 3 50
101 0 Dave
101 3 30
Position 0 means "SalesRep1", Position 1 means "SR1Commission", Position 2 means "SalesRep2", and Position 3 means "SR2Commission".
I am trying to get a table that looks like following, with the Commission columns being decimals instead of strings:
id SalesRep1 SR1Commission SalesRep2 SR2Commisson
100 John 25 Jane 50
101 Dave 30 NULL NULL
I've gotten close using CASE, but I end up with only one value per row and not sure there's a way to do what I want. I also have problems with trying to get CAST included to change the commission values from strings to decimals. Here's what I have so far:
SELECT id
CASE "position" WHEN '0' THEN field END AS SalesRep1,
CASE "position" WHEN '1' THEN field END AS SalesRep2,
CASE "position" WHEN '2' THEN field END AS SR1Commission,
CASE "position" WHEN '3' THEN field END AS SR2Commission
FROM v_custom_field WHERE id = ?
This gives me the following result when querying for id 100:
id SalesRep1 SR1Commission SalesRep2 SR2Commission
100 John NULL NULL NULL
100 NULL 25 NULL NULL
100 NULL NULL Jane NULL
100 NULL NULL NULL 50
This is close, but I want to 'collapse' the rows down into one row based off of the id as well as cast the commission values to numbers. I tried adding in a CAST(field AS DECIMAL) I'm not sure if this is even the right direction to go, and was looking into PIVOT, but Sybase doesn't seem to support that.
This is known as an entity-attribute-value table. They're a pain to work with because they're one step removed from being relational data, but they're very common for user-defined fields in applications.
If you can't use PIVOT, you'll need to do something like this:
SELECT DISTINCT s.id,
f0.field AS SalesRep1,
CAST(f1.field AS DECIMAL(20,5)) AS SR1Commission,
f2.field AS SalesRep2,
CAST(f3.field AS DECIMAL(20,5)) AS SR2Commission
FROM UnnamedSalesTable s
LEFT JOIN UnnamedSalesTable f0
ON f0.id = s.id AND f0.position = 0
LEFT JOIN UnnamedSalesTable f1
ON f1.id = s.id AND f1.position = 1
LEFT JOIN UnnamedSalesTable f2
ON f2.id = s.id AND f2.position = 2
LEFT JOIN UnnamedSalesTable f3
ON f3.id = s.id AND f3.position = 3
It's not very fast because it's a ton of self-joins followed by a DISTINCT, but it does work.

SQL - Select the longest substrings

I have the data like that.
AB
ABC
ABCD
ABCDE
EF
EFG
IJ
IJK
IJKL
and I just want to get ABCDE,EFG,IJKL. how can i do that oracle sql?
the size of the char are min 2 but doesn't have a fixed length, can be from 2 to 100.
In the event that you mean "longest string for each sequence of strings", the answer is a little different -- you are not guaranteed that all have a length of 4. Instead, you want to find the strings where adding a letter isn't another string.
select t.str
from table t
where not exists (select 1
from table t2
where substr(t2.str, 1, length(t.str)) = t.str and
length(t2.str) = length(t.str) + 1
);
Do note that performance of this query will not be great if you have even a moderate number of rows.
Select all rows where the string is not a substring of any other row. It's not clear if this is what you want though.
select t.str
from table t
where not exists (
select 1
from table t2
where instr(t1.str, t2.str) > 0
);