Get rows where value is not a substring in another row - sql

I'm writing recursive sql against a table that contains circular references.
No problem! I read that you can build a unique path to prevent infinite loops. Now I need to filter the list down to only the last record in the chain. I must be doing something wrong though. -edit I'm adding more records to this sample to make it more clear why just selecting the longest record doesn't work.
This is an example table:
create table strings (id int, string varchar(200));
insert into strings values (1, '1');
insert into strings values (2, '1,2');
insert into strings values (3, '1,2,3');
insert into strings values (4, '1,2,3,4');
insert into strings values (5, '5');
And my query:
select * from strings str1 where not exists
(
select * from strings str2
where str2.id <> str1.id
and str1.string || '%' like str2.string
)
I'd expect to only get the last records
| id | string |
|----|---------|
| 4 | 1,2,3,4 |
| 5 | 5 |
Instead I get them all
| id | string |
|----|---------|
| 1 | 1 |
| 2 | 1,2 |
| 3 | 1,2,3 |
| 4 | 1,2,3,4 |
| 5 | 5 |
Link to sql fiddle: http://sqlfiddle.com/#!15/7a974/1

My problem was all around the 'LIKE' comparison.
select * from strings str1
where not exists
(
select
*
from
strings str2
where
str2.id <> str1.id
and str2.string like str1.string || '%'
)

Related

Count string occurances within a list column - Snowflake/SQL

I have a table with a column that contains a list of strings like below:
EXAMPLE:
STRING User_ID [...]
"[""null"",""personal"",""Other""]" 2122213 ....
"[""Other"",""to_dos_and_thing""]" 2132214 ....
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]" 4342323 ....
QUESTION:
I want to be able to get a count of the amount of times each unique string appears (strings are seperable within the strings column by commas) but only know how to do the following:
SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by u.STRING
order by cnt desc;
However the above method doesn't work as it only counts the number of user ids that use a specific grouping of strings.
The ideal output using the example above would like this!
DESIRED OUTPUT:
STRING COUNT_Instances
"null" 1223
"personal" 543
"Other" 324
"to_dos_and_thing" 221
"getting_things_done" 146
"Work!!!!!" 22
Based on your description, here is my sample table:
create table u (user_id number, string varchar);
insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
I used SPLIT_TO_TABLE to split each string as a row, and then REGEXP_SUBSTR to clean the data. So here's the query and output:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string , ',' ) s
GROUP BY extracted
order by count(*) DESC;
+---------------------+----------+
| EXTRACTED | COUNT(*) |
+---------------------+----------+
| Other | 2 |
| null | 1 |
| personal | 1 |
| to_dos_and_thing | 1 |
| getting_things_done | 1 |
| TO_dos_and_thing | 1 |
| Work!!!!! | 1 |
+---------------------+----------+
SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html
REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html

SQL Server - Ordering Combined Number Strings Prior To Column Insert

I have 2 string columns (thousands of rows) with ordered numbers in each string (there can be zero to ten numbers in each string). Example:
+------------------+------------+
| ColString1 | ColString2 |
+------------------+------------+
| 1;3;5;12; | 4;6' |
+------------------+------------+
| 1;5;10 | 2;26; |
+------------------+------------+
| 4;7; | 3; |
+------------------+------------+
The end result is to combine these 2 columns, sort the numbers in
ascending order and then put each number into individual columns (smallest, 2nd smallest etc).
e.g. Colstring1 is 1;3;5;12; and ColString2 is 4;6; needs to return 1;3;4;5;6;12; which I then use xml to allocated into columns.
Everthing works fine using xml apart from the step to order the numbers (i.e I'm getting 1;3;5;12;4;6; when I combine the strings i.e. not in ascending order).
I've tried put them into a JSON array first to order, thinking I could do a top[1] etc but that did not work.
Any help on how to combine the 2 columns and order them before inserting into columns:
Steps so far:
Example data:
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ColString1 VARCHAR(50), ColString2 VARCHAR(50));
INSERT INTO #tbl (ColString1, ColString2)
VALUES
('1;3;5;12;', '4;6;'),
('1;5;10;', '2;26;'),
('14;', '3;8;');
XML Approach (Combines strings and puts into columns but not in the correct order):
;WITH Split_Numbers (xmlname)
AS
(
SELECT
CONVERT(XML,'<Names><name>'
+ REPLACE ( LEFT(ColString1+ColString2,LEN(ColString1+ColString2) - 1),';', '</name><name>') + '</name></Names>') AS xmlname
FROM #tbl
)
SELECT
xmlname.value('/Names[1]/name[1]','int') AS Number1,
xmlname.value('/Names[1]/name[2]','int') AS Number2,
xmlname.value('/Names[1]/name[3]','int') AS Number3,
xmlname.value('/Names[1]/name[4]','int') AS Number4,
xmlname.value('/Names[1]/name[5]','int') AS Number5
--etc for additional columns
FROM Split_Numbers
Current Output: numbers not in correct order,
+---------+---------+---------+---------+---------+
| Number1 | Number2 | Number3 | Number4 | Number5 |
+---------+---------+---------+---------+---------+
| 1 | 3 | 5 | 12 | 4 |
| 1 | 5 | 10 | 2 | 26 |
| 14 | 3 | 8 | NULL | NULL |
+---------+---------+---------+---------+---------+
Desired Output: numbers in ascending order.
+---------+---------+---------+---------+---------+
| Number1 | Number2 | Number3 | Number4 | Number5 |
+---------+---------+---------+---------+---------+
| 1 | 3 | 4 | 5 | 6 |
| 1 | 2 | 5 | 10 | 26 |
| 3 | 8 | 14 | NULL | NULL |
+---------+---------+---------+---------+---------+
JSON Approach: combines the columns into a JSON array but I still can't order correctly when in JSON format.
REPLACE ( CONCAT('[', LEFT(ColString1+ColString2,LEN(ColString1+ColString2) - 1), ']') ,';',',')
Any help will be greatly appreciated whether there is a way to order the xml or JSON string prior to entry. Happy to consider an alternative way if there is an easier solution.
You can use string_agg() and string_split():
select t.*, newstring
from t cross apply
(select string_agg(value, ',') order by (value) as newstring
from (select s1.value
from unnest(colstring1, ',') s1
union all
select s2.value
from unnest(colstring2, ',') s2
) s
) s;
That said, you should probably put your effort into fixing the data model. Storing numbers in strings is bad. Storing multiple values in a string is bad, bad. If the numbers are foreign references to other tables, that is bad, bad, bad, bad, bad.
While waiting for a DDL and sample data population, etc., here is a conceptual example for you. It is using XQuery and its FLWOR expression.
CTE does most of the heavy lifting:
Concatenates both columns values into one string. CONCAT() function protects against NULL values.
Converts it into XML data type.
Sorts XML elements by converting their values to int data type in the FLWOR expression.
Filters out XML elements with no legit values.
The rest is trivial.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(100), col2 VARCHAR(100));
INSERT INTO #tbl (col1, col2)
VALUES
('1;3;5;12;', '4;6;'),
('1;5;10;', '2;26;');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ';';
;WITH rs AS
(
SELECT *
, CAST('<root><r><![CDATA[' +
REPLACE(CONCAT(col1, col2), #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML).query('<root>
{
for $x in /root/r[text()]
order by xs:int($x)
return $x
}
</root>') AS sortedXML
FROM #tbl
)
SELECT ID
, c.value('(r[1]/text())[1]','INT') AS Number1
, c.value('(r[2]/text())[1]','INT') AS Number2
, c.value('(r[3]/text())[1]','INT') AS Number3
-- continue with the rest of the columns
FROM rs CROSS APPLY sortedXML.nodes('/root') AS t(c);
Output
+----+---------+---------+---------+
| ID | Number1 | Number2 | Number3 |
+----+---------+---------+---------+
| 1 | 1 | 3 | 4 |
| 2 | 1 | 2 | 5 |
+----+---------+---------+---------+

postgres insert data from an other table inside array type columns

I have tow table on Postgres 11 like so, with some ARRAY types columns.
CREATE TABLE test (
id INT UNIQUE,
category TEXT NOT NULL,
quantitie NUMERIC,
quantities INT[],
dates INT[]
);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (1, 'cat1', 33, ARRAY[66], ARRAY[123678]);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (2, 'cat2', 99, ARRAY[22], ARRAY[879889]);
CREATE TABLE test2 (
idweb INT UNIQUE,
quantities INT[],
dates INT[]
);
INSERT INTO test2 (idweb, quantities, dates) VALUES (1, ARRAY[34], ARRAY[8776]);
INSERT INTO test2 (idweb, quantities, dates) VALUES (3, ARRAY[67], ARRAY[5443]);
I'm trying to update data from table test2 to table test only on rows with same id. inside ARRAY of table test and keeping originals values.
I use INSERT on conflict,
how to update only 2 columns quantities and dates.
running the sql under i've got also an error that i don't understand the origin.
Schema Error: error: column "quantitie" is of type numeric but expression is of type integer[]
INSERT INTO test (SELECT * FROM test2 WHERE idweb IN (SELECT id FROM test))
ON CONFLICT (id)
DO UPDATE
SET
quantities = array_cat(EXCLUDED.quantities, test.quantities),
dates = array_cat(EXCLUDED.dates, test.dates);
https://www.db-fiddle.com/f/rs8BpjDUCciyZVwu5efNJE/0
is there a better way to update table test from table test2, or where i'm missing the sql?
update to show result needed on table test:
**Schema (PostgreSQL v11)**
| id | quantitie | quantities | dates | category |
| --- | --------- | ---------- | ----------- | --------- |
| 2 | 99 | 22 | 879889 | cat2 |
| 1 | 33 | 34,66 | 8776,123678 | cat1 |
Basically, your query fails because the structures of the tables do not match - so you cannot insert into test select * from test2.
You could work around this by adding "fake" columns to the select list, like so:
insert into test
select idweb, 'foo', 0, quantities, dates from test2 where idweb in (select id from test)
on conflict (id)
do update set
quantities = array_cat(excluded.quantities, test.quantities),
dates = array_cat(excluded.dates, test.dates);
But this looks much more convoluted than needed. Essentially, you want an update statement, so I would just recommend:
update test
set
dates = test2.dates || test.dates,
quantities = test2.quantities || test.quantities
from test2
where test.id = test2.idweb
Note that this ues || concatenation operator instead of array_cat() - it is shorter to write.
Demo on DB Fiddle:
id | category | quantitie | quantities | dates
-: | :------- | --------: | :--------- | :------------
2 | cat2 | 99 | {22} | {879889}
1 | cat1 | 33 | {34,66} | {8776,123678}

Use IN to compare Array of Values against a table of data

I want to compare an array of values against the the rows of a table and return only the rows in which the data are different.
Suppose I have myTable:
| ItemCode | ItemName | FrgnName |
|----------|----------|----------|
| CD1 | Apple | Mela |
| CD2 | Mirror | Specchio |
| CD3 | Bag | Borsa |
Now using the SQL instruction IN I would like to compare the rows above against an array of values pasted from an excel file and so in theory I would have to write something like:
WHERE NOT IN (
ARRAY[CD1, Apple, Mella],
ARRAY[CD2, Miror, Specchio],
ARRAY[CD3, Bag, Borsa]
)
The QUERY should return rows 1 and 2 "MELLA" and "MIROR" are in fact typos.
You could use a VALUES expression to emulate a table of those arrays, like so:
... myTable AS t
LEFT JOIN (
VALUES (1, 'CD1','Apple','Mella')
, (1, 'CD2', 'Miror', 'Specchio')
, (1, 'CD3', 'Bag', 'Borsa')
) AS v(rowPresence, a, b, c)
ON t.ItemCode = v.a AND t.ItemName = v.b AND t.FrgnName = v.c
WHERE v.rowPresence IS NULL
Technically, in your scenario, you can do without the "rowPresence" field I added, since none of the values in your arrays are NULL any would do; I basically added it to point to a more general case.

Insert deleted data, plus some hard-coded values, with output statement

I want to move rows from one table to another, and delete from [foo] output deleted.[col] into [bar] (col) looks like a good option.
But the columns aren't identical. So I want to insert some hard-coded values (and ideally programmatically-determined values) into the destination table.
I set up a couple tables to demonstrate.
create table delete_output_test (
thing1 int not null,
thing2 varchar(50),
thing3 varchar(50)
)
create table delete_output_test2 (
thing1 int not null,
thing2 varchar(50),
thing3 varchar(50),
thing4 int
)
insert into delete_output_test values (0, 'hello', 'world'),
(1, 'it''s', 'me'),
(2, 'i', 'was'),
(3, 'wondering', 'if')
Now moving from one table to another works fine if I'm not too needy...
delete from delete_output_test2
output deleted.thing1,
deleted.thing2,
deleted.thing3
into delete_output_test
(thing1,
thing2,
thing3)
But what if I want to populate that last column?
delete from delete_output_test2
output deleted.thing1,
deleted.thing2,
deleted.thing3
into delete_output_test
(thing1,
thing2,
thing3,
4)
Incorrect syntax near '4'. Expecting '.', ID, PSEUDOCOL, or QUOTED_ID.
I'm fairly new to SQL, so I'm not even sure what those things are.
So why can't I hard-code a value to insert? Or even replace the 4 with some select statement if I want to get clever?
Well, delete_output_test doesn't have a column named 4 or thing4 but delete_output_test2 does. So you can do this:
delete from delete_output_test
output deleted.thing1,
deleted.thing2,
deleted.thing3,
4
into delete_output_test2
(thing1,
thing2,
thing3,
thing4);
select * from delete_output_test2;
rextester demo: http://rextester.com/CVZOB61339
returns:
+--------+-----------+--------+--------+
| thing1 | thing2 | thing3 | thing4 |
+--------+-----------+--------+--------+
| 0 | hello | world | 4 |
| 1 | it's | me | 4 |
| 2 | i | was | 4 |
| 3 | wondering | if | 4 |
+--------+-----------+--------+--------+
The requirement is a little curious, but I think you can do it using a CTE or subquery:
with todelete as (
select dot.*, 4 as col4
from delete_output_test
)
delete from todelete
output deleted.thing1, deleted.thing2, deleted.thing3, deleted.col4
into delete_output_test2(thing1, thing2, thing3, col4);
You need to be sure that delete_output_test has space for the additional column.