Take the difference between two lists in PostgreSQL - sql

I have two columns in my table with multiple values, and I want to get the values that are in one of the columns and not in the other column.
Its best described by an example:
including_ids | excluding_ids
123, 456 | 456, 789
I want to create a new column of all the including_ids that are not in the excluding_ids, so in the above example:
including_ids | excluding_ids | remaining_ids
123, 456 | 456, 789 | 123
If easier, I could also represent the values as lists or arrays or something like that.

You can use arrays for that:
CREATE TABLE mytable (including_ids integer[], excluding_ids integer[]);
INSERT INTO mytable VALUES ('{123,456}', '{456,789}');
INSERT INTO mytable VALUES ('{1,2,3}', '{3,4,5}');
Then you can get the result you want like this:
SELECT (SELECT array_agg(i)
FROM unnest(m.including_ids) AS arr(i)
WHERE NOT ARRAY[i] <# m.excluding_ids)
FROM mytable AS m;
array_agg
-----------
{123}
{1,2}
(2 rows)
But, as jarlh commented, using arrays or other composite data types is often a bad idea if you want to manipulate the values inside the database a lot. A more normalized data model is often a better idea: queries will become simpler, and the performance will be better.

You could also use the intarray extension:
create extension intarray;
SELECT including_ids - excluding_ids as remaining_ids
FROM mytable;
remaining_ids
───────────────
{123}
{1,2}
(2 rows)

Related

Split not-atomar value into multiple rows with PostgreSQL

I have some not atomar data in a database like this:
ID
Component ID List
1
123, 456
2
123, 345
I need to transform those table into a view that provides the "Component ID List" in a way, that I can use joins. Expected result:
ID
Component ID List
1
123
1
456
2
123
2
345
Because I have this case in quite a few tables I look for the possibility to create a reusable way to perform this action, e.g. with a SQL-function. The tables have different column-names so the function would need a parameter, like this:
SELECT *, split_values("Component ID List") FROM xyz
I know the best way would be to fix the problem in the raw-data but that's not possible in this case.
Any suggestions how to solve this the best way possible?
You can use unnest(string_to_array(Component_ID_List, ', ')):
SELECT ID,
unnest(string_to_array(Component_ID_List, ', ')) as Component_ID_List
FROM table_name;
Fiddle

Data field - search and write value in new data field (Oracle)

Sorry, I don't know how to describe that as a title.
With a query (example: Select SELECT PKEY, TRUNC (CREATEDFORMAT), STATISTICS FROM BUSINESS_DATA WHERE STATISTICS LIKE '% business_%'), I can display all data that contains the value "business_xxxxxx".
For example, the data field can have the following content: c01_ad; concierge_beendet; business_start; or also skill_my; pre_initial_markt; business_request; topIntMaster; concierge_start; c01_start;
Is it now possible in a temp-only output the corresponding value in another column?
So the output looks like this, for example?
PKEY | TRUNC(CREATEDFORMAT) | NEW_STATISTICS
1 | 13.06.2020 | business_start
2 | 14.06.2020 | business_request
That means removing everything that does not start with business_xxx? Is this possible in an SQL query? RegEx would not be the right one, I think.
I think you want:
select
pkey,
trunc(createdformat) createddate,
regexp_substr(statistics, 'business_\S*') new_statistics
from business_data
where statistics like '% business_%'
You can also use the following regexp_substr:
SQL> select regexp_substr(str,'business_[^;]+') as result
2 from
3 --sample data
4 (select 'skill_my; pre_initial_markt; business_request; topIntMaster; concierge_start; c01_start;' as str from dual
5 union all
6 select 'c01_ad; concierge_beendet; business_start;' from dual);
RESULT
--------------------------------------------------------------------------------
business_request
business_start
SQL>

Querying array of text in postgres

I have an array type I want to store in Postgres. One of the major use cases I have is to see if any of the records has an array which has a string in it.
eg.
| A | ["NY", "Paris", "Milan"] |
| B | ["Paris", "NY"] |
| C | [] |
| D | ["Milan"] |
Does there exist a row with Paris in the array? Which rows have Milan in the array? and so on.
I have 2 options on how to store the column. I can either make it a type text[] or convert it into a json as {"cities": ["NY", "Paris", "Milan"]} and then store as a JSONB field
However, I am not sure what would allow the fastest querying for the use case I have. Is there any one obviously better way of doing this? Am I tying myself down in any way by choosing one over the other? If I choose one over the other then how can I query the DB?
As you seem to be storing simple lists of values, I would recommend to use datataype Array over JSON, which better fits more complex cases (nested datastructures, associative arrays, ...).
To check for the value of an element at any position in the array, you can use array function ANY().
Here is a query that will return all records where the array stored in column cities contains 'Paris' :
SELECT t.* FROM mytable t WHERE 'Paris' = ANY(t.cities);
Yields :
id cities
---------------------------
A ["NY","Paris","Milan"]
B ["Paris","NY"]
Demo on DB Fiddle
For more information :
Postgres Arrays Documentation
Postgres Arrays Tutorial
I've noticed it is better to query JSONB, if it is a simple key-value store.
As in for instance you want to store arbitrary info on a row that your not sure what the columns(keys) would be.
info = {"a":"apple", "b":"ball"}
For use cases like yours, it would be better if you could design the db with simple tables so you could use JOINS and Indexes to your advantage.
You could restructure the tables like :
Location
id | name
----------
1 | Paris
2 | NY
3 | Milan
Other Table (with foreign key on location table)
user | location_id
--------------------
A | 1
A | 3
B | 2
Using these set of tables it would be easy to query all users with location paris using JOINS.

In Postgres: Select columns from a set of arrays of columns and check a condition on all of them

I have a table like this:
I want to perform count on different set of columns (all subsets where there is at least one element from X and one element from Y). How can I do that in Postgres?
For example, I may have {x1,x2,y3}, {x4,y1,y2,y3},etc. I want to count number of "id"s having 1 in each set. So for the first set:
SELECT COUNT(id) FROM table WHERE x1=1 AND x2=1 AND x3=1;
and for the second set does the same:
SELECT COUNT(id) FROM table WHERE x4=1 AND y1=1 AND y2=1 AND y3=1;
Is it possible to write a loop that goes over all these sets and query the table accordingly? The array will have more than 10000 sets, so it cannot be done manually.
You should be able convert the table columns to an array using ARRAY[col1, col2,...], then use the array_positions function, setting the second parameter to be the value you're checking for. So, given your example above, this query:
SELECT id, array_positions(array[x1,x2,x3,x4,y1,y2,y3,y4], 1)
FROM tbl
ORDER BY id;
Will yield this result:
+----+-------------------+
| id | array_positions |
+----+-------------------+
| a | {1,4,5} |
| b | {1,2,4,7} |
| c | {1,2,3,4,6,7,8} |
+----+-------------------+
Here's a SQL Fiddle.

SQL composite key value vs string

I have a list of integer from 1 to N elements (N < 24)
At the moment, there are two solutions to manage this value in a SQL database (I think it is the same for MySQL and Microsoft SQL Server)
Solution 1: use VARCHAR and , to separate integer values:
aaa | 40,50,50,10,600,200
aab | 40,50,600,200
aac | 40,50,50,10,600,200,500,1
Solution 2: create a new table with composite primary key (key, id) (id = index of element in list) and value:
aaa | 0 | 40
aaa | 1 | 50
aaa | 2 | 50
....
aab | 0 | 40
aab | 1 | 50
aab | 2 | 600
....
What is it better solution considering I have many items of data to load and I need to refresh this data many times
Thanks
Edit:
my operative case is: i need to refresh/read all data (list for key) with same call and i never call one by one, this is why i think first approach better.
And all math like avg or max i wanna do on client.
Usually the second approach is preferable. One advantage is ease of access:
-- Third value of aaa
select value from mytable where key = 'aaa' and pos = 3;
-- Avarage value of aaa
select avg(value) from mytable where key = 'aaa';
-- Avarage number of values
select avg(cnt) from (select count(*) as cnt from mytable group by key) counted;
Another is data consistency. You can add simple constraints to your columns, such as to allow only integers from, say, 1 to 700 and positions only up to 23.
There is an exception to the above, though. If you use the database only to store the list as is and you don't want to select separate values or even aggregate them, i.e. if this is just a string to the DBMS and your queries don't care about its content, then store it as a simple string. Why not?
The second solution that you propose is the classic way of doing this, I would recommend that.
The first solution is quite terrible in scaling and in other hundred things