Remove one, non-unique value from a 2d array - sql

To expand on my answered question here:
Remove one, non-unique value from an array
Given this table in PostgreSQL 9.6:
CREATE TABLE test_table (
id int PRIMARY KEY
, test_array text[][]
);
With a row like:
INSERT INTO test_table (id, test_array)
VALUES (1 , '{ {A,AA},{A,AB},{B,AA},{B,AB} }');
How would I remove an index from test_array:
a) matching the [0] value,
b) matching both the [0] and [1] values.
I am getting an exception when using array_position:
searching for elements in multidimensional arrays is not supported
Also, how would an update query be constructed based on this matching?
I'm not sure that I can build a query as done in a 1d array.
Any help is appreciated.

Decided to normalize instead (in this instance, breaking the arrays into two tables with reference keys), per a_horse_with_no_name's recommendation.

Related

BigQuery insert values AS, assume nulls for missing columns

Imagine there is a table with 1000 columns.
I want to add a row with values for 20 columns and assume NULLs for the rest.
INSERT VALUES syntax can be used for that:
INSERT INTO `tbl` (
date,
p,
... # 18 more names
)
VALUES(
DATE('2020-02-01'),
'p3',
... # 18 more values
)
The problem with it is that it is hard to tell which value corresponds to which column. And if you need to change/comment out some value then you have to make edits in two places.
INSERT SELECT syntax also can be used:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
... # 980 more NULL AS column
Then if I need to comment out some column just one line has to be commented out.
But obviously having to set 980 NULLs is an inconvenience.
What is the way to combine both approaches? To achieve something like:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
The query above doesn't work, the error is Inserted row has wrong column count; Has 20, expected 1000.
Your first version is really the only one you should ever be using for SQL inserts. It ensures that every target column is explicitly mentioned, and is unambiguous with regard to where the literals in the VALUES clause should go. You can use the version which does not explicitly mention column names. At first, it might seem that you are saving yourself some code. But realize that there is a column list which will be used, and it is the list of all the table's columns, in whatever their positions from definition are. Your code might work, but appreciate that any addition/removal of a column, or changing of column order, can totally break your insert script. For this reason, most will strongly advocate for the first version.
You can try following solution, it is combination of above 2 process which you have highlighted in case study:-
INSERT INTO `tbl` (date, p, 18 other coll names)
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
Couple of things you should consider here are :-
Other 980 Columns should ne Nullable, that means it should hold NULL values.
All 18 columns in Insert line and Select should be in same order so that data will be inserted in same correct order.
To Avoid any confusion, try to use Alease in Select Query same as Insert Table Column name. It will remove any ambiguity.
Hopefully it will work for you.
In BigQuery, the best way to do what you're describing is to first load to a staging table. I'll assume you can get the values you want to insert into JSON format with keys that correspond to the target column names.
values.json
{"date": "2020-01-01", "p": "p3", "column": "value", ... }
Then generate a schema file for the target table and save it locally
bq show --schema project:dataset.tbl > schema.json
Load the new data to the staging table using the target schema. This gives you "named" null values for each column present in the target schema but missing from your json, bypassing the need to write them out.
bq load --replace --source_format=NEWLINE_DELIMIITED_JSON \
project:dataset.stg_tbl values.json schema.json
Now the insert select statement works every time
insert into `project:dataset.tbl`
select * from `project:dataset.stg_tbl`
Not a pure SQL solution but I managed this by loading my staging table with data then running something like:
from google.cloud import bigquery
client = bigquery.Client()
table1 = client.get_table(f"{project_id}.{dataset_name}.table1")
table1_col_map = {field.name: field for field in table1.schema}
table2 = client.get_table(f"{project_id}.{dataset_name}.table2")
table2_col_map = {field.name: field for field in table2.schema}
combined_schema = {**table2_col_map, **table1_col_map}
table1.schema = list(combined_schema.values())
client.update_table(table1_cols, ["schema"])
Explanation:
This will retrieve the schemas of both, convert their schemas into a dictionary with key as column name and value as the actual field info from the sdk. Then both are combined with dictionary unpacking (the order of unpacking determines which table's columns have precedence when a column is common between them. Finally the combined schema is assigned back to table 1 and used to update the table, adding the missing columns with nulls.

Filter an SQL Array text[] for matching value containing a parameter

I have a table with a TEXT[] column. I want to return all rows that have at least one of the array value that contains my parameter.
Right now I'm doing WHERE array_to_string(arr, ',') ilike '%myString%'
But I feel their must be a better optimized way of doing that kind of search.
Plus I would also like to search for values begining or ending by my parameter.
CREATE TABLE IF NOT EXISTS my_table
(
id BIGSERIAL,
col_array TEXT[],
CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
insert into my_table(col_array)
VALUES ('{ABC,DEF}'),
('{FGH,IJK}'),
('{LMN}'),
('{OPQ}');
select * from my_table where ARRAY_TO_STRING(col_array, ',') ilike '%F%';
this works as it returns only first 2 rows.
You can find a sqlfiddle here: http://sqlfiddle.com/#!17/09632/7
I would use a sub-query:
select t.*
from my_table t
where exists (select *
from unnest(t.col_array) as x(e)
where x.e ilike '%F%')
You might want to re-consider your decision to de-normalize your model.
Quote from the manual
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

PostgreSQL insert into and select from an array of user defined objects

I've been having some issues while trying to learn PostgreSQL. I've created a relational object called person and then a table consisting of a primary integer key and an array of person objects. I have a feeling it's the way I am inserting rows into this array however, I am unsure of how to access specific columns of the object as well (Ex. person.name).
Currently the only way I was able to insert a row is as follows however, I think it may just be making a string object instead of the proper person object.
INSERT INTO towns VALUES (0, '{"(bob,blue,springfield,33)"}');
For reference the schema I created is:
CREATE TYPE person AS (
name text,
favorite_color text,
hometown text,
age integer
);
CREATE TABLE towns (
town_id integer PRIMARY KEY,
people person[]
);
That's one of the reasons I prefer the array[...] syntax over '{...}'. You don't need to think about nesting quotes:
INSERT INTO towns (town_id, people)
VALUES
(0, array[('bob','blue','springfield',33)::person]);
('bob','blue','springfield',33)::person creates a record of type person and array[...] that makes that a single element array. You have to cast the anonymous record created with (...) to person to make this work. If you want to insert multiple person records into the array it's a bit less typing to case the whole array at the end:
array[('bob','blue','springfield',33), ('arthur','red','sleepytown',42)]::person[]
To select a specific element of a record in an array, you can use e.g:
select town_id, people[1].name
from towns
You are combining 2 things here, a field type as array of a composite type person.
To insert a composite type you should do something like this ROW("bob","blue","springfield",33), note that ROW keyword is optional
For array types you should use brackets inside single quotes, like '{val1,val2}', in your case, you are adding only one element of the array, which result is a person type.
Your example should look like:
INSERT INTO towns VALUES (0, '{ROW("bob","blue","springfield",33)}');
Note that the double quotes are for values inside the person type, not for the whole object.
References: Composite Types, Arrays
To access the person value, you have to wrap the object in (), so (person).name
The only way I could get this to work was by inserting in the following way:
INSERT INTO towns VALUES (0, (array[ROW('bob','blue','springfield',33)::person]));
And to select you would have to do this:
select (people[1]).age from towns;

Find position(s) in array matching a given sub-array

Given this table:
CREATE TABLE datasets.travel(path integer[], path_timediff double precision[]);
INSERT INTO datasets.travel
VALUES (array[50,49,49,49,49,50], array[NULL,438,12,496,17,435]);
I am looking for some kind of function or query in the PostgreSQL that for a given input array[49,50] will find the matching consecutive index values in path which is [5,6] and the corresponding element in path_timediff which is 435 in the example (array index 6).
My ultimate purpose is to find all such occurrences of [49,50] in path and all the corresponding elements in path_timediff. How can I do that?
Assuming you have a primary key in your table you did not show:
CREATE TABLE datasets.travel (
travel_id serial PRIMARY KEY
, path integer[]
, path_timediff float8[]
);
Here is one way with generate_subscripts() in a LATERAL join:
SELECT t.travel_id, i+1 AS position, path_timediff[i+1] AS timediff
FROM (SELECT * FROM datasets.travel WHERE path #> ARRAY[49,50]) t
, generate_subscripts(t.path, 1) i
WHERE path[i:i+1] = ARRAY[49,50];
This finds all matches, not just the first.
i+1 works for a sub-array of length 2. Generalize with i + array_length(sub_array, 1) - 1.
The subquery is not strictly necessary, but can use a GIN index on (path) for a fast pre-selection:
(SELECT * FROM datasets.travel WHERE path #> ARRAY[49,50])
Related:
How to access array internal index with postgreSQL?
Parallel unnest() and sort order in PostgreSQL
PostgreSQL unnest() with element number

postgreSQL hstore if contains value

Is there a way to check if a value already exists in the hstore in the query itself.
I have to store various values per row ( each row is an "item").
I need to be able to check if the id already exists in database in one of the hstore rows without selecting everything first and doing loops etc in php.
hstore seems to be the only data type that offers something like that and also allows you to select the column for that row into an array.
Hstore may not be the best data type to store data like that but there isn't anything else better available.
The whole project uses 9.2 and i cannot change that - json is in 9.3.
The exist() function tests for the existence of a key. To determine whether the key '42' exists anywhere in the hstore . . .
select *
from (select test_id, exist(test_hs, '42') key_exists
from test) x
where key_exists = true;
test_id key_exists
--
2 t
The svals() function returns values as a set. You can query the result to determine whether a particular value exists.
select *
from (select test_id, svals(test_hs) vals
from test) x
where vals = 'Wibble';
hstore Operators and Functions
create table test (
test_id serial primary key,
test_hs hstore not null
);
insert into test (test_hs) values (hstore('a', 'b'));
insert into test (test_hs) values (hstore('42', 'Wibble'));