When to store data in a designated table or serialized? How to make the call? - sql

I'm using a postgresql install to house my data.
My Post model has an attribute called "selection" which currently stores data within a TEXT column in the form of: "x1,x2,x3,x4,x5..."
When I need to access this data i split it on the the comma and do my thing with it.
I'm prototyping an app so i quickly just did the easiest thing when i was writing it but now i can see an alternative option would be to create a table for "selections" and associate it back to the post, then have individual rows for each bit.
My question is, how or when do i make the choice to store or not data like this?
Thank you

PostgreSQL has array types - so you can use a "text[]" type
postgres=# create table xxx(a text[]);
CREATE TABLE
postgres=# insert into xxx values(array['x1','x2']);
INSERT 0 1
postgres=# insert into xxx values(array['x1','x2','x3']);
INSERT 0 1
postgres=# select * from xxx where 'x1' = ANY(a);
a
------------
{x1,x2}
{x1,x2,x3}
(2 rows)
postgres=# select * from xxx where 'x3' = ANY(a);
a
------------
{x1,x2,x3}
(1 row)
You can use index for large data too

If those represent other data elements in other tables in your database, then I would never store it as a comma separated string.
SQL in general is optimized for set-based arithmetic and functions, not for string parsing.
The only scenario that I can think of where the string version may be easier/faster is if you want to find a specific set of values and ONLY those values, i.e. Col = 'A1, B2, C3, d4'.
Otherwise, if you want to check individual fields or do other comparisons, storing that data in a normalized table is the best course of action. It's more extensible, easier and more efficient to check for specific values, and will make other operations on that table quicker (since you store less data in-row for that main table).

Related

How can I create a calculate column in the creation of table in POSTGRESQL, for example in sql server LineTotal AS Price * Quantity [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

How to insert into existing table column value

I am VERY new to SQL and I am having a little trouble.
Let's say I have a table, called data, with two columns, class, and name.
I want to create the column math if it doesn't exist, and give it a value of John.
I can do this with:
INSERT INTO data VALUES ('math','John')
But if I change John to Steve, I want math to have a value of "John","Steve".
But instead, it creates another row called "math" with a value of "Steve", how can I make this insert into the same column?
Thanks
I would strongly recommend against storing a CSV list of names in your table. CSV is hard to query, update, and maintain. Actually, there is nothing at all wrong with your current approach:
class | name
math | John
math | Steve
Data in this format can easily be queried, because it is relational. And if you need to add, update, or remove a name associated with a class, it becomes a one record affair without having to deal with CSV. Note that if you really need a CSV representation, you can still achieve that using SQLite's GROUP_CONCAT() function:
SELECT class, GROUP_CONCAT(name, ',') AS names
FROM yourTable
GROUP BY class
Not totally sure about SQLite but the normal command usually is something like:
ALTER TABLE table_name
ADD column_name datatype
Then insert into

Store SQL query result (1 column) as Array

After running my query I get 1 column result as
5
6
98
101
Is there a way to store this result as array so that I can use it later
in queries like
WHERE NOT IN ('5','6','98','101')
I am aware of storing single variable results but is this possible?
I can not use #Table variable as I will be rerunning the query again in the future and it goes out of scope
There are multiple way of storing those column data like using Temporary Tables or View or Table valued function but IMO there is no need of storing that column data anywhere. You can directly use that column in any query saying below (or) perform a JOIN which would be much better option than NOT IN
select * from
table2
where some_column not in (select column1 from this_table);
While this method is not recommended, storing an array in a single column can be done using CSV's(Comma Separated Values). Simply create a VARCHAR array and store it by storing a string containing the values in a specific order. Basically store all of your values into a string with each value being separated by a comma in that string. Store that into a column of your choice. You can later fetch the string and parse it with a string parser i.e using the .split() function in python. AGAIN I do not recommend doing this, I would instead use multiple columns, one referring to each value and access them that way instead
Using separate columns would make it easy to use in a Stored Procedure.

Using string values inside "IN" in Informix

In database table in Informix I have columns like this:
Table name is MYTABLE
key 1234
value 'POCO','LOCD',MACD'
Now I want to use this in a query like this
select * from table where symbol in (select value from MYTABLE where key='1234');
But this query is not working as value is stored as char and
output of select value from MYTABLE where key='1234' would be something like
"'POCO','LOCD',MACD'"
Is there a way to make this work. I want to achieve this in a single query.
Please suggest a better approach.
You cannot interpolate values like that. The database optimiser cannot be expected to know that value is going to return a string that looks like a list, which is to be interpreted in a list context.
Since you ask for a better approach…
That design breaks numerous fundamental rules about how databases should be structured. At a minimum, the 'value' column should be a COLLECTION data type, so that its role as a list of values is properly articulated. Personally I would create a standard, relational bridging table:
MYTABLE
key col1 col2
1234 .. ..
MYVALUE
key value
1234 POCO
1234 LOCD
1234 MACD
This is not the easy way out suggested by others, but it is the right answer.

PostgreSQL - best way to return an array of key-value pairs

I'm trying to select a number of fields, one of which needs to be an array with each element of the array containing two values. Each array item needs to contain a name (character varying) and an ID (numeric). I know how to return an array of single values (using the ARRAY keyword) but I'm unsure of how to return an array of an object which in itself contains two values.
The query is something like
SELECT
t.field1,
t.field2,
ARRAY(--with each element containing two values i.e. {'TheName', 1 })
FROM MyTable t
I read that one way to do this is by selecting the values into a type and then creating an array of that type. Problem is, the rest of the function is already returning a type (which means I would then have nested types - is that OK? If so, how would you read this data back in application code - i.e. with a .Net data provider like NPGSQL?)
Any help is much appreciated.
ARRAYs can only hold elements of the same type
Your example displays a text and an integer value (no single quotes around 1). It is generally impossible to mix types in an array. To get those values into an array you have to create a composite type and then form an ARRAY of that composite type like you already mentioned yourself.
Alternatively you can use the data types json in Postgres 9.2+, jsonb in Postgres 9.4+ or hstore for key-value pairs.
Of course, you can cast the integer to text, and work with a two-dimensional text array. Consider the two syntax variants for a array input in the demo below and consult the manual on array input.
There is a limitation to overcome. If you try to aggregate an ARRAY (build from key and value) into a two-dimensional array, the aggregate function array_agg() or the ARRAY constructor error out:
ERROR: could not find array type for data type text[]
There are ways around it, though.
Aggregate key-value pairs into a 2-dimensional array
PostgreSQL 9.1 with standard_conforming_strings= on:
CREATE TEMP TABLE tbl(
id int
,txt text
,txtarr text[]
);
The column txtarr is just there to demonstrate syntax variants in the INSERT command. The third row is spiked with meta-characters:
INSERT INTO tbl VALUES
(1, 'foo', '{{1,foo1},{2,bar1},{3,baz1}}')
,(2, 'bar', ARRAY[['1','foo2'],['2','bar2'],['3','baz2']])
,(3, '}b",a{r''', '{{1,foo3},{2,bar3},{3,baz3}}'); -- txt has meta-characters
SELECT * FROM tbl;
Simple case: aggregate two integer (I use the same twice) into a two-dimensional int array:
Update: Better with custom aggregate function
With the polymorphic type anyarray it works for all base types:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[id,id]]) AS x -- for int
,array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS y -- or text
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array.
Update for Postgres 9.5+
Postgres now ships a variant of array_agg() accepting array input and you can replace my custom function from above with this:
The manual:
array_agg(expression)
...
input arrays concatenated into array of one
higher dimension (inputs must all have same dimensionality, and cannot
be empty or NULL)
I suspect that without having more knowledge of your application I'm not going to be able to get you all the way to the result you need. But we can get pretty far. For starters, there is the ROW function:
# SELECT 'foo', ROW(3, 'Bob');
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
So that right there lets you bundle a whole row into a cell. You could also make things more explicit by making a type for it:
# CREATE TYPE person(id INTEGER, name VARCHAR);
CREATE TYPE
# SELECT now(), row(3, 'Bob')::person;
now | row
-------------------------------+---------
2012-02-03 10:46:13.279512-07 | (3,Bob)
(1 row)
Incidentally, whenever you make a table, PostgreSQL makes a type of the same name, so if you already have a table like this you also have a type. For example:
# DROP TYPE person;
DROP TYPE
# CREATE TABLE people (id SERIAL, name VARCHAR);
NOTICE: CREATE TABLE will create implicit sequence "people_id_seq" for serial column "people.id"
CREATE TABLE
# SELECT 'foo', row(3, 'Bob')::people;
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
See in the third query there I used people just like a type.
Now this is not likely to be as much help as you'd think for two reasons:
I can't find any convenient syntax for pulling data out of the nested row.
I may be missing something, but I just don't see many people using this syntax. The only example I see in the documentation is a function taking a row value as an argument and doing something with it. I don't see an example of pulling the row out of the cell and querying against parts of it. It seems like you can package the data up this way, but it's hard to deconstruct after that. You'll wind up having to make a lot of stored procedures.
Your language's PostgreSQL driver may not be able to handle row-valued data nested in a row.
I can't speak for NPGSQL, but since this is a very PostgreSQL-specific feature you're not going to find support for it in libraries that support other databases. For example, Hibernate isn't going to be able to handle fetching an object stored as a cell value in a row. I'm not even sure the JDBC would be able to give Hibernate the information usefully, so the problem could go quite deep.
So, what you're doing here is feasible provided you can live without a lot of the niceties. I would recommend against pursuing it though, because it's going to be an uphill battle the whole way, unless I'm really misinformed.
A simple way without hstore
SELECT
jsonb_agg(to_jsonb (t))
FROM (
SELECT
unnest(ARRAY ['foo', 'bar', 'baz']) AS table_name
) t
>>> [{"table_name": "foo"}, {"table_name": "bar"}, {"table_name": "baz"}]