Can SQL determine which values from a set of possible column values do not exist? - sql

I have a unique column. I also have a known set of elements that are possible values for the column. I need to know which of the possible values are not already in the table, and as such, are suitable for insertion.
Is this possible with SQL or is post processing required?
Currently, I am using the "in" operator to select all rows where the column value equals an element in my set. Then I remove all matched elements from my set via post processing.

Stick the allowed values in a temporary table allowed, then use a subquery using NOT IN:
SELECT *
FROM allowed
WHERE allowed.val NOT IN (
SELECT maintable.val
)
Some DBs will allow you to build up a table "in-place", instead of having to create a separate table. E.g. in PostgreSQL (any version):
SELECT *
FROM (
SELECT 'foo'
UNION ALL SELECT 'bar'
UNION ALL SELECT 'baz' -- etc.
) inplace_allowed
WHERE inplace_allowed.val NOT IN (
SELECT maintable.val
)
More modern versions of PostgreSQL (and perhaps other DBs) will let you use the slightly nicer VALUES syntax to do the same thing.

To do this entirely in SQL you will need to create a separate table with one column. Each row holds one value from the known set of elements. Assuming the table is called ElementList and the other table is called Existing:
SELECT * FROM ElementList WHERE Element NOT IN
(SELECT DISTINCT Element FROM Existing)
Depending on what database engine you're using you may be able to use a temporary table to create and hold the list without saving it permanently in the database. However, storing the list of allowed elements is valuable for constraining the Element column in the Existing table (and for presenting the user with allowed Elements in the user interface).

Related

How can I create a calculate column in the creation of table in POSTGRESQL, for example in sql server LineTotal AS Price * Quantity [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

How to subscript a Postgres column

I have a Postgres query:
SELECT main
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
Which returns a single column of values in the format (value_a, value_b) in each row. I want the outer query to format those values so that all the value_a's and value_b's are in their own separate columns.
Is there an easy way to do this?
Output screenshot:
http://example.com/path-to-ghosts.jpg
You can abuse row_to_json to do this, but it is probably best to avoid anonymous record types in the first place.
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
To give a concrete example (after running pgbench -i):
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (aid, bid)
END as main
FROM pgbench_accounts
LIMIT 100) inner_t;
But it only works in v10 and up.
This is more of an explanation than an actual answer. But it won't fit into a comment.
The thing is, SQL is a strictly typed language. Postgres demands to know the number and data types in the SELECT list at call time. The *-expansion in SELECT * FROM .. is based on registered types. Postgres knows the columns of a table because the structure is saved in the catalog tables.
The expression nested in your construct (col_a, col_b) is short for ROW(col_a, col_b) and a ROW constructor creates an anonymous record. The manual:
By default, the value created by a ROW expression is of an anonymous
record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.
Postgres does not know how to expand an anonymous record. *-expansion does not work.
You could cast like the manual says. But that's only an option if the type is stable, i.e. you always put in the same number of columns with the same data types. And that still would not preserve column names.
So, for the best solution, first define:
Obviously you want to preserve column vales.
Do you also want to preserve column names?
Do you also want to preserve column types?
And:
Is the number of columns in the expression always the same?
Are data types always the same?
The CASE condition is stable or based on other columns?
If the true aim of the game is to fit multiple values in a single CASE expression, you only care about values, create a text array instead:
SELECT main[1] AS col_a, main[2] AS col_b
FROM (
SELECT CASE WHEN true THEN ARRAY[col_a::text, col_b::text] END AS main
FROM table1
LIMIT 100
) inner_t;
You lose name and type. You can cast and add aliases if you know name & type.
Else you have to describe your use case more closely - in the question.
Try Below query
SELECT split_part(main,',',1) as Val1,split_part(main,',',2) as Val2
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t

SqlQuery and SqlFieldsQuery

it looks that SqlQuery only supports sql that starts with select *? Doesn't it support other sql that only select some columns like
select id, name from person and maps the columns to the corresponding POJO?
If I use SqlFieldQuery to run sql, the result is a QueryCursor of List(each List contains one record of the result). But if the sql starts with select *, the this list's contents would be different with field query like:
select id,name,age from person
For the select *, each List is constructed with 3 parts:
the first elment is the key of the cache
the second element is the pojo object that contains the data
the tailing element are the values for each column.
Why was it so designed? If I don't know what the sql that SqlFieldsQuery runs , then I need additional effort to figure out what the List contains.
SqlQuery returns key and value objects, while SqlFieldsQuery allows to select specific fields. Which one to use depends on your use case.
Currently select * indeed includes predefined _key and _val fields, and this will be improved in the future. However, generally it's a good practice to list fields you want to fetch when running SQL queries (this is true for any SQL database, not only Ignite). This way your code will be protected from unexpected behavior in case schema is changed, for example.

Oracle Selecting Columns with IN clause which includes NULL values

So I am comparing two Oracle databases by grabbing random rows in database A, and searching for these rows in database B based off their key columns. Then I compare the rows which are returned in java.
I am using the following query to find rows in database B using the key columns from database A:
select * from mytable
Where (Key_Column_A,Key_Column_B,Key_Column_C)
in (('1','A', 'cat'),('2','B', 'dog'),('3','C', ''));
This works just fine for the first two sets of keys, but the third key('3','C', '') does not work because there is a null value in the third column. Changing the statement to ('3','C', NULL) or changing the SQL to
select * from mytable
Where (Key_Column_A,Key_Column_B,Key_Column_C)
in ((('1','A', 'cat'),('2','B', 'dog'),('3','C', ''))
OR (Key_Column_A,Key_Column_B,Key_Column_C) IS NULL);
will not work either.
Is there a way to include a null column in an IN clause? And if not, is there a way to efficiently do the same thing? (My only solution currently is to create a check to make sure there are no nullable columns in my keys which would make this process rather unefficient and somewhat messy).
You can use it this way. I think it would work.
select * from mytable
Where (NVL(Key_Column_A,''),NVL(Key_Column_B,''),NVL(Key_Column_C,''))
in (('1','A', 'cat'),('2','B', 'dog'),('3','C', ''));
I am not sure about this (Key_Column_A,Key_Column_B,Key_Column_C) IS NULL. Wouldn't this imply that all of the columns (A,B,C) are NULL ?

sqlite data from one db to another

I have 2 sqlite databases, and I'm trying to insert data from one database to another. For example, "db-1.sqlite" has a table '1table' with 2 columns ('name', 'state'). Also, "db-2.sqlite" has a table '2table' with 2 columns ('name', 'url'). Both tables contain a list of 'name' values that are mostly common with each other but randomized, so the id of each row does not match.
I want to insert the values for the 'url' column into the db-1's table, but I want to make sure each url value goes to its corresponding 'name' value.
So far, I have done this:
> sqlite3 db-1.sqlite
sqlite> alter table 1table add column url;
sqlite> attach database 'db-2.sqlite' as db2;
Now, the part I'm not sure about:
sqlite> insert into 1table(url) select db2.2table.url from db2.2table where 1table.name==db2.2table.name
If you look at what I wrote above, you can tell what I'm trying to accomplish, but it is incorrect. If I can get any help on the matter, I'd be very grateful!!
The equality comparison operator in SQL is =, not ==.
Also, I suspect that you should be updating 1table, rather then inserting in it.
Finally, your table names start with digits, so you need to escape them.
This SQL should work better:
update `1table`
set url = (select db2.`2table`.url
from db2.`2table`
where `1table`.name = db2.`2table`.name);