Keep strings case-insensitively unique in a database - sql

I'd like to ensure that when a user wants to register a username in my system (web application), the user name is unique even if case is not regarded. So if a user named "SuperMan" is already registered, no other user must be allowed to register as "superman" or "SUPERman". This must be checked at database level.
In my current implementation, I do the following:
select count(*) from user where lower(name) = lower(?) for update;
-- If the count is greater than 0, abort with an error
-- Determine a new ID for the user
insert into user (id, name, …) values (?, ?, …);
I'm not sure whether the "for update" will lock the database far enough so that other users won't be able to register with an invalid name between the two SQL statements above. Probably it's no 100% safe solution. Unfortunately I cannot use unique keys in SQL because they will only compare case-sensitively.
Is there another solution to this? How about the following, to add more safety?
select count(*) from user where lower(name) = lower(?) for update;
-- If the count is greater than 0, abort with an error
-- Determine a new ID for the user
insert into user (id, name, …) values (?, ?, …);
-- Now count again
select count(*) from user where lower(name) = lower(?);
-- If the count is now greater than 1, do the following:
delete from user where id = ?;
-- Or alternatively roll back the transaction

The way I've done that is make a second column for the username that's either converted to all caps or all lower case. That I put a unique index on that. I use a trigger to generate the value in this other column so the code doesn't have to worry about it.
Come to think of it, you could use a function based index to get the same result:
CREATE UNIQUE INDEX user_name_ui1 ON user (lower(name))

Your assumption that an unique constraint will compare case sensitively is incorrect. They will compare according to the collation of the column. All commercial databases worth talking about support collations. Simply choose a case insensitive collation for your column, and declare a unique constraint on it. See:
Character Sets and Collations That MySQL Supports
Using SQL Server Collations
Oracle Linguistic Sorting and String Searching
SQLite collating sequences
...
Doing the enforcement by a lookup and before insert is not only extremely inefficient, is also incorrect under concurrency.

You can also change the way unique constraints work by changing the collation for that column: Unique constraint on table column

Related

Postgres ATOMIC stored procedure INSERT INTO . . . SELECT with one parameter and one set of rows from a table

I am trying to write a stored procedure to let a dev assign new user identities to a specified group when they don't already have one (i.e. insert a parameter and the output of a select statement into a joining table) without hand-writing every pair of foreign keys as values to do so. I know how I'd do it in T-SQL/SQL Server but I'm working with a preexisting/unfamiliar Postgres database. I would strongly prefer to keep my stored procedures as LANGUAGE SQL/BEGIN ATOMIC and this + online examples being simplified and/or using constants has made it difficult for me to get my bearings.
Apologies in advance for length, this is me trying to articulate why I do not believe this question is a duplicate based on what I've been able to find searching on my own but I may have overcorrected.
Schema (abstracted from the most identifying parts; these are not the original table names and I am not in a position to change what anything is called; I am also leaving out indexing for simplicity's sake) is like:
create table IF NOT EXISTS user_identities (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY NOT NULL,
[more columns not relevant to this query)
)
create table IF NOT EXISTS user_groups (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY NOT NULL,
name TEXT NOT NULL
)
create table IF NOT EXISTS group_identities (
user_id BIGINT REFERENCES user_identities(id) ON DELETE RESTRICT NOT NULL,
group_id BIGINT REFERENCES user_groups(id) ON DELETE RESTRICT NOT NULL
)
Expected dev behavior:
Add all predetermined identities intended to belong to a group in a single batch
Add identifying information for the new group (it is going to take a lot of convincing to bring the people involved around to using nested stored procedures for this if I ever can)
Bring the joining table up to date accordingly (what I've been asked to streamline).
If this were SQL Server I would do (error handling omitted for time and putting aside whether EXCEPT or NOT IN would be best for now, please)
create OR alter proc add_identities_to_group
#group_name varchar(50) NULL
as BEGIN
declare #use_group_id int
if #group_name is NULL
set #use_group_id = (select Top 1 id from user_groups where id not in (select group_id from group_identities) order by id asc)
ELSE set #use_group_id = (select id from user_groups where name = #group_name)
insert into group_identities (user_id, group_id)
select #use_group_id, id from user_identities
where id not in (select user_id from group_identities)
END
GO
Obviously this is not going to fly in Postgres; part of why I want to stick with atomic stored procedures is staying in "neutral" SQL, both to be closer to my comfort zone and because I don't know what other languages the database is currently set up for, but my existing education has played kind of fast and loose with differentiating what was T-SQL specific at any point.
I am aware that this is not going to run for a wide variety of reasons because I'm still trying to internalize the syntax, but the bad/conceptual draft I have written so that I have anything to stare at is:
create OR replace procedure add_identities_to_groups(
group_name text default NULL ) language SQL
BEGIN ATOMIC
declare use_group_id integer
if group_name is NULL
set use_group_id = (select Top 1 id from user_groups
where id not in (select user_id from group_identities)
order by id asc)
ELSE set use_group_id = (select id from user_groups where name = group_name) ;
insert into group_identities (group_id, user_id)
select use_group_id, id from user_identities
where id not in (select user_id from group_identities)
END ;
GO ;
Issues:
Have not found either answers for how to do this with the combination of a single variable and a column with BEGIN ATOMIC or hard confirmation that it wouldn't work (e.g. can atomic stored procedures just not accept parameters? I cannot find an answer to this on my own). (This is part of why existing answers that I can find here and elsewhere haven't been clarifying for me.)
~~Don't know how to compensate for Postgres's not differentiating variables and parameters from column names at all. (This is why examples using a hardcoded constant haven't helped, and they make up virtually all of what I can find off StackOverflow itself.)~~ Not a problem if Postgres will handle that intelligently within the atomic block but that's one of the things I hadn't been able to confirm on my own.
Google results for "vanilla" SQL unpredictably saturated with SQL Server anyway, while my lack of familiarity with Postgres is not doing me any favors but I don't know anyone personally who has more experience than I do.
because I don't know what other languages the database is currently set up for
All supported Postgres versions always include PL/pgSQL.
If you want to use procedural elements like variables or conditional statements like IF you need PL/pgSQL. So your procedure has to be defined with language plpgsql - that removes the possibility to use the ANSI standard BEGIN ATOMIC syntax.
Don't know how to compensate for Postgres's not differentiating variables and parameters from column names at all.
You don't. Most people simply using naming conventions to do that. In my environment we use p_ for parameters and l_ for "local" variables. Use whatever you prefer.
Quote from the manual
By default, PL/pgSQL will report an error if a name in an SQL statement could refer to either a variable or a table column. You can fix such a problem by renaming the variable or column, or by qualifying the ambiguous reference, or by telling PL/pgSQL which interpretation to prefer.
The simplest solution is to rename the variable or column. A common coding rule is to use a different naming convention for PL/pgSQL variables than you use for column names. For example, if you consistently name function variables v_something while none of your column names start with v_, no conflicts will occur.
As documented in the manual the body for a procedure written in PL/pgSQL (or any other language that is not SQL) must be provided as a string. This is typically done using dollar quoting to make writing the source easier.
As documented in the manual, if you want to store the result of a single row query in a variable, use select ... into from ....
As documented in the manual an IF statement needs a THEN
As documented in the manual there is no TOP clause in Postgres (or standard SQL). Use limit or the standard compliant fetch first 1 rows only instead.
To avoid a clash between names of variables and column names, most people use some kind of prefix for parameters and variables. This also helps to identify them in the code.
In Postgres it's usually faster to use NOT EXISTS instead of NOT IN.
In Postgres statements are terminated with ;. GO isn't a SQL command in SQL Server either - it's a client side thing supported by SSMS. To my knowledge, there is no SQL tool that works with Postgres that supports the GO "batch terminator" the same way SSMS does.
So a direct translation of your T-SQL code to PL/pgSQL could look like this:
create or replace procedure add_identities_to_groups(p_group_name text default NULL)
language plpgsql
as
$$ --<< start of PL/pgSQL code
declare --<< start a block for all variables
l_use_group_id integer;
begin --<< start the actual code
if p_group_name is NULL THEN --<< then required
select id
into l_use_group_id
from user_groups ug
where not exists (select * from group_identities gi where gi.id = ug.user_id)
order by ug.id asc
limit 1;
ELSE
select id
into l_use_group_id
from user_groups
where name = p_group_name;
end if;
insert into group_identities (group_id, user_id)
select l_use_group_id, id
from user_identities ui
where not exists (select * from group_identities gi where gi.user_id = ui.id);
END;
$$
;

Why do you only need double quotation marks in SQL for particular cases?

I have a column in my database table named UID. For some reason queries fail unless I surround the column name with double quotation marks (" "). None of the other columns require these quotation marks.
For example, this doesn't work:
SELECT user_name FROM user_table WHERE UID = '...'
But this does:
SELECT user_name FROM user_table WHERE "UID" = '...'
Is UID some kind of keyword? Why is it only happening to that column? How do I know if I need to use double quotes for other columns?
By the way, I'm running JDK 1.8_221 and using an oracle JDBC driver if that makes a difference.
Yes, it is about keywords. You can double quote everything (tables, columns) to avoid this but I can understand you don't want to do this.
To have a list of standard keywords: SQL Keywords
But you can see UID is not in this list as I assume it is a reserved keyword by your database implementation. I had the same problem with a table called "order" as it contains orders. ORDER is a keyword so I had to quote it each time.
So best is to test your statements using a SQL client tool.
Since you mention Oracle: Oracle keywords: "You can obtain a list of keywords by querying the V$RESERVED_WORDS data dictionary view."
If your create table command for user_table looks something like this:
create table user_table ("UID" varchar2(10))
then you will have to use quotes around UID in your query. This query:
select * from user_table where UID = 'somestring'
means to use the Oracle predefined UID pseudo column and your table's UID column will not be accessed.
If your table doesn't have a user-defined UID column, then using "UID" should fail.
My guess is your table does indeed have a UID column and when you say it "doesn't work" without using the quotes you probably mean it motivates an ORA-1722.
The type of failure, when using UID without quotes, depends on the content of the string 'somestring'. If the content of that string can be cast as a number then you probably won't get the rows you expect. If it cannot be cast as a number then you'll get an ORA-1722.
As an aside, if you try to execute this, then you'll get an ORA-904:
create table user_table (UID number)
Yes, it is keywords and return
UID returns an integer that uniquely identifies the session user (the
user who logged on).
By default, Oracle identifiers (table names, column names, etc.) are case-insensitive. You can make them case-sensitive by using quotes around them when creating them (eg: SELECT * FROM "My_Table" WHERE "my_field" = 1). SQL keywords (SELECT, WHERE, JOIN, etc.) are always case-insensitive.
You can use it for more information here.

oracle - Unique constraints ORA-00001 vs Case Sensitive Records

I tried to insert into a table with a unique constraints on 2 columns and encountered the unique constraints error.
Select distinct query returns the following records:
Record 1:
ColA:A001 ColB:TV set A001
Record 2:
ColA:A001 ColB:Tv set A001
Does oracle do case sensitive comparison when validating against unique constraints? For instance, in the above scenario, we can see all the values are the same except the ColB (TV verus Tv). Distinct showes they are two different records whereas unique constraints seems to think they are the same?
Can anyone help to clarify?
Thanks!
By default, Oracle is case sensitive. So, TV is not the same as Tv. If the constraint is implemented as:
create unique index table_colb on table(colb)
(or the equivalent in the create table statement), then the two will be different. If you want case insensitivity, then use a functional index:
create unique index table_colb on table(lower(colb))
Oracle is Case Sensitive You may check The DDL SQL Fiddle Demo.
Check the NLS_COMP ans NLS_SORT
select * from NLS_INSTANCE_PARAMETERS where parameter in ('NLS_SORT','NLS_COMP');

informix check if table exists and then read the value

I have a table in informix (Version 11.50.UC4) called NextRecordID with just one column called id and it will have one row. What I want to do is copy this value into another table. But don't want my query to fail if this table does not exist. Something like
if table NextRecordID exists
then insert into sometable values ('NextRecordID', (select id from NextRecordID))
else insert into sometable values ('NextRecordID', 1)
I ended up using the below SQL query. Its not ANSI SQL but works the informix server I am using.
insert into sometable values ('NextRecordID',
select case (select 1 from systables where tabname='nextrecordid')
when 1 then (select nextid from nextrecordid)
else (select 1 from systables where tabname='systables') end
from systables where tabname='systables');
What is happening here is within insert query I get the value to be inserted by using select query. Now that select query is interesting. It uses case statement of Informix. I have written a select query to check if the table nextrecordid exists in systables and return 1 if it exists. If this query returns 1, I query the table nextrecordid for the value or else I wrote a query to return the default value 1. This work for me.
You should be able to do this by checking the systables table.
Thank you for including server version information - it makes answering your question easier.
You've not indicated which language(s) you are using.
Normally, though, you design a program to expect a certain schema (certain tables to be present), and then fail - preferably under control - if those tables are not present. Also, it is not clear whether you would get into problems because of repeated execution of the second INSERT statement. Nor is it clear when the NextRecordID table is updated - presumably, once the value has been used, it must be updated.
You should look at SERIAL (BIGSERIAL) and see whether that is appropriate for you.
You should also look at whether a SEQUENCE would be appropriate to use here - it certainly looks rather like it might be applicable.
As Adam Hughes points out, if you want to check whether the NextRecordID table is present in the database, you would look in the systables table. Be aware, though, that your search will need to be against an all lower-case name (nextrecordid).
Also, MODE ANSI databases complicate life - you have to worry about the table's owner (because there could be multiple tables called nextrecordid in a MODE ANSI database). Most likely, you don't have to worry about that - any more than you are likely to have to worry about delimited identifiers for table "someone"."NextRecordID" (which is a different table from someone.NextRecordID).

MySQL - Set default value for field as a string concatenation function

I have a table that looks a bit like this actors(forename, surname, stage_name);
I want to update stage_name to have a default value of
forename." ".surname
So that
insert into actors(forename, surname) values ('Stack', 'Overflow');
would produce the record
'Stack' 'Overflow' 'Stack Overflow'
Is this possible?
Thanks :)
MySQL does not support computed columns or expressions in the DEFAULT option of a column definition.
You can do this in a trigger (MySQL 5.0 or greater required):
CREATE TRIGGER format_stage_name
BEFORE INSERT ON actors
FOR EACH ROW
BEGIN
SET NEW.stage_name = CONCAT(NEW.forename, ' ', NEW.surname);
END
You may also want to create a similar trigger BEFORE UPDATE.
Watch out for NULL in forename and surname, because concat of a NULL with any other string produces a NULL. Use COALESCE() on each column or on the concatenated string as appropriate.
edit: The following example sets stage_name only if it's NULL. Otherwise you can specify the stage_name in your INSERT statement, and it'll be preserved.
CREATE TRIGGER format_stage_name
BEFORE INSERT ON actors
FOR EACH ROW
BEGIN
IF (NEW.stage_name IS NULL) THEN
SET NEW.stage_name = CONCAT(NEW.forename, ' ', NEW.surname);
END IF;
END
According to 10.1.4. Data Type Default Values no, you can't do that. You can only use a constant or CURRENT_TIMESTAMP.
OTOH if you're pretty up-to-date, you could probably use a trigger to accomplish the same thing.
My first thought is if you have the two values in other fields what is the compelling need for redundantly storing them in a third field? It flies in the face of normalization and efficiency.
If you simply want to store the concatenated value then you can simply create a view (or IMSNHO even better a stored procedure) that concatenates the values into a pseudo actor field and perform your reads from the view/sproc instead of the table directly.
If you absolutely must store the concatenated value you could handle this in two ways:
1) Use a stored procedure to do your inserts instead of straight SQL. This way you can receive the values and construct a value for the field you wish to populate then build the insert statement including a concatenated value for the actors field.
2) So I don't draw too many flames, treat this suggestion with kid gloves. Use only as a last resort. You could hack this behavior by adding a trigger to build the value if it is left null. Generally, triggers are not good. They add unseen cost and interactions to fairly simple interactions. You can, though, use the CREATE TRIGGER to update the actors field after a record is inserted or updated. Here is the reference page.
As of MySQL 8.0.13, you can use DEFAULT clause for a column which can be a literal constant or an expression.
If you want to use an expression then, simply enclose the required expression within parentheses.
(concat(forename," ",surname))
There are two ways to accomplish what you are trying to do as per my knowledge:
(important: consider backing up your table first before running below queries)
1- Drop the column "stage_name" all together and create a new one with DEFAULT constraint.
ALTER TABLE actors ADD COLUMN stage_name VARCHAR(20) DEFAULT (concat(forename," ",surname))
2- This will update newer entries in the column "stage_name" but not the old ones.
ALTER TABLE actors alter stage_name set DEFAULT (concat(forename," ",surname));
After that, if you need to update the previous values in the column "stage_name" then simply run:
UPDATE actors SET stage_name=(concat(forename," ",surname));
I believe this should solve your problem.