In Postgres, can you update CITEXT value to a different casing? - sql

Running a posgresql database.
I have a table with CITEXT columns for case-insensitivity. When I try to update a a CITEXT value to the same word in different casing it does not work. Postgres returns 1 row updated, as it targeted 1 row, but the value is not changed.
Eg
Table Schema - users
Column | Type
___________________________
user_id | PRIMARY KEY SERIAL
user_name | CITEXT
age | INT
example row:
user_id | user_name | age
_________________________________
1 | ayanaMi | 99
SQL command:
UPDATE users SET user_name = 'Ayanami' WHERE user_id = 1
The above command turns 1 UPDATED, but the casing does not change. I assume this is because postgres sees them as the same value.
The docs state:
If you'd like to match case-sensitively, you can cast the operator's arguments to text.
https://www.postgresql.org/docs/9.1/citext.html
I can force a case sensitive search by using CAST as such:
SELECT * FROM users WHERE CAST(user_name AS TEXT) = `Ayanami`
[returns empty row]
Is there a way to force case sensitive updating?

Related

SQL - foreach column make slug form different column

I am new with writing complex SQL queries. I am on Postgres 14 and I I am executing migration file which has following concept.
I want to add string field with name slug which will be required like:
$this->addSql('ALTER TABLE tag ADD COLUMN slug VARCHAR NOT NULL');
As the table is already populated and my new column needs to be required
I want to:
Create Null-able column, then update slug table column with valid not null values and finally ALTER column to set NOT NULL constraint!
My table (with potential slug column):
| id | name | type | slug |
|----|------------------|------|------------------|
| 1 | LARPs: The Series| | larps-the-series |
|----|------------------|------|------------------|
| 2 | #MyHaliburton. | | my-haliburton |
Catch:
Added column called “slug” is is a 'slugified' version of the name (all lower case, removed punctuation, spaces replaces with dashes).
I started with:
UPDATE `tag` SET slug = lower(name),
slug = replace(slug, ':', '-'),
slug = replace(slug, '#'', ''),
...
Is this right way to cover all the cases? And yet, how to do it for all fields? Should I use FOR EACH?
Thanks
You can do your slug generation in one command, combining a call to LOWER with two calls to REGEXP_REPLACE, one of which strips leading and trailing non-alphabetic characters, and the other which replaces non-alphabetic characters internal to the string with a - and also inserts a - when a lower-case character is followed by an upper case character:
ALTER TABLE tag ADD COLUMN slug VARCHAR;
UPDATE tag
SET slug = LOWER(
REGEXP_REPLACE(
REGEXP_REPLACE(name, '^[^A-Za-z]+|[^A-Za-z]+$', '', 'g'),
'[^A-Za-z]+|(?<=[a-z])(?=[A-Z])', '-', 'g')
)
;
ALTER TABLE tag ALTER COLUMN slug SET NOT NULL;
Output (for your sample input):
id name type slug
1 LARPs: The Series larps-the-series
2 #MyHaliburton. my-haliburton
Demo on db-fiddle

How can I update the table in SQL?

I've created a table called Youtuber, the code is below:
create table Channel (
codChannel int primary key,
name varchar(50) not null,
age float not null,
subscribers int not null,
views int not null
)
In this table, there are 2 channels:
|codChannel | name | age | subscribers | views |
| 1 | PewDiePie | 28 | 58506205 | 16654168214 |
| 2 | Grandtour Games | 15 | 429 | 29463 |
So, I want to edit the age of "Grandtour Games" to "18". How can I do that with update?
Is my code right?
update age from Grandtour Games where age='18'
No, in update, you'll have to follow this sequence:
update tableName set columnWanted = 'newValue' where columnName = 'elementName'
In your code, put this:
update Channel set age=18 where name='Grandtour Games'
Comments below:
/* Channel is the name of the table you'll update
set is to assign a new value to the age, in your case
where name='Grandtour Games' is referencing that the name of the Channel you want to update, is Grandtour Games */
alter table changes the the schema (adding, updating, or removing columns or keys, that kind of thing).
Update table changes the data in the table without changing the schema.
So the two are really quite different.
Here is your answer -
-> ALTER is a DDL (Data Definition Language) statement
UPDATE is a DML (Data Manipulation Language) statement.
->ALTER is used to update the structure of the table (add/remove field/index etc).
Whereas UPDATE is used to update data.
Hope this helps!

Report Query - variable nr of columns

I have a table which "indexes" the captured documents for an employer's employees. Each document has got a unique ID, a URI, date captured, staffid (FK) and a document type column named "tag"
gv2=# \d staffdoc
Table "zoozooland_789166.staffdoc"
Column | Type | Modifiers
------------+-----------------------------+-----------------------
id | uuid | not null
asseturi | text | not null
created | timestamp without time zone | not null
created_by | uuid | not null
tag | text | not null
is_active | boolean | not null default true
staffid | uuid | not null
Foreign-key constraints:
"staffdoc_staffid_fkey" FOREIGN KEY (staffid) REFERENCES staffmember(id)
Inherits: assetdoc
I want to write a report that will be used to highlight missing documents.
For each staffmember the report should have a column for each document tag type, so the nr of columns is unknown up front.
Currently I do all of this in the application - generate a list of all possible tags (SELECT DISTINCT tag FROM table), generate a list of all possible staff-IDs, then for each staff ID I run multiple queries to get the document with the biggest value in the created column for each tag value.
I'm pretty sure I should at least be able to optimise it to one query per document type (tag value) (most recent document for each staff id) which would be a good-enough optimisation.
The typical scenario is 4 or 5 document tag values (document types) so running 5 queries is much more acceptable that running 5 X nr-of-staff queries.
In the final report I have the following columns:
Staff-member Name, doctype1, doctype2, doctype2, etc
The name is "joined" from the staffmember table. The value in the doctype columns is the LAST (MAX) value of dates for that doc tag for that staff member, or "None" if the document is missing for that staffmember.
FWIW I'm using Postgres 9.5

Check constraint for a flag column

Database is MS SQLServer
Data example:
| Name | defaultValue | value |
| one | true | valone |
| one | false | valtwo |
| one | false | valthree |
I'm after a way of constraining the table such that each 'Name' can only have one row with 'defaultValue' set to true
Create a computed column like this:
ALTER TABLE yourtable
ADD ValueCheck AS CASE defaultValue
WHEN true THEN 1
WHEN false THEN NULL
END
and then add unique constraint for (Name, ValueCheck)
I liked Michael's idea but it will only allow you one false value per name in SQL Server. To avoid this how about using
ALTER TABLE yourtable
ADD [ValueCheck] AS
(case [defaultValue] when (1) then ('~Default#?#') /*Magic string!*/
else value end) persisted
and then add unique constraint for (Name, ValueCheck).
I am assuming that name, value combinations will be unique. If the value column does not allow NULLs then using NULL rather than the magic string would be preferable otherwise choose a string that cannot appear in the data (e.g. 101 characters long if the value column only allows 100 chars)
You can use a TRIGGER to validate this constraint on update or insert events and roll back the transaction if it was invalid.

Making PostgreSQL a little more error tolerant?

This is sort of a general question that has come up in several contexts, the example below is representative but not exhaustive. I am interested in any ways of learning to work with Postgres on imperfect (but close enough) data sources.
The specific case -- I am using Postgres with PostGIS for working with government data published in shapefiles and xml. Using the shp2pgsql module distributed with PostGIS (for example on this dataset) I often get schema like this:
Column | Type |
------------+-----------------------+-
gid | integer |
st_fips | character varying(7) |
sfips | character varying(5) |
county_fip | character varying(12) |
cfips | character varying(6) |
pl_fips | character varying(7) |
id | character varying(7) |
elevation | character varying(11) |
pop_1990 | integer |
population | character varying(12) |
name | character varying(32) |
st | character varying(12) |
state | character varying(16) |
warngenlev | character varying(13) |
warngentyp | character varying(13) |
watch_warn | character varying(14) |
zwatch_war | bigint |
prog_disc | bigint |
zprog_disc | bigint |
comboflag | bigint |
land_water | character varying(13) |
recnum | integer |
lon | numeric |
lat | numeric |
the_geom | geometry |
I know that at least 10 of those varchars -- the fips, elevation, population, etc., should be ints; but when trying to cast them as such I get errors. In general I think I could solve most of my problems by allowing Postgres to accept an empty string as a default value for a column -- say 0 or -1 for an int type -- when altering a column and changing the type. Is this possible?
If I create the table before importing with the type declarations generated from the original data source, I get better types than with shp2pgsql, and can iterate over the source entries feeding them to the database, discarding any failed inserts. The fundamental problem is that if I have 1% bad fields, evenly distributed over 25 columns, I will lose 25% of my data since a given insert will fail if any field is bad. I would love to be able to make a best-effort insert and fix any problems later, rather than lose that many rows.
Any input from people having dealt with similar problems is welcome -- I am not a MySQL guy trying to batter PostgreSQL into making all the same mistakes I am used to -- just dealing with data I don't have full control over.
Could you produce a SQL file from shp2pgsql and do some massaging of the data before executing it? If the data is in COPY format, it should be easy to parse and change "" to "\N" (insert as null) for columns.
Another possibility would be to use shp2pgsql to load the data into a staging table where all the fields are defined as just 'text' type, and then use an INSERT...SELECT statement to copy the data to your final location, with the possibility of massaging the data in the SELECT to convert blank strings to null etc.
I don't think there's a way to override the behaviour of how strings are converted to ints and so on: possibly you could create your own type or domain, and define an implicit cast that was more lenient... but this sounds pretty nasty, since the types are really just artifacts of how your data arrives in the system and not something you want to keep around after that.
You asked about fixing it up when changing the column type: you can do that too, for example:
steve#steve#[local] =# create table test_table(id serial primary key, testvalue text not null);
NOTICE: CREATE TABLE will create implicit sequence "test_table_id_seq" for serial column "test_table.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table"
CREATE TABLE
steve#steve#[local] =# insert into test_table(testvalue) values('1'),('0'),('');
INSERT 0 3
steve#steve#[local] =# alter table test_table alter column testvalue type int using case testvalue when '' then 0 else testvalue::int end;
ALTER TABLE
steve#steve#[local] =# select * from test_table;
id | testvalue
----+-----------
1 | 1
2 | 0
3 | 0
(3 rows)
Which is almost equivalent to the "staging table" idea I suggested above, except that now the staging table is your final table. Altering a column type like this requires rewriting the entire table anyway: so actually, using a staging table and reformatting multiple columns at once is likely to be more efficient.