update semicolon separated IDs in a column - sql - sql

I've been trying to figure this problem out for a while now. Hope anyone can help here please.
I have an semicolon separated CategoryIDs in my Article table (e.g. ;2;34;5;9;), and I have a Category table with the old CategoryID column and the new CategoryID column. Now I'd like to update the CategoryIDs in the Article table with some new CategoryID based on the original CategoryIDs. e.g. (e.g. ;2;34;5;9;) will be updated to (e.g. ;564;344;753;944;)
The reason that I need this is because I'm merging tables from two databases with the same tables set ups.
I know this is a bad design, but it's been like this for ages. I can't really think of a solution for this, any help would be really appreciated.

Yes it is a bad design.
I suppose you use MySQL. I've used GROUP_CONCAT MySQL featured function but you can emulate it in MSSQL also. And I suppose you need to replace ALL old Id's with new.
SQLFiddle demo
Tables:
create table article(id int, CategoryID varchar(200));
create table Category(OldCategoryId int, newCategoryId int);
and update to replace groups:
update article,
(select
article.id,
CONCAT(';',
GROUP_CONCAT(cast(newCategoryID as char) separator ';'),
';') newGroups
from category
left join article on CategoryId like CONCAT('%;',CAST(oldCategoryId as char),';%')
group by article.id) t1
SET article.CategoryId=t1.newGroups
where article.id=t1.id

For such matters MySQL provides the function find_in_set. However SQL Server (if that's what you are working with) does not directly support an IN predicate for CSV, neither it provides find_in_set. It needs a rowset for IN predicate to work...
Check on the above link that CTE examples.

You need a script / program that construct old & new CategoryID mapping, and go through your Article table and replace it by looking up the map.
Maybe procedural program might do (if your DBMS supports procedural like stored proc / PLSQL), otherwise maybe it's easier to do this using Java / C# / Php

Related

How can I create a calculate column in the creation of table in POSTGRESQL, for example in sql server LineTotal AS Price * Quantity [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

Postgres ATOMIC stored procedure INSERT INTO . . . SELECT with one parameter and one set of rows from a table

I am trying to write a stored procedure to let a dev assign new user identities to a specified group when they don't already have one (i.e. insert a parameter and the output of a select statement into a joining table) without hand-writing every pair of foreign keys as values to do so. I know how I'd do it in T-SQL/SQL Server but I'm working with a preexisting/unfamiliar Postgres database. I would strongly prefer to keep my stored procedures as LANGUAGE SQL/BEGIN ATOMIC and this + online examples being simplified and/or using constants has made it difficult for me to get my bearings.
Apologies in advance for length, this is me trying to articulate why I do not believe this question is a duplicate based on what I've been able to find searching on my own but I may have overcorrected.
Schema (abstracted from the most identifying parts; these are not the original table names and I am not in a position to change what anything is called; I am also leaving out indexing for simplicity's sake) is like:
create table IF NOT EXISTS user_identities (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY NOT NULL,
[more columns not relevant to this query)
)
create table IF NOT EXISTS user_groups (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY NOT NULL,
name TEXT NOT NULL
)
create table IF NOT EXISTS group_identities (
user_id BIGINT REFERENCES user_identities(id) ON DELETE RESTRICT NOT NULL,
group_id BIGINT REFERENCES user_groups(id) ON DELETE RESTRICT NOT NULL
)
Expected dev behavior:
Add all predetermined identities intended to belong to a group in a single batch
Add identifying information for the new group (it is going to take a lot of convincing to bring the people involved around to using nested stored procedures for this if I ever can)
Bring the joining table up to date accordingly (what I've been asked to streamline).
If this were SQL Server I would do (error handling omitted for time and putting aside whether EXCEPT or NOT IN would be best for now, please)
create OR alter proc add_identities_to_group
#group_name varchar(50) NULL
as BEGIN
declare #use_group_id int
if #group_name is NULL
set #use_group_id = (select Top 1 id from user_groups where id not in (select group_id from group_identities) order by id asc)
ELSE set #use_group_id = (select id from user_groups where name = #group_name)
insert into group_identities (user_id, group_id)
select #use_group_id, id from user_identities
where id not in (select user_id from group_identities)
END
GO
Obviously this is not going to fly in Postgres; part of why I want to stick with atomic stored procedures is staying in "neutral" SQL, both to be closer to my comfort zone and because I don't know what other languages the database is currently set up for, but my existing education has played kind of fast and loose with differentiating what was T-SQL specific at any point.
I am aware that this is not going to run for a wide variety of reasons because I'm still trying to internalize the syntax, but the bad/conceptual draft I have written so that I have anything to stare at is:
create OR replace procedure add_identities_to_groups(
group_name text default NULL ) language SQL
BEGIN ATOMIC
declare use_group_id integer
if group_name is NULL
set use_group_id = (select Top 1 id from user_groups
where id not in (select user_id from group_identities)
order by id asc)
ELSE set use_group_id = (select id from user_groups where name = group_name) ;
insert into group_identities (group_id, user_id)
select use_group_id, id from user_identities
where id not in (select user_id from group_identities)
END ;
GO ;
Issues:
Have not found either answers for how to do this with the combination of a single variable and a column with BEGIN ATOMIC or hard confirmation that it wouldn't work (e.g. can atomic stored procedures just not accept parameters? I cannot find an answer to this on my own). (This is part of why existing answers that I can find here and elsewhere haven't been clarifying for me.)
~~Don't know how to compensate for Postgres's not differentiating variables and parameters from column names at all. (This is why examples using a hardcoded constant haven't helped, and they make up virtually all of what I can find off StackOverflow itself.)~~ Not a problem if Postgres will handle that intelligently within the atomic block but that's one of the things I hadn't been able to confirm on my own.
Google results for "vanilla" SQL unpredictably saturated with SQL Server anyway, while my lack of familiarity with Postgres is not doing me any favors but I don't know anyone personally who has more experience than I do.
because I don't know what other languages the database is currently set up for
All supported Postgres versions always include PL/pgSQL.
If you want to use procedural elements like variables or conditional statements like IF you need PL/pgSQL. So your procedure has to be defined with language plpgsql - that removes the possibility to use the ANSI standard BEGIN ATOMIC syntax.
Don't know how to compensate for Postgres's not differentiating variables and parameters from column names at all.
You don't. Most people simply using naming conventions to do that. In my environment we use p_ for parameters and l_ for "local" variables. Use whatever you prefer.
Quote from the manual
By default, PL/pgSQL will report an error if a name in an SQL statement could refer to either a variable or a table column. You can fix such a problem by renaming the variable or column, or by qualifying the ambiguous reference, or by telling PL/pgSQL which interpretation to prefer.
The simplest solution is to rename the variable or column. A common coding rule is to use a different naming convention for PL/pgSQL variables than you use for column names. For example, if you consistently name function variables v_something while none of your column names start with v_, no conflicts will occur.
As documented in the manual the body for a procedure written in PL/pgSQL (or any other language that is not SQL) must be provided as a string. This is typically done using dollar quoting to make writing the source easier.
As documented in the manual, if you want to store the result of a single row query in a variable, use select ... into from ....
As documented in the manual an IF statement needs a THEN
As documented in the manual there is no TOP clause in Postgres (or standard SQL). Use limit or the standard compliant fetch first 1 rows only instead.
To avoid a clash between names of variables and column names, most people use some kind of prefix for parameters and variables. This also helps to identify them in the code.
In Postgres it's usually faster to use NOT EXISTS instead of NOT IN.
In Postgres statements are terminated with ;. GO isn't a SQL command in SQL Server either - it's a client side thing supported by SSMS. To my knowledge, there is no SQL tool that works with Postgres that supports the GO "batch terminator" the same way SSMS does.
So a direct translation of your T-SQL code to PL/pgSQL could look like this:
create or replace procedure add_identities_to_groups(p_group_name text default NULL)
language plpgsql
as
$$ --<< start of PL/pgSQL code
declare --<< start a block for all variables
l_use_group_id integer;
begin --<< start the actual code
if p_group_name is NULL THEN --<< then required
select id
into l_use_group_id
from user_groups ug
where not exists (select * from group_identities gi where gi.id = ug.user_id)
order by ug.id asc
limit 1;
ELSE
select id
into l_use_group_id
from user_groups
where name = p_group_name;
end if;
insert into group_identities (group_id, user_id)
select l_use_group_id, id
from user_identities ui
where not exists (select * from group_identities gi where gi.user_id = ui.id);
END;
$$
;

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

SQL-92 (Filemaker): How can I UPDATE a list of sequential numbers?

I need to re-assign all SortID's, starting from 1 until MAX (SortID) from a subset of records of table Beleg, using SQL-92, after one of the SortID's has changed (for example from 444 to 444.1). I have tried several ways (for example SET #a:=0; UPDATE table SET field=#a:=#a+1 WHERE whatever='whatever' ORDER BY field2), but it didn't work, as these solutions all need a special kind of SQL, like SQLServer or Oracle, etc.
The SQL that I use is SQL-92, implemented in FileMaker (INSERT and UPDATE are available, though, but nothing fancy).
Thanks for any hint!
Gary
From what I know, SQL-92 is a standard and not a language. So you can say you are using T-SQL, which is mostly SQL-92 compliant, but you can't say I program SQL Server in SQL-92. The same applies to FileMaker.
I suppose you are trying to update your table through ODBC? The Update statement looks OK, but there are no variables if FileMaker SQL (and I am not sure using a variable inside query will give you result you expect, I think you will set SortId in every row to 1). You are thinking about doing something like Window functions with row() in TSQL, but I do not think this functionality is available.
The easiest solution is to use FileMaker, resetting the numbering for a column is really a trivial task which takes seconds. Do you need help with this?
Edit:
I was referring to TSQL functions rank() and row_number(), there is no row() function in TSQL
I finally got the answer from Ziggy Crueltyfree Zeitgeister on the Database Administrators copy of my question.
He suggested to break this down into multiple steps using a temporary table to store the results:
CREATE TABLE sorting (sid numeric(10,10), rn int);
INSERT INTO sorting (sid, rn)
SELECT SortID, RecordNumber FROM Beleg
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210
ORDER BY SortID;
UPDATE Beleg SET SortID = (SELECT rn FROM sorting WHERE sid=Beleg.SortID)
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210;
DROP TABLE sorting;
Of course! I just keep the table definition in Filemaker (let the type coercion be done by Filemaker this way), and filling and deleting from it with my function: RenumberSortID ().

Stored Procedure for converting rows to columns in SQL Server

I'm looking for an efficient way of how to convert rows to columns in SQL Server. I tried in Toad for Oracle, but now I want it in SQL Server.
This is my example:
CID SENTENCE
1 Hello; Hi;
2 Why; What;
The result should be like
CID SENTENCE
1 Hello
1 Hi
2 Why
2 What
Would you please help me with it?
I would advise you to rethink your database design. It's almost never a good idea to store data in a delimited string in any relational database.
If it's impossible to change your database design, you need to use some UDF to split strings.
There are many different approaches to split strings in sql server, read this article on the differences between the common ways.
You can probably change your chosen split string function to take the cid as well as the sentence as a variable and have it return the data exactly as your desired output for each row in your table.
Then all you have to do is a select from your table with an inner join with the udf on the cid.
try
declare #var table (CID int, SENTENCE varchar(50))
insert into #var(CID,SENTENCE) values
(1,'Hello; Hi;'),
(2,'Why; What;')
select cid,t.c.value('.','varchar(50)') as val
from (select cid,x=cast('<t>'+ replace(stuff(sentence,len(sentence),1,''),';','</t><t>')+'</t>' as xml)
from #var) a cross apply x.nodes('/t') t(c)