I have the following table:
Id
Category
1
some thing
2
value
This table contains a lot of rows and what I'm trying to do is to update all the Category values to change every first letter to caps. For example, some thing should be Some Thing.
At the moment this is what I have:
UPDATE MyTable
SET Category = (SELECT UPPER(LEFT(Category,1))+LOWER(SUBSTRING(Category,2,LEN(Category))) FROM MyTable WHERE Id = 1)
WHERE Id = 1;
But there are two problems, the first one is trying to change the Category Value to upper, because only works ok for 1 len words (hello=> Hello, hello world => Hello world) and the second one is that I'll need to run this query X times following the Where Id = X logic. So my question is how can I update X rows? I was thinking in a cursor but I don't have too much experience with it.
Here is a fiddle to play with.
You can split the words apart, apply the capitalization, then munge the words back together. No, you shouldn't be worrying about subqueries and Id because you should always approach updating a set of rows as a set-based operation and not one row at a time.
;WITH cte AS
(
SELECT Id, NewCat = STRING_AGG(CONCAT(
UPPER(LEFT(value,1)),
SUBSTRING(value,2,57)), ' ')
WITHIN GROUP (ORDER BY CHARINDEX(value, Category))
FROM
(
SELECT t.Id, t.Category, s.value
FROM dbo.MyTable AS t
CROSS APPLY STRING_SPLIT(Category, ' ') AS s
) AS x GROUP BY Id
)
UPDATE t
SET t.Category = cte.NewCat
FROM dbo.MyTable AS t
INNER JOIN cte ON t.Id = cte.Id;
This assumes your category doesn't have non-consecutive duplicates within it; for example, bora frickin bora would get messed up (meanwhile bora bora fickin would be fine). It also assumes a case insensitive collation (which could be catered to if necessary).
In Azure SQL Database you can use the new enable_ordinal argument to STRING_SPLIT() but, for now, you'll have to rely on hacks like CHARINDEX().
Updated db<>fiddle (thank you for the head start!)
Related
I have the following table:
id symbol_01 symbol_02
1 abc xyz
2 kjh okd
3 que qid
I need a query that ensures symbol_01 and symbol_02 are both contained in a list of valid symbols. In other words I would needs something like this:
select *
from mytable
where symbol_01 in (
select valid_symbols
from somewhere)
and symbol_02 in (
select valid_symbols
from somewhere)
The above example would work correctly, but the subquery used to determine the list of valid symbols is identical both times and is quite large. It would be very innefficient to run it twice like in the example.
Is there a way to do this without duplicating two identical sub queries?
Another approach:
select *
from mytable t1
where 2 = (select count(distinct symbol)
from valid_symbols vs
where vs.symbol in (t1.symbol_01, t1.symbol_02));
This assumes that the valid symbols are stored in a table valid_symbols that has a column named symbol. The query would also benefit from an index on valid_symbols.symbol
You could try use a CTE like;
WITH ValidSymbols AS (
SELECT DISTINCT valid_symbol
FROM somewhere
)
SELECT mt.*
FROM MyTable mt
INNER JOIN ValidSymbols v1
ON mt.symbol_01 = v1.valid_symbol
INNER JOIN ValidSymbols v2
ON mt.symbol_02 = v2.valid_symbol
From a performance perspective, your query is the right way to do this. I would write it as:
select *
from mytable t
where exists (select 1
from valid_symbols vs
where t.symbol_01 = vs.valid_symbol
) and
exists (select 1
from valid_symbols vs
where t.symbol_02 = vs.valid_symbol
) ;
The important component is that you need an index on valid_symbols(valid_symbol). With this index, the lookup should be pretty fast. Appropriate indexes can even work if valid_symbols is a view, although the effect depends on the complexity of the view.
You seem to have a situation where you have two foreign key relationships. If you explicitly declare these relationships, then the database will enforce that the columns in your table match the valid symbols.
Consider the following:
create table tmp.x (i integer, t text);
create table tmp.y (i integer, t text);
delete from tmp.x;
delete from tmp.y;
insert into tmp.x values (1, 'hi');
insert into tmp.y values(1, 'there');
insert into tmp.y values(1, 'wow');
In the above, there is one row in table x, which I want to update. In table y, there are two rows, both of which I want to "feed data into" the update.
Below is my attempt:
update tmp.x
set t = x.t || y.t
from ( select * from tmp.y order by t desc ) y
where y.i = x.i;
select * from tmp.x;
I want the value of x.t to be 'hiwowthere' but the value ends up being 'hiwow'. I believe the cause of this is that the subquery in the update statement returns two rows (the y.t value of 'wow' being returned first), and the where clause y.i = x.i only matches the first row.
Can I achieve the desired outcome using a single update statement, and if so, how?
UPDATE: The use of the text type above was for illustration purposes only. I do not actually want to modify textual content, but rather JSON content using the json_set function that I posted here (How do I modify fields inside the new PostgreSQL JSON datatype?), although I'm hoping the principle could be applied to any function, such as the fictional concat_string(column_name, 'string-to-append').
UPDATE 2: Rather than waste time on this issue, I actually wrote a small function to accomplish it. However, it would still be nice to know if this is possible, and if so, how.
What you can do is to build up a concatenated string using string_agg, grouped by the integer i, which you can then join onto during the update:
update tmp.x
set t = x.t || y.txt
from (select i, string_agg(t, '') as txt
from(
select tmp.y.i,tmp.y.t
from tmp.y
order by t desc
) z
group by z.i) y
where y.i = x.i ;
In order to preserve the order, you may need an additional wrapping derived table. SqlFiddle here
Use string_agg, as follows:
update tmp.x x
set t = x.t || (
select string_agg(t,'' order by t desc)
from tmp.y where i = x.i
group by i
)
SQLFiddle
with cte as (
select y.i, string_agg(t, '' order by t desc) as txt
from y
group by y.i
)
update x set t= x.t||cte.txt
from cte where cte.i=x.i
Not sure where to start on this one. I inheriated a table that has a list of part numbers that are are active and inactive. If the part number is inactive, they enter the next valid part number. If the part number is active there is no Next PartNumber. They want to search on a Part Number and find all of the next part numbers that match.
Basically the table looks like this.
PartNumber Varchar(20), Active Varchar(3), NextPartNumber Varchar(20).
Problem is I do not know how many part numbers are in the chain. Here is a sample of the data:
100X No XYZ
XYZ No 45A6
45A6 Yes
QWER No RT98
RT98 No POUL1
POUL1 No N9HGT
N9HGT No FGH12
FGH12 Yes
I can write a query like this, but since I don't know how many part numbers there are, this won't work.
Select A.PartNumber, A.NextPartNumber, B.PartNumber, B.NextPartNumber, C.PartNumber, C.NextPartNumber
FROM tblPartTable as A
inner join
tblPartTable as B
on A.PartNumber = B.NextPartNumber
inner join
tblPartTable as C
on B.PartNumber = C.NextPartNumber
where A.PartNumber = '100X'
With SQL Server (which I'm assuming you're talking about since your earlier questions have been about it), you can use a recursive common table expression to easily get the searched for part and all its successors, there is no need to loop manually;
WITH cte AS (
-- Base condition, where do we start the search?
SELECT t.* FROM tblPartTable t WHERE t.PartNumber = '100X'
UNION ALL
-- Continue condition, how do we find the next part from the current one?
SELECT t.* FROM tblPartTable t JOIN cte ON t.PartNumber = cte.NextPartNumber
)
SELECT partnumber, active FROM cte;
An SQLfiddle to test with.
The same query works on most RDBMS's except MySQL.
I've a table 'tblRandomString' with following data:
ID ItemValue
1 *Test"
2 ?Test*
I've another table 'tblSearchCharReplacement' with following data
Original Replacement
* `star`
? `quest`
" `quot`
; `semi`
Now, I want to make a replacement in the ItemValues using these replacement.
I tried this:
Update T1
SET ItemValue = select REPLACE(ItemValue,[Original],[Replacement])
FROM dbo.tblRandomString T1
JOIN
dbo.tblSpecialCharReplacement T2
ON T2.Original IN ('"',';','*','?')
But it doesnt help me because only one replacement is done per update.
One solution is I've to use as a CTE to perform multiple replacements if they exist.
Is there a simpler way?
Sample data:
declare #RandomString table (ID int not null,ItemValue varchar(500) not null)
insert into #RandomString(ID,ItemValue) values
(1,'*Test"'),
(2,'?Test*')
declare #SearchCharReplacement table (Original varchar(500) not null,Replacement varchar(500) not null)
insert into #SearchCharReplacement(Original,Replacement) values
('*','`star`'),
('?','`quest`'),
('"','`quot`'),
(';','`semi`')
And the UPDATE:
;With Replacements as (
select
ID,ItemValue,0 as RepCount
from
#RandomString
union all
select
ID,SUBSTRING(REPLACE(ItemValue,Original,Replacement),1,500),rs.RepCount+1
from
Replacements rs
inner join
#SearchCharReplacement scr
on
CHARINDEX(scr.Original,rs.ItemValue) > 0
), FinalReplacements as (
select
ID,ItemValue,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RepCount desc) as rn
from
Replacements
)
update rs
set ItemValue = fr.ItemValue
from
#RandomString rs
inner join
FinalReplacements fr
on
rs.ID = fr.ID and
rn = 1
Which produces:
select * from #RandomString
ID ItemValue
----------- -----------------------
1 `star`Test`quot`
2 `quest`Test`star`
What this does is it starts with the unaltered texts (the top select in Replacements), then it attempts to apply any valid replacements (the second select in Replacements). What it will do is to continue applying this second select, based on any results it produces, until no new rows are produced. This is called a Recursive Common Table Expression (CTE).
We then use a second CTE (a non-recursive one this time) FinalReplacements to number all of the rows produced by the first CTE, assigning lower row numbers to rows which were produced last. Logically, these are the rows which were the result of applying the last applicable transform, and so will no longer contain any of the original characters to be replaced. So we can use the row number 1 to perform the update back against the original table.
This query does do more work than strictly necessary - for small numbers of rows of replacement characters, it's not likely to be too inefficient. We could clear it up by defining a single order in which to apply the replacements.
Will skipping the join table and nesting REPLACE functions work?
Or do you need to actually get the data from the other table?
-- perform 4 replaces in a single update statement
UPDATE T1
SET ItemValue = REPLACE(
REPLACE(
REPLACE(
REPLACE(
ItemValue,'*','star')
ItemValue,'?','quest')
ItemValue,'"','quot')
ItemValue,';','semi')
Note: I'm not sure if you need to escape any of the characters you're replacing
I understand how to use the WITH clause for recursive queries (!!), but I'm having problems understanding its general use / power.
For example the following query updates one record whose id is determined by using a subquery returning the id of the first record by timestamp:
update global.prospect psp
set status=status||'*'
where psp.psp_id=(
select p2.psp_id
from global.prospect p2
where p2.status='new' or p2.status='reset'
order by p2.request_ts
limit 1 )
returning psp.*;
Would this be a good candidate for using a WITH wrapper instead of the relatively ugly sub-query? If so, why?
If there can be concurrent write access to involved tables, there are race conditions in the following queries. Consider:
Postgres UPDATE … LIMIT 1
Your example can use a CTE (common table expression), but it will give you nothing a subquery couldn't do:
WITH x AS (
SELECT psp_id
FROM global.prospect
WHERE status IN ('new', 'reset')
ORDER BY request_ts
LIMIT 1
)
UPDATE global.prospect psp
SET status = status || '*'
FROM x
WHERE psp.psp_id = x.psp_id
RETURNING psp.*;
The returned row will be the updated version.
If you want to insert the returned row into another table, that's where a WITH clause becomes essential:
WITH x AS (
SELECT psp_id
FROM global.prospect
WHERE status IN ('new', 'reset')
ORDER BY request_ts
LIMIT 1
)
, y AS (
UPDATE global.prospect psp
SET status = status || '*'
FROM x
WHERE psp.psp_id = x.psp_id
RETURNING psp.*
)
INSERT INTO z
SELECT *
FROM y;
Data-modifying queries using CTEs were added with PostgreSQL 9.1.
The manual about WITH queries (CTEs).
WITH lets you define "temporary tables" for use in a SELECT query. For example, I recently wrote a query like this, to calculate changes between two sets:
-- Let o be the set of old things, and n be the set of new things.
WITH o AS (SELECT * FROM things(OLD)),
n AS (SELECT * FROM things(NEW))
-- Select both the set of things whose value changed,
-- and the set of things in the old set but not in the new set.
SELECT o.key, n.value
FROM o
LEFT JOIN n ON o.key = n.key
WHERE o.value IS DISTINCT FROM n.value
UNION ALL
-- Select the set of things in the new set but not in the old set.
SELECT n.key, n.value
FROM o
RIGHT JOIN n ON o.key = n.key
WHERE o.key IS NULL;
By defining the "tables" o and n at the top, I was able to avoid repeating the expressions things(OLD) and things(NEW).
Sure, we could probably eliminate the UNION ALL using a FULL JOIN, but I wasn't able to do that in my particular case.
If I understand your query correctly, it does this:
Find the oldest row in global.prospect whose status is 'new' or 'reset'.
Mark it by adding an asterisk to its status
Return the row (including our tweak to status).
I don't think WITH will simplify anything in your case. It may be slightly more elegant to use a FROM clause, though:
update global.prospect psp
set status = status || '*'
from ( select psp_id
from global.prospect
where status = 'new' or status = 'reset'
order by request_ts
limit 1
) p2
where psp.psp_id = p2.psp_id
returning psp.*;
Untested. Let me know if it works.
It's pretty much exactly what you have already, except:
This can be easily extended to update multiple rows. In your version, which uses a subquery expression, the query would fail if the subquery were changed to yield multiple rows.
I did not alias global.prospect in the subquery, so it's a bit easier to read. Since this uses a FROM clause, you'll get an error if you accidentally reference the table being updated.
In your version, the subquery expression is encountered for every single item. Although PostgreSQL should optimize this and only evaluate the expression once, this optimization will go away if you accidentally reference a column in psp or add a volatile expression.