sqlite string replace/delete - sql

I have a column in my database table with name tags which contains comma separated strings and it has records like this-
index | tags
-------------
1 | a,b,c
2 | b
3 | c
4 | z
5 | b,a,c
6 | p,f,w
7 | a,c,b
(for simplicity i am denoting strings with characters)
Now i want to replace/delete particular string.
Delete - say I want to delete b from all rows. If tags column become empty after this operation that row/record should be deleted (index 2 in this case). My records should look like this after this operation.
index | tags
-------------
1 | a,c
3 | c
4 | z
5 | a,c
6 | p,f,w
7 | a,c
Replace - say I want to replace all a with k on original records
index | tags
-------------
1 | k,b,c
2 | b
3 | c
4 | z
5 | b,k,c
6 | p,f,w
7 | k,c,b
Question - I am thinking of using replace function somehow but not sure how to meet above requirement with that. Can i do this in a single sql command? If not please suggest best way to do this (may be multiple sql commands).

I use MSSQL, I'm not sure sqlite. But, you use REPLACE function, like this:
To remove b:
UPDATE Your_Table
SET tags = REPLACE(REPLACE(tags, ',b', ' '), 'b,', ' ')
UPDATE Your_Table
SET tags = NULL WHERE tags = 'b'
To replace a with k:
UPDATE Your_Table
SET tags = REPLACE(tags, 'a', 'k')

Related

How to split string into rows by number of characters in Bigquery?

if I have a table for example:
mydataset.itempf containing:
id | item
1 | ABCDEFGHIJKL
2 | ZXDFKDLFKFGF
And I would like the "item" field to be split by 4 characters into different rows like:
id | item
1 | ABCD
1 | EFGH
1 | IJKL
2 | ZXDF
2 | KDLF
2 | KFGF
How can I write this in bigquery? Please help.
Consider below approach
select id, item
from your_table,
unnest(regexp_extract_all(item, r'.{1,4}')) item
if applied to sample data in your question - output is
Use the Substring with the Count Method, That Should make it easier to see which ones are longer than others.

Recursive self join over file data

I know there are many questions about recursive self joins, but they're mostly in a hierarchical data structure as follows:
ID | Value | Parent id
-----------------------------
But I was wondering if there was a way to do this in a specific case that I have where I don't necessarily have a parent id. My data will look like this when I initially load the file.
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
Essentially, its a CSV file where each row in the table is a line in the file. Lines 1 and 5 identify an object header and lines 3, 4, 7, and 8 identify the rows belonging to the object. The object header lines can have only 40 attributes which is why the object is broken up across multiple sections in the CSV file.
What I'd like to do is take the table, separate out the record # column, and join it with itself multiple times so it achieves something like this:
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,5,6,7,8,...
2 | *,record,abc,efg,hij,lmn,opq,rst
3 | ,,1,x,y,z,t,u,v,...
4 | ,,2,q,r,s,l,m,n,...
I know its probably possible, I'm just not sure where to start. My initial idea was to create a view that separates out the first and second columns in a view, and use the view as a way of joining in a repeated fashion on those two columns. However, I have some problems:
I don't know how many sections will occur in the file for the same
object
The file can contain other objects as well so joining on the first two columns would be problematic if you have something like
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
9 | ,4,Data,1,2,3,4,...
10 | *,record,lmn,opq,rst,...
11 | ,,1,t,u,v,...
In the above case, my plan could join rows from the Data object in row 9 with the first rows of the Formula object by matching the record value of 1.
UPDATE
I know this is somewhat confusing. I tried doing this with C# a while back, but I had to basically write a recursive decent parser to parse the specific file format and it simply took to long because I had to get it in the database afterwards and it was too much for entity framework. It was taking hours just to convert one file since these files are excessively large.
Either way, #Nolan Shang has the closest result to what I want. The only difference is this (sorry for the bad formatting):
+----+------------+------------------------------------------+-----------------------+
| ID | header | x | value
|
+----+------------+------------------------------------------+-----------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 |3,Formula,1,2,3,4,5,6,7,8 |
| 2 | ,, | ,1,x,y,z,t,u,v | ,1,x,y,z,t,u,v |
| 3 | ,, | ,2,q,r,s,l,m,n | ,2,q,r,s,l,m,n |
| 4 | *,record, | ,abc,efg,hij,lmn,opq,rst |*,record,abc,efg,hij,lmn,opq,rst |
| 5 | ,4, | ,Data,1,2,3,4 |,4,Data,1,2,3,4 |
| 6 | *,record, | ,lmn,opq,rst | ,lmn,opq,rst |
| 7 | ,, | ,1,t,u,v | ,1,t,u,v |
+----+------------+------------------------------------------+-----------------------------------------------+
I agree that it would be better to export this to a scripting language and do it there. This will be a lot of work in TSQL.
You've intimated that there are other possible scenarios you haven't shown, so I obviously can't give a comprehensive solution. I'm guessing this isn't something you need to do quickly on a repeated basis. More of a one-time transformation, so performance isn't an issue.
One approach would be to do a LEFT JOIN to a hard-coded table of the possible identifying sub-strings like:
3,Formula,
*,record,
,,1,
,,2,
,4,Data,
Looks like it pretty much has to be human-selected and hard-coded because I can't find a reliable pattern that can be used to SELECT only these sub-strings.
Then you SELECT from this artificially-created table (or derived table, or CTE) and LEFT JOIN to your actual table with a LIKE to get all the rows that use each of these values as their starting substring, strip out the starting characters to get the rest of the string, and use the STUFF..FOR XML trick to build the desired Line.
How you get the ID column depends on what you want, for instance in your second example, I don't know what ID you want for the ,4,Data,... line. Do you want 5 because that's the next number in the results, or do you want 9 because that's the ID of the first occurrance of that sub-string? Code accordingly. If you want 5 it's a ROW_NUMBER(). If you want 9, you can add an ID column to the artificial table you created at the start of this approach.
BTW, there's really nothing recursive about what you need done, so if you're still thinking in those terms, now would be a good time to stop. This is more of a "Group Concatenation" problem.
Here is a sample, but has some different with you need.
It is because I use the value the second comma as group header, so the ,,1 and ,,2 will be treated as same group, if you can use a parent id to indicated a group will be better
DECLARE #testdata TABLE(ID int,Line varchar(8000))
INSERT INTO #testdata
SELECT 1,'3,Formula,1,2,3,4,...' UNION ALL
SELECT 2,'*,record,abc,efg,hij,...' UNION ALL
SELECT 3,',,1,x,y,z,...' UNION ALL
SELECT 4,',,2,q,r,s,...' UNION ALL
SELECT 5,'3,Formula,5,6,7,8,...' UNION ALL
SELECT 6,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 7,',,1,t,u,v,...' UNION ALL
SELECT 8,',,2,l,m,n,...' UNION ALL
SELECT 9,',4,Data,1,2,3,4,...' UNION ALL
SELECT 10,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 11,',,1,t,u,v,...'
;WITH t AS(
SELECT *,REPLACE(SUBSTRING(t.Line,LEN(c.header)+1,LEN(t.Line)),',...','') AS data
FROM #testdata AS t
CROSS APPLY(VALUES(LEFT(t.Line,CHARINDEX(',',t.Line, CHARINDEX(',',t.Line)+1 )))) c(header)
)
SELECT MIN(ID) AS ID,t.header,c.x,t.header+STUFF(c.x,1,1,'') AS value
FROM t
OUTER APPLY(SELECT ','+tb.data FROM t AS tb WHERE tb.header=t.header FOR XML PATH('') ) c(x)
GROUP BY t.header,c.x
+----+------------+------------------------------------------+-----------------------------------------------+
| ID | header | x | value |
+----+------------+------------------------------------------+-----------------------------------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 | 3,Formula,1,2,3,4,5,6,7,8 |
| 3 | ,, | ,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v | ,,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v |
| 2 | *,record, | ,abc,efg,hij,lmn,opq,rst,lmn,opq,rst | *,record,abc,efg,hij,lmn,opq,rst,lmn,opq,rst |
| 9 | ,4, | ,Data,1,2,3,4 | ,4,Data,1,2,3,4 |
+----+------------+------------------------------------------+-----------------------------------------------+

Get row values as new columns [duplicate]

This question already has answers here:
Create a pivot table with PostgreSQL
(3 answers)
Closed 8 years ago.
I was wondering if it would be possible to get all values of rows with the same ID and present them as new columns, via a query.
For example, if I have the following table:
ID | VALUE
1 | a
1 | b
1 | c
2 | a
2 | b
[...]
I want to present it as:
ID | VALUE1 | VALUE2 | VALUE3 [...]
1 | a | b | c
2 | a | b | -
Thank you for any help
A query wouldn't do it. Unless you do 3 seperate querys.
SELECT ID,VALUE1 FROM Table
SELECT ID,VALUE2 FROM Table
ect...
If you have a problem with your database values not being recursive, then i would set up your table differently.
ID | VALUE
1 | a
1 | b
1 | c
2 | a
2 | b
[...]
You should set up the Table atributes like that rather than your first table.
if you are going to set up your tables differently I would do insert Statements.
INSERT INTO newTable (ID, VALUE)
SELECT ID,VALUE1 FROM oldTable
INSERT INTO newTable (ID, VALUE)
SELECT ID,VALUE2 FROM oldTable
ect..
Another possible way to do it is to display it in your application. Take php for instance.
foreach($sqlArray as $var){
echo $var['id'] ' | ' $var['value1']
echo $var['id'] ' | ' $var['value2']
echo $var['id'] ' | ' $var['value3']
}

Renumber duplicates to make them unique

Table "public.t"
Column | Type | Modifiers
--------+---------+-----------
code | text |
grid | integer |
The codigo column, although of type text, has a numeric sequence which has
duplicates. The grid column is a unique sequence.
select * from t order by grid;
code | grid
------+------
1 | 1
1 | 2
1 | 3
2 | 4
2 | 5
2 | 6
3 | 7
The goal is to eliminate the duplicates in the code column to make it unique. The result should be similar to:
code | grid
------+------
1 | 1
6 | 2
4 | 3
2 | 4
7 | 5
5 | 6
3 | 7
The version is 8.2 (no window functions).
create table t (code text, grid integer);
insert into t values
('1',1),
('1',2),
('1',3),
('2',4),
('2',6),
('3',7),
('2',5);
This is the solution that worked.
drop sequence if exists s;
create temporary sequence s;
select setval('s', (select max(cast(code as integer)) m from t));
update t
set code = i
from (
select code, grid, nextval('s') i
from (
select code, max(grid) grid
from t
group by code
having count(*) > 1
order by grid
) q
) s
where
t.code = s.code
and
t.grid = s.grid
The problem with it is that the update command must be repeated until there are no more duplicates. It is just a "it is not perfect" problem as it is a one time operation only.
Export (and remove) everything but code column (maybe you could subquery for export, and remove just duplicated row). Make code primary with something such auto increment behaviour and reimport everything. code column should be automatically generated.

How to sort sql result using a pre defined series of rows

i have a table like this one:
--------------------------------
id | name
--------------------------------
1 | aa
2 | aa
3 | aa
4 | aa
5 | bb
6 | bb
... one million more ...
and i like to obtain an arbitrary number of rows in a pre defined sequence and the other rows ordered by their name. e.g. in another table i have a short sequence with 3 id's:
sequ_no | id | pos
-----------------------
1 | 3 | 0
1 | 1 | 1
1 | 2 | 2
2 | 65535 | 0
2 | 45 | 1
... one million more ...
sequence 1 defines the following series of id's: [ 3, 1, 2]. how to obtain the three rows of the first table in this order and the rest of the rows ordered by their name asc?
how in PostgreSQL and how in mySQL? how would a solution look like in hql (hibernate query language)?
an idea i have is to first query and sort the rows which are defined in the sequence and than concat the other rows which are not in the sequence. but this involves tow queries, can it be done with one?
Update: The final result for the sample sequence [ 3, 1, 2](as defined above) should look like this:
id | name
----------------------------------
3 | aa
1 | aa
2 | aa
4 | aa
5 | bb
6 | bb
... one million more ...
i need this query to create a pagination through a product table where part of the squence of products is a defined sequence and the rest of the products will be ordered by a clause i dont know yet.
I'm not sure I understand the exact requirement, but won't this work:
SELECT ids.id, ids.name
FROM ids_table ids LEFT OUTER JOIN sequences_table seq
WHERE ids.id = seq.id
ORDER BY seq.sequ_no, seq.pos, ids.name, ids.id
One way: assign a position (e.g. 0) to each id that doesn't have a position yet, UNION the result with the second table, join the result with the first table, and ORDER BY seq_no, pos, name.