Split a string into rows using pure SQLite - sql

Using SQLite, I'd like to split a string in the following way.
Input string:
C:\Users\fidel\Desktop\Temp
and have the query return these rows:
C:\
C:\Users\
C:\Users\fidel\
C:\Users\fidel\Desktop\
C:\Users\fidel\Desktop\Temp
In other words, I'd like to split a file path into its constituent paths. Is there a way to do this in pure SQLite?

This is possible with a recursive common table expression:
WITH RECURSIVE split(s, last, rest) AS (
VALUES('', '', 'C:\Users\fidel\Desktop\Temp')
UNION ALL
SELECT s || substr(rest, 1, 1),
substr(rest, 1, 1),
substr(rest, 2)
FROM split
WHERE rest <> ''
)
SELECT s
FROM split
WHERE rest = ''
OR last = '\';
(You did not ask for a reasonable way.)

Recursive CTE:
WITH RECURSIVE cte(org, part, rest, pos) AS (
VALUES('C:\Users\fidel\Desktop\Temp', '','C:\Users\fidel\Desktop\Temp'|| '\', 0)
UNION ALL
SELECT org,
SUBSTR(org,1, pos + INSTR(rest, '\')),
SUBSTR(rest, INSTR(rest, '\')+1),
pos + INSTR(rest, '\')
FROM cte
WHERE INSTR(rest, '\') > 0
)
SELECT *
FROM cte
WHERE pos <> 0
ORDER BY pos;
SqlFiddleDemo
Output:
╔═════════════════════════════╗
║ part ║
╠═════════════════════════════╣
║ C:\ ║
║ C:\Users\ ║
║ C:\Users\fidel\ ║
║ C:\Users\fidel\Desktop\ ║
║ C:\Users\fidel\Desktop\Temp ║
╚═════════════════════════════╝
How it works:
org - original string does not change
part - simply `LEFT` equivalent of original string taking pos number of chars
rest - simply `RIGHT` equivalent, rest of org string
pos - position of first `\` in the rest
Trace:
╔══════════════════════════════╦══════════════════════════════╦════════════════════════════╦═════╗
║ org ║ part ║ rest ║ pos ║
╠══════════════════════════════╬══════════════════════════════╬════════════════════════════╬═════╣
║ C:\Users\fidel\Desktop\Temp ║ C:\ ║ Users\fidel\Desktop\Temp\ ║ 3 ║
║ C:\Users\fidel\Desktop\Temp ║ C:\Users\ ║ fidel\Desktop\Temp\ ║ 9 ║
║ C:\Users\fidel\Desktop\Temp ║ C:\Users\fidel\ ║ Desktop\Temp\ ║ 15 ║
║ C:\Users\fidel\Desktop\Temp ║ C:\Users\fidel\Desktop\ ║ Temp\ ║ 23 ║
║ C:\Users\fidel\Desktop\Temp ║ C:\Users\fidel\Desktop\Temp ║ ║ 28 ║
╚══════════════════════════════╩══════════════════════════════╩════════════════════════════╩═════╝

If you want to search for the values ​​individually, use the code below:
WITH RECURSIVE split(content, last, rest) AS (
VALUES('', '', 'value1§value2§value3§value4§value5§value6§value7')
UNION ALL
SELECT
CASE WHEN last = '§'
THEN
substr(rest, 1, 1)
ELSE
content || substr(rest, 1, 1)
END,
substr(rest, 1, 1),
substr(rest, 2)
FROM split
WHERE rest <> ''
)
SELECT
REPLACE(content, '§','') AS 'ValueSplit'
FROM
split
WHERE
last = '§' OR rest ='';
Result:
**ValueSplit**
value1
value2
value3
value4
value5
value6
value7
I hope I can help people with the same problem.

There's simpler alternative to the recursive CTE, that also can be applied to a number of file paths in a result set (or generally any delimited strings that you want to "split" into multiple rows by a separator).
SQLite has JSON1 extension. It's compatible with SQLite >= 3.9.0 (2015-10-14), but sqlite3 is almost always compiled with it now (e.g. Ubuntu, Debian, official Python Docker images and so on, and you can check it with PRAGMA compile_options and this answer has a little more detail on it).
JSON1 has json_each, which is one of the two table-valued functions in the extension that:
walk the JSON value provided as their first argument and return one row for each element.
Hence if you can turn your string into a JSON array string, this function will do the rest. And it's not hard to do.
const sql = `
WITH input(filename) AS (
VALUES
('/etc/redis/redis.conf'),
('/run/redis/redis-server.pid'),
('/var/log/redis-server.log')
), tmp AS (
SELECT
filename,
'["' || replace(filename, '/', '", "') || '"]' as filename_array
FROM input
)
SELECT (
SELECT group_concat(ip.value, '/')
FROM json_each(filename_array) ip
WHERE ip.id <= p.id
) AS path
FROM tmp, json_each(filename_array) AS p
WHERE p.id > 1 -- because the filenames start with the separator
`
async function run() {
const wasmUrl = 'https://cdnjs.cloudflare.com/ajax/libs/sql.js/1.5.0/sql-wasm.wasm'
const sqljs = await window.initSqlJs({locateFile: file => wasmUrl})
const db = new sqljs.Database()
const results = db.exec(sql)
ko.applyBindings(results[0])
}
run()
<script src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.2/knockout-min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/sql.js/1.5.0/sql-wasm.min.js"></script>
<table>
<thead>
<tr data-bind="foreach: columns"><th data-bind="text: $data"></th></tr>
</thead>
<tbody data-bind="foreach: values">
<tr data-bind="foreach: $data"><td data-bind="text: $data"></td></tr>
</tbody>
</table>

Inspired from Lukasz Szozda's answer:
WITH RECURSIVE cte("pre","post") AS (
VALUES('C:', 'Users\fidel\Desktop\Temp' || '\')
UNION ALL
SELECT "pre" || '\' || left("post", position('\' in "post")-1),
substring("post" from position('\' in "post")+1)
FROM cte
WHERE "post" > ''
)
SELECT "pre" FROM cte
(tested on PostgreSQL)
The idea is now to replace the VALUES line
VALUES('C:', 'Users\fidel\Desktop\Temp' || '\')
with placeholders like
VALUES(?, ? || '\')
which have been pre-split in the programming language that is going to run the SQL statement above against the data base.
Reading the SQLite docs, I see that substring(... from ...) has to be replaced by substr(..., ...) and position(... in ...) is to be replaced by instr(..., ...) with parameters swapped.
Very annoying for me since I wanted SQL code that runs on both PostgreSQL and SQLite.

Simple split using json_each from JSON1:
create table demo as select 'Split,comma,separated,string,into,rows' as DemoString;
select row_number() over () as Part, parts.value as Splitted
from demo, json_each('["'||replace(demo.DemoString,',','","')||'"]') parts;

Related

Convert a string of HEX to multiple integers

I have a list of HEX strings that I need to convert to 3 different integers (int32, int32, int16)
For example:
0x6d,0xb,0x0,0x0,0x6e,0x3,0x0,0x0,0x4e,0x0
Would be converted to:
First 4 = 0xb6d = 2925
Next 4 = 0x36e = 878
Last 2 = 0x4e = 78
If possible, each one should be a separate column.
For example:
2925 | 878 | 78
Any idea how to do it in sql Google Bigquery?
Thanks.
Consider below approach
select * from (
select col, div(offset, 4) grp, cast('0x' || string_agg(replace(el, '0x', ''), '' order by mod(offset, 4) desc) as int64) val
from your_table, unnest(split(col)) el with offset
group by col, grp
)
pivot (any_value(val) as val for grp in (0, 1, 2))
if applied to sample data in your question as
with your_table as (
select '0x6d,0xb,0x0,0x0,0x6e,0x3,0x0,0x0,0x4e,0x0' col
)
output is

postgresql Multiple identical conditions are unified into one parameter

I have one sql that need convert string column to array and i have to filter with this column,sql like this:
select
parent_line,
string_to_array(parent_line, '-')
from
bx_crm.department
where
status = 0 and
'851' = ANY(string_to_array(parent_line, '-')) and
array_length(string_to_array(parent_line, '-'), 1) = 5;
parent_line is a varchar(50) column,the data in this like 0-1-851-88
question:
string_to_array(parent_line, '-') appear many times in my sql.
how many times string_to_array(parent_line) calculate in each row. one time or three times
how convert string_to_array(parent_line) to a parameter. at last,my sql may like this:
depts = string_to_array(parent_line, '-')
select
parent_line,
depts
from
bx_crm.department
where
status = 0 and
'851' = ANY(depts) and
array_length(depts, 1) = 5;
Postgres supports lateral joins which can simplify this logic:
select parent_line, v.parents, status, ... other columns ...
from bx_crm.department d cross join lateral
(values (string_to_array(parent_line, '-')) v(parents)
where d.status = 0 and
cardinality(v.parents) = 5
'851' = any(v.parents)
Use a derived table:
select *
from (
select parent_line,
string_to_array(parent_line, '-') as parents,
status,
... other columns ...
from bx_crm.department
) x
where status = 0
and cardinality(parents) = 5
and '851' = any(parents)

How to show nullable decimal values without unnecessary zeros?

I have a table like below:
CREATE TABLE a
(
ID INT,
V DECIMAL(28, 10) NULL
)
INSERT INTO a(ID, V)
VALUES(1, 12.345)
INSERT INTO a(ID)
VALUES(2)
The desired output is like
╔════╤═══════╗
║ ID │ V ║
╠════╪═══════╣
║ 1 │ 12.35 ║
╟────┼───────╢
║ 2 │ ║
╚════╧═══════╝
But with this query, I got NULL for row 2:
SELECT ID, ROUND([V], 2) AS V
FROM a;
ID V
1 12.3500000000
2 NULL
With this query I got unnecessary zeros for row 2:
SELECT ID, ROUND(ISNULL(CAST(V AS VARCHAR(50)), ''), 2) AS V
FROM a;
ID V
1 12.35
2 0
Can anybody help please? Database is SQL Server 2005.
UPDATED:
This query will result in unwanted scale:
SELECT ID, ISNULL(CAST(ROUND(V, 2) AS VARCHAR(50)), '') AS V
FROM a;
ID V
1 12.3500000000
2
You can't do this, because you have V as a number when you Round
select 1 union select ''
This leads to output
1
0
So, what you need to do is round V then convert to a string while keeping the rounding. Easier said than done :
SELECT ID, case when v is null then '' else left(CAST(ROUND(V,2) AS VARCHAR(50)),charindex('.', CAST(ROUND(V,2) AS VARCHAR(50)))+2) end AS V
FROM a;
SQL Fiddle
I just use some brute force string manipulation to find the decimal and round that way
If you want to keep the NULL value as NULL, then simply remove the ISNULL():
SELECT ID, ROUND(CAST(V AS VARCHAR(50)), 2) AS V
FROM a;
If you know the precision that you want, then I would suggest using either STR() or casting to the decimal with the specified precision.
you should
ROUND() to 2 decimal places
then CAST() to string
follow by ISNULL() to convert NULL to empty string
query :
SELECT ID, ISNULL(CAST(ROUND(V, 2) AS VARCHAR(50)), '') AS V
FROM a;

Eliminating duplicates from a COALESCED column in a stored procedure?

I'm writing a stored procedure. The query that I'm using takes rows that are identical in every way except for 2 columns and combines them into one row and coalesces the two rows into one. Now I'm running into another issue. Sometimes there are duplicate values in those two rows and I want to eliminate the duplicates.
Example:
TeamID Team City State Equipment
1 Thunder OKC OK Basketball, Basketball, Basketball, Shorts, Jersey, Jersey
I want it to be like this:
TeamID Team City State Equipment
1 Thunder OKC OK Basketball, Shorts, Jersey
Here is the query that I'm using that combines the rows.
SELECT DISTINCT
AssignedOfficeID, AssignedOffice, OperatorID, OperatorName, RigMasterID, DrillerRigNumber, WellID,
County, State, WellName, CompanyMan, CompanyManPhone, DateStart, DateStop, Representative, RepresentativeID, RepresentativeAssignedID, RepresentativeAssigned,
PricePerDay, CotNumber, CustomerOrderTicketNumber,
Equipment = STUFF((SELECT ', ' + COALESCE(RentalEquipmentAbbreviation, EquipmentAbbreviation, '')
FROM #ActiveRigsInfo AS ARI2
WHERE ARI2.AssignedOfficeID = ARI1.AssignedOfficeID AND ARI2.AssignedOfficeID = ARI1.AssignedOfficeID
AND ARI2.OperatorID = ARI1.OperatorID AND ARI2.OperatorName = ARI1.OperatorName
AND ARI2.RigMasterID = ARI1.RigMasterID AND ARI2.DrillerRigNumber = ARI1.DrillerRigNumber
AND ARI2.WellID = ARI1.WellID AND ARI2.County = ARI1.County AND ARI2.State = ARI1.State
AND ARI2.WellName = ARI1.WellName AND ARI2.CompanyMan = ARI2.CompanyMan AND ARI2.CompanyManPhone = ARI1.CompanyManPhone
AND ARI2.DateStart = ARI1.DateStart AND ARI2.Representative = ARI1.Representative
AND ARI2.CotNumber = ARI1.CotNumber
FOR XML PATH(''), TYPE).value('.[1]', 'nvarchar(max)'),1,2,'')
FROM #ActiveRigsInfo AS ARI1
ORDER BY AssignedOffice, OperatorID, RigMasterID;
Is there a way to do this when creating the stored procedure? Or is there a way that I can alter my query to do this?
All I want to do is take out the duplicates from the coalesced columns. The query does as expected except for that.
Thanks. I hope that makes sense.
You need DISTINCT in your subquery:
SELECT DISTINCT
AssignedOfficeID, AssignedOffice, OperatorID, OperatorName, RigMasterID, DrillerRigNumber, WellID,
County, State, WellName, CompanyMan, CompanyManPhone, DateStart, DateStop, Representative, RepresentativeID, RepresentativeAssignedID, RepresentativeAssigned,
PricePerDay, CotNumber, CustomerOrderTicketNumber,
Equipment = STUFF((SELECT DISTINCT ', ' + COALESCE(RentalEquipmentAbbreviation, EquipmentAbbreviation, '')
FROM #ActiveRigsInfo AS ARI2
WHERE ARI2.AssignedOfficeID = ARI1.AssignedOfficeID AND ARI2.AssignedOfficeID = ARI1.AssignedOfficeID
AND ARI2.OperatorID = ARI1.OperatorID AND ARI2.OperatorName = ARI1.OperatorName
AND ARI2.RigMasterID = ARI1.RigMasterID AND ARI2.DrillerRigNumber = ARI1.DrillerRigNumber
AND ARI2.WellID = ARI1.WellID AND ARI2.County = ARI1.County AND ARI2.State = ARI1.State
AND ARI2.WellName = ARI1.WellName AND ARI2.CompanyMan = ARI2.CompanyMan AND ARI2.CompanyManPhone = ARI1.CompanyManPhone
AND ARI2.DateStart = ARI1.DateStart AND ARI2.Representative = ARI1.Representative
AND ARI2.CotNumber = ARI1.CotNumber
FOR XML PATH(''), TYPE).value('.[1]', 'nvarchar(max)'),1,2,'')
FROM #ActiveRigsInfo AS ARI1
ORDER BY AssignedOffice, OperatorID, RigMasterID;

SQL - field concatenation, based on variable

I have a need to build a string from Last Name, First Name, Middle Initial according to the following rules:
If the Last Name is unique, just
return the Last Name
If the Last
Name isn't unique, but the first
letter of the First Name is unique,
return Last Name + first letter of
First Name
If the Last Name and
first letter of the First Name are
not unique, return the Last Name +
first letter of First Name + Middle
Initial.
For example, the table might be:
MDC MDLast MDFirst MDInit
3 Jones Fred A
21 Smith Sam D
32 Brown Tom E
42 Brown Ted A
55 Smith Al D
The query should return:
MDC MDFormattedName
3 Jones
21 Smith S
32 Brown TE
42 Brown TA
55 Smith A
I've written up a query that almost works, but it is using several nested queries, and will still need several more to (possibly) make a workable solution, and is so inefficient. I'm sure there is a 'proper' way to implement this (for SQL Server 2005, BTW).
This is what I've got so far. It doesn't work, due to the aggregations I lose the IDs can can't do the final join to get ID/Name pairs.
select
CASE
WHEN CountLastFirst > 1 THEN
CASE WHEN MDInit IS NOT NULL THEN MDLastFirst + LEFT(MDInit,1) ELSE MDLastFirst END
WHEN CountLastFirst = 1 AND CountLast > 1 THEN MDLastFirst
ELSE MDLast
END as MDName
FROM
(
select x.MDLast, CountLast, MDLastFirst, CountLastFirst FROM
(
select MDLast,Count(MDLast) as CountLast FROM
MDList
GROUP BY MDLast) as x
INNER JOIN
(select MDLast, MDLastFirst,Count(MDLastFirst) as CountLastFirst FROM
(
select MDLast,
MDLast + ' ' + LEFT(MDFirst,1) as MDLastFirst
From MDList
) as a
GROUP BY MDLastFirst, MDLast) as y ON x.MDLast = y.MDLast
) as z
Assuming a table name of MDCTable, this should work:
SELECT MDCTable.MDC,
CASE MDCCount.NameCount
WHEN 1
THEN MDCTable.MDLast
ELSE
CASE MDFormat1Count
WHEN 1
THEN MDFormat1.MDFormat1Name
ELSE MDCTable.MDLast + ' ' + upper(left(MDCTable.MDFirst, 1)) +
MDCTable.MDInit
END
END AS MDFormattedName
FROM MDCTable
INNER JOIN
(
SELECT COUNT(MDLast) as NameCount, MDLast
FROM MDCTable
GROUP BY MDLast
) MDCCount ON MDCCount.MDLast = MDCTable.MDLast
INNER JOIN (
SELECT COUNT(MDLast + left(MDFirst, 1)) as MDFormat1Count, MDLast + ' ' +
left(MDFirst, 1) AS MDFormat1Name
FROM MDCTable
GROUP BY MDLast + ' ' + left(MDFirst, 1)
) MDFormat1 ON MDCTable.MDLast + ' ' + left(MDCTable.MDFirst, 1) =
MDFormat1.MDFormat1Name
ORDER BY MDCTable.MDC
Have you considered performing this operation in your application instead of directly in an SQL statement? Unless you have a good reason to do this directly in SQL, this is almost always the preferable approach for situations like this.