Sort the digits of a numerical string - sql

I need to SORT all the digits from some string values in Postgres.
For instance, if I have two strings, e.g.
"70005" ==> "00057"
"70001" ==> "00017"
"32451" ==> "12345"
I can't cast the strings to integer or bigint due to my logic limitations. Is it possible to do this?

Use a recursive cte. Take the first char. if is '0' ignore it other wise go to the begining of target string.
Then use LPAD to append 0 until you get length 10.
SQL DEMO
WITH RECURSIVE cte (id, source, target) as (
SELECT 1 as id, '70001' as source , '' as target
UNION
SELECT 2 as id, '70005' as source , '' as target
UNION ALL
SELECT id,
substring(source from 2 for length(source)-1) as source,
CASE WHEN substring(source from 1 for 1) = '0' THEN target
ELSE substring(source from 1 for 1) || target
END
FROM cte
WHERE length(source) > 0
), reverse as (
SELECT id,
target,
row_number() over (partition by id
order by length(target) desc) rn
FROM cte
)
SELECT id, LPAD(target::text, 10, '0')
FROM reverse
WHERE rn = 1
OUTPUT
| id | lpad |
|----|------------|
| 1 | 0000000017 |
| 2 | 0000000057 |

Assuming that your data is organized like this:
Table: strings
| id | string |
|----+---------|
| 1 | '70005' |
| 2 | '70001' |
etc...
Then you can use a query like this:
SELECT all_digits.id,
array_to_string(array_agg(all_digits.digit ORDER BY all_digits.digit), '')
FROM (
SELECT strings.id, digits.digit
FROM strings, unnest(string_to_array(strings.string, NULL)) digits(digit)
) all_digits
GROUP BY all_digits.id
What this query does is split your table up into one row for each character in the string, sorts the table, and then aggregates the characters back into a string.
There's a SQL fiddle here: http://sqlfiddle.com/#!15/7f7fb0/14

Related

Sort each character in a string from a specific column in Snowflake SQL

I am trying to alphabetically sort each value in a column with Snowflake. For example I have:
| NAME |
| ---- |
| abc |
| bca |
| acb |
and want
| NAME |
| ---- |
| abc |
| abc |
| abc |
how would I go about doing that? I've tried using SPLIT and the ordering the rows, but that doesn't seem to work without a specific delimiter.
Using REGEXP_REPLACE to introduce separator between each character, STRTOK_SPLIT_TO_TABLE to get individual letters as rows and LISTAGG to combine again as sorted string:
SELECT tab.col, LISTAGG(s.value) WITHIN GROUP (ORDER BY s.value) AS result
FROM tab
, TABLE(STRTOK_SPLIT_TO_TABLE(REGEXP_REPLACE(tab.col, '(.)', '\\1~'), '~')) AS s
GROUP BY tab.col;
For sample data:
CREATE OR REPLACE TABLE tab
AS
SELECT 'abc' AS col UNION
SELECT 'bca' UNION
SELECT 'acb';
Output:
Similar implementation as Lukasz's, but using regexp_extract_all to extract individual characters in the form of an array that we later split to rows using flatten . The listagg then stitches it back in the order we specify in within group clause.
with cte (col) as
(select 'abc' union
select 'bca' union
select 'acb')
select col, listagg(b.value) within group (order by b.value) as col2
from cte, lateral flatten(regexp_extract_all(col,'.')) b
group by col;

Extract the number from the Title column in SQL

Please help the query extract the number after "_tid-" from the Title column.
Use the Vertica Regular Expression function collection. Regular expressions in Vertica correspond to the perl regex functionality in their behaviour.
So, with your input, search for the first group that follows _tid- and consists of consecutive digits (\d)...
WITH
-- your input, don't use in final query ...
indata(Title,Extract_ID) AS (
SELECT 'sdffsdvprocessortype%3Alnjklel&text=&textSearch=&pageSize=10&SSid=psj6tcd5b1g7_tid-87945','87945'
UNION ALL SELECT 'https://www.google.com/hk/en/age.html?SSid=ps_c8x4v1r2a3_tid-8952777456','8952777456'
UNION ALL SELECT 'https://www.google.com/hk/en/ge/dhci.html?SSid=ps_7228fk5sbh_tid-5879','5879'
UNION ALL SELECT 'https://www.google.com/fr/fr/c/3328412?q=%3%3AtSearch=&pageSize=10&SSid=pseydgg8h2_tid-9858867','9858867'
UNION ALL SELECT 'https://www.google.com/fr/fr/seas/1011028701?SSid=ps_yne5j6fmqv_tid-6879582','6879582'
UNION ALL SELECT 'https://www.google.com/il/en/sera/p/1010192786?SSid=ps_gydi5nk673_tid-5577484126','5577484126'
UNION ALL SELECT 'pid=ps_98qcokfh3_tid-548965&q=%3Arelevance%3Afacet_Processorstype','548965'
UNION ALL SELECT 'pid=ps_345ey5na9_tid-95861469rq=%3relevance%3Afacet_Processo','95861469'
UNION ALL SELECT 'npyamjhsgx_tid-002154785%20/p/1010192775?SSid=ps_npyamjhsgx_tid-002154785','002154785'
UNION ALL SELECT 'https://www.google.com/us/en/ke.html?ssid=ps_wc998kn__tid-0012889','0012889'
)
-- end of your input, real query starts here ...
SELECT
REGEXP_SUBSTR(
title -- input string
, '_tid-(\d+)' -- regular experssion (note the bit in parentheses, that's the first group)
, 1 -- starting point
, 1 -- occurrence ordinal number
, '' -- modifier (case insensitive, etc. check perl docu)
, 1 -- parentheses base grouping expression's ordinal number
) AS calc_extract
, *
FROM indata;
-- out calc_extract | Title | Extract_ID
-- out --------------+------------------------------------------------------------------------------------------------+------------
-- out 87945 | sdffsdvprocessortype%3Alnjklel&text=&textSearch=&pageSize=10&SSid=psj6tcd5b1g7_tid-87945 | 87945
-- out 8952777456 | https://www.google.com/hk/en/age.html?SSid=ps_c8x4v1r2a3_tid-8952777456 | 8952777456
-- out 5879 | https://www.google.com/hk/en/ge/dhci.html?SSid=ps_7228fk5sbh_tid-5879 | 5879
-- out 9858867 | https://www.google.com/fr/fr/c/3328412?q=%3%3AtSearch=&pageSize=10&SSid=pseydgg8h2_tid-9858867 | 9858867
-- out 6879582 | https://www.google.com/fr/fr/seas/1011028701?SSid=ps_yne5j6fmqv_tid-6879582 | 6879582
-- out 5577484126 | https://www.google.com/il/en/sera/p/1010192786?SSid=ps_gydi5nk673_tid-5577484126 | 5577484126
-- out 548965 | pid=ps_98qcokfh3_tid-548965&q=%3Arelevance%3Afacet_Processorstype | 548965
-- out 95861469 | pid=ps_345ey5na9_tid-95861469rq=%3relevance%3Afacet_Processo | 95861469
-- out 002154785 | npyamjhsgx_tid-002154785%20/p/1010192775?SSid=ps_npyamjhsgx_tid-002154785 | 002154785
-- out 0012889 | https://www.google.com/us/en/ke.html?ssid=ps_wc998kn__tid-0012889 | 0012889

SQL Order random rows based on 2 columns

How to sort this table in Oracle9:
START | END | VALUE
A | F | 1
D | H | 9
F | C | 8
C | D | 12
To make it look like this?:
START | END | VALUE
A | F | 1
F | C | 12
C | D | 8
D | H | 9
Goal is to start every next row with the end from the previous row.
This cannot be done with the order by clause alone, as it would have to find the record without a predecessor first, then find the next record comparing end and start column of the two records etc. This is an iterative process for which you need a recursive query.
That recursive query would find the first record, then the next and so on, giving them sequence numbers. Then you'd use the result and order by those generated numbers.
Here is how to do it in standard SQL. This is supported from Oracle 11g onwards only, however. In Oracle 9 you'll have to use CONNECT BY with which I am not familiar. Hopefully you or someone else can convert the query for you:
with chain(startkey, endkey, value, pos) as
(
select startkey, endkey, value, 1 as pos
from mytable
where not exists (select * from mytable prev where prev.endkey = mytable.startkey)
union all
select mytable.startkey, mytable.endkey, mytable.value, chain.pos + 1 as pos
from chain
join mytable on mytable.startkey = chain.endkey
)
select startkey, endkey, value
from chain
order by pos;
UPDATE: As you say the data is cyclic, you'd have to change above query so as to start with an arbitrarily chosen row and stop when through:
with chain(startkey, endkey, value, pos) as
(
select startkey, endkey, value, 1 as pos
from mytable
where rownum = 1
union all
select mytable.startkey, mytable.endkey, mytable.value, chain.pos + 1 as pos
from chain
join mytable on mytable.startkey = chain.endkey
)
cycle startkey set cycle to 1 default 0
select startkey, endkey, value
from chain
where cycle = 0
order by pos;

Get even / odd / all numbers between two numbers

I want to display all the numbers (even / odd / mixed) between two numbers (1-9; 2-10; 11-20) in one (or two) column.
Example initial data:
| rang | | r1 | r2 |
-------- -----|-----
| 1-9 | | 1 | 9 |
| 2-10 | | 2 | 10 |
| 11-20 | or | 11 | 20 |
CREATE TABLE initialtableone(rang TEXT);
INSERT INTO initialtableone(rang) VALUES
('1-9'),
('2-10'),
('11-20');
CREATE TABLE initialtabletwo(r1 NUMERIC, r2 NUMERIC);
INSERT INTO initialtabletwo(r1, r2) VALUES
('1', '9'),
('2', '10'),
('11', '20');
Result:
| output |
----------------------------------
| 1,3,5,7,9 |
| 2,4,6,8,10 |
| 11,12,13,14,15,16,17,18,19,20 |
Something like this:
create table ranges (range varchar);
insert into ranges
values
('1-9'),
('2-10'),
('11-20');
with bounds as (
select row_number() over (order by range) as rn,
range,
(regexp_split_to_array(range,'-'))[1]::int as start_value,
(regexp_split_to_array(range,'-'))[2]::int as end_value
from ranges
)
select rn, range, string_agg(i::text, ',' order by i.ordinality)
from bounds b
cross join lateral generate_series(b.start_value, b.end_value) with ordinality i
group by rn, range
This outputs:
rn | range | string_agg
---+-------+------------------------------
3 | 2-10 | 2,3,4,5,6,7,8,9,10
1 | 1-9 | 1,2,3,4,5,6,7,8,9
2 | 11-20 | 11,12,13,14,15,16,17,18,19,20
Building on your first example, simplified, but with PK:
CREATE TABLE tbl1 (
tbl1_id serial PRIMARY KEY -- optional
, rang text -- can be NULL ?
);
Use split_part() to extract lower and upper bound. (regexp_split_to_array() would be needlessly expensive and error-prone). And generate_series() to generate the numbers.
Use a LATERAL join and aggregate the set immediately to simplify aggregation. An ARRAY constructor is fastest in this case:
SELECT t.tbl1_id, a.output -- array; added id is optional
FROM (
SELECT tbl1_id
, split_part(rang, '-', 1)::int AS a
, split_part(rang, '-', 2)::int AS z
FROM tbl1
) t
, LATERAL (
SELECT ARRAY( -- preserves rows with NULL
SELECT g FROM generate_series(a, z, CASE WHEN (z-a)%2 = 0 THEN 2 ELSE 1 END) g
) AS output
) a;
AIUI, you want every number in the range only if upper and lower bound are a mix of even and odd numbers. Else, only return every 2nd number, resulting in even / odd numbers for those cases. This expression implements the calculation of the interval:
CASE WHEN (z-a)%2 = 0 THEN 2 ELSE 1 END
Result as desired:
output
-----------------------------
1,3,5,7,9
2,4,6,8,10
11,12,13,14,15,16,17,18,19,20
You do not need WITH ORDINALITY in this case, because the order of elements is guaranteed.
The aggregate function array_agg() makes the query slightly shorter (but slower) - or use string_agg() to produce a string directly, depending on your desired output format:
SELECT a.output -- string
FROM (
SELECT split_part(rang, '-', 1)::int AS a
, split_part(rang, '-', 2)::int AS z
FROM tbl1
) t
, LATERAL (
SELECT string_agg(g::text, ',') AS output
FROM generate_series(a, z, CASE WHEN (z-a)%2 = 0 THEN 2 ELSE 1 END) g
) a;
Note a subtle difference when using an aggregate function or ARRAY constructor in the LATERAL subquery: Normally, rows with rang IS NULLare excluded from the result because the LATERAL subquery returns no row.
If you aggregate the result immediately, "no row" is transformed to one row with a NULL value, so the original row is preserved. I added demos to the fiddle.
SQL Fiddle.
You do not need a CTE for this, which would be more expensive.
Aside: The type conversion to integer removes leading / training white space automatically, so a string like this works as well for rank: ' 1 - 3'.

bigquery split string to chars

Suppose I have a table, in which one of the columns is a string:
id | value
________________
1 | HELLO
----------------
2 | BYE
How would I split each STRING into it's chars, to create the following table:
id | value
________________
1 | H
----------------
1 | E
----------------
1 | L
----------------
1 | L
....
?
You can use SPLIT function with empty string as delimiter, i.e.
SELECT id, SPLIT(value, '') value FROM Table
Please note, that SPLIT returns repeated field, and if you want flat results (wasn't clear from your question), you would use
SELECT * FROM
FLATTEN((SELECT id, SPLIT(value, '') value FROM Table), value)
Apparently, if you pass an empty delimiter, it works:
select id, split(str, '')
from (
select 1 as id, "HELLO" as str
)