How to get all combinations (ordered sampling without replacement) in regex - sql

I'm trying to match a comma-separated string of numbers to a certain pattern within an sql query. I used regular expressions for similar problems in the past successfully, so I'm trying to get them working here as well. The problem is as follows:
The string may contain any number in a range (e.g. 1-4) exactly 0-1 times.
Two numbers are comma-separated
The numbers have to be in ascending order
(I think this is kind of a case of ordered sampling without replacement)
Sticking with the example of 1-4, the following entries should match:
1
1,2
1,3
1,4
1,2,3
1,2,4
1,3,4
1,2,3,4
2
2,3
2,4
3
3,4
4
and these should not:
q dawda 323123 a3 a1 1aa,1234 4321 a4,32,1a 1112222334411
1,,2,33,444, 11,12,a 234 2,2,3 33 3,3,3 3,34 34 123 1,4,4,4a 1,444
The best try I currently have is:
\b[1-4][\,]?[2-4]?[\,]?[3-4]?[\,]?[4]?\b
This still has two major drawbacks:
It delivers quite a lot of false positives. Numbers are not eliminated after they occurred once.
It will get rather long, when the range of numbers increases, e.g. 1-18 is already possible as well, bigger ranges are thinkable of.
I used regexpal for testing purposes.
Side notes:
As I'm using sql it would be possible to implement some algorithm in another language to generate all the possible combinations and save them in a table that can be used for joining, see e.g. How to get all possible combinations of a list’s elements?. I would like to only rely on that as a last resort, as the creation of new tables will be involved and these will contain a lot of entries.
The resulting sql statement that uses the regex should run on both Postgres and Oracle.
The set of positive examples is also referred to as "powerset".
Edit: Clarified the list of positive examples

I wouldn't use Regex for this, as e.g. the requirements "have to be unique" and "have to be in ascending order" can't really be expressed with a regular expression (at least I can't think of a way to do that).
As you also need to have an expression that is identical in Postgres and Oracle, I would create a function that checks such a list and then hide the DBMS specific implementation in that function.
For Postgres I would use its array handling features to implement that function:
create or replace function is_valid(p_input text)
returns boolean
as
$$
select coalesce(array_agg(x order by x) = string_to_array(p_input, ','), false)
from (
select distinct x
from unnest(string_to_array(p_input,',')) as t(x)
where x ~ '^[0-9]+$' -- only numbers
) t
where x::int between 1 and 4 -- the cast is safe as the inner query only returns valid numbers
$$
language sql;
The inner query returns all (distinct) elements from the input list as individual numbers. The outer query then aggregates that back for values in the desired range and numeric order. If that result isn't the same as the input, the input isn't valid.
Then with the following sample data:
with sample_data (input) as (
values
('1'),
('1,2'),
('1,3'),
('1,4'),
('1,2,3'),
('1,2,4'),
('foo'),
('1aa,1234'),
('1,,2,33,444,')
)
select input, is_valid(input)
from sample_data;
It will return:
input | is_valid
-------------+---------
1 | true
1,2 | true
1,3 | true
1,4 | true
1,2,3 | true
1,2,4 | true
foo | false
1aa,1234 | false
1,,2,33,444, | false
If you want to use the same function in Postgres and Oracle you probably need to use returns integer in Postgres as Oracle still doesn't support a boolean data type in SQL
Oracle's string processing functions are less powerful than Postgres' functions (e.g. no string_to_array or unnest), but you can probably implement a similar logic in PL/SQL as well (albeit more complicated)

Related

How can I "dynamically" split a varchar column by specific characters?

I have a column that stores 2 values. Example below:
| Column 1 |
|some title1 =ExtractThis ; Source Title12 = ExtractThis2|
I want to remove 'ExtractThis' into one column and 'ExtractThis2' into another column. I've tried using a substring but it doesn't work as the data in column 1 is variable and therefore it doesn't always carve out my intended values. SQL below:
SELECT substring(d.Column1,13,24) FROM dbo.Table d
This returns 'Extract This' but for other columns it either takes too much or too little. Is there a function or combination of functions that will allow me to split consistently on the character? This is consistent in my column unlike my length count.
select substring(col1,CHARINDEX('=',col1)+1,CHARINDEX (';',col1)-CHARINDEX ('=',col1)-1) Val1,
substring(col1,CHARINDEX('=',col1,CHARINDEX (';',col1))+1,LEN(col1)) Val2
from #data
there is duplicate calculation that can be reduced from 5 to 3 to each line.
but I want to believe this simple optimization done by SQL SERVER.

PostgreSQL: order by column, with specific NON-NULL value LAST

When I discovered NULLS LAST, I kinda hoped it could be generalised to 'X LAST' in a CASE statement in the ORDER BY portion of a query.
Not so, it would seem.
I'm trying to sort a table by two columns (easy), but get the output in a specific order (easy), with one specific value of one column to appear last (got it done... ugly).
Let's say that the columns are zone and status (don't blame me for naming a column zone - I didn't name them). status only takes 2 values ('U' and 'S'), whereas zone can take any of about 100 values.
One subset of zone's values is (in pseudo-regexp) IN[0-7]Z, and those are first in the result. That's easy to do with a CASE.
zone can also take the value 'Future', which should appear LAST in the result.
In my typical kludgy-munge way, I have simply imposed a CASE value of 1000 as follows:
group by zone, status
order by (
case when zone='IN1Z' then 1
when zone='IN2Z' then 2
when zone='IN3Z' then 3
.
. -- other IN[X]Z etc
.
when zone = 'Future' then 1000
else 11 -- [number of defined cases +1]
end), zone, status
This works, but it's obviously a kludge, and I wonder if there might be one-liner doing the same.
Is there a cleaner way to achieve the same result?
Postgres allows boolean values in the ORDER BY clause, so here is your generalised 'X LAST':
ORDER BY (my_column = 'X')
The expression evaluates to boolean, resulting values sort this way:
FALSE (0)
TRUE (1)
NULL
Since we deal with non-null values, that's all we need. Here is your one-liner:
...
ORDER BY (zone = 'Future'), zone, status;
Related:
Sorting null values after all others, except special
Select query but show the result from record number 3
SQL two criteria from one group-by
I'm not familiar postgreSQL specifically, but I've worked with similar problems in MS SQL server. As far as I know, the only "nice" way to solve a problem like this is to create a separate table of zone values and assign each one a sort sequence.
For example, let's call the table ZoneSequence:
Zone | Sequence
------ | --------
IN1Z | 1
IN2Z | 2
IN3Z | 3
Future | 1000
And so on. Then you simply join ZoneSequence into your query, and sort by the Sequence column (make sure to add good indexes!).
The good thing about this method is that it's easy to maintain when new zone codes are created, as they likely will be.

How does order by clause works if two values are equal?

This is my NEWSPAPER table.
National News A 1
Sports D 1
Editorials A 12
Business E 1
Weather C 2
Television B 7
Births F 7
Classified F 8
Modern Life B 1
Comics C 4
Movies B 4
Bridge B 2
Obituaries F 6
Doctor Is In F 6
When i run this query
select feature,section,page from NEWSPAPER
where section = 'F'
order by page;
It gives this output
Doctor Is In F 6
Obituaries F 6
Births F 7
Classified F 8
But in Kevin Loney's Oracle 10g Complete Reference the output is like this
Obituaries F 6
Doctor Is In F 6
Births F 7
Classified F 8
Please help me understand how is it happening?
If you need reliable, reproducible ordering to occur when two values in your ORDER BY clause's first column are the same, you should always provide another, secondary column to also order on. While you might be able to assume that they will sort themselves based on order entered (almost always the case to my knowledge, but be aware that the SQL standard does not specify any form of default ordering) or index, you never should (unless it is specifically documented as such for the engine you are using--and even then I'd personally never rely on that).
Your query, if you wanted alphabetical sorting by feature within each page, should be:
SELECT feature,section,page FROM NEWSPAPER
WHERE section = 'F'
ORDER BY page, feature;
In relational databases, tables are sets and are unordered. The order by clause is used primarily for output purposes (and a few other cases such as a subquery containing rownum).
This is a good place to start. The SQL standard does not specify what has to happen when the keys on an order by are the same. And this is for good reason. Different techniques can be used for sorting. Some might be stable (preserving original order). Some methods might not be.
Focus on whether the same rows are in the sets, not their ordering. By the way, I would consider this an unfortunate example. The book should not have ambiguous sorts in its examples.
When you use the SELECT statement to query data from a table, the order which rows appear in the result set may not be what you expected.
In some cases, the rows that appear in the result set are in the order that they are stored in the table physically. However, in case the query optimizer uses an index to process the query, the rows will appear as they are stored in the index key order. For this reason, the order of rows in the result set is undetermined or unpredictable.
The query optimizer is a built-in software component in the database
system that determines the most efficient way for an SQL statement to
query the requested data.

substring and trim in Teradata

I am working in Teradata with some descriptive data that needs to be transformed from a gerneric varchar(60) into the different field lengths based on the type of data element and the attribute value. So I need to take whatever is in the Varchar(60) and based on field 'ABCD' act on field 'XYZ'. In this case XYZ is a varchar(3). To do this I am using CASE logic within my select. What I want to do is
eliminate all occurances of non alphabet/numeric data. All I want left are upper case Alpha chars and numbers.
In this case "Where abcd = 'GROUP' then xyz should come out as a '000', '002', 'A', 'C'
eliminate extra padding
Shift everything Right
abcd xyz
1 GROUP NULL
2 GROUP $
3 GROUP 000000000000000000000000000000000000000000000000000000000000
4 GROUP 000000000000000000000000000000000000000000000000000000000002
5 GROUP A
6 GROUP C
7 GROUP r
To do this I have tried TRIM and SUBSTR amongst several other things that did not work. I have pasted what I have working now, but I am not reliably working through the data within the select. I am really looking for some options on how to better work with strings in Teradata. I have been working out of the "SQL Functions, Operators, Expressions and Predicates" online PDF. Is there a better reference. We are on TD 13
SELECT abcd
, CASE
-- xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
WHEN abcd= 'GROUP'
THEN(
CASE
WHEN SUBSTR(tx.abcd,60, 4) = 0
THEN (
SUBSTR(tx.abcd,60, 3)
)
ELSE
TRIM (TRAILING FROM tx.abcd)
END
)
END AS abcd
FROM db.descr tx
WHERE tx.abcd IS IN ( 'GROUP')
The end result should look like this
abcd xyz
1 GROUP 000
2 GROUP 002
3 GROUP A
4 GROUP C
I will have to deal with approx 60 different "abcd" types, but they should all conform to the type of data I am currently seeing.. ie.. mixed case, non numeric, non alphabet, padded, etc..
I know there is a better way, but I have come in several circles trying to figure this out over the weekend and need a little push in the right direction.
Thanks in advance,
Pat
The SQL below uses the CHARACTER_LENGTH function to first determine if there is a need to perform what amounts to a RIGHT(tx.xyz, 3) using the native functions in Teradata 13.x. I think this may accomplish what you are looking to do. I hope I have not misinterpreted your explanation:
SELECT CASE WHEN tx.abcd = 'GROUP'
AND CHARACTER_LENGTH(TRIM(BOTH FROM tx.xyz) > 3
THEN SUBSTRING(tx.xyz FROM (CHARACTER_LENGTH(TRIM(BOTH FROM tx.xyz)) - 3))
ELSE tx.abcd
END
FROM db.descr tx;
EDIT: Fixed parenthesis in SUBSTRING

SQLite - sql question 101

I would do something like this:
select * from cars_table where body not equal to null.
select * from cars_table where values not equal to null And id = "3"
I know the syntax for 'not equal' is <>, but I get an empty results.
For the second part, I want to get a result set where it only returns the columns that have a value. So, if the value is null, then don't include that column.
Thanks
You cannot use equality operators for nulls, you must use is null and is not null.
So your first query would be:
select * from cars_table where body is not null;
There's no easy way to do your second operation (excluding columns that are null). I'm assuming you mean don't display the column if all rows have NULL for that column, since it would be even harder to do this on a row-by-row basis. A select statement expects a list of columns to show, and will faithfully show them whether they are null or not.
The only possibility that springs to mind is a series (one per column) of selects with grouping to determine if the column only has nulls, then dynamically construct a new query with only columns that didn't meet that criteria.
But that's incredibly messy and not at all suited to SQL - perhaps if you could tell us the rationale behind the request, there may be another solution.
Comparing to NULL is not done with <>, but with is null and is not null :
select *
from cars_table
where body is not null
And :
select *
from cars_table
where values is not null
and id = '3'
NULL is not a normal value, and has to be dealt with differently than standard values.
In SQL, NULL is an unknown value. It is neither equal-to nor not-equal-to any other value. All comparison operators (=, <>, <, >, etc.) return false if either of the values being compared is NULL. (I don't know how old you are, so I can't say that you're 24, but I also can't say that you're not 24.)
As already mentioned, you have to use IS NULL or IS NOT NULL instead when testing for NULL values.
Thank you all for your answers.
Well, I have a table with a bunch of columns.
And I am going to search a particular car, and get the values for that car...
car | manual | automatic | sedan | power brakes | jet engine
honda | true | false | false | true | false
mazda | false | true | false | true | true
So, I want to do ->
select * from car_table where car='mazda' and values not equal to false
So then I just iterate over the cursor result and fill up the table with the appropriate columns and values. Where values are columns. I guess I replace values with * for columns
I know I could do it programmatically, but was thinking I could do this just by sql