What is the best way to represent similar inputs for rules in a relational database? - sql

I am designing a rules database, which takes a number of inputs and determines the appropriate output.
There is a set of ~20 input types that are very similar (can be represented with Id, Value) which all drive a set ~20 output types. Each input type has a number of possible values. Each output type has a number of possible values as well.
An example rule would be InputTypeA + InputTypeB determines OutputTypeA or InputTypeA determines OutputTypeB. n input types can drive an output, but currently the maximum number is 3.
What is the best way to store the input values? I am considering either:
Storing in one table, with a InputType table FK'd to determine the type of the input value (they will all be strings from a db type perspective).
Storing each input type in a seperate table.
How can I represent the relationships? I could have a separate table for each relationship, but this system should be as flexible as possible. I'm concerned that will be a large number of tables, and make it difficult to add new rules.

I would think of this as 3 major concepts.
Condition: what needs to be true for a rule to fire?
Rule: when a rule fires, which results are true?
Result: which outputs fire, and with which values?
Conditions are modelled as one or more inputs with their values (your example uses InputTypeA + InputTypeB so I assume a rule has one or more qualifying inputs).
Results are modelled as one or more output, with their values.
Specifically:
Condition
-----------
ID
RuleID
InputType
InputValue
Rule
----
ID
Result
-------
ID
RuleID
OutputType
OutputValue
If there is more logic to input types and output types, you can create a lookup table with a foreign key relationship.
Example might be:
Condition
ID RuleID InputType InputValue
1 1 DrivingIn30zone 1
2 1 CurrentSpeed 40
Rule
ID Name
1 Don't break the speed limit
Result
ID RuleID OutputType OutputValue
1. 1 SetSpeed 29
You would then search for conditions which match your current input values to see what the output states are:

Related

How to get all combinations (ordered sampling without replacement) in regex

I'm trying to match a comma-separated string of numbers to a certain pattern within an sql query. I used regular expressions for similar problems in the past successfully, so I'm trying to get them working here as well. The problem is as follows:
The string may contain any number in a range (e.g. 1-4) exactly 0-1 times.
Two numbers are comma-separated
The numbers have to be in ascending order
(I think this is kind of a case of ordered sampling without replacement)
Sticking with the example of 1-4, the following entries should match:
1
1,2
1,3
1,4
1,2,3
1,2,4
1,3,4
1,2,3,4
2
2,3
2,4
3
3,4
4
and these should not:
q dawda 323123 a3 a1 1aa,1234 4321 a4,32,1a 1112222334411
1,,2,33,444, 11,12,a 234 2,2,3 33 3,3,3 3,34 34 123 1,4,4,4a 1,444
The best try I currently have is:
\b[1-4][\,]?[2-4]?[\,]?[3-4]?[\,]?[4]?\b
This still has two major drawbacks:
It delivers quite a lot of false positives. Numbers are not eliminated after they occurred once.
It will get rather long, when the range of numbers increases, e.g. 1-18 is already possible as well, bigger ranges are thinkable of.
I used regexpal for testing purposes.
Side notes:
As I'm using sql it would be possible to implement some algorithm in another language to generate all the possible combinations and save them in a table that can be used for joining, see e.g. How to get all possible combinations of a list’s elements?. I would like to only rely on that as a last resort, as the creation of new tables will be involved and these will contain a lot of entries.
The resulting sql statement that uses the regex should run on both Postgres and Oracle.
The set of positive examples is also referred to as "powerset".
Edit: Clarified the list of positive examples
I wouldn't use Regex for this, as e.g. the requirements "have to be unique" and "have to be in ascending order" can't really be expressed with a regular expression (at least I can't think of a way to do that).
As you also need to have an expression that is identical in Postgres and Oracle, I would create a function that checks such a list and then hide the DBMS specific implementation in that function.
For Postgres I would use its array handling features to implement that function:
create or replace function is_valid(p_input text)
returns boolean
as
$$
select coalesce(array_agg(x order by x) = string_to_array(p_input, ','), false)
from (
select distinct x
from unnest(string_to_array(p_input,',')) as t(x)
where x ~ '^[0-9]+$' -- only numbers
) t
where x::int between 1 and 4 -- the cast is safe as the inner query only returns valid numbers
$$
language sql;
The inner query returns all (distinct) elements from the input list as individual numbers. The outer query then aggregates that back for values in the desired range and numeric order. If that result isn't the same as the input, the input isn't valid.
Then with the following sample data:
with sample_data (input) as (
values
('1'),
('1,2'),
('1,3'),
('1,4'),
('1,2,3'),
('1,2,4'),
('foo'),
('1aa,1234'),
('1,,2,33,444,')
)
select input, is_valid(input)
from sample_data;
It will return:
input | is_valid
-------------+---------
1 | true
1,2 | true
1,3 | true
1,4 | true
1,2,3 | true
1,2,4 | true
foo | false
1aa,1234 | false
1,,2,33,444, | false
If you want to use the same function in Postgres and Oracle you probably need to use returns integer in Postgres as Oracle still doesn't support a boolean data type in SQL
Oracle's string processing functions are less powerful than Postgres' functions (e.g. no string_to_array or unnest), but you can probably implement a similar logic in PL/SQL as well (albeit more complicated)

SQL add column value based on another column ACCESS

What I'm trying to do is add another column to an existing table whose value will depend on an already existing column in the table. For example say I have this table:
Table1
|Letter|
A
C
R
A
I want to create another column (for example, numbers) that is chosen based on the letters. So let's say A corresponds with 10, C with 3 and R with 32 (this was chosen at random). My resulting table should be like this:
|Letter| Number |
A | 10
C | 3
R | 32
A | 10
Can anyone help me write a query that does this..I have over 20 different cases, so the simpler it looks the better.
Thanks in advance!
Options:
Build a table that associates [Letter] with the numeric value. Include this table in query by joining on the common [Letter] fields.
A very long Switch() expression. However, query design grid cell has a limit of 1024 characters.
Better to provide example with your real data and criteria.

How to insert uneven data rows into matrix in SAS?

I have an originations data set with loan ids. I then have a corresponding dataset with performance data for each of these loans ids, which can be anywhere from 10-40 rows in the performance data set.
The start date of each of the performance loans is not the same either, although some do overlap. What I want to do is take every loan id group in the performance data set, and then create a row of a certain column value across all occurrences in the data set. It doesn't matter if they start on different dates, I just want to align the values as this is the first value for loan id x and y.
For example:
ID Date Val
3 201601 100
3 201602 102
3 201603 103
--> Result:
ID Val1 Val2 Val3
3 100 102 103
I'm having two issues. One is the differing size of performance data for each id. I can't construct a matrix with differing lengths of rows. I'm assuming I'll need to append 0's to the end of each row to meet a predefined width.
My second issue is that I'm not sure how to read through a the performance data set to group loans, extract the value column, construct the column into a row for that id, and then insert into a matrix. I know how I would do this in Python but I need to use SAS. I can construct tables in SAS, but I'm not sure how to append rows, only columns.
If someone could provide some guidance on this it'd be a great help.
Anyone who runs into a similar issue it ended up being only a few lines of code.
proc transpose data = new_data
out = new_data1;
var trans_state;
by id;
run;
The output will be

Hive table with dynamic number of columns

TestTable
inputsCOLUMN
3-300-150-150-R
3-200-100-100-A
5-500-00-500-A
output
3_open 3_spent 3_closing 3_type 5_open 5_spent 5_closing 5_type
-------- --------- ----------- -------- -------- --------- ----------- --------
300 150 150 R 500 00 500 A
200 100 100 A
Above is the input table called TestTable. It has two columns that contains rows of data(strings)
And there is a desired output table of which the column names are based on the input string.
the column name is the first number on the string + another string name, like CONCAT(split(inputsCOLUMN,'\\-')[0],'-','type')
so that output is the desired output. and the below query is not working as desired because of that part when i am trying to concatenate an alias i think is not allowed. so help me if there is a way i can find that desired output.
SELECT split(inputsCOLUMN,'\\-')[1] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','open'),
split(inputsCOLUMN,'\\-')[2] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','spent'),
split(inputsCOLUMN,'\\-')[3] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','closing'),
split(inputsCOLUMN,'\\-')[4] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','type')
Hive cannot have a dynamic number of columns, and it cannot have dynamic column names. It must be able to determine the entire schema (column count, types, and names) at query planning time, without looking at any data.
It's also not clear to me how exactly you're matching up input records into a single row. For example, how do you know which "3" record corresponds to which "5" record.
If you knew that, for example, there would always be a "3" record and a "5" record and you could commit to those being the only column names, and if you had a consistent way of matching up records to "flatten" this data, then it is possible, but difficult. I've done almost this exact operation before, and it involved a custom UDTF and a custom UDAF, and some code to auto-generate the actual query, which ended up being hundreds of lines long in some cases. I would re-evaluate why you want to do this in the first place and see if you can come up with another approach.

What is the best way to reassign ordinal number of a move operation

I have a column in the sql server called "Ordinal" that is used to indicate the display order of the rows. It starts from 0 and skips 10 for the next row. so we have something like this:
Id Ordinal
1 0
2 20
3 10
It skips 10 because we wanted to be able to move item in between items (based on ordinal) without having to reassign ordinal number for the entire table.
As you can imagine eventually, Ordinal number will need to be reassign somehow for a move in between operation either on surrounding rows or for the entire table as the unused ordinal numbers between the target items are all used up.
Is there any algorithm that I can use to effectively reorder the ordinal number for the move operation taken in the consideration like long term maintainability of the table and minimizing update operations of the table?
You can re-number the sequences using a somewhat complicated UPDATE statement:
UPDATE u
SET u.sequence = 10 * (c.num_below-1)
FROM test u
JOIN (
SELECT t.id, count(*) AS num_below
FROM test t
JOIN test tr ON tr.sequence <= t.sequence
GROUP BY t.id
) c ON c.id=u.id
The idea is to obtain a count of items with the sequence lower than that of the current row, multiply the count by ten, and assign it as the new count.
The content of test before the UPDATE:
ID Sequence
__ ________
1 0
2 10
3 20
4 12
The content of test after the UPDATE:
ID Sequence
__ ________
1 0
2 30
3 10
4 20
Now the sequence numbers are evenly spread again, so you can continue inserting in the middle until you run out of new sequence numbers; then you can re-number again.
Demo.
These won't answer your question directly--I just thought I might suggest some other approaches:
One possibility--don't try to do it by hand. Have your software manage the numbers. If they need re-writing, just save them with new numbers.
a second--use a "Linked List" instead. In each record store the index of the next record you want displayed, then have your code load that directly into a linked list.
Yet another simple approach. Let's say you're inserting a new record with an ordinal equal x.
First, check if there's a row having ordinal value equal x. In case there's one, just update all the records having the ordinal value equal or bigger than x increasing them by y. Then, you are safe to insert a new record.
This way you're sure you'll not run update every time and of course, you'll keep the order.