Rule Table (With wild cards) - sql

I have searched the web for something similar to my question, but have not found anything. I am looking for a way to apply "rules" with a SQL query.
My input data schema:
a: Int (NOT NULL)
b: Int (NOT NULL)
c: Int (NOT NULL)
The rule table schema:
a: Int (NULLABLE)
b: Int (NULLABLE)
c: Int (NULLABLE)
result: Int (NOT NULL)
There could be multiple rules which could "match" with the data. NULL represents a wildcard (could be any value). For example:
Input Data
a | b | c
1 | 2 | 3
Rules table:
a | b | c | result
1 | 2 | NULL| 99
1 | NULL| NULL| 101
1 | 2 | 3 | 203
When the rules are applied, it should match with the row which has the most matches (row 3 in this case).
I have come up with a query, which appears to be working, but it is not perfect. It can be slow if the "rules" table gets significant in size, and I'm worried there are edge cases I could be missing.
SELECT input.*,
COALESCE(rule.result, -1) as 'RuleResult'
FROM dbo.input input
OUTER APPLY (
SELECT TOP 1 result
FROM dbo.RuleTable rt
WHERE (input.a = rt.a OR rt.a IS NULL)
AND (input.b = rt.b OR rt.b IS NULL)
AND (input.c = rt.c OR rt.c IS NULL)
ORDER BY rt.a DESC, rt.b DESC, rt.c DESC
) rule
The idea is: The outer apply will run the query for each row in the input. The ORDER BY clause will set the priority of the rules, and will have the columns with the least number of NULL values at the top. The top row then becomes the result. The ORDER BY clause needs to align with the business need. If there is one rule with an 'a' value, and another with a 'b' value, an input row could match with two rules which only have a single matching condition. Then one still needs to be chosen.
My question: Is this query optimal? Am I missing anything? Are there resources about this out there I have not found? Is there a better way of doing this?
Update 1: After reading replies and discussing this with other people, here are some additional thoughts (still need to test these out):
Can we reverse the join? Basically join the rule table to the data
Can we eliminate input data that we know no rules apply to? Is this possible with NULLS (wildcard) values, or would this help?
Note: I'm still working through the thought process on this, so I may not be super clear yet.

Maybe the ORDER BY in the OUTER APPLY could use a small change
ORDER BY (iif(rt.a is null,0,1)
+iif(rt.b is null,0,1)
+iif(rt.c is null,0,1)) desc,
rt.a desc, rt.b desc, rt.c desc
Sample Data
create table input (a int, b int, c int);
insert into input values
(1,2,3),
(1,2,null),
(1,null,null),
(4,5,6);
create table RuleTable (a int, b int, c int, result int);
insert into RuleTable values
(1,2,null, 120),
(1,null,null, 100),
(1,2,3, 123),
(4,null,null, 400),
(null,5,6, 056);
Query
SELECT input.*
, COALESCE(ruled.result, -1) as RuleResult
FROM input input
OUTER APPLY (
SELECT TOP 1 result
FROM RuleTable as rt
WHERE (input.a = rt.a OR rt.a IS NULL)
AND (input.b = rt.b OR rt.b IS NULL)
AND (input.c = rt.c OR rt.c IS NULL)
ORDER BY (iif(rt.a is null,0,1)
+iif(rt.b is null,0,1)
+iif(rt.c is null,0,1)) desc,
rt.a desc, rt.b desc, rt.c desc
) ruled
ORDER BY a desc, b desc, c desc;
Result
a
b
c
RuleResult
4
5
6
56
1
2
3
123
1
2
null
120
1
null
null
100
Test on db<>fiddle here

Related

Pivot with column name in Postgres

I have the following table tbl:
column1 | column2 | column 3
-----------------------------------
1 | 'value1' | 3
2 | 'value2' | 4
How to do "pivot" with column names to produce output like:
column1 | 1 | 2
column2 | 'value1' |'value2'
column3 | 3 | 4
As has been commented, the issue of data types is undefined in the question.
If you are OK with all result columns being type text (every data type can be converted to text), you can use one of these:
Plain SQL
WITH cte AS (
SELECT nu.*
FROM tbl t
, LATERAL (
VALUES
(1, t.column1::text)
, (2, t.column2)
, (3, t.column3::text)
) nu(rn, c)
)
SELECT *
FROM (TABLE cte OFFSET 0 LIMIT 3) c1
JOIN (TABLE cte OFFSET 3 LIMIT 3) c2 USING (rn);
The same with useful column names:
WITH cte AS (
SELECT nu.*
FROM tbl t
, LATERAL (
VALUES
('column1', t.column1::text)
, ('column2', t.column2)
, ('column3', t.column3::text)
) nu(rn, c)
)
SELECT * FROM (
SELECT *
FROM (TABLE cte OFFSET 0 LIMIT 3) c1
JOIN (TABLE cte OFFSET 3 LIMIT 3) c2 USING (rn)
) t (key, row1, row2);
Works in any modern version of Postgres.
The SQL string has to be adapted to the number of rows and columns. See fiddles below!
Using a document type as stepping stone
Makes for shorter code.
With many rows and many columns, performance of the SQL solution may scale better because the intermediate derived table is smaller.
(The thread is limited as you can't have more than ~ 1600 table columns in Postgres.)
Since everything is converted to text anyway, hstore seems most efficient. See:
Key value pair in PostgreSQL
SELECT key
, arr[1] AS row1
, arr[2] AS row2
FROM (
SELECT x.key, array_agg(x.value) AS arr
FROM tbl t, each(hstore(t)) x
GROUP BY 1
) sub
ORDER BY 1;
Technically speaking we would have to enforce the right sort order when in array_agg(), but that should work without explicit ORDER BY. To be absolutely sure you can add one: array_agg(x.value ORDER BY t.ctid) Using ctid for lack of information.
You can do the same with JSON functions in (Postgres 9.3+). Just replace each(hstore(t) with json_each_text(row_to_json(t). The rest is identical.
These fiddles demonstrate how to scale each query:
Original example with 2 rows of 3 columns:
db<>fiddle here
Scaled up to 3 rows of 4 columns:
db<>fiddle here

SQL Select Where Opposite Match Does Not Exist

Trying to compare between two columns and check if there are no records that exist with the reversal between those two columns. Other Words looking for instances where 1-> 3 exists but 3->1 does not exist. If 1->2 and 2->1 exists we will still consider 1 to be part of the results.
Table = Betweens
start_id | end_id
1 | 2
2 | 1
1 | 3
1 would be added since it is a start to an end with no opposite present of 3,1. Though it did not get added until the 3rd entry since 1 and 2 had an opposite.
So, eventually it will just return names where the reversal does not exist.
I then want to join another table where the number from the previous problem has its name installed on it.
Table = Names
id | name
1 | Mars
2 | Earth
3 | Jupiter
So results will just be the names of those that don't have an opposite.
You can use a not exists condition:
select t1.start_id, t1.end_id
from the_table t1
where not exists (select *
from the_table t2
where t2.end_id = t1.start_id
and t2.start_id = t1.end_id);
I'm not sure about your data volume, so with your ask, below query will supply desired result for you in Sql Server.
create table TableBetweens
(start_id INT,
end_id INT
)
INSERT INTO TableBetweens VALUES(1,2)
INSERT INTO TableBetweens VALUES(2,1)
INSERT INTO TableBetweens VALUES(1,3)
create table TableNames
(id INT,
NAME VARCHAR(50)
)
INSERT INTO TableNames VALUES(1,'Mars')
INSERT INTO TableNames VALUES(2,'Earth')
INSERT INTO TableNames VALUES(3,'Jupiter')
SELECT *
FROM TableNames c
WHERE c.id IN (
SELECT nameid1.nameid
FROM (SELECT a.start_id, a.end_id
FROM TableBetweens a
LEFT JOIN TableBetweens b
ON CONCAT(a.start_id,a.end_id) = CONCAT(b.end_id,b.start_id)
WHERE b.end_id IS NULL
AND b.start_id IS NULL) filterData
UNPIVOT
(
nameid
FOR id IN (filterData.start_id,filterData.end_id)
) AS nameid1
)

SQL Unpivot / Coalesce multiple columns into one column based on values

I am trying to unpivot / coalesce multiple columns into one value, based on values in the target columns.
Given the following sample data:
CREATE TABLE SourceData (
Id INT
,Column1Text VARCHAR(10)
,Column1Value INT
,Column2Text VARCHAR(10)
,Column2Value INT
,Column3Text VARCHAR(10)
,Column3Value INT
)
INSERT INTO SourceData
SELECT 1, NULL, NULL, NULL, NULL, NULL, NULL UNION
SELECT 2, 'Text', 1, NULL, NULL, NULL, NULL UNION
SELECT 3, 'Text', 2, 'Text 2', 1, NULL, NULL UNION
SELECT 4, NULL, NULL, NULL, 1, NULL, NULL
I am trying to produce the following result set:
Id ColumnText
----------- ----------
1 NULL
2 Text
3 Text 2
4 NULL
Where ColumnXText column values become one "ColumnText" value per row, based on the following criteria:
If all ColumnX columns are NULL, then ColumnText = NULL
If a ColumnXValue value is "1" and the ColumnXText IS NULL, then
ColumnText = NULL
If a ColumnXValue value is "1" and the ColumnXText IS NOT NULL, then
ColumnText = ColumnXText.
There are no records with more than one ColumnXValue of "1".
What I'd tried is in this SQL Fiddle # http://sqlfiddle.com/#!6/f2e18/2
I'd tried (shown in SQL fiddle):
Unpivioting with CROSS / OUTER APPLY. I fell down on this approach because I was not able to get WHERE conditions to produce the expected results.
I'd also tried using UNPIVOT, but had no luck.
I was thinking of a brute-force approach that did not seem to be correct. The real source table has 44MM rows. I do not control the schema of the source table.
Please let me know if there's a simpler approach than a brute-force tangle of CASE WHENs. Thank you.
I don't think there is much mileage in trying to be too clever with this
SELECT
Id,
CASE
WHEN COLUMN1VALUE = 1 THEN COLUMN1TEXT
WHEN COLUMN2VALUE = 1 THEN COLUMN2TEXT
WHEN COLUMN3VALUE = 1 THEN COLUMN3TEXT
End as ColumnText
From
Sourcedata
I did have them in 321 order, but considered that the right answer might be hit sooner if the checking is done in 123 order instead (fewer checks, if there are 44million rows, might be significant)
Considering you have 44 million rows, you really don't want to experiment to much to join table on itself with apply or something like that. You need just go through it once, and that's best with simple CASE, what you call "brute-force" approach:
SELECT
Id
, CASE WHEN Column1Value = 1 THEN Column1Text
WHEN Column2Value = 1 THEN Column2Text
WHEN Column3Value = 1 THEN Column3Text
END AS ColumnText
FROM SourceData
But, if you really want to get fancy and write something without case, you could use UNION to merge different columns into one, and then join on it:
wITH CTE_Union AS
(
SELECT Id, Column1Text AS ColumnText, Column1Value AS ColumnValue
FROM SourceData
UNION ALL
SELECT Id, Column2Text, Column2Value FROM SourceData
UNION ALL
SELECT Id, Column3Text, Column3Value FROM SourceData
)
SELECT s.Id, u.ColumnText
FROM SourceData s
LEFT JOIN CTE_Union u ON s.Id = u.id and u.ColumnValue = 1
But I guarantee first approach will outperform this by a margin of 4 to 1
If you do not want to use a case expression, then you can use another outer apply() on a common table expression (or subquery/derived table) of your original unpivot with outer apply():
;with cte as (
select s.Id, oa.ColumnText, oa.ColumnValue
from sourcedata s
outer apply (values
(s.Column1Text, s.Column1Value)
, (s.Column2Text, s.Column2Value)
, (s.Column3Text, s.Column3Value)
) oa (ColumnText, ColumnValue)
)
select s.Id, x.ColumnText
from sourcedata s
outer apply (
select top 1 cte.ColumnText
from cte
where cte.Id = s.Id
and cte.ColumnValue = 1
) x
rextester demo: http://rextester.com/TMBR41346
returns:
+----+------------+
| Id | ColumnText |
+----+------------+
| 1 | NULL |
| 2 | Text |
| 3 | Text 2 |
| 4 | NULL |
+----+------------+
This will give you the first non-null text value in order. It seems as if this is what you are trying to accomplish.
select ID, Coalesce(Column3Text,Column2Text,Column1Text) ColumnText
from SourceData

Determine source on COALESCE fields

I have two tables table which are identical in structure but belong to different schemas (schemas A and B). All rows in question will always appear in the A.table but may or may not appear in B.table. B.table is essentially an override for the defaults in A.table.
As such my query uses a COALESCE on each field similar to:
SELECT COALESCE(B.id, A.id) as id,
COALESCE(B.foo, A.foo) as foo,
COALESCE(B.bar, A.bar) as bar
FROM A.table LEFT JOIN B.table ON (A.id = B.id)
WHERE A.id in (1, 2, 3)
This works great, but I also want to add the source of the data. In the example above, assuming id=2 existed in B.table but not 1 or 3, I would want to include some indication that A is the source for 1 and 3 and B is the source for 2.
So the data might look like the following
+---------------------------------+
| id | foo | bar | source |
+---------------------------------+
| 1 | a | b | A |
| 2 | c | d | B |
| 3 | e | f | A |
+---------------------------------+
I don't really care what the value of source is as long as I can distinguish A from B.
I am no pgsql expert (not by a long shot) but I have tinkered around with EXISTS and a subquery but have had no luck so far.
As records showing the default value (from A.table) have NULLs for B.id, all you need is to add this column specification to your query:
CASE WHEN B.id IS NULL THEN 'A' ELSE 'B' END AS Source
The USING clause would simplify the query you have:
SELECT id
, COALESCE(B.foo, A.foo) AS foo
, COALESCE(B.bar, A.bar) AS bar
, CASE WHEN b.id IS NULL THEN 'A' ELSE 'B' END AS source -- like #Terje provided
FROM a
LEFT JOIN b USING (id)
WHERE a.id IN (1, 2, 3);
But typically, this alternative query should serve you better:
SELECT x.* -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT *, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT *, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
Advantages:
You don't have to add another COALESCE construct for every column you want to add to the result.
The same query works for any number of columns in a and b.
The query even works if the column names are not identical. Only number and data types of columns must match.
Of course, you can always list selected, compatible columns as well:
SELECT * -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT foo, bar, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT foo2, bar17, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
The first SELECT determines names, data types and number of columns.
This query doesn't break if columns in b are not defined NOT NULL.
COALESCE cannot tell the difference between b.foo IS NULL and no row with matching id in b. So the source of any result column (except id) can still be 'A', even if the result row says 'B' - if any relevant column in b can be NULL.
My alternative returns all values from b if the row exists - including NULL values. So the result can be different if columns in b can be NULL. It depends on your requirements which behavior is desirable.
Either query assumes that id is defined as primary key (so exactly 1 or 0 rows per given id value).
Related:
Select first record if none match
What is the difference between LATERAL and a subquery in PostgreSQL?

Select exactly 5 rows from a table

I have an odd requirement which ideally should be solved in SQL, not the surrounding app.
I need to select exactly 5 rows regardless of how many are actually available. In practice the number of rows available will usually be less than 5 and on some rare occasions it will be more than 5. The "extra" rows should have null in every column.
The app is written in a technology that isn't Turing Complete. This requirement is much more difficult to solve in the app's code than you might imagine! To describe it, the app is effectively a transformer: It takes in a bunch of queries and spits out a report. So please understand the app is NOT written in a "programming language" in the traditional sense.
So for example, if I have a table:
A | B
-----
1 | X
2 | Y
3 | Z
Then a valid result would be
A | B
-----------
2 | Y
1 | X
3 | Z
null | null
null | null
I know this is an unusual requirement. Sadly it can't be solved in the application due to the technology being used.
Ideally this shouldn't require changes to the database but if there is no other way that changes can be arranged.
Any suggestions?
You can do something like this:
select top 5 a, b
from (select a, b, 1 as priority from t union all
select null, null, 2 cross join
(values(1, 2, 3, 4, 5)) v(5)
) x
order by priority;
That is, create dummy rows, append them, and then choose the first five.
I do think that this work should be done in the app, but you can do it in SQL.
Create Table #Test (A int, B int)
Insert #Test Values (1,1)
Insert #Test Values (2,1)
Insert #Test Values (3,1)
Select Top 5 * From
(
Select A, B From #Test
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
) A
Wrap this in a stored proc..
declare #rowcount int
select top 5* from dbo.test
set #rowcount=##rowcount
if #rowcount<5
Begin
select * from dbo.test
union all
select null from dbo.numbers where n<=5-#rowcount
End
If you use some sort of tally table (although the numbers themselves do not matter, only that the table has enough records), you can use it to create the dummy rows. e.g. using sys.columns:
select top 5 a,b from
(
select a, b, 0 ord from yourTable
union all
select null a,null b, 1 from sys.columns
) t
order by ord
The advantage of the tally would be that if you need another number of rows in the future, you only need to change the top x (provided the tally table has enough rows)
Get those 3 records from your table.
Take a counter variable.
and then from your code add the NULL content until your counter gets 5.