Storing operators with operands in table in SQL Server - sql

I work at a company that sells many versions of a product to several different resellers, and each reseller adds parameters that change the resale price of the product.
For example, we sell a vehicle service contract where, for a certain vehicle, the reserve price of the contract is $36. The dealer marks up every reserve by 30% (to $47), adds a premium of $33 to the reserve price (now $80), and adds a set of fee--like commissions and administrative costs--to bring the contract total to $235.
The reserve price is the same for every dealer on this program, but they all use different increases that are either flat or a percentage. There are of course dozens of parameters for each contract.
My question is this: can I store a table of parameters like "x*1.3" or "y+33" that are indexed to a unique ID, and then join or cross apply that table to one full of values like the reserve price mentioned above?
I looked at the SQL Server "table valued parameters," but I don't see from the MSDN examples if they apply to my case.
Thanks so much for your kind replies.
EDIT:
As I feared, my example seems to be a little too esoteric (my fault). So consider this:
Twinings recommends different temperatures for brewing various kinds of tea. Depending on your elevation, your boiling point might be different. So there must be a way to store a table of values that looks like this--
(source: twinings.co.uk)
A user enters a ZIP code that has a corresponding elevation, and SQL Server calculates and returns the correct brew temperature for you. Is that any better an example?
Again, thanks to those who have already contributed.

I don't know if I like this solution, but it does seem to at least work. The only real way to iteratively construct totals is to use some form of "loop", and the most set-based way of doing that these days is with a recursive CTE:
declare #actions table (ID int identity(1,1) not null, ApplicationOrder int not null,
Multiply decimal(12,4), AddValue decimal(12,4))
insert into #actions (ApplicationOrder,Multiply,AddValue) values
(1,1.3,null),
(2,null,33),
(3,null,155)
declare #todo table (ID int not null, Reserve decimal(12,4))
insert into #todo(ID,Reserve) values (1,36)
;With Applied as (
select
t.ID, Reserve as Computed, 0 as ApplicationOrder
from
#todo t
union all
select a.ID,
CONVERT(decimal(12,4),
((a.Computed * COALESCE(Multiply,1)) + COALESCE(AddValue,0))),
act.ApplicationOrder
from
Applied a
inner join
#actions act
on
a.ApplicationOrder = act.ApplicationOrder - 1
), IdentifyFinal as (
select
*,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ApplicationOrder desc) as rn
from Applied
)
select
*
from
IdentifyFinal
where
rn = 1
Here I've got a simple single set of actions to apply to each price (in #actions) and a set of prices to apply them to (in #todo). I then use the recursive CTE to apply each action in turn.
My result:
ID Computed ApplicationOrder rn
----------- --------------------------------------- ---------------- --------------------
1 234.8000 3 1
Which isn't far off your $235 :-)
I appreciate that you may have different actions to apply to each particular price, and so my #actions may instead, for you, be something that works out which rules to apply in each case. That may be one of more CTEs before mine that do that work, possibly using another ROW_NUMBER() expression to work out the correct ApplicationOrder values. You may also need more columns and join conditions in the CTE to satisfy this.
Note that I've modelled the actions so that each can apply a multiplication and/or an add at each stage. You may want to play around with that sort of idea (or e.g. add a "rounding" flag of some kind as well so that we might well end up with the $235 value).
Applied ends up containing the initial values and each intermediate value as well. The IdentifyFinal CTE gets us just the final results, but you may want to select from Applied instead just to see how it worked.

You can use a very simple structure to store costs:
DECLARE #costs TABLE (
ID INT,
Perc DECIMAL(18, 6),
Flat DECIMAL(18, 6)
);
The Perc column represents percentage of base price. It is possible to store complex calculations in this structure but it gets ugly. For example if we have:
Base Price: $100
Flat Fee: $20
Tax: 11.5%
Processing Fee: 3%
Then it will be stored as:
INSERT INTO #costs VALUES
-- op example
(1, 0.0, NULL),
(1, 0.3, NULL),
(1, NULL, 33.0),
(1, NULL, 155.0),
-- above example
(2, 0.0, NULL),
(2, NULL, 20.0),
(2, 0.115, NULL),
(2, NULL, 20.0 * 0.115),
(2, 0.03, NULL),
(2, NULL, 20.0 * 0.03),
(2, 0.115 * 0.03, NULL),
(2, NULL, 20 * 0.115 * 0.03);
And queried as:
DECLARE #tests TABLE (
ID INT,
BasePrice DECIMAL(18, 2)
);
INSERT INTO #tests VALUES
(1, 36.0),
(2, 100.0);
SELECT t.ID, SUM(
BasePrice * COALESCE(Perc, 0) +
COALESCE(Flat, 0)
) AS TotalPrice
FROM #tests t
INNER JOIN #costs c ON t.ID = c.ID
GROUP BY t.ID
ID | TotalPrice
---+-------------
1 | 234.80000000
2 | 137.81400000
The other, better, solution is to use a structure such as follows:
DECLARE #costs TABLE (
ID INT,
CalcOrder INT,
PercOfBase DECIMAL(18, 6),
PercOfPrev DECIMAL(18, 6),
FlatAmount DECIMAL(18, 6)
);
Where CalcOrder represents the order in which calculation is done (e.g. tax before processing fee). PercOfBase and PercOfPrev specify whether base price or running total is multiplied. This allows you to handle situations where, for example, a commission is added on base price but it must not be included in tax and vice-versa. This approach requires recursive or iterative query.

Related

Select from a list in a parameter value

I am trying to have a dropdown with subject areas for a school report. The problem I am running into is that in my database, the subjects are grouped by grade and subject instead of just subject. So when I look at gt.standardid in (#SubjectArea) for "Literacy" the standard ids for literacy are (54,61,68,75,88,235) one for each grade level, but I want to have it show me all of them as Literacy. In my parameter "#subjectArea" I have specific values I want to add for each subject area, so for the Label of "Literacy" I want it to select the StandardIds (54,61,68,75,88,235). I am not sure how to accomplish this.
Select
CS.subjectArea
,CS.Name As Group_Name
,GT.Abbreviation
,GT.Name
,GT.standardID
From GradingTask as GT
inner join CurriculumStandard CS
on GT.Standardid = CS.standardid
where GT.ARCHIVED = 0
and GT.standardid in (#SubjectArea)
ORDER BY GT.seq
I would try a cascading parameter approach.
You can have the first parameter be a pre-defined list:
The specific values are not important, but will be used in the next step.
Ideally your IDs would be in a table already, but if not you can use something like this:
declare #SubjectIDs as table
(
[SubjectName] nvarchar(50),
[SubjectID] int
);
insert into #SubjectIDs
(
[SubjectName],
[SubjectID]
)
values
('Literacy', 54),
('Literacy', 61),
('Literacy', 68),
('Literacy', 75),
('Literacy', 88),
('Literacy', 23);
select
SubjectID
from #SubjectIDs
where SubjectName in (#SubjectList);
Make this into a data set. I'm going to call it DS_SubjectIDs.
Make a new hidden or internal parameter called SubjectIDs:
Set it to get its values from the DS_SubjectIDs query:
You can now use the parameter for SubjectIDs in your final query.

How to speed up a slow MariaDB SQL query that has a flat BNL join?

I'm having problems with a slow SQL query running on the following system:
Operating system: Debian 11 (bullseye)
Database: MariaDB 10.5.15 (the version packaged for bullseye)
The table schemas and some sample data (no DB Fiddle as it doesn't support MariaDB):
DROP TABLE IF EXISTS item_prices;
DROP TABLE IF EXISTS prices;
DROP TABLE IF EXISTS item_orders;
CREATE TABLE item_orders
(
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
ordered_date DATE NOT NULL
) Engine=InnoDB;
CREATE TABLE prices
(
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
selected_flag TINYINT UNSIGNED NOT NULL
) Engine=InnoDB;
CREATE TABLE item_prices
(
item_order_id INT UNSIGNED NOT NULL,
price_id INT UNSIGNED NOT NULL,
PRIMARY KEY (item_order_id, price_id),
FOREIGN KEY (item_order_id) REFERENCES item_orders(id),
FOREIGN KEY (price_id) REFERENCES prices(id)
) Engine=InnoDB;
INSERT INTO item_orders VALUES (1, '2022-01-01');
INSERT INTO item_orders VALUES (2, '2022-02-01');
INSERT INTO item_orders VALUES (3, '2022-03-01');
INSERT INTO prices VALUES (1, 0);
INSERT INTO prices VALUES (2, 0);
INSERT INTO prices VALUES (3, 1);
INSERT INTO prices VALUES (4, 0);
INSERT INTO prices VALUES (5, 0);
INSERT INTO prices VALUES (6, 1);
INSERT INTO item_prices VALUES (1, 1);
INSERT INTO item_prices VALUES (1, 2);
INSERT INTO item_prices VALUES (1, 3);
INSERT INTO item_prices VALUES (2, 4);
INSERT INTO item_prices VALUES (2, 5);
INSERT INTO item_prices VALUES (3, 6);
A high-level overview of the table usage is:
For any given month, there will be thousands of rows in item_orders.
A row in item_orders will link to zero or more rows in item_prices (item_orders.id = item_prices.item_order_id).
A row in item_prices will have exactly one linked row in prices (item_prices.price_id = prices.id).
For any given row in item_orders, there will be zero or one row in prices where the selected_flag is 1 (item_orders.id = item_prices.item_order_id AND item_prices.price_id = prices.id AND prices.selected_flag = 1). This is enforced by the application rather than the database (i.e. it's not defined as a CONSTRAINT).
What I want to get, in a single query, are:
The number of rows in item_orders.
The number of rows in item_orders where the related selected_flag is 1.
At the moment I have the following query:
SELECT
COUNT(item_orders.id) AS item_order_count,
SUM(CASE WHEN prices.id IS NOT NULL THEN 1 ELSE 0 END) AS item_order_selected_count
FROM
item_orders
LEFT JOIN prices ON prices.id IN (
SELECT price_id
FROM item_prices
WHERE
item_prices.item_order_id = item_orders.id)
AND prices.selected_flag = 1
This query returns the correct data (item_order_count = 3, item_order_selected_count = 2), however it takes a long time (over 10 seconds) to run on a live dataset, which is too slow for users (it is a heavily-used report, refreshed repeatedly through the day). I think the problem is the subquery in the LEFT JOIN, as removing the LEFT JOIN and the associated SUM reduces the query time to around 0.1 seconds. Also, the EXPLAIN output for the join has this in the Extra column:
Using where; Using join buffer (flat, BNL join)
Searching for 'flat BNL join' reveals a lot of information, of which the summary seems to be: 'BNL joins are slow, avoid them if you can'.
Is it possible to rewrite this query to return the same information, but avoiding the BNL join?
Things I've considered already:
All the ID columns are indexed (item_orders.id, prices.id, item_prices.item_order_id, item_prices.price_id).
Splitting the query in two - one for item_order_count (no JOIN), the other for item_order_selected_count (INNER JOIN, as I only need rows which match). This works but isn't ideal as I want to build up this query to return more data (I've stripped it back to the minimum for this question). Also, I'm trying to keep the query output as close as possible to what the user will see, as that makes debugging easier and makes the database (which is optimised for that workload) do the work, rather than the application.
Changing the MariaDB configuration: Some of the search results for BNL joins suggest changing configuration options, however I'm wary of doing this as there are hundreds of other queries in the application and I don't want to cause a regression (e.g. speed up this query but accidentally slow down all the others).
Upgrading MariaDB: This would be a last resort as it would involve using a version different to that packaged with Debian, might break other parts of the application, and the system has just been through a major upgrade.
Not sure whether this will be any faster but worth a try (table joins on indexed foreign keys are fast and sometimes simplicity is king...)
SELECT
(SELECT COUNT(*) FROM item_orders) AS item_order_count,
(SELECT COUNT(*)
FROM item_orders io
JOIN item_prices ip
ON io.id = ip.item_order_id
JOIN prices p
ON ip.price_id = p.id
WHERE p.selected_flag = 1) AS item_order_selected_count;
I came back to this question this week as the performance got even worse as the number of rows increased, to the point where it was taking over 2 minutes to run the query (with around 100,000 rows in the item_orders table, so hardly 'big data').
I remembered that it was possible to list multiple tables in the FROM clause and wondered if the same was true of a LEFT JOIN. It turns out this is the case and the query can be rewritten as:
SELECT
COUNT(item_orders.id) AS item_order_count,
SUM(CASE WHEN prices.id IS NOT NULL THEN 1 ELSE 0 END) AS item_order_selected_count
FROM
item_orders
LEFT JOIN (item_prices, prices) ON
item_prices.item_order_id = item_orders.id
AND prices.id = item_prices.price_id
AND prices.selected_flag = 1
This returns the same results but takes less than a second to execute. Unfortunately I don't know any relational algebra to prove this, but effectively what I am saying is 'only LEFT JOIN where everything matches on both item_prices and prices'.

Clustering/Similarity between text cells in an postgres aggregate

I've got a table that has a text column and some other identifying features. I want to be able to group by one of the features and find out whether the text in the groups are similar or not. I want to use this to determine if there are multiple groups in my data or a single group (with some possible bad spelling) so that I can provide a rough "confidence" value showing if the aggregate represents a single group or not.
CREATE TABLE data_test (
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Job VARCHAR(100) NOT NULL);
INSERT INTO data_test
(Name, Job)
VALUES
('John', 'Astronaut'),
('John', 'Astronaut'),
('Ann', 'Sales'),
('Jon', 'Astronaut'),
('Jason', 'Sales'),
('Pranav', 'Sales'),
('Todd', 'Sales'),
('John', 'Astronaut');
I'd like to run a query that was something like:
select
Job,
count(Name),
Similarity_Agg(Name)
from data_test
group by Job;
and receive
Job count Similarity
Sales 4 0.1
Astronaut 4 0.9
Basically showing that Astronaut names are very similar (or, more likely in my data, all the rows are referring to a single astronaut) and the Sales names aren't (more people working in sales than in space). I see there is a Postgres Module that can handle comparing two strings but it doesn't seem to have any aggregate functions in it.
Any ideas?
One option is a self-join:
select
d.job,
count(distinct d.id) cnt,
avg(similarly(d.name, d1.name)) avg_similarity
from data_test d
inner join data_test d1 on d1.job = d.job
group by d.job

Parent/Child Tables Query Pattern

Suppose I have the following parent/child table relationship in my database:
TABLE offer_master( offer_id int primary key, ..., scope varchar )
TABLE offer_detail( offer_detail_id int primary key, offer_id int foreign key, customer_id int, ... )
where offer_master.scope can take on the value
INDIVIDUAL: when the offer is to made to particular customers. In this case,
whenever a row is inserted into offer_master, a corresponding row is
added to offer_detail for each customer to which the offer has been extended.
e.g.
INSERT INTO offer_master( 1, ..., 'INDIVIDUAL' );
INSERT INTO offer_detail( offer_detail_id, offer_id, customer_id, ... )
VALUES ( 1, 1, 100, ... )
INSERT INTO offer_detail( offer_detail_id, offer_id, customer_id, ... )
VALUES ( 2, 1, 101, ... )
GLOBAL: when the offer is made to all customers. In this case,
new offers can be added to the parent table as follows:
INSERT INTO offer_master( 2, ..., 'GLOBAL' );
INSERT INTO offer_master( 3, ..., 'GLOBAL' );
but a child row is added to offer_detail only
when a customer indicates some interest in the offer. So
it may be the case that, at some later point we will have
INSERT INTO offer_detail( offer_detail_id, offer_id, customer_id, ... )
VALUES ( 4, 3, 100, ... )
Given this situation, suppose we would like to query the database
to obtain all offers which have been extended to customer 100;
this includes 3 types of offers:
offers which have been extended specifically to customer 100.
global offers which customer 100 showed no interest in.
global offers which customer 100 did show interest in.
I see two approaches:
Using a Subquery:
SELECT *
FROM offer_master
WHERE offer_id in (
SELECT offer_id
FROM offer_detail
WHERE customer_id = 100 )
OR scope = 'GLOBAL'
Using a UNION
SELECT om.*
FROM offer_master om INNER JOIN
offer_detail od
ON om.offer_id = od.offer_id
WHERE od.customer_id = 100
UNION
SELECT *
FROM offer_master
WHERE scope = 'GLOBAL'
Note: a UNION ALL cannot be used since a global offer
which a customer has shown interest in would be duplicated.
My question is:
Does this query pattern have a name?
Which of the two query methods are preferable?
Should the database design be improved in some way?
I'm not aware of a pattern name.
To me, the second query is clearer but I think either is OK.
offer_detail seems to be a dual purpose table which is a bit of a red flag to me. You might have separate tables for the customers in an individual offer, and the customers who have expressed interest.

Calculating percent of votes inside mysql statement

UPDATE polls_options SET `votes`=`votes`+1, `percent`=ROUND((`votes`+1) / (SELECT voters FROM polls WHERE poll_id=? LIMIT 1) * 100,1)
WHERE option_id=?
AND poll_id=?
Don't have table data yet, to test it properly. :)
And by the way, in what type % integers should be stored in database?
Thanks for the help!
You don't say what database your using (Postgresql, Mysql, Oracle..etc) but if your using Mysql you could get away with using a TinyInt datatype. Your rounding to an int anyway, and assuming your percentages will always be between 0 and 100 you'll be fine.
Your problem seems to be that you don't have any test data so you are unable to test the syntax of your query. But that is a problem you can easily solve yourself and it doesn't even take that long:
Just make up some data and use that to test.
This isn't as hard as it might sound. For example here I create two polls, the first of which has four votes and the second of which has two votes. I then try to add a vote to option 1 of poll 1 using your query.
CREATE TABLE polls_options (
poll_id INT NOT NULL,
option_id INT NOT NULL,
votes INT NOT NULL,
percent FLOAT NOT NULL
);
INSERT INTO polls_options (poll_id, option_id, votes, percent) VALUES
(1, 1, 1, '25'),
(1, 2, 3, '75'),
(2, 1, 1, '50'),
(2, 2, 1, '50');
CREATE TABLE polls (poll_id INT NOT NULL, voters INT NOT NULL);
INSERT INTO polls (poll_id, voters) VALUES
(1, 4),
(2, 2);
UPDATE polls_options
SET votes = votes + 1,
percent = ROUND((votes + 1) / (SELECT voters FROM polls WHERE poll_id = 1 LIMIT 1) * 100,1)
WHERE option_id = 1
AND poll_id = 1;
SELECT * FROM polls_options;
Here are the results:
poll_id option_id votes percent
1 1 2 75
1 2 3 75
2 1 1 50
2 2 1 50
You can see that there are a number of problems:
The polls table isn't updated yet so the total vote count for poll 1 is wrong (4 instead of 5). Notice that you don't even need this table - it duplicates the same information that can already be found in the polls_options table. Having two keep these two tables in sync is extra work. If you need to adjust the results for some reason, for example to remove some spam voting, you will have to remember to update both tables. It's unnecessary extra work and an extra source of errors.
Even if you have remembered to update the polls table first, the percentage for option 1 is still calculated incorrectly: it is calculated as 3/5 instead of 2/5 because it is effectively doing this calculation: ((votes + 1) + 1).
The percentage for 2 isn't updated causing the total percentage for poll 1 to be greater than 100.
You probably shouldn't even be storing the percentage in the database. Instead of persisting this value consider calculate it on-the-fly only when you need it.
You might want to reconsider your table design to avoid redundant data. Consider normalizing your table structure. If you do this then all the problems I listed above will be solved and your statements will be much simpler.
Good luck!