Storing and querying pricing structures which differ for each row? - sql

I've been struggling to think of a good way to store this data...
Each row is a doctor's practice. Each practice has a price which differs for each age group. For example one practice might have this pricing structure, where the left is the age and the right is the price:
0: 0
13: 20
21: 35
35: 35
60: 20
However the structure varies from practice to practice. So another practice might have this:
0: 5
25: 50
40: 35
60: 20
I need to be able to fetch the price for any given age.
Currently I have all the prices stored in a JSONB column on each row. I grab the JSON object in nodeJS and run an algorithm to get the price for an age. Surely this isn't ideal, but I can't think of any way to store and query this in postgres.
So far I've considered having another table for fees and storing each age/price couple like this:
name | age | price
but that seems kinda clumsy and would create many thousands of rows. The other idea I had was to have columns for the cost at each possible age on the practice row like this:
name | 0 | 5 | 6 | 13 | 16 | 18 | 21 | 25 | 40 | 60
but that also seems incredibly clumsy and unwieldy.
Any ideas? Or maybe I'm better off just keeping it in JSON?

I would store the prices in a separate table that defines the "interval" in which the price is valid:
create table practice
(
id integer not null primary key,
name text
);
create table prices
(
practice_id integer not null references practice,
from_age integer not null,
to_age integer,
price integer not null,
constraint valid_interval check (from_age < to_age)
);
You can also add an exclusion constraint to prevent overlapping intervals:
CONSTRAINT no_overlap
EXCLUDE USING gist (practice_id WITH =, int4range(from_age, to_age) WITH &&)
Of course instead of using two columns from_age and to_age you could put that into a single int4range column which has the benefit of being indexable. As you typically will query per practice that only leaves a very small number of rows to go through so the index on the range column isn't necessary (or won't help)
To query the price, you just do:
select price
from prices
where practice_id = 42
and 25 between from_age and to_age
As the condition practice_id = 42 already narrows this down to just 5 or 6 rows, this is quite fast. Even if you have thousands of practices.

If I understand it correctly, the price depends on the combination of practice and age. How about make a table with 3 columns:
Practice_ID | Age | Price
Where the Practice_ID and Age are the primary keys. That way you can query for the price with the practice and age.

Related

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.
Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

SQL composite key value vs string

I have a list of integer from 1 to N elements (N < 24)
At the moment, there are two solutions to manage this value in a SQL database (I think it is the same for MySQL and Microsoft SQL Server)
Solution 1: use VARCHAR and , to separate integer values:
aaa | 40,50,50,10,600,200
aab | 40,50,600,200
aac | 40,50,50,10,600,200,500,1
Solution 2: create a new table with composite primary key (key, id) (id = index of element in list) and value:
aaa | 0 | 40
aaa | 1 | 50
aaa | 2 | 50
....
aab | 0 | 40
aab | 1 | 50
aab | 2 | 600
....
What is it better solution considering I have many items of data to load and I need to refresh this data many times
Thanks
Edit:
my operative case is: i need to refresh/read all data (list for key) with same call and i never call one by one, this is why i think first approach better.
And all math like avg or max i wanna do on client.
Usually the second approach is preferable. One advantage is ease of access:
-- Third value of aaa
select value from mytable where key = 'aaa' and pos = 3;
-- Avarage value of aaa
select avg(value) from mytable where key = 'aaa';
-- Avarage number of values
select avg(cnt) from (select count(*) as cnt from mytable group by key) counted;
Another is data consistency. You can add simple constraints to your columns, such as to allow only integers from, say, 1 to 700 and positions only up to 23.
There is an exception to the above, though. If you use the database only to store the list as is and you don't want to select separate values or even aggregate them, i.e. if this is just a string to the DBMS and your queries don't care about its content, then store it as a simple string. Why not?
The second solution that you propose is the classic way of doing this, I would recommend that.
The first solution is quite terrible in scaling and in other hundred things

SQL Server Primary Key for a range lookup

I have a static dataset that correlates a range of numbers to some metadata, e.g.
+--------+--------+-------+--------+----------------+
| Min | Max |Country|CardType| Issuing Bank |
+--------+--------+-------+--------+----------------+
| 400011 | 400051 | USA |VISA | Bank of America|
+--------+--------+-------+--------+----------------+
| 400052 | 400062 | UK |MAESTRO | HSBC |
+--------+--------+-------+--------+----------------+
I wish to lookup a the data for some arbitrary single value
SELECT *
FROM SomeTable
WHERE Min <= 400030
AND Max >= 400030
I have about 200k of these range mappings, and am wondering the best table structure for SQL Server?
A composite key doesn't seem correct due to the fact that most of the time, the value being looked up will be in between the two range values stored on disk. Similarly, only indexing the first column doesn't seem to be selective enough.
I know that 200k rows is fairly insignificant, and I can get by with doing not much, but lets assume that the numbers of rows could be orders of magnitude greater.
If you usually search on both min and max then a compound key on (min,max) is appropriate. The engine will find all rows where min is less than X, then search within those result to find the rows where max is greater then Y.
The index would also be useful if you do searches on min only, but would not be applicable if you do searches only on max.
You can index the first number and then do the lookup like this:
select t.*,
(select top 1 s.country
from static s
where t.num >= s.firstnum
order by s.firstnum
) country
from sometable t;
Or use outer apply:
select t.*, s.country
from sometable t outer apply
(select top 1 s.country
from static s
where t.num >= s.firstnum
order by s.firstnum
) s
This should take advantage of an index on static(firstnum) or static(firstnum, country). This does not check against the second number. If that is important, use outer apply and do the check outside the subquery.
I would specify the primary key on (Min,Max). Queries are as simple as:
SELECT *
FROM SomeTable
WHERE #Value BETWEEN Min AND Max
I'd also define a constraint to enforce that Min <= Max. Then I would create a trigger to enforce uniqueness in ranges and prevent the database from storing an overlapping range.
I belive is easy/faster if you create a trigger for INSERT and then fill the related calculated columns country, issuing bank, card-number length
At the end you do the calculation only once, instead 200k every time you will do a query. Of course is there a space cost. But query will be much easier to mantain.
I remember once I have to calculate some sin and cos to calculate distance so I just create the calculated columns once.
After your update I think is even easier
+--------+--------+-------+--------+----------------+----------+
| Min | Max |Country|CardType| Issuing Bank | TypeID |
+--------+--------+-------+--------+----------------+----------+
| 400011 | 400051 | USA |VISA | Bank of America| 1 |
+--------+--------+-------+--------+----------------+----------+
| 400052 | 400062 | UK |MAESTRO | HSBC | 2 |
+--------+--------+-------+--------+----------------+----------+
Then you Card will also create a column TypeID

Convert any string to an integer

Simply put, I'd like to be able to convert any string to an integer, preferably being able to restrict the size of the integer and ensure that the result is always identical. In other words is there a hashing function, supported by Oracle, that returns a numeric value and can that value have a maximum?
To provide some context if needed, I have two tables that have the following, simplified, format:
Table 1 Table 2
id | sequence_number id | sequence_number
-------------------- -------------
1 | 1 1 | 2QD44561
1 | 2 1 | 6HH00244
2 | 1 2 | 5DH08133
3 | 1 3 | 7RD03098
4 | 2 4 | 8BF02466
The column sequence_number is number(3) in Table 1 and varchar2(11) in Table 2; it is part of the primary key in both tables.
The data is externally provided and cannot be changed; in Table 1 it is, I believe, created by a simple sequence but in Table 2 has a meaning. The data is made up but representative.
Someone has promised that we would output a number(3) field. While this is fine for the column in the first table, it causes problems for the second.
I would like to be able to both convert sequence_number to an integer (easy), that is less than 1000 (harder) and if at all possible is constant (seemingly impossible). This means that I would like '2QD44561' to always return 586. It does not matter, much, if two strings return the same number.
Simply converting to an integer I can use utl_raw.cast_to_number():
select utl_raw.cast_to_number((utl_raw.cast_to_raw('2QD44561'))) from dual;
UTL_RAW.CAST_TO_NUMBER((UTL_RAW.CAST_TO_RAW('2QD44561')))
---------------------------------------------------------
-2.033E+25
But as you can see this isn't less than 1000
I've also been playing around with dbms_crypto and utl_encode to see if I could come up with something but I've not managed to get a small integer. Is there a way?
How about ora_hash?
select ora_hash(sequence_number, 999) from table_2;
... will produce a maximum of 3 digits. You could also seed it with the id I suppose, but not sure that adds much with so few values, and I'm not sure you'd want that anyway.
You are talking about using a hash function. There are lots of solutions out there - sha1 is very common.
But just FYI, when you say "restrict the size of the integer" understand that you will then be mapping an infinite set of strings or numbers onto a limited set of values. So while your strings will always map to the same value when they are the same, they will not be the only string to map to that value

Is this SELECT and ORDER BY query the most efficient way I could have done it?

In my journey to learn SQL, I'm writing various queries on an old database of mine, but getting into more complex things, I want to make sure I'm not over engineering this. I have a table Agent, with different agents offering different prices for cities. Multiple agents can serve the same city, each with different prices. I wanted to run a query which would return the total cost of hiring all of the agents for any given city, ordered by the most expensive.
WITH orderedPrices AS (
SELECT SUM(agtFMPrice)
OVER (PARTITION BY agtCity)
AS IX FROM Agent)
SELECT IX
FROM orderedPrices
ORDER BY IX DESC
I found that doing it without the view returned by orderedPrices, it wouldn't order the prices (I assume because it's an aggregate function, or whatever they're called). Did I do this in the best way I could have, or could it be simplified?
Also, if you're feeling particularly bored, go ahead and give me a new assignment/query to do on this table. I could use the practice.
What you have written in English doesn't seem to quite match qhat you have written in SQL.
English:
- One record per City
- One field per record, showing the total cost of all associated agents
SQL:
- One record per Agent
- One field per record, showing the total cost of all agents in the same city
AgentID | agtCity | agtFMPrice
---------+---------+------------
1 | 1 | 10
2 | 1 | 20
3 | 2 | 30
4 | 2 | 10
5 | 2 | 25
Results of SQL version Results of English version
------------------------ ----------------------------
30 30
30 65
65
65
65
If you want the English version, I'd do this...
SELECT
agtCity,
SUM(agtFMPrice) AS IX
FROM
Agent
GROUP BY
agtCity
ORDER BY
SUM(agtFMPrice) DESC
To assist performance, the table could (should?) also have an Index on (agtCity)