How to handle a range within a data field - sql

I have a set of data with ranges of numbers that are saved to the field itself. So for example, in the age column there are entries like "60-64", "65+" and in the income field "30\,000-40\,000". Is there a way to query these fields and treat them as number ranges? So a query for 52500 would match the "50\,000-60\,000" income range?
Preprocessing the input is my current top idea, where I just map the user input to all possible values for these fields before I query the database. But I was wondering if there is a better way.
Assume that I cannot modify the database or create a new database at all.

There is no easy way with SQLite that I know off, and you certainly would be better off to restructure all your range columns into two columns each, range_start and range_end.
If your ranges are fixed ranges, you can get the minimum / maximum from a separate table:
create table age_ranges (
name varchar(16) unique not null
, range_start integer unique not null
, range_end integer unique not null
);
insert into age_ranges (name, range_start,range_end) values ('60-64',60,64);
insert into age_ranges (name, range_start,range_end) values ('65+',65,999);
create table participant (
name varchar(16) unique not null
, age integer not null
, income integer not null
);
insert into participant (name, age, income) values ('Joe Blow', 65, 900);
insert into participant (name, age, income) values ('Jane Doe' , 61 , 1900)
;
create table question (
question varchar(64) not null
, relevant_age varchar(32) not null
);
insert into question (question,relevant_age) values('What is your favourite non-beige color?', '65+');
insert into question (question,relevant_age) values('What is your favourite car?', '60-64');
;
select
p.name,
q.question,
q.relevant_age
from participant p
join age_ranges r on (r.range_start <= p.age and p.age <= r.range_end)
join question q on q.relevant_age = r.name
SQL Fiddle
Alternatively, you can also try to parse the range start and range end out by using string functions such as LEFT() etc., but the performance will likely bad.

Related

Single columns from several rows into several columns in one record, but allow NULL in the rows

I'm trying to combine single rows from multiple records into several columns in one record.
Say we have a database of people and they've all chosen 2 numbers. But, some people have only chosen 1 number and haven't submitted their 2nd number.
This is a simplified example, in my actual production database, it's scaled up to several of these 'numbers'.
From this example, person 3 hasn't chosen their 2nd number yet.
I tried this query:
SELECT ppl.*,
cn1.chosen_num AS first_num,
cn2.chosen_num AS second_num
FROM people AS ppl
LEFT JOIN
chosenNumbers AS cn1
LEFT JOIN
chosenNumbers AS cn2
WHERE ppl.numid = cn1.personid
AND ppl.numid = cn2.personid
AND cn1.type = 'first'
AND cn2.type = 'second'
But it doesn't return any information on Person3, since they haven't chosen their second number yet. However, I want data on EVERYONE involved in this number guessing, but I want to be able to see the first and second numbers they've guessed.
Here is a dump of the sample database where I'm testing this.
BEGIN TRANSACTION;
CREATE TABLE people (numid INTEGER PRIMARY KEY, name TEXT);
INSERT INTO people VALUES(1,'Person1');
INSERT INTO people VALUES(2,'Person2');
INSERT INTO people VALUES(3,'Person3');
CREATE TABLE chosenNumbers (numid INTEGER PRIMARY KEY, chosen_num INTEGER, type TEXT, personid INTEGER, FOREIGN KEY(personid) REFERENCES people(numid));
INSERT INTO chosenNumbers VALUES(1,101,'first',1);
INSERT INTO chosenNumbers VALUES(2,102,'second',1);
INSERT INTO chosenNumbers VALUES(3,201,'first',2);
INSERT INTO chosenNumbers VALUES(4,202,'second',2);
-- Person 3 hasn't chosen their 2nd number yet..
-- But I want data on them, and the query above
-- doesn't work.
INSERT INTO chosenNumbers VALUES(5,301,'first',3);
COMMIT;
I'd also appreciate being told how I could scale this up to say, 3 numbers, or 4 numbers, or even more than that.
You can use conditional aggregation:
SELECT p.*,
MAX(CASE WHEN c.type = 'first' THEN c.chosen_num END) AS first_num,
MAX(CASE WHEN c.type = 'second' THEN c.chosen_num END) AS second_num
FROM people AS p LEFT JOIN chosenNumbers AS c
ON p.numid = c.personid
GROUP BY p.numid;
You can expand the code for more columns.
See the demo.

Ambiguity with column reference [duplicate]

This question already has answers here:
SQL column reference "id" is ambiguous
(5 answers)
Closed 4 months ago.
I try to run a simple code as follows:
Create Table weather (
city varchar(80),
temp_lo int,
temp_hi int,
prcp real,
date date
);
Insert Into weather Values ('A', -5, 40, 25, '2018-01-10');
Insert Into weather Values ('B', 5, 45, 15, '2018-02-10');
Create Table cities (
city varchar(80),
location point
);
Insert Into cities Values ('A', '(12,10)');
Insert Into cities Values ('B', '(6,4)');
Insert Into cities Values ('C', '(18,13)');
Select * From cities, weather Where city = 'A'
But what I get is
ERROR: column reference "city" is ambiguous.
What is wrong with my code?
If I were you I'd model things slightly differently.
To normalise things a little, we'll start with the cities table and make a few changes:
create table cities (
city_id integer primary key,
city_name varchar(100),
location point
);
Note that I've used an integer to denote the ID and Primary Key of the table, and stored the name of the city separately. This gives you a nice easy to maintain lookup table. By using an integer as the primary key, we'll also use less space in the weather table when we're storing data.
Create Table weather (
city_id integer,
temp_lo int,
temp_hi int,
prcp real,
record_date date
);
Note that I'm storing the id of the city rather than the name. Also, I've renamed date as it's not a good idea to name columns after SQL reserved words.
Ensure that we use IDs in the test data:
Insert Into weather Values (1, -5, 40, 25, '2018-01-10');
Insert Into weather Values (2, 5, 45, 15, '2018-02-10');
Insert Into cities Values (1,'A', '(12,10)');
Insert Into cities Values (2,'B', '(6,4)');
Insert Into cities Values (3,'C', '(18,13)');
Your old query:
Select * From cities, weather Where city = 'A'
The name was ambiguous because both tables have a city column, and the database engine doesn't know which city you mean (it doesn't automatically know if it needs to use cities.city or weather.city). The query also performs a cartesian product, as you have not joined the tables together.
Using the changes I have made above, you'd require something like:
Select *
From cities, weather
Where cities.city_id = weather.city_id
and city_name = 'A';
or, using newer join syntax:
Select *
From cities
join weather on cities.city_id = weather.city_id
Where city_name = 'A';
The two queries are functionally equivalent - these days most people prefer the 2nd query, as it can prevent mistakes (eg: forgetting to actually join in the where clause).
Both tables cities and weather have a column called city. On your WHERE clause you filter city = 'A', which table's city is it refering to?
You can tell the engine which one you want to filter by preceding the column with it's table name:
Select * From cities, weather Where cities.city = 'A'
You can also refer to tables with alias:
Select *
From cities AS C, weather AS W
Where C.city = 'A'
But most important, make sure that you join tables together, unless you want all records from both tables to be matched without criterion (cartesian product). You can join them with explicit INNER JOIN:
Select
*
From
cities AS C
INNER JOIN weather AS W ON C.city = W.city
Where
C.city = 'A'
In the example you mention, this query is used:
SELECT *
FROM weather, cities
WHERE city = name;
But in here, cities table has name column (instead of city which is the one you used). So this WHERE clause is linking weather and cities table together, since city is weather column and name is cities column and there is no ambiguity because both columns are named different.

Inserting multiple records in database table using PK from another table

I have DB2 table "organization" which holds organizations data including the following columns
organization_id (PK), name, description
Some organizations are deleted so lot of "organization_id" (i.e. rows) doesn't exist anymore so it is not continuous like 1,2,3,4,5... but more like 1, 2, 5, 7, 11,12,21....
Then there is another table "title" with some other data, and there is organization_id from organization table in it as FK.
Now there is some data which I have to insert for all organizations, some title it is going to be shown for all of them in web app.
In total there is approximately 3000 records to be added.
If I would do it one by one it would look like this:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
VALUES
(
'This is new title',
XXXX,
CURRENT TIMESTAMP,
1,
1,
1
);
where XXXX represent "organization_id" which I should get from table "organization" so that insert do it only for existing organization_id.
So only "organization_id" is changing matching to "organization_id" from table "organization".
What would be best way to do it?
I checked several similar qustions but none of them seems to be equal to this?
SQL Server 2008 Insert with WHILE LOOP
While loop answer interates over continuous IDs, other answer also assumes that ID is autoincremented.
Same here:
How to use a SQL for loop to insert rows into database?
Not sure about this one (as question itself is not quite clear)
Inserting a multiple records in a table with while loop
Any advice on this? How should I do it?
If you seriously want a row for every organization record in Title with the exact same data something like this should work:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
;
you shouldn't need the column aliases in the select but I am including for readability and good measure.
https://www.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafymultrow.htm
and for good measure in case you process errors out or whatever... you can also do something like this to only insert a record in title if that organization_id and title does not exist.
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
LEFT JOIN Title t
ON o.organization_id = t.organization_id
AND t.name = 'This is new title'
WHERE
t.organization_id IS NULL
;

Add rows to a table then loop back and add more rows for a different userid

I have a table with 4 columns - ID, ClubID, FitnessTestNameID and DisplayName
I have another table called Club and it has ID and Name
I want to add two rows of data to the 1st table for each club
I can write a statement like this, but can someone tell me how to create a loop so that I can insert the two rows, set the #clubid + 1 and then loop back again?
declare #clubid int
set #clubid = 1
insert FitnessTestsByClub (ClubID,FitnessTestNameID,DisplayName)
values (#clubid,'1','Height (cm)')
insert FitnessTestsByClub (ClubID,FitnessTestNameID,DisplayName)
values (#clubid,'2','Weight (kg)')
You can probably do this with one statement only. No need for loops:
INSERT INTO FitnessTestsByClub
(ClubID, FitnessTestNameID, DisplayName)
SELECT
c.ID, v.FitnessTestNameID, v.DisplayName
FROM
Club AS c
CROSS JOIN
( VALUES
(1, 'Height (cm)'),
(2, 'Weight (kg)')
) AS v (FitnessTestNameID, DisplayName)
WHERE
NOT EXISTS -- a condition so no duplicates
( SELECT * -- are inserted
FROM FitnessTestsByClub AS f -- and the statement can be run again
WHERE f.ClubID = c.ID -- in the future, when more clubs
) -- have been added.
;
The Table Value Constructor syntax above (the (VALUES ...) construction) is valid from version 2008 and later.
There is a nice article with lots of useful examples of how to use them, by Robert Sheldon: Table Value Constructors in SQL Server 2008

Is there a standard SQL Table design for overriding 'big picture' default values with lower level details?

Here's an example. Suppose we are trying to calculate a service charge.
Say sales in the USA attract a 10 dollar charge, sales in the UK attract a 20 dollar charge
So far it's easy - we are starting to imagine a table that lists charges by country.
Now lets assume that Alaska and Hawaii are treated as special cases they are both 15 dollars
That suggests a table with states, Alaska and Hawaii are charged at 15, but presumably we need 48 (redundant) rows all saying 10. This gives us a maintainance problem, our user only wants to type 10 once NOT 48 times. It does not sit well with the UK either. The UK does not have states.
Suppose we throw in another couple of cross cutting rules.
If you order by phone there is a 10% supplement on the charge.
If you order via the web there is a 10% discount.
But for some reason best known to the owners of the business the web/phone supplement/discount are not applied in Hawaii.
It seems to me that this is quite a common kind of problem and there is probably a well known arrangement of tables to store the data. Most cases get handled by broad brush answers, but there are some very detailed low level variations. They give rise to a huge number of theoretical combinations, most of which are not used.
You can group states/countries into categories and assign charges to the categories not to the states/countries.
Alaska and Hawaii are charged at 15,
but presumably we need 48 (redundant)
rows all saying 10.
No, you need three rows: two for Alaska and Hawaii, and one for the Continental United States.
All of the other rules appear to be additive. There's one record in some table for each rule. If the rule is not triggered/matched, the charge is not added.
One answer:
create table charge( entity char(2) , amount int ) ;
insert into charge( entity, amount) ( 'DF', 10 ) ; -- defualt;
insert into charge( entity, amount) ( 'AK', 15 ) ; -- alaska;
insert into charge( entity, amount) ( 'HI', 15) ; -- hiwaii;
Then:
select coalesce( amount,( select amount from charge where entity = 'DF') )
from charge where entity = 'DC';
get you the default amount.
alternately:
select amount
from charge
where entity
= ( select coalesce( entity, 'DF') from charge where entity = 'DC');
In other words, use a null result and coalesce to either replace non-existent results with a default result, or to replace a non-listed entity with a default entity.
Do you want a general technique/idiom, or a detailed design for a specific case? This is a general idiom.
If it's a specific case, look at what Robert Harvey said: "All of the other rules appear to be additive.". If so, your design becomes very simple, a table of charges, or (better) a table of charges, a table of jurisdictions, and a many-to-many relation. Again, this only works in an additive case;
create table surcharges ( id int not null primary key,
description varchar(50) not null, amount int ) ;
create table jurisdiction ( id int not null primary key,
name varchar(50) not null, abbr char(5) );
create table jurisdiction_surcharge ( id int not null primary key,
jurisdiction_id int not null references jurisdiction(id),
surcharge_id int not null references surcharge(id) );
insert into charges (description, amount) values ( 'Outside Continental US', 15 );
insert into jurisdiction (name, abbr) values ( 'Mainland US', 'CONUS');
insert into jurisdiction (name, abbr) values ( 'Alaska', 'AK');
insert into jurisdiction (name, abbr) values ( 'Hawaii', 'HI');
insert into jurisdiction_surcharge
values ( jurisdiction_id, surcharge_id) values ( 2, 1 );
insert into jurisdiction_surcharge
values ( jurisdiction_id, surcharge_id) values ( 3, 1 );
List charges:
select a.*
from charges a
join jurisdiction_charge b on (a.id = b.surcharge_id)
join jurisdiction c on (c.id = b.jurisdiction_id)
where c.abbr='AK';
Sum charges:
select sum(a.amount)
from charges a
join jurisdiction_charge b on (a.id = b.surcharge_id)
join jurisdiction c on (c.id = b.jurisdiction_id)
where c.abbr='AK';
I'd be inclined to allow for multiple means to identify the area where a given service charge applied like so:
Create Table ServiceCharges
(
Country char(3) not null -- ISO 3166-2 alpha-3 code
, StateOrProvince nvarchar(25) null -- ideally a code but given multiple
-- countries, that may not be feasible
, City nvarchar(128) null -- again name
, PostalCode varchar(10) null
, Rate decimal(16,4) not null
, Constraint CK_ServiceCharges_RateGTEZero Check ( Rate >= 0 )
, Constraint CK_ServiceCharges_MinEntry Check ( Case
When Country Is Null
And StateOrProvince Is Null
And City Is Null
And PostalCode Is Null Then 0
Else 1
End = 1 )
)
This could also be split apart so that the locations are maintained in a separate table with a surrogate LocationId column applied in the ServiceCharges table. This design does not account for overlap which brings the question: What should happen in the case of overlap? What happens if the USA has one charge but TX has another? Does the TX rate trump? If so, that means that the most specific location wins and we can determine specificity by the existence of a Postal Code or next a City or next a StateOrProvince or next only a country.