How to calculate age from birthday when creating a table in SQL? - sql

This is my code
CREATE TABLE patients (
PatientID int PRIMARY KEY NOT NULL,
FirstName varchar(40) NOT NULL,
LastName varchar(40) NOT NULL,
PatientBirthday date,
PatientAge int AS (year(CURRENT_TIMESTAMP) - year(PatientBirthday))
);
But whenever I run it I get a syntax error highlighting on AS

I don't think you can. The expressions that are available in a calculated field are limited. If you remove the line with the calculated field, create the table, then shift it into design view and try to add the calculated field manually, you'll get the design interface and it will give you a list of the available functions. YEAR is listed, but NOW isn't,
and I think NOW is the Access equivalent of the ANSI current_timestamp. If you attempt YEAR(now()) you will see this:
YEAR([PatientBirthday]) is accepted by the editor, so that's not the issue. Sorry, I think this is not possible, at least not the way you want to do it. I think you will have to calculate during the SELECT or UPDATE the ages after insert, or you could hardcode the current year: 2021 - YEAR([PatientBirthday]). I know that's less than ideal. It would also be possible to store the current year in a separate table and update that with VBA at database startup, then use that field in the calculation.
Access can be very hacky at times. I really would suggest using a full featured ANSI compliant DBMS if you have the choice.

Related

How to produce a reproducible column of random integers in SQL

I have a table of patient, with a unique patientID column. This patientID cannot be shared with study teams, so I need a randomised set of unique patient identifiers to be able to share. The struggle is that there will be several study teams, so every time a randomised identifier is produced, it needs to be different to the identifier produced for other studies. To make it even more complicated, we need to be able to reproduce the same set of random identifiers for a study at any point (if the study needs to re-run the data for example).
I have looked into the RAND() and NEWID() functions but not managed to figure out a solution. I think this may be possible using RAND() with a seed, and a while loop, but I haven't used these before.
Can anyone provide a solution that allows me to share several randomised sets of unique identifiers, that never have the same identifier for the same patient, and which can be re-run to produce the same list?
Thanks in advance to anyone that helps with this!
Your NEWID() should work as long as you have correct datatype.
Using UNIQUEIDENTIFIER as datatype should be unique across entire database/server. See full details from link below:
sqlshack.com/understanding-the-guid-data-type-in-sql-server
DECLARE #UNI UNIQUEIDENTIFIER
SET #UNI = NEWID()
SELECT #UNI
Comments from link:
As mentioned earlier, GUID values are unique across tables, databases, and servers. GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.
One method is to use the patientid as a seed to rand():
select rand(checksum(patientid))
This returns a value between 0 and 1. You can multiply by a large number.
That said, I think you should keep a list of patients in each study -- so you don't have to reproduce the results. Reproducing results seems dangerous, especially for something like a "study" that could have an impact on health.
This is too much for a comment. It's not black and white from your description and comments what you are asking for, but it appears you want to associate a new random ID value for each existing patients' ID, presumably being able to tie it back to the source ID, and produce the same random ID at a later date repeatedly.
It sounds like you'll need an intermediary table to store the randomly produced IDs (otherwise, being random how do you guarantee to get the same value for the same PatientID?)
Could you therefore have a table something like
create table Synonyms (
Id int not null identity(1,1),
PatientId int not null,
RandomId uniqueidentifier not null default newid(),
Createdate datetime not null default getdate()
)
PatientId is the foreign key to the actual Id of the Patent.
Each time you need a new random PatientId, insert the PatientIDs into this table and then join to it when querying out the patient data, supplying the RandomId instead. That way, you can reproduce the same random Id each time it's needed.
You could have a view that always provides the most recent RandomId value for each PatientId, or by some mechanism to track which "version" a report gets.
If you need a new Id for the patient, insert its Id again and you are guaranteed to get the same Id via whatever logic you need - ie you could have a ReportNo column as a sequence partitioned by PatientId or any number of other ways.
If you prefer to avoid a GUID you could make it an int and use a function to generate it by checking it's not already used, possibly a computed column with an inline function that selects top 1 from a numbers table that doesn't already exist as a RandomId... or something like that!
I may have completely misunderstood, hopefully it might give you some ideas though.

Best way to store thousands of numbers in SQL?

I have about 250 products and they all come with 52 weeks of history + 52 weeks of forecasts. I need to store these numbers in SQL but can't figure out the best way of doing it. I've only used databases a couple of times before so my knowledge is pretty limited...
I thought about using plain text and read/write with separators. But it felt bad in so many ways and made the entire database kinda useless.
Then I thought of adding 52 columns to the table, but I read it was a bad idea.
So now I'm back to where I began and it's just a table with
[ID] [WEEK_NUMBERS] [HISTORY_NUMBERS] [FORECAST_NUMBERS]
Is this the best way of doing it?
The ~25000-30000 rows are not a problem?
You would use a table something like this:
create table ProductHistory (
ProductHistoryId int identity primary key,
ProductId int not null references Products(ProductId),
Type varchar(255) not null,
Week date not null, -- storing this as a date is a guess
Number decimal(38, 10), -- should probably be decimal, but the scale and precision might be overkill,
constraint chk_ProductHistory_type (type in ('Forecast', 'Actual')
);
This is an example. It is unclear:
Should each row have a column for Forecast and Actual, or should they be on separate rows?
How should the week be stored?
What is the right type for Number?
But the idea is the same . . . at least one row per product/week combination.

Combining amendsments to database table fields

I am interested to understand the correct approach to solving a database problem I find myself faced with.
I have a table (with ~45 columns) that holds information about stock, it's pricing and a lot of information relating to it's packaging, discounting, etc.
The stock is accessed via a web application using ADO (VB6) by constructing TSQL queries in numerous places.
There is now a need to hold a new table with a very cut down list of the above columns to allow some users to override parts of the stock information from the source (mainly descriptions and such).
The problem I'm faced with is coming up with a way to (perhaps) construct a view of the two tables such that the software still thinks it is talking to the first table (changing the software is simply a no-go) when in fact it is the first table amended by the second, perhaps via some sort of UNION.
To present a simple example, suppose you have a stock table as:
CREATE TABLE Stock
(
id int NOT NULL,
ref varchar(20) NOT NULL,
short_description varchar(50) NOT NULL,
long_description varchar(255) NOT NULL,
...many other columns
)
and an amendments table as:
CREATE TABLE StockAmendments
(
id int NOT NULL,
ref varchar(20) NOT NULL,
short_description varchar(50) NOT NULL
)
The idea would be to rename Stock as StockSource and to build a view called Stock which amends StockSource with StockAmendments (in this case a potentially different short_description). This way the software does not need to know about the change.
Is this possible?
This should be doable. I haven't done t-sql in a very long time but something like this:
CREATE VIEW Stock
AS
SELECT
ss.id,
ss.ref,
iif(isnull(sa.short_description), ss.short_description, sa.short_description)) as short_description,
ss.long_description
FROM StockSource ss, StockAmendments sa
WHERE ss.id = sa.id AND ss.ref = sa.ref
One thing to worry about is query performance depending on indexes, etc. If this is a problem, you might be better off creating a real 'Stock' table based off of StockSource and putting a trigger on StockAmendments to update the 'Stock' table.
Yes it is possible using updatable views.
Rename Stock as StockSource and to build an updatable view Stock over StockSource and StockAmendments. This way you'll not have to do any change in VB6 code.

Multiple Wildcard Counts in Same Query

One of my job functions is being responsible for mining and marketing on a large newsletter subscription database. Each one of my newsletters has four columns (newsletter_status, newsletter_datejoined, newsletter_dateunsub, and newsletter_unsubmid).
In addition to these columns, I also have a master unsub column that our customer service dept. can update to accomodate irate subscribers who wish to be removed from all our mailings, and another column that gets updated if a hard bounce (or a set number of soft bounces) occurs called emailaddress_status.
When I pull a count for current valid subscribers for one list I use the following syntax:
select count (*) from subscriber_db
WHERE (emailaddress_status = 'VALID' OR emailaddress_status IS NULL)
AND newsletter_status = 'Y'
and unsub = 'N' and newsletter_datejoined >= '2013-01-01';
What I'd like to have is one query that looks for all columns with %_status, with the aforementioned criteria ordered by current count size.
I'd like for it to look like this:
etc.
I've search around the web for months looking for something similar, but other than running them in a terminal and exporting the results I've not been able to successfully get them all in one query.
I'm running PostgreSQL 9.2.3.
A proper test case would be each aggregate total matching the counts I get when running the individual queries.
Here's my obsfucated table definition for ordinal placement, column_type, char_limit, and is_nullable.
Your schema is absolutely horrifying:
24 ***_status text YES
25 ***_status text YES
26 ***_status text YES
27 ***_status text YES
28 ***_status text YES
29 ***_status text YES
where I presume the masked *** is something like the name of a publication/newsletter/etc.
You need to read about data normalization or you're going to have a problem that keeps on growing until you hit PostgreSQL's row-size limit.
Since each item of interest is in a different column the only way to solve this with your existing schema is to write dynamic SQL using PL/PgSQL's EXECUTE format(...) USING .... You might consider this as an interim option only, but it's a bit like using a pile driver to jam the square peg into the round hole because a hammer wasn't big enough.
There are no column name wildcards in SQL, like *_status or %_status. Columns are a fixed component of the row, with different types and meanings. Whenever you find yourself wishing for something like this it's a sign that your design needs to be re-thought.
I'm not going to write an example since (a) this is an email marketing company and (b) the "obfuscated" schema is completely unusable for any kind of testing without lots of manual work re-writing it. (In future, please provide CREATE TABLE and INSERT statements for your dummy data, or better yet, a http://sqlfiddle.com/). You'll find lots of examples of dynamic SQL in PL/PgSQL - and warnings about how to avoid the resulting SQL injection risks by proper use of format - with a quick search of Stack Overflow. I've written a bunch in the past.
Please, for your sanity and the sanity of whoever else needs to work on this system, normalize your schema.
You can create a view over the normalized tables to present the old structure, giving you time to adapt your applications. With a bit more work you can even define a DO INSTEAD view trigger (newer Pg versions) or RULE (older Pg versions) to make the view updateable and insertable, so your app can't even tell that anything has changed - though this comes at a performance cost so it's better to adapt the app if possible.
Start with something like this:
CREATE TABLE subscriber (
id serial primary key,
email_address text not null,
-- please read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
-- for why I merged "fname" and "lname" into one field:
realname text,
-- Store birth month/year as a "date" with a "CHECK" constraint forcing it to be the 1st day
-- of the month. Much easier to work with.
birthmonth date,
CONSTRAINT birthmonth_must_be_day_1 CHECK ( extract(day from birthmonth) = 1),
postcode text,
-- Congratulations! You made "gender" a "text" field to start with, you avoided
-- one of the most common mistakes in schema design, the boolean/binary gender
-- field!
gender text,
-- What's MSO? Should have a COMMENT ON...
mso text,
source text,
-- Maintain these with a trigger. If you want modified to update when any child record
-- changes you can do that with triggers on subscription and reducedfreq_subscription.
created_on timestamp not null default current_timestamp,
last_modified timestamp not null,
-- Use the native PostgreSQL UUID type, after running CREATE EXTENSION "uuid-ossp";
uuid uuid not null,
uuid2 uuid not null,
brand text,
-- etc etc
);
CREATE TABLE reducedfreq_subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
-- Suspect this was just a boolean stored as text in your schema, in which case
-- delete it.
reducedfreqsub text,
reducedfreqpref text,
-- plural, might be a comma list? Should be in sub-table ("join table")
-- if so, but without sample data can only guess.
reducedfreqtopics text,
-- date can be NOT NULL since the row won't exist unless they joined
reducedfreq_datejoined date not null,
reducedfreq_dateunsub date
);
CREATE TABLE subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
sub_name text not null,
status text not null,
datejoined date not null,
dateunsub date
);
CREATE TABLE subscriber_activity (
last_click timestamptz,
last_open timestamptz,
last_hardbounce timestamptz,
last_softbounce timestamptz,
last_successful_mailing timestamptz
);
To call it merely "horrifying" shows a great deal of tact and kindness on your part. Thank You. :) I inherited this schema only recently (which was originally created by the folks at StrongMail).
I have a full relational DB re-arch project on my roadmap this year - the sample normalization is very much inline with what I'd been working on. Very interesting insight on realname, I hadn't really thought about that. I suppose the only reason StrongMail had it broken out was for first name email personalization.
MSO is multiple systems operator (cable company). We're a large lifestyle media company, and the newsletters we produce are on food, travel, homes and gardening.
I'm creating a Fiddle for this - I'm new here so going forward I'll be more mindful of what you guys need to be able to help. Thank you!

Dynamically generate criteria in SQL

I have a Users table that contains dozens of columns like date of birth, year of vehicle owned, make and model of the vehicle, color and many other personal fields unrelated to the vehicle
There's also a 2nd table called Coupons that needs to be designed in a way to support a qualification like "user qualifies if younger than 30 yrs old", "user qualifies if vehicle is greater than 10 yrs old", "user qualifies if vehicle color is green".
When a user logs in, I need to present all coupons the user qualifies for. The problem that I'm having is that the coupon qualifications could be numerous, could have qualifiers like equal, greater than or less than and may have different combinations.
My only solution at this point is to store the actual sql string within one of the coupons table columns like
select * from Users where UserId = SOME_PLACEHOLDER and VehicleYear < 10
Then I could execute the sql for each coupon row and return true or false. Seems very inefficient as I would potentially have to execute 1000s of sql statements for each coupon code.
Any insight, help is appreciated. I do have server-side code where I could potentially be able to do looping.
Thank you.
Very difficult problem. Seems like users will be added at high volume speed, with coupons at a fairly regular frequency.
Adding SQL to a table to be used dynamically is workable - at least you'll get a fresh execution plan - BUT your plan cache may balloon up.
I have a feeling that running a single coupon for all users is probably likely to be your highest performing query because it's one single set of criteria which will be fairly selective on users first and total number of coupons is small, whereas running all coupons for a single user is separate criteria for each coupon for that user. Running all coupons for all users may still perform well, even though it's effectively a cross join first - I guess it is just going to depend.
Anyway, the case for all coupons for all users (or sliced either way, really) will be something like this:
SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
CASE WHEN <coupon.criteria> THEN <coupon.id> -- code generated from the coupon rules table
CASE WHEN <coupon.criteria> THEN <coupon.id> -- etc.
ELSE NULL
) = coupon.id
To generate the coupon rules, you can relatively easily do the string concatenation in a single swipe (and you can combine an individual rule lines design for a coupon with AND with a further inner template):
DECLARE #outer_template AS varchar(max) = 'SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
{template}
ELSE NULL
) = coupon.id
';
DECLARE #template AS varchar(max) = 'CASE WHEN {coupon.rule} THEN {coupon.id}{crlf}';
DECLARE #coupon AS TABLE (id INT, [rule] varchar(max));
INSERT INTO #coupon VALUES
(1, 'user.Age BETWEEN 20 AND 29')
,(2, 'user.Color = ''Yellow''');
DECLARE #sql AS varchar(MAX) = REPLACE(
#outer_template
,'{template}',
REPLACE((
SELECT REPLACE(REPLACE(
#template
,'{coupon.rule}', coupon.[rule])
, '{coupon.id}', coupon.id)
FROM #coupon AS coupon
FOR XML PATH('')
), '{crlf}', CHAR(13) + CHAR(10)));
PRINT #sql;
// EXEC (#sql);
There's ways to pretty that up - play with it here: https://data.stackexchange.com/stackoverflow/q/115098/
I would consider adding computed columns (possibly persisted and indexed) to assist. For instance, age - non-persisted computed column will likely perform better than a scalar function.
I would consider batching this with a table which says whether a coupon is valid for a user and when it was last validated.
Seems like ages can change and a user can become valid or invalid for a coupon as their birthday passes.
When a user logs in you could spawn a background job to update their coupons. On subsequent logons, there won't be any need to update (since it's not likely to change until the next day or a triggering event).
Just a few ideas.
I would also add that you should have a way to test a coupon before it is approved to ensure there are no syntax errors (since the SQL is ad hoc or arbitrary) - this can be done relatively easily - perhaps a test user table (test_user as user in the generated code template instead) is required to contain pass and fail rows and the coupon rule points to those. Not only does the EXEC have to work - the rows it returns should be the expected and only the expected rows for that coupon.
This is not an easy problem. Here are some quick ideas that may help depending on your domain requirements:
Restrict the type of criteria you will be filtering on so that you can use dynamic or non-dynamic sql to execute them efficiently. For example if you are going to only have integers between a range of min and max values as a criteria then the problem becomes simpler. (You only need to know the field name, and the min max values to describe a criterian, not the full where statement.)
Create a number of views which expose the attributes in a helpful way. Then perform queries against those views -- or have those views pre-select in some way. For example, an age group view that has a field which can contain the values < 21, 21-30, 30-45, >45. Then your select just needs to return the rows from this view that match these strings.
Create a table which stores the results of running your criteria matching query (This can be run off line by a back ground process). Then for a given user check for membership by looking where in the table this user's ID exists.
Thinking about this some more I realize all my suggestions are based on one idea.
A query for an individual user will work faster overall if you first perform an SQL query against all users and cache that result in some way. If every user is reproducing queries against the whole dataset you will lose efficiency. You need some way to cache results and reuse them.
Hope this helps -- comment if these ideas are not clear.
My first thought on an approach (similar to Hogan's) would be to test for coupon applicability at the time the coupon is created. Store those results in a table (User_Coupons for example). If any user data is changed, your system would then retest any changed users for which coupons are applicable to them. At coupon creation (or change) time it would only check versus that coupon. At use creation (or change) time it would only check versus that user.
The coupon criteria should be from a known set of possible criteria and any time that you want to add a new type of criteria, it would possibly involve a code change. For example, let's say that you have a table set up similar to this:
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
age_minimum SMALLINT NULL,
age_maximum SMALLINT NULL,
vehicle_color VARCHAR(20) NULL,
...
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id)
)
If you wanted to add the ability to base a coupon on vehicle age then you would have to add a column to the table and likewise you would have to adjust your search code. You would use NULL values to indicate that the criteria is unused for that coupon.
An example query for the above table:
SELECT
CC.coupon_id
FROM
Users U
INNER JOIN Coupon_Criteria CC ON
(CC.age_maximum IS NULL OR dbo.f_GetAge(U.birthday) <= age_maximum) AND
(CC.age_minimum IS NULL OR dbo.f_GetAge(U.birthday) >= age_minimum) AND
(CC.vehicle_color IS NULL OR U.vehicle_color = CC.vehicle_color) AND
...
This can get unwieldy if the number of possible criteria gets to be very large.
Another possibility would be to save the coupon criteria in XML and have a business object for your application use that to determine eligibility. It could use the XML to generate a proper query against the User table (and any other necessary tables).
Here's another possibility. Each criteria could be given a query template which you could append to your queries. This would just involve updates to the data instead of DDL and could have good performance. It would involve dynamic SQL.
CREATE TABLE Coupons (
coupon_id INT NOT NULL,
description VARCHAR(2000) NOT NULL,
...
CONSTRAINT PK_Coupons PRIMARY KEY CLUSTERED (coupon_id)
)
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
criteria_num SMALLINT NOT NULL,
description VARCHAR(50) NOT NULL,
code_template VARCHAR(500) NOT NULL,
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id, criteria_num),
CONSTRAINT FK_Coupon_Criteria_Coupon FOREIGN KEY (coupon_id) REFERENCES Coupons (coupon_id)
)
INSERT INTO Coupons (coupon_id, description)
VALUES (1, 'Young people save $200 on yellow vehicles!')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 1, 'Young people', 'dbo.Get_Age(U.birthday) <= 20')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 2, 'Yellow Vehicles', U.vehicle_color = ''Yellow''')
You could then build a query by simply concatenating all of the criteria for any given coupon. The big downside to this one is that it's only one-directional. Given a coupon you can easily find who is qualified for it, but given a user you cannot find all coupons for which they are eligible except by going through all of the coupons. My guess is that the second is what you'd probably be most interested in unfortunately. Maybe this will give you some other ideas though.
For example, you could potentially have it work the other way by having a set number of criteria in a table and for the coupon/criteria linking table indicate whether or not that criteria is active. When querying you could then include that in your query. In other words, the query would look something like:
WHERE
(CC.is_active = 0 OR <code from the code column>) AND
The querying gets very complex though since you either need to join once for every possible criteria or you need to query to compare the number of active requirements for a coupon versus the number that are fulfilled. That is possible in SQL, but it's similar to working with an EAV model - which is basically what this turns into: a variation on an EAV model (yuck)