Generating a primary key unique across multiple databases - sql

Operational databases of identical structure work in several countries.
country A has table Users with column user_id
country B has table Users with column user_id
country C has table Users with column user_id
When data from all three databases is brought to the staging area for the further data warehousing purposes all three operational tables are integrated into a single table Users with dwh_user_id.
The logic looks like following:
if record comes from A then dwh_user_id = 1000000 + user_id
if record comes from B then dwh_user_id = 4000000 + user_id
if record comes from c then dwh_user_id = 8000000 + user_id
I have a strong feeling that it is a very bad approach. What would be a better approach?
(user_id + country_iso_code maybe?)

In general, it's a terrible idea to inject logic into your primary key in this way. It really sets you up for failure - what if country A gets more than 4000000 user records?
There are a variety of solutions.
Ideally, you include the column "country" in all tables, and use that together with the ID as the primary key. This keeps the logic identical between master and country records.
If you're working with a legacy system, and cannot modify the country tables, but can modify the master table, add the key there, populate it during load, and use the combination of country and ID as the primary key.

The way we handle this scenario in Ajilius is to add metadata columns to the load. Values like SERVER_NAME or DATABASE_NAME might provide enough unique information to make a compound key unique.
An alternative scenario is to generate a GUID for each row at extract or load time, which would then uniquely identify each row.
The data vault guys like to use a hash across the row, but in this case it would only work if no row was ever a complete duplicate.

This is why they made the Uniqueidentifier data type. See here.
If you can't change to that, I would put each one in a different table and then union them in a view. Something like:
create view vWorld
as
select 1 as CountryId, user_id
from SpainUsers
UNION ALL
select 2 as CountryId, user_id
from USUsers

Most efficient way to do this would be :-
If record from Country A, then user * 0 = Hence dwh_user_id = 0.
If record from Country B, then (user * 0)- 1 = Hence dwh_user_id = -1.
If record from Country C, then (user * 0)+ 1 = Hence dwh_user_id = 1.
Suggesting this logic assuming the dwh_user_id is supposed to be a number field.

Related

How can I create a check to make sure only one entry in the column can have a specific value based on an id from a different column in SQL?

I am trying to create a new table in SQL Developer that has a four columns. In one column there is a numerical value called ORG_ID, this ORG_ID can be the same across multiple entries in the table. Another column is called DEFAULT_FLAG, this column only contains a Y or N character denoting if it is the default entry for the table for that ORG_ID.
I am trying to create a CHECK in the DEFAULT_FLAG column that makes sure there is only one entry with a Y for all entries with the same ORG_ID. Here is an example of what it would look like:
xxxx|xxxx|ORG_ID|DEFAULT_FLAG
xxxx|xxxx|123456| Y
xxxx|xxxx|123456| N
xxxx|xxxx|987654| Y
xxxx|xxxx|567495| Y
In the above table, the second entry for ORG_ID 123456 would need to be rejected if Y was inserted as the DEFAULT_FLAG.
I'm a little new to SQL, so I've done my research of needing to use a constraint and check on the column. I tried writing my own but it did not work, the code is below.
default_flag varchar(1)
constraint one_default Check(ORG_ID AND DEFAULT_FLAG != "Y"),
This is too long for a comment.
You are trying to use a check constraint for something it is not designed for. You have an org_id. You should have an organizations table that uses this id as its primary key.
Then, then flag you want to store should be in the organizations table. Voila! The flag is only stored once. You don't need to worry about keeping it in synch between different rows.
Create a unique index for all ORG_ID entries with a 'Y', so each ORG_ID can only have one row with a 'Y':
create unique index idx on mytable(case when default_flag = 'Y' then org_id end)
I think a technically-better solution than the one from Thorsten Kettner, but using the same idea, is
CREATE UNIQUE INDEX ON mytable(org_id)
WHERE default_flag = 'Y';
But let me also suggest a table organization_defaults with two columns, one the ID for an organization and the other the ID for mytable is a better approach, as suggested in comments to the OP.

MS-Access 2007: Query for names that have two or more different values in another field

Hello & thank you in advance.
I have an access db that has the following information about mammals we captured. Each capture has a unique ID, which is the capture table's primary key: "capture_id". The mammals (depending on species) have ear tags that we use to track them from year to year and day to day. These are in a field called "id_code". I have the sex of the mammal as it was recorded at capture time in another field called sex.
I want a query that will return all instances of an id_code IF the sex changes even once for that id.
Example: Animal E555 was caught 4 times, 3 times someone recorded this animal as a F and once as a M.
I've managed to get it to display this info by stacking about 5 queries on top of each other (Query for recaptured animals -> Query for all records of animals from 1st query -> Query for unique combo of id & sex (via just using those two columns & requiring "Unique Values") -> Query that pulls only duplicate id values from that last one and pulls back up all capture records of those ids). HOwever, this is clearly not the right way to do this, it is then not updateable (which I need since this is for data quality control) and for some reason it also returns duplicates of each of those records...
I realize that this could be solved two other ways:
Using R to pull up these records (I want none of this data to have to leave the database though, because we're working on getting it into one place after 35 years of collecting! And my boss can't use R and I'm seasonal, so I want him to just have to open a query)
Creating a table that tracks all animal id's as an animal index. However, this would make entering the data more difficult and also require someone to go back through 20,000 records and create a brand new animal id for every one because you can't give ear tags to voles & things so they don't get a unique identifier in the field.
Help!
It is quite simple to do with a single query. As a bonus, the query will be updatable, not duplicated, and simple to use:
SELECT mammals.ID, mammals.Sex, mammals.id_code, mammals.date_recorded
FROM mammals
WHERE mammals.id_code In
(select id_code from
(select distinct id_code, sex from [mammals]) a
group by id_code
having count(*)>1
);
The reason why you see a sub-query inside a sub-query is because Access does not support COUNT(DISTINCT). With any other "normal" database you would write:
SELECT mammals.ID, mammals.Sex, mammals.id_code, mammals.date_recorded
FROM mammals
WHERE mammals.id_code In
(select id_code
from [mammals]
group by id_code
having count(DISTINCT Sex)>1
);

Can this table structure work or should it change

I have an existing table which is expected to work for a new piece of functionality. I have the opinion that a new table is needed to achieve the objective and would like an opinion if it can work as is, or is the new table a must? The issue is a query returning more records than it should, I believe this is why:
There is a table called postcodes. Over time this has really become a town table because different town names have been entered so it has multiple records for most postcodes. In reference to the query below the relevant fields in the postcode table are:
postcode.postcode - the actual postcode, as mentioned this is not unique
postcode.twcid - is a foreign key to the forecast table, this is not unique either
The relevant fields in the forecast table are:
forecast.twcid - identifyer for the table however not unique because there four days worth of forecasts in the table. Only ever four, newver more, never less.
And here is the query:
select * from forecast
LEFT OUTER JOIN postcodes ON forecast.TWCID = postcodes.TWCID
WHERE postcodes.postcode = 3123
order by forecast.twcid, forecast.theDate;
Because there are two records in the postcode table for 3123 the results are doubled up. Two forecasts for day 1, two for day 2 etc......
Given that the relationship between postcodes and forecast is many to many (there are multiple records in the postcode tables for each postcode and twcid. And there are multiple records for each twcid in the forecast table because it always holds four days worth of forecasts) is there a way to re-write the query to only get four forecast records for a post code?
Or is my thought of creating a new postcode table which has unique records for each post code necessary?
You have a problem that postcodes can be in multiple towns. And towns can have multiple postcodes. In the United States, the US Census Bureau and the US Post Office have defined very extensive geographies for various coding schemes. The basic idea is that a zip code has a "main" town.
I would suggest that you either create a separate table with one row per postcode and the main town. Or, add a field to your database indicating a main town. You can guarantee uniqueness of this field with a filtered index:
create unique index postcode_postcode_maintown on postcodes(postcode) where IsMainTown = 1;
You might need the same thing for IsMainPostcode.
(Filtered indexes are a very nice feature in SQL Server.)
With this construct, you can change your query to:
select *
from forecast LEFT OUTER JOIN
postcodes
ON forecast.TWCID = postcodes.TWCID and postcodes.IsMainPostcode = 1
WHERE postcodes.postcode = 3123
order by forecast.twcid, forecast.theDate;
You should really never have a table without a primary key. Primary keys are, by definition, unique. The primary key should be the target for your foreign keys.
You're having problems because you're fighting against a poor database design.

Many to many relationship and MySQL

If I wanted to make a database with subscribers (think YouTube), my thought is to have one table containing user information such as user id, email, etc. Then another table (subscriptIon table) containing 2 columns: one for the user id and one for a new subscriber's user id.
So if my user id is 101 and user 312 subscribes to me, my subscription table would be updated with a new row containing 101 in column 1 and 312 in column 2.
My issue with this is that every time 101 gets a new subscriber, it adds their id to the subscription table meaning I can't really set a primary key for the subscription table as a user id can be present many times for each of their subscribers and a primary key requires a unique value.
Also in the event that there's a lot of subscriptions going on, won't it be very slow to search for all of 101's followers as all the rows will have to be searched and be checked for every time 101 is in the first column and check the user id (the subscriber to 101) in the second column?
Is there's a more optimal solution to my problem?
Thanks!
In your case, the pairs (user_id, subscriber_id) are unique (a user can't have two subscriptions for another user, can they?). So make a compound primary key consisting of both fields if you need one.
Regarding the speed of querying your subscription table: think about the queries you'll run on the table, and add appropriate indexes. A common operation might be "give me a list of all my subscribers", which would translate to something like
SELECT subscriber_id FROM subscriptions WHERE user_id = 123;
(possibly as part of a join). If you have indexed the user_id column, this query can be run quite efficiently.
A Primary Key can be made of two columns, subscribe and subscriber in your case. And since search will only be on integer value, (no text search) it will be fast.
more informations here : https://stackoverflow.com/a/2642799/1338574

Inserting data into a table with new and old data from another two tables

I have a table name Queue_info with structure as
Queue_Id number(10)
Movie_Id number(10)
User_Id Varchar2(20)
Status Varchar2(20)
Reserved_date date
I have two other tables named Movie_info having a many columns including movie_Id and User_info having many columns including User_Id.
In the first table movie_id, user_id is foreign key from movie_info(movie_id) and user_info(User_id).
My problem is that if I insert any value either in the Movie_info or User_info, the Queue_info table should be updated as new entry for every user or for every movie
For example
If insertion in Movie_info as new movie then queue_info should be updated as for every user the status of that new movie is awaiting.
use from triggers. by using triggers you can update all related tables to your table. for example if 1 row inserted in to table 1, 1 row insert in to table 2 too.
Some notes first:
I really like that you have a standardized way to name tables and fields. I would use Queue instead of Queue_info, Movie instead of Movie_info, etc..., as all tables have information - don't they? - and we all know that. I'd also choose MovieId instead of Movie_Id, ReservedDate instead of Resedrved_date but that's a matter of personal taste (allergy to underscores).
What I wanted to stress is that choosing one way for naming and keeping it is very good.
What I don't like is that while your structure seems normalized, you use Varchar type for the User_id Key. Primary (and Foreign) Keys are best if they are small in size and with constant size. This mainly helps in keeping index sizes small (so more efficient) and secondly because the keys are the only values stored repeatedly in the db (so it helps keeping db size small).
Now, to your question, do you really need this? I mean, you may end up having in your database thousands of movies and users. Do you want to add a thousand rows in the Queue table whenever a new movie is inserted? Or another thousand rows when a new user is registered? Or 50 thousand rows when a new list with 50 new movies arrives (and is inserted in the db)?
With 10K movies and 2K users, you'll have a 20M rows table. There is no problem with a table of that size, and one or more triggers will serve your need. What happens if you have 100K movies and 50K users though? A 5G rows table. You can deal with that too, but perhaps you can just keep in that table only the movies that a user is interested in (or has borrowed or has seen, whatever the purpose of the db is). And if you want to have a list of movies that a certain user has not yet been interested in, check for those Movie_Id that do not exist in the table. with something like this:
SELECT
Movie_Id, Movie_Title
FROM
Movie_info AS m
WHERE
NOT EXISTS
( SELECT *
FROM Queue_info AS q
WHERE q.Movie_Id = m.Movie_Id
AND q.User_Id = #UserId
)