How can I get a complete list of unique values from multiple columns in multiple tables? - sql

Back story: I have an odd situation. An organization affiliated with my own provides us with a database that we use heavily; a few other organizations also use it, but our records are easily more than 90% of the total. We have a web frontend for data entry that is connected to the live database, but we only get the backend data as an Access file of selected tables that are sent to us periodically.
That's a hassle in general, but a critical problem that I run into in every report is differentiating records produced by our organization from others'. Records are identified by the staff who created them, but I don't have (and am unlikely to get) the users table itself - which means I have to manually keep a list of which user IDs correspond to which users, and if those users belong to our organization, etc. Right now, I'm building a sort of shadow DB that links to the data extract and has queries that append that kind of information onto the data tables - so when I pull out a list of records, I can get them by user ID, name, organization, role, etc.
The problem: not all users create or modify records of all types, so the user IDs I need to make this list complete are scattered across several tables. How can I create a list of unique user IDs from across all of these tables? I'm currently using a union of the IDs from the two biggest tables, but I don't know if I can stack subquery upon subquery to make that work - and I'm kind of hesitant to dive into writing that for Access without knowing if it will ultimately work. I'm interested in other methods, too.
TL;DR: What's the simplest way to get a column of the unique values of several columns that are spread across several tables?

Combine SELECT queries on each of the tables into a UNION query. The UNION query returns distinct values.
SELECT UserID FROM Table1
UNION
SELECT UserID FROM Table2
UNION
SELECT UserID FROM Table3;

Related

Creating one many to many table that is reused for many different tables when they all have a relationship to the same entity

I have a table [Team] in my Database. Many other tables require a many to many relationship with this table. This is mostly due to various records in my other table having authorisation settings based on which team the current user is in.
For example a record in my [User] table can be linked to many teams and a team can be linked to many [Ticket] records. If the teams overlap that user now has permission to view the [Ticket].
I am considering two options, firstly have a seperate table for each [UserTeam] and [TicketTeam] e.g ticket team will have the following columns Id,TicketId,TeamId.
My second option is to store the data for both in the same table. To do this I would create a [AuthorisationRecord] table with one Id column and a [AuthorisationRecordTeam] table with Id,AuthorisationId,TeamId. I would also then need an extra column on my [Ticket] and [User] tables AuthorisationRecordId. This has the downside of needing an extra column and every time we create a record we also have to create an [AuthorisationRecord] record.
If it was just these two tables I think it would make sense to have a seperate many to many table for each. However as there are many tables needing this same relationship with the [Team] table and the potential for many more in the future I am leaning toward my second option, as this will simplfy the devlopment process as every time we add a new feature that needs authorisation based on the Team we will not have to add another many to many table but simply add one extra column with a relationship to my [AuthorisationRecord] table.
My question is weather this is a good scaleable idea? It seems like it will greatly simplfy my database as instead of potentialy hundreds of many to many tables with my [Team] table there will be just one. However I can't find resources online about this potential method so I think it might be a bad idea.

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?
You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.
There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.
I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.
Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.
a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp
This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.
You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

How do I give different users access to different rows without creating separate views in BigQuery?

In this question: How do I use row-level permissions in BigQuery? it describes how to use an authorized view to grant access to only a portion of a table. But I'd like to give different users access to different rows. Does this mean I need to create separate views for each user? Is there an easier way?
Happily, if you want to give different users access to different rows in your table, you don't need to create separate views for each one. You have a couple of options.
These options all make use of the SESSION_USER() function in BigQuery, which returns the e-mail address of the currently running user. For example, if I run:
SELECT SESSION_USER()
I get back tigani#google.com.
The simplest option, then, for displaying different rows to different users, is to add another column to your table that is the user who is allowed to see the row. For example, the schema: {customer:string, id:integer} would become {customer:string, id:integer, allowed_viewer: string}. Then you can define a view:
#standardSQL
SELECT customer, id
FROM private.customers
WHERE allowed_viewer = SESSION_USER()
(note, don't forget to authorize the view as described here).
Then I'd be able to see only the fields where tigani#google.com was the value in the allowed_viewer column.
This approach has its own drawbacks, however; You can only grant access to a single user at a time. One option would be to make the allowed_viewer column a repeated field; this would let you provide a list of users for each row.
However, this is still pretty restrictive, and requires a lot of bookkeeping about which users should have access to which row. Chances are, what you'd really like to do is specify a group. So your schema would look like: {customer:string, id:integer, allowed_group: string}, and anyone in the allowed_group would be able to see your table.
You can make this work by having another table that has your group mappings. That table would look like: {group:string, user_name:string}. The rows might look like:
{engineers, tigani#google.com}
{engineers, some_engineer#google.com}
{administrators, some_admin#google.com}
{sales, some_salesperson#google.com}
...
Let's call this table private.access_control. Then we can change our view definition:
#standardSQL
SELECT c.customer, c.id
FROM private.customers c
INNER JOIN (
SELECT group
FROM private.access_control
WHERE SESSION_USER() = user_name) g
ON c.allowed_group = g.group
(note you will want to make sure that there are no duplicates in private.access_control, otherwise it could records to repeat in the results).
In this way, you can manage the groups in the private.access_control separately from the data table (private.customers).
There is still one piece missing that you might want; the ability for groups to contain other groups. You can get this by doing a more complex join to expand the groups in the access control table (you might want to consider doing this only once and saving the results, to save the work each time the main table is queried).

Opinions on planning and avoiding data redundancy

I am currently going to be designing an app in vb.net to work with an access back-end database. I have been trying to think of ways to reduce down data redundancy
and I have an example scenario below:
Lets imagine, for an example purpose, I have a customers table and need to highlight all customers in WI and send them a letter. The customers table would
contain all the customers and properties associated with customers (Name, Address, Etc) so we would query for where the state is "WI" in the table. Then we would
take the results of that data, and append it into a table with a "completion" indicator (So from 'CUSTOMERS' to say 'WI_LETTERS' table).
Lets assume some processing needs to be done so when its completed, mark a field in that table as 'complete', then allow the letters to be printed with
a mail merge. (SELECT FROM 'WI_LETTERS' WHERE INDICATOR = COMPLETE).
That item is now completed and done. But lets say, that every odd year (2013) we also send a notice to everyone in the table with a state of "WI". We now query the
customers table when the year is odd and the customer's state is "WI". Then append that data into a table called 'notices' with a completion indicator
and it is marked complete.
This seems to keep the data "task-based" as the data is based solely around the task at hand. However, isn't this considered redundant data? This setup means there
can be one transaction type to many accounts (even multiple times to the same account year after year), but shouldn't it be one account to many transactions?
How is the design of this made better?
You certainly don't want to start creating new tables for each individual task you perform. You may want to create several different tables for different types of tasks if the information you need to track (and hence the columns in those tables) will be quite different between the different types of tasks, but those tables should be used for all tasks of that particular type. You can maintain a field in those tables to identify the individual task to which each record applies (e.g., [campaign_id] for Marketing campaign mailouts, or [mail_batch_id], or similar).
You definitely don't want to start creating new tables like [WI_letters] that are segregated by State (or any client attribute). You already have the customers' State in the [Customers] table so the only customer-related attribute you need in your [Letters] table is the [CustomerID]. If you frequently want to see a list of Letters for Customers in Wisconsin then you can always create a saved Query (often called a View in other database systems) named [WI_Letters] that looks like
SELECT * FROM Letters INNER JOIN Customers ON Customers.CustomerID=Letters.CustomerID
WHERE Customers.State="WI"

SQL - how to get out chained data?

I have 4 tables which were auto generated for me:
User
Challenge
Exercise
Challenge_Exercise
One User may have many Challenges, and one Challenge will have many Exercises.
What I noticed is that the Challenge table has a reference to it's parent User (called user_id) but Exercise do not have a reference in it's table to Challenge; their relation is stored in Challenge_Exercise as Challenge_id and exercise_id.
My question is, how would I take out every Exercise that is linked to a specific user? For instance User with id = 1?
SELECT *
FROM excerise,
challenge_excerise,
challenge
WHERE challenge.user_id = 1
AND challenge_excerise.challenge_id = challenge.id
AND challenge_excerise.exercise_id = excercise.id
What I'm doing here is a join, you could also explicitly do it with inner joins (google it if you wanna know more).
This table is needed because you have a many to many relationship, which means each challenge can have multiple exercises, but also each exercise can have multiple challenges. It's a standard to make an extra table then, so you don't have redundant data, this table is often called junction table.
If you want background just google it, there are tons of data to this topic.