SQL: selective subqueries

SQL: selective subqueries - sql

I'm having an SQL query (MSSQLSERVER) where I add columns to the resultset using subselects:
SELECT P.name,
(select count(*) from cars C where C.type = 'sports') AS sportscars,
(select count(*) from cars C where C.type = 'family') AS familycars,
(select count(*) from cars C where C.type = 'business') AS businesscars
FROM people P
WHERE P.id = 1;
The query above is just from a test setup that's a bit nonsense, but it serves well enough as example I think. The query I'm actually working on spans a number of complex tables which only distracts from the issue at hand.
In the example above, each record in the table "people" also has three additional columns: "wantsSportscar", "wantsFamilycar" and "wantsBusinesscar". Now what I want to do is only do the subselect of each additional column if the respective "wants....." field in the people table is set to "true". In other words, I only want to do the first subselect if P.wantsSportscar is set to true for that specific person. The second and third subselects should work in a similar manner.
So the way this query should work is that it shows the name of a specific person and the number of models available for the types of cars he wants to own. It might be worth noting that my final resultset will always only contain a single record, namely that of one specific user.
It's important that if a person is not interested in a certain type of cars, that the column for that type will not be included in the final resultset. An example to be sure this is clear:
If person A wants a sportscar and a familycar, the result would include the columns "name", "sportscars" and "familycars".
If person B wants a businesscar, the result would include the columns "name" and "businesscar".
I've been trying to use various combinations with IF, CASE and EXISTS statements, but so far I've not been able to get a syntactically correct solution. Does anyone know if this is even possible? Note that the query will be stored in a Stored Procedure.

In your case, there are 8 column layouts possible and to do this, you will need 8 separate queries (or build your query dynamically).
It's not possible to change the resultset layout within a single query.
Instead, you may design your query as follows:
SELECT P.name,
CASE WHEN wantssport = 1 THEN (select count(*) from cars C where C.type = 'sports') ELSE NULL END AS sportscars,
CASE WHEN wantsfamily = 1 THEN (select count(*) from cars C where C.type = 'family') ELSE NULL END AS familycars,
CASE WHEN wantsbusiness = 1 THEN (select count(*) from cars C where C.type = 'business') ELSE NULL END AS businesscars
FROM people P
WHERE P.id = 1
which will select NULL in appropriate column if a person doesn't want it, and parse these NULL's on client side.
Note that relational model answers the queries in terms of relations.
In your case, the relation is as follows: "this person needs are satisifed with this many sport cars, this many business cars and this many family cars".
Relational model always answers this specific question with a quaternary relation.
It doesn't omit any of the relation members: instead, it just sets them to NULL which is the SQL's way to show that the member of a relation is not defined, applicable or meaningful.

I'm mostly an Oracle guy but there's a high chance the same applies. Unless I've misunderstood, what you want is not possible at that level - you will always have a static number of columns. Your query can control if the column is empty but since in the outer-most part of the query you have specified X number of columns, you are guaranteed to get X columns in your resultset.
As I said, I am unfamiliar with MS SQL Server but I'm guessing there will be some way of executing dynamic SQL, in which case you should research that since it should allow you to build a more flexible query.

You may be able to do what you want by first selecting the values as separate rows into a temp table, then doing a PIVOT on that table (turning the rows into columns).

It's important that if a person is not
interested in a certain type of cars,
that the column for that type will not
be included in the final resultset. An
example to be sure this is clear:
You will not be able to do it in plain SQL. I suggest you just make this column NULL or ZERO.
If you want the query to be dynamically expand when new cars are added, then PIVOTing could help you somewhat.

There are three fundamentals you want to learn to make this work easy. The first is data normalization, the second is GROUP BY, and the third is PIVOT.
First, data normalization. Your design of the people table is not in first normal form. The columns "wantsports", "wantfamily", "wantbusiness" are really a repeating group, although they may not look like one. If you can modify the table design, you will find it advantageous to create a third table, lets call it "peoplewant", with two key columns, personid and cartype. I can go into detail about why this design will be more flexible and powerful if you like, but I'm going to skip that for now.
On to GROUP BY. This allows you to produce a result that summarizes each group in one row of the result.
SELECT
p.name,
c.type,
c.count(*) as carcount
FROM people p,
INNER JOIN peoplewant pw ON p.id = pw.personid
INNER JOIN cars c on pw.cartype = c.type
WHERE
p.id = 1
GROUP BY
p.name,
c.type
This (untested) query gives you the result you want, except that the result has a separate row for each car type the person wants.
Finally, PIVOT. The PIVOT tool in your DBMS allows you to turn this result into a form where there is just one row for the person, and there is a separate column for each of the cartypes wanted by that person. I haven't used PIVOT myself, so I'll let somebody else edit this response to provide an example using PIVOT.
If you use the same technique to retrieve data for multiple people in one sweep, keep in mind that a column will appear for each wanted type that any person wants, and zeroes will appear in the PIVOT result for persons who do not want a car type that is in the result columns.

Just came across this post through a google search, so I realize I'm late to this party by a bit, but .. sure this really is possible to do... however, I wouldn't suggest actually doing it this way because it's usually considered a Very Bad Thing (tm).
Dynamic SQL is your answer.
Before I say how to do it, I want to preface this with, Dynamic SQL is a very dangerous thing, if you aren't sanitizing your input from the application.
So, therefore, proceed with caution:
declare #sqlToExecute nvarchar(max);
declare #includeSportsCars bit;
declare #includeFamilyCars bit;
declare #includeBusinessCars bit;
set #includeBusinessCars = 1
set #includeFamilyCars = 1
set #includeSportsCars = 1
set #sqlToExecute = 'SELECT P.name '
if #includeSportsCars = 1
set #sqlToExecute = #sqlToExecute + '(select count(*) from cars C where C.type = ''sports'') AS sportscars, ';
if #includeFamilyCars = 1
set #sqlToExecute = #sqlToExecute + '(select count(*) from cars C where C.type = ''family'') AS familycars, ';
if #includeBusinessCars = 1
set #sqlToExecute = #sqlToExecute + '(select count(*) from cars C where C.type = ''business'') AS businesscars '
set #sqlToExecute = #sqlToExecute + ' FROM people P WHERE P.id = 1;';
exec(#sqlToExecute)

Related

SQL - How to join indvidual cells from multiple rows into single row?

I have two tables. One with properties and one with buildings. Each property is associated with 0 to theoretically infinite amounts of buildings. Right now i have a code like this:
Select Property.ID, Building.Number
From Properties
Left Join Buildings on Buildings.pID = Property.ID
This returns a table of all buildings with their associated property. This however means, that all properties appear as many times, as they have buildings.
What i want is a result, where each property has its buildings in the same row as it self, so it becomes a result of properties, with their buildings, and not a result of buildings, with their properties.
EDIT: I should probably specify, that this is a server i only have read access on.

In sql server, you can use STUFF fuctionality for combining the resultset.
SELECT P.ID
,STUFF((SELECT ', ' + CAST(b.Number AS VARCHAR(10)) [text()]
FROM Buildings b
WHERE b.pID = P.ID
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ') BuildingNumbers
FROM Properties p

If the properties really are able to have an infinite amount of buildings then this will become really hard. You will have to create a table with an infinite amount of rows (one for each potential building), and set all of them to NULL except where there are buildings.
You can't have a dynamic amount of columns. It will have to be set beforehand. There are suboptimal workarounds, but they are all very contrary to database normalization.
edit: if it's just for the results of a query you could use PIVOT()

Multiply total number of values in column by value in a different table

I am trying to count all the values in one column and then multiply this number by a value in a different table. So far I have:
SELECT CLUB_FEE * COUNT(MEMBER_ID) AS VALUE
FROM CLUB, SUBSCRIPTION
WHERE CLUB_ID = 'CLUB1';
This is not working however, can anyone please help?
I also need help doing this for multiple clubs. Is it possible to do it all in one statement for all clubs and then get the average?

Presumably, you intend something like this:
SELECT MAX(c.CLUB_FEE) * COUNT(MEMBER_ID) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
WHERE c.CLUB_ID = 'CLUB1';
You can also write this as:
SELECT SUM(c.CLUB_FEE) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
WHERE c.CLUB_ID = 'CLUB1';
I thought the first version would be clearer, because the OP specifies COUNT() in the question.
If you want it for all clubs that have subscribers:
SELECT SUM(c.CLUB_FEE) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
GROUP BY c.CLUB_ID;

From inspecting the explain plans, it seems the following version may be a bit more efficient (since it avoids a join and uses only one aggregation). If you need this for ALL clubs at the same time, then probably all solutions will have the same "optimizer cost" (they will all do a join at some point).
select club_fee * (select count(member_id) from subscription where club_id = 'CLUB1')
from club
where club_id = 'CLUB1'
So now the only aggregate function is pushed into a subquery and the rest does not need either a join or another aggregate function.
Of course, this only matters if performance is important; it may very well not be.

Activerecord query returning doubles while using uniq

I am running the following query with the goal of returning a unique set of customer objects:
Customer.joins(:projects).select('customers.*, projects.finish_date').where("projects.closed = false").uniq
However, this code will generate duplicates if a customer has more than one project active (e.g. closed = true). If I remove the projects.finish_date from the select clause this query works as intended. However, I need this to be in there to be able to sort on that column.
How can I make this query return a unique set of customers?

How can I make this query return a unique set of customers?
This doesn't completely make sense, and probably isn't what you want.
The problem is that you're joining against the projects table, at which point there may be several rows for the same customer with different project finish_dates. These rows are unique and will be returned as multiple unique Customer objects, each with different a finish_date.
If you only want one of these, how is Rails to determine which one? Wouldn't it be a problem if you only had one customer object with one finish_date returned if there are really 10 projects for that customer, each with a different finish_date?
Instead, you probably want something like this:
customers = Customer.joins(:projects).select('customers.*, projects.finish_date').where("projects.closed = false").uniq
customers.group_by(&:id)
This groups all of your same customers together.
OR, you might want:
projects = Project.where(closed: false).includes(:user)
users = projects.map(&:user).uniq
In either case, you're producing a unique set of users from the superset of all user-project joins.
RE Your comments:
If you want to get a list of customers with their most recent associated project, you could use a sub query in your where:
select customers.*, projects.finish_date from customers
inner join projects on projects.customer_id = customers.id
where projects.id = (
select id from projects
where customer.id = project.customer_id
and closed = false
order by finish_date desc
limit 1
)
You can express this using ActiveRecord by embedding the sub-query in a where:
Customer.joins(:projects)
.select('customers.*, projects.finish_date as finish_date')
.where('select id from projects where customer.id = project.customer_id and closed = false order by finish_date desc limit 1')
I have no idea how this will perform for you, but I suspect poorly.
I would always stick to a simple includes and in-Ruby filter before attempting to optimize with SQL.

Update a column based on results from junction table

Possible duplicate: Concatenate row values T-SQL
I have three tables. Items, Organizations & Items_Organizations junction table which is providing the many to many relationship between the two others.
Items table includes the column 'organizations' where I want to store the values I receive from the junction table for each item; combining each organization with a comma or similar.
As far as I read it is not the best practice to store multiple values in a column, however I need to display organizations for each item in front-end through some handler & did not come up with a better idea than storing multiple values in one column.
So what I am trying to do in the back end is to update the 'organizations' column information using something like that;
Receiving the organizations for a specific item:
SELECT OrganizationName FROM organizations
JOIN organizations ON organizations.organizationID =
Items_Organizations.organizationID
WHERE Items_Organizations.item_ID = '1'
Trying to update the column 'organizations' using the results table of the query above using UPDATE
UPDATE items SET organizations = ?
It is quite difficult to formulate the question. I hope I made it clear enough though.

Try this:
UPDATE i
SET i.organizations =
STUFF((
SELECT ', ' + o.OrganizationName
FROM Items_Organizations AS io
INNER JOIN organizations AS o ON o.organizationID = io.organizationID
WHERE io.Item_Id = i.Item_Id AND io.Item_Id = 1
FOR XML PATH(''))
,1,2,'')
FROM items AS i
WHERE i.Item_id = 1;
See it in action here:
SQL Fiddle Demo

Query to get all revisions of an object graph

I'm implementing an audit log on a database, so everything has a CreatedAt and a RemovedAt column. Now I want to be able to list all revisions of an object graph but the best way I can think of for this is to use unions. I need to get every unique CreatedAt and RemovedAt id.
If I'm getting a list of countries with provinces the union looks like this:
SELECT c.CreatedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.CreatedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
UNION
SELECT c.RemovedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.RemovedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
For more complicated queries this could get quite complicated and possibly perform very poorly so I wanted to see if anyone could think of a better approach. This is in MSSQL Server.
I need them all in a single list because this is being used in a from clause and the real data comes from joining on this.

You have most likely already implemented your solution, but to address a few issues; I would suggest considering Aleris's solution, or some derivative thereof.
In your tables, you have a "removed at" field -- well, if that field were active (populated), technically the data shouldn't be there -- or perhaps your implementation has it flagged for deletion, which will break the logging once it is removed.
What happens when you have multiple updates during a reporting period -- the previous log entries would be overwritten.
Having a separate log allows for archival of the log information and allows you to set a different log analysis cycle from your update/edit cycles.
Add whatever "linking" fields required to enable you to get back to your original source data OR make the descriptions sufficiently verbose.
The fields contained in your log are up to you but Aleris's solution is direct. I may create an action table and change the field type from varchar to int, as a link into the action table -- forcing the developers to some standardized actions.
Hope it helps.

An alternative would be to create an audit log that might look like this:
AuditLog table
EntityName varchar(2000),
Action varchar(255),
EntityId int,
OccuranceDate datetime
where EntityName is the name of the table (eg: Contries, Provinces), the Action is the audit action (eg: Created, Removed etc) and the EntityId is the primary key of the modified row in the original table.
The table would need to be kept synchronized on each action performed to the tables. There are a couple of ways to do this:
1) Make triggers on each table that will add rows to AuditTable
2) From your application add rows in AuditTable each time a change is made to the repectivetables
Using this solution is very simple to get a list of logs in audit.
If you need to get columns from original table is also possible using joins like this:
select *
from
Contries C
join AuditLog L on C.Id = L.EntityId and EntityName = 'Contries'

You could probably do it with a cross join and coalesce, but the union is probably still better from a performance standpoint. You can try testing each though.
SELECT
COALESCE(C.CreatedAt, P.CreatedAt)
FROM
dbo.Countries C
FULL OUTER JOIN dbo.Provinces P ON
1 = 0
WHERE
C.LocalID = #Country OR
P.LocalID = #Country

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas