Can I get duplicate results (from one table) in an INTERSECT operation between two tables? - sql

I know the wording of the question is awkward, but I couldn't phrase it any better. Let me explain the situation.
There's table A which has a bunch of columns (a, b, c ... ) and I run a SELECT query on it like so:
SELECT a FROM A WHERE b IN ('....') (the ellipsis indicates a number of values to be matched to)
There's another table B which has a bunch of columns (d, e, f ... ) and I run a SELECT query on it like so:
SELECT d FROM B WHERE f = '...' (the ellipsis indicates a single value to be matched to)
Now I should say here that the two tables store different types of information about the same entity, but the columns a and d contain the exact same data (in this case, an ID). I want to find out the intersection of the two tables so I run this:
SELECT a FROM A WHERE b IN ('....') INTERSECT SELECT d FROM B WHERE f = '...'
Now here's the problem:
The first SELECT contains a set of values in the WHERE clause, right? So let's say the set is (1234, 2345,3456). Now, the result of this query when b is matched ONLY to 1234 is, let's say, abc. When it's matched to 2345, it's def, suppose. And matching to 3456, it gives abc.
Let's suppose these two results (abc and def) are also in the set of results from the second SELECT.
So, now, putting back the entire set of values to matched into the WHERE clause, the INTERSECT operation will give me abc and def. But I want abc twice since two values in the WHERE clause set match to the second SELECT.
Is there any way I can get that?
I hope it's not too complicated to understand my problem. This is a real-life problem I'm facing in my job.
Data structure and my code
Table A contains general information about a company:
company_id | branch_id | no_of_employees | city
Table B contains the financials of the company:
company_id | branch_id | revenue | profits
First SELECT:
SELECT branch_id FROM A WHERE CITY IN ('Dallas', 'Miami', 'New Orleans')
Now, running each city separately in the first SELECT, I get the branch_ids:
branch_id | city
23 | Dallas
45 | Miami
45 | New Orleans
Once again, this seems impractical as to how two cities can have the same branch ids, but please bear with me on this.
Second SELECT:
SELECT branch_id FROM B
WHERE REVENUE = 5000000
I know this is a little impractical, but for the purpose of this example, it suffices.
Running this query I get the following set:
11
23
45
22
10
So the INTERSECT will give me just 23 and 45. But I want 45 twice, since both Miami and New Orleans have that branch_id and that branch_id has generated a revenue of 5 million.

Directly from Microsoft's documentation (https://msdn.microsoft.com/en-us/library/ms188055.aspx)
:
"INTERSECT returns distinct rows that are output by both the left and right input queries operator."
So NO, it is not possible to get the same value twice when using INTERSECT because the results will be DISTINCT. However if you build an INNER JOIN correctly you can do essentially the same thing as INTERSECT except keep the repetitive results by NOT using distinct or group by.
SELECT
A.a
FROM
A
INNER JOIN B
ON A.a = B.d
AND B.F = '....'
WHERE b IN ('....')
And for your specific Example that you edited:
SELECT
branch_id
FROM
A
INNER JOIN B
ON A.branch_id = B.branch_id
AND B.REVENUE = 5000000
WHERE A.CITY IN ('Dallas', 'Miami', 'New Orleans')

You overcomplicated your task a lot:
SELECT *
FROM A
WHERE CITY IN (...)
AND EXISTS
(
SELECT 1 FROM B
WHERE B.REVENUE = 5000000
AND B.branch_id = A.branch_id
)
INTERSECT and EXCEPT are both returning row sets with DISTINCT applied.
Regular joining/filtering operations are not performed by INTERSECT or EXCEPT.

Related

TypeORM & Postgres: Count only unique distinct values from multiple columns

I have various SQL queries, which return me unique / distinct value from DB, (or count them),
like:
SELECT buyer as counterparty
FROM public.order
UNION
SELECT seller as counterparty
FROM public.order
or
SELECT COUNT(*)
FROM (
SELECT DISTINCT p
FROM public.order
CROSS JOIN LATERAL (VALUES(seller),(buyer)) AS C(p)
) AS internalQuery
Example structure of my table:
id buyer seller
0 A B
1 B A
2 B D
3 D A
4 A D
Desired result:
3 or A,B,D
I'd like to rewrite them with TypORM query builder, but I can't figure out, how to replace CROSS JOIN LATERAL (VALUES(seller),(buyer)) AS C(p) or UNION in my case. TypeORM is pretty poor with examples and doc coverage in this case.
Does there any option with that?
I have seen various methods like .getCount and .distinct(true) which could help me and easily find the solution for one column.
So I understood, that if I want to find the exact number, instead of doc results, I should use .getCount instead of .getMany
But I can't understand, how to select (and unite) values from multiple columns via typeORM to receive distinct values from multiple columns.
I am working with PostgrSQL, so when I am trying:
const query = repository.createQueryBuilder('order')
.distinctOn(['buyer', 'seller'])
.limit(100)
.getMany()
I receive docs with each distinct value in each field, so instead of 3 I get 6 values (3 distinct by column1, and 3 by column2)

Get the list for Super and sub types

Am Having the Tables in SQL server as Super and Sub types like below. Now if i have to get list of Furnitures then how can i get the list?
Furniture table:
Id FurnituretypeId NoofLegs
-------------------------------
1 1 4
2 2 4
FurnitureType table:
Id Name
-----------------
1 chair
2 cot
3 table
Chair Table:
Id Name CansSwing CanDetachable FurnitureId
------------------------------------------------------------
1 Chair1 Y Y 1
Cot Table:
Id Name CotType Storage StorageType FurnitureId
-------------------------------------------------------------------
1 Cot1 Auto Y Drawer 2
How can i get the entire furniture list as some of them are chair and some of them are cot. How can i join the these tables with furniture table and get the list?
Hmmm . . . union all and join?
select cc.*, f.*
from ((select Id, Name, CansSwing, CanDetachable,
NULL as CotType, NULL as Storage, NULL as StorageType, FurnitureId
from chairs
) union all
(select Id, Name, NULL as CansSwing, NULL as CanDetachable,
CotType, Storage, StorageType, FurnitureId
from cots
)
) cc join
furniture f
on cc.furnitureid = f.id;
This is a classical learning problem, that's why I'm not giving you the code to solve this but all the insights you need to do so.
You have multiple approaches possible, but I'm describing two simple ones:
1) Use the UNION statement to join two separate queries one for Chair and the other for Cot, bare in mind that both SELECT have to return the same structure.
SELECT
a1,
a2,
etc..
FROM table1 a1
JOIN table2 a2 ON a1.some = a2.some
UNION
SELECT
a1,
a3,
etc..
FROM table1 a1
JOIN table3 a3 ON a1.some = a3.some
2) You can do it all in one SELECT statement using a LEFT JOIN to both tables and and in the select using COALESCE or ISNULL to get the values for one table or the other. In the WHERE condition you have to force one or the other join to be not null.
SELECT
a1,
COALESCE(a2,a3) as col2
FROM table1
LEFT JOIN table2 a2 ON a1.some = a2.some
LEFT JOIN table3 a3 ON a1.some = a3.some
WHERE
a2.some IS NOT NULL
OR a3.some IS NOT NULL
Mapping objects into relational models takes a degree of understanding of what is possible vs. what is wise in an RDBMS. Object oriented database systems tried to go after problems like this (generally without much success) precisely because the problem statement is arguably not the right one.
Please consider just putting all of these in one table. Then use null for the fields that don't really matter for each sub-type. You will likely end up being a lot happier in the end since you can spend less time at runtime doing joins and instead just query the information you need and use indexing on the same table to find the fasted path for each query you want to run.
SELECT * FROM CombinedTable;

Retrieving duplicate values in column SQL database

I have a small database which contains a table that holds information in each row on a Movie (e.g, Movie Name, Movie Runtime, Movie Rating) and I also have a separate Genre table which contains a list of genres (Horror, Action etc).
I have an association table which links a movie to a genre (a typical row will contain the unique Id for that row, the genreId and the movieId).
I have written a query which pulls back all the genres a user has watched; however, it is removing the duplicate row values and is giving me what seems to be a distinct count.
Below is the SQL statement:
SELECT g.Type,
g.Id
FROM GenreTable g
WHERE
g.Id in (
SELECT gma.GenreId
FROM MovieGenreAssociationTable gma
WHERE gma.MovieId in (
SELECT uma.MovieSeriesId
FROM UserMovieAssociationTable uma
WHERE uma.UserId = '1'
)
);
This returns all of the genres a user has watched, but I'm noticing that it's not bringing back the duplicates which I know exist in the association table.
How do I get those duplicates?
You are not making a JOIN but a SELECT on a single table, so it will never return any duplicates unless they exist in GenreTable.
If you do something like SELECT a FROM tbl WHERE b IN (1,1,1,1,1), it will return only one row -- not five. And even if you have a complicated WHERE there, it's still a simple IN clause.
update: quick and dirty refresher on JOINs.
I'd actually suggest you look for a SQL tutorial. I make no claim about the completeness of this note - rather, the contrary. First google hit, second hit, etc.
Say that you have two simple tables:
a.id a.a b.id b.b
1 1 1 'Hello'
2 1 2 'World'
3 2 7 'foobar'
4 3
If you run a JOIN between a and b, ON(a.a = b.id), the query will select all records in a; each of them will be then joined to all matching records in b. This is what JOIN is for.
In this case, the second and third columns will always be equal:
1 1 1 'Hello'
2 1 1 'Hello'
3 2 2 'World'
Notice that the fourth row of a is discarded because it has no matches, and the third row of b is never selected at all. The second row of b is selected twice, because there are two elements of a which have a match.
A LEFT JOIN works the same, except that if there are no matches for the left side of the query (i.e. table a), as it happens for the fourth row, that row is selected all the same; but the extra fields that would have come from b are replaced by NULLs. You get a further row for which the JOIN clause, ON(a.a = b.id), is actually false:
4 3 NULL NULL
(And you can use this to select the rows of a that have no matches in b: just specify e.g. WHERE b.primary_key_of_b IS NULL).
Your case
You should do something like:
SELECT
g.Type,
g.Id
FROM GenreTable AS g
JOIN MovieGenreAssociationTable AS gma ON (gma.GenreId = g.Id)
JOIN UserMovieAssociationTable AS uma ON (uma.MovieSeriesId = gma.MovieId)
WHERE uma.UserId = 1;
You can then GROUP BY e.g. Type and Id to get the COUNT() of movies watched for each genre.
But...
Say that you have a GenreTable with two rows (Id=123, Type="Science Fiction" and Id=456, Type="Comedy"), a Movie table with one row (777, "Galactic Quest"), a MovieGenreAssociationTable with (123, 777) and (456, 777) because that movie is a great comedy too, and finally user 1 watched only movie 777. You would get:
Genre gma uma Movie
123 "Science Fiction" 123, 777 777, 1 777, "Galaxy Quest"
456 "Comedy" 456, 777 777, 1 777, "Galaxy Quest"
and would see that user 1 has seen two movies - one SciFi, one Comedy.
In this case you need to either accept the result (how many comedies did he watch? One. How many SciFis? One), or make a more complicated query for which you must decide which is the main genre. Otherwise you would get illogical results ("How many comedies? One. How many movies? One. Then number of non-comedies is one minus one, ie, zero? No, it is again one - wait, what?").
In this case you could add a column for this purpose in MovieGenreAssociation, a boolean column "IsMainGenre". So when you want to know how many comedies one watched, you would do as above. But when you split movies by genre, you add AND IsMainGenre=1 and you calculate "Galaxy Quest" among SciFis, but not among comedies or parodies.

in sql how to return single row of data from more than one row in the same table

I have a single table of activities, some labelled 'Assessment' (type_id of 50) and some 'Counselling' (type_id of 9) with dates of the activities. I need to compare these dates to find how long people wait for counselling after assessment. The table contains rows for many people, and that is the primary key of 'id'. My problem is how to produce a result row with both the assessment details and the counselling details for the same person, so that I can compare the dates. I've tried joining the table to itself, and tried nested subqueries, I just can't fathom it. I'm using Access 2010 btw.
Please forgive my stupidity, but here's an example of joining the table to itself that doesn't work, producing nothing (not surprising):
Table looks like:
ID TYPE_ID ACTIVITY_DATE_TIME
----------------------------------
1 9 20130411
1 v 50 v 20130511
2 9 20130511
3 9 20130511
In the above the last two rows have only had assessment so I want to ignore them, and just work on the situation where there's both assessment and counselling 'type-id'
SELECT
civicrm_activity.id, civicrm_activity.type_id,
civicrm_activity.activity_date_time,
civicrm_activity_1.type_id,
civicrm_activity_1.activity_date_time
FROM
civicrm_activity INNER JOIN civicrm_activity AS civicrm_activity_1
ON civicrm_activity.id = civicrm_activity_1.id
WHERE
civicrm_activity.type_id=9
AND civicrm_activity_1.type_id=50;
I'm actually wondering whether this is in fact not possible to do with SQL? I hope it is possible? Thank you for your patience!
Sounds to me like you only want to get the ID numbers where you have a TYPE_ID entry of both 9 and 50.
SELECT DISTINCT id FROM civicrm_activity WHERE type_id = '9' AND id IN (SELECT id FROM civicrm_activity WHERE type_id = '50');
This will give you a list of id's that has entries with both type_id 9 and 50. With that list you can now go and get the specifics.
Use this SQL for the time of type_id 9
SELECT activity_date_time FROM civicrm_activity WHERE id = 'id_from_last_sql' AND type_id = '9'
Use this SQL for the time of type_id 50
SELECT activity_date_time FROM civicrm_activity WHERE id = 'id_from_last_sql' AND type_id = '50'
Your query looks OK to me, too. The one problem might be that you use only one table alias. I don't know, but perhaps Access treats the table name "specially" such that, in effect, the WHERE clause says
WHERE
civicrm_activity.type_id=9
AND civicrm_activity.type_id=50;
That would certainly explain zero rows returned!
To fix that, use an alias for each table. I suggest shorter ones,
SELECT A.id, A.type_id, A.activity_date_time,
B.type_id, B.activity_date_time
FROM civicrm_activity as A
JOIN civicrm_activity as B
ON A.id = B.id
WHERE A.type_id=9
AND B.type_id=50;

I DISTINCTly hate MySQL (help building a query)

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?
SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`
Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.
Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.
Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c