Select if data has two instances with different values sql - sql

I have a table with multiple instances of title some hardcover (h) and some paperback (p)
title | type
-----------------------------+------
Franklin in the Dark | p
Little Women | p
The Cat in the Hat | p
Dune | p
The Shining | p
Programming Python | p
Goodnight Moon | p
2001: A Space Odyssey | h
Dynamic Anatomy | p
Bartholomew and the Oobleck | p
The Cat in the Hat | h
Dune | h
The Velveteen Rabbit | p
The Shining | h
The Tell-Tale Heart | p
2001: A Space Odyssey | p
I'm trying to return instances that have both paper back and hardcover copies.
The table should ideally return only 4 titles.
*edit these are part of two different tables.
7808 | The Shining | 4156 | 9
4513 | Dune | 1866 | 15
4267 | 2001: A Space Odyssey | 2001 | 15
1608 | The Cat in the Hat | 1809 | 2
1590 | Bartholomew and the Oobleck | 1809 | 2
25908 | Franklin in the Dark | 15990 | 2
0385121679 | 7808 | 2 | 75 | 1993-10-01 | h
1885418035 | 156 | 1 | 163 | 1995-03-28 | p
0929605942 | 156 | 2 | 171 | 1998-12-01 | p
0441172717 | 4513 | 2 | 99 | 1998-09-01 | p
044100590X | 4513 | 3 | 99 | 1999-10-01 | h
0451457994 | 4267 | 3 | 101 | 2000-09-12 | p
0451198492 | 4267 | 3 | 101 | 1999-10-01 | h
0823015505 | 2038 | 1 | 62 | 1958-01-01 | p
0596000855 | 41473 | 2 | 113 | 2001-03-01 | p

This could also work.
SELECT TITLE
FROM BOOKS
GROUP BY TITLE
HAVING COUNT(DISTINCT TYPE) > 1

there are a couple ways of doing this. If you just want the title of the book that has both a hard cover and a paperback (I'm assuming those are the only two options). Then you can do a query like this:
select title, count(*) from book group by title having count(*) > 1
You also could join to the table.
select t0.title from
(
select title from book where btype = 'h'
) t0
inner join
(
select title from book where btype = 'p'
) t1 on t0.title = t1.title
Edited for the two tables
select * from table_one where bookid in (
select t0.bookid
from
(
select bookid from table_two where type = 'h'
) t0
inner join
(
select bookid from table_two where type = 'p'
) t1
on t0.bookid = t1.bookid
) t2

Does this work for you?
SELECT title
FROM table a
WHERE type = 'h' AND
EXISTS (SELECT 1
FROM table
WHERE title = a.title AND
type = 'p')

Related

Select from a concatenation of two columns after a left join

Problem description
Let the tables C and V have those values
>> Table V <<
| UnID | BillID | ProductDesc | Value | ... |
| 1 | 1 | 'Orange Juice' | 3.05 | ... |
| 1 | 1 | 'Apple Juice' | 3.05 | ... |
| 1 | 2 | 'Pizza' | 12.05 | ... |
| 1 | 2 | 'Chocolates' | 9.98 | ... |
| 1 | 2 | 'Honey' | 15.98 | ... |
| 1 | 3 | 'Bread' | 3.98 | ... |
| 2 | 1 | 'Yogurt' | 8.55 | ... |
| 2 | 1 | 'Ice Cream' | 7.05 | ... |
| 2 | 1 | 'Beer' | 9.98 | ... |
| 2 | 2 | 'League of Legends RP' | 40.00 | ... |
>> Table C <<
| UnID | BillID | ClientName | ... |
| 1 | 1 | 'Alexander' | ... |
| 1 | 2 | 'Tom' | ... |
| 1 | 3 | 'Julia' | ... |
| 2 | 1 | 'Tom' | ... |
| 2 | 2 | 'Alexander' | ... |
Table C have the values of each product, which is associated with a bill number. Table V has the relationship between the client name and the bill number. However, the bill number has a counter that is dependent on the UnId, which is the store unity ID. That being said, each store has it`s own Bill number 1, number 2, etc. Also, the number of bills from each store are not equal.
Solution description
I'm trying to make select between the C left join V without sucess. Because each BillID is dependent on the UnID, I have to make the join considering the concatenation between those two columns.
I've used this script, but it gives me an error.
SELECT
SUM(C.Value),
V.ClientName
FROM
C
LEFT JOIN
V
ON
CONCAT(C.UnID, C.BillID) = CONCAT(V.UnID, V.BillID)
GROUP BY
V.ClientName
and SQL server returns me this 'CONCAT' is not a recognized built-in function name.
I'm using Microsoft SQL Server 2008 R2
Is the use of CONCAT wrong? Or is it the way I tried to SELECT? Could you give me a hand?
[OBS: The tables I've present you are just for the purpose of explaining my difficulties. That being said, if you find any errors in the explanation, please let me know to correct them.]
You should be joining on the equality of the UnID and BillID columns in the two tables:
SELECT
c.ClientName,
COALESCE(SUM(v.Value), 0) AS total
FROM C c
LEFT JOIN V v
ON c.UnID = v.UnID AND
c.BillID = v.BillID
GROUP BY
c.ClientName;
In theory you could try joining on CONCAT(UnID, BillID). However, you could run into problems. For example, UnID = 1 with BillID = 23 would, concatenated together, be the same as UnID = 12 and BillID = 3.
Note: We wrap the sum with COALESCE, because should a given client have no entries in the V table, the sum would return NULL, which we then replace with zero.
concat is only available in sql server 2012.
Here's one option.
SELECT
SUM(C.Value),
V.ClientName
FROM
C
LEFT JOIN
V
ON
cast(C.UnID as varchar(100)) + cast(C.BillID as varchar(100)) = cast(V.UnID as varchar(100)) + cast(V.BillID as varchar(100))
GROUP BY
V.ClientName

How to print the students name in this query?

The concerned tables are as follows:
students(rollno, name, deptcode)
depts(deptcode, deptname)
course(crs_rollno, crs_name, marks)
The query is
Find the name and roll number of the students from each department who obtained
highest total marks in their own department.
Consider:
i) Courses of different department are different.
ii) All students of a particular department take same number and same courses.
Then only the query makes sense.
I wrote a successful query for displaying the maximum total marks by a student in each department.
select do.deptname, max(x.marks) from students so
inner join depts do
on do.deptcode=so.deptcode
inner join(
select s.name as name, d.deptname as deptname, sum(c.marks) as marks from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name and x.deptname=do.deptname group by do.deptname;
But as mentioned I need to display the name as well. Accordingly if I include so.name in select list, I need to include it in group by clause and the output is as below:
Kendra Summers Computer Science 274
Stewart Robbins English 80
Cole Page Computer Science 250
Brian Steele English 83
expected output:
Kendra Summers Computer Science 274
Brian Steele English 83
Where is the problem?
I guess this can be easily achieved if you use window function -
select name, deptname, marks
from (select s.name as name, d.deptname as deptname, sum(c.marks) as marks,
row_number() over(partition by d.deptname order by sum(c.marks) desc) rn
from students s
inner join crs_regd c on s.rollno=c.crs_rollno
inner join depts d on d.deptcode=s.deptcode
group by s.name,d.deptname) x
where rn = 1;
To solve the problem with a readable query I had to define a couple of views:
total_marks: For each student the sum of their marks
create view total_marks as select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno;
dept_max: For each department the highest total score by a single student of that department
create view dept_max as select deptcode, max(total) max_total from total_marks group by deptcode;
So I can get the desidered output with the query
select a.deptcode, a.rollno, a.name from total_marks a join dept_max b on a.deptcode = b.deptcode and a.total = b.max_total
If you don't want to use views you can replace their selects on the final query, which will result in this:
select a.deptcode, a.rollno, a.name
from
(select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a
join (select deptcode, max(total) max_total from (select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a_ group by deptcode) b
on a.deptcode = b.deptcode and a.total = b.max_total
Which I'm sure it is easily improvable in performance by someone more skilled then me...
If you (and anybody else) want to try it the way I did, here is the schema:
create table depts ( deptcode int primary key auto_increment, deptname varchar(20) );
create table students ( rollno int primary key auto_increment, name varchar(20) not null, deptcode int, foreign key (deptcode) references depts(deptcode) );
create table course ( crs_rollno int, crs_name varchar(20), marks int, foreign key (crs_rollno) references students(rollno) );
And here all the entries I inserted:
insert into depts (deptname) values ("Computer Science"),("Biology"),("Fine Arts");
insert into students (name,deptcode) values ("Turing",1),("Jobs",1),("Tanenbaum",1),("Darwin",2),("Mendel",2),("Bernard",2),("Picasso",3),("Monet",3),("Van Gogh",3);
insert into course (crs_rollno,crs_name,marks) values
(1,"Algorithms",25),(1,"Database",28),(1,"Programming",29),(1,"Calculus",30),
(2,"Algorithms",24),(2,"Database",22),(2,"Programming",28),(2,"Calculus",19),
(3,"Algorithms",21),(3,"Database",27),(3,"Programming",23),(3,"Calculus",26),
(4,"Zoology",22),(4,"Botanics",28),(4,"Chemistry",30),(4,"Anatomy",25),(4,"Pharmacology",27),
(5,"Zoology",29),(5,"Botanics",27),(5,"Chemistry",26),(5,"Anatomy",25),(5,"Pharmacology",24),
(6,"Zoology",18),(6,"Botanics",19),(6,"Chemistry",22),(6,"Anatomy",23),(6,"Pharmacology",24),
(7,"Sculpture",26),(7,"History",25),(7,"Painting",30),
(8,"Sculpture",29),(8,"History",24),(8,"Painting",30),
(9,"Sculpture",21),(9,"History",19),(9,"Painting",25) ;
Those inserts will load these data:
select * from depts;
+----------+------------------+
| deptcode | deptname |
+----------+------------------+
| 1 | Computer Science |
| 2 | Biology |
| 3 | Fine Arts |
+----------+------------------+
select * from students;
+--------+-----------+----------+
| rollno | name | deptcode |
+--------+-----------+----------+
| 1 | Turing | 1 |
| 2 | Jobs | 1 |
| 3 | Tanenbaum | 1 |
| 4 | Darwin | 2 |
| 5 | Mendel | 2 |
| 6 | Bernard | 2 |
| 7 | Picasso | 3 |
| 8 | Monet | 3 |
| 9 | Van Gogh | 3 |
+--------+-----------+----------+
select * from course;
+------------+--------------+-------+
| crs_rollno | crs_name | marks |
+------------+--------------+-------+
| 1 | Algorithms | 25 |
| 1 | Database | 28 |
| 1 | Programming | 29 |
| 1 | Calculus | 30 |
| 2 | Algorithms | 24 |
| 2 | Database | 22 |
| 2 | Programming | 28 |
| 2 | Calculus | 19 |
| 3 | Algorithms | 21 |
| 3 | Database | 27 |
| 3 | Programming | 23 |
| 3 | Calculus | 26 |
| 4 | Zoology | 22 |
| 4 | Botanics | 28 |
| 4 | Chemistry | 30 |
| 4 | Anatomy | 25 |
| 4 | Pharmacology | 27 |
| 5 | Zoology | 29 |
| 5 | Botanics | 27 |
| 5 | Chemistry | 26 |
| 5 | Anatomy | 25 |
| 5 | Pharmacology | 24 |
| 6 | Zoology | 18 |
| 6 | Botanics | 19 |
| 6 | Chemistry | 22 |
| 6 | Anatomy | 23 |
| 6 | Pharmacology | 24 |
| 7 | Sculpture | 26 |
| 7 | History | 25 |
| 7 | Painting | 30 |
| 8 | Sculpture | 29 |
| 8 | History | 24 |
| 8 | Painting | 30 |
| 9 | Sculpture | 21 |
| 9 | History | 19 |
| 9 | Painting | 25 |
+------------+--------------+-------+
I take chance to point out that this database is badly designed. This becomes evident with course table. For these reasons:
The name is singular
This table does not represent courses, but rather exams or scores
crs_name should be a foreign key referencing the primary key of another table (that would actually represent the courses)
There is no constrains to limit the marks to a range and to avoid a student to take twice the same exam
I find more logical to associate courses to departments, instead of student to departments (this way also would make these queries easier)
I tell you this because I understood you are learning from a book, so unless the book at one point says "this database is poorly designed", do not take this exercise as example to design your own!
Anyway, if you manually resolve the query with my data you will come to this results:
+----------+--------+---------+
| deptcode | rollno | name |
+----------+--------+---------+
| 1 | 1 | Turing |
| 2 | 6 | Bernard |
| 3 | 8 | Monet |
+----------+--------+---------+
As further reference, here the contents of the views I needed to define:
select * from total_marks;
+----------+-----------+--------+-------+
| deptcode | name | rollno | total |
+----------+-----------+--------+-------+
| 1 | Turing | 1 | 112 |
| 1 | Jobs | 2 | 93 |
| 1 | Tanenbaum | 3 | 97 |
| 2 | Darwin | 4 | 132 |
| 2 | Mendel | 5 | 131 |
| 2 | Bernard | 6 | 136 |
| 3 | Picasso | 7 | 81 |
| 3 | Monet | 8 | 83 |
| 3 | Van Gogh | 9 | 65 |
+----------+-----------+--------+-------+
select * from dept_max;
+----------+-----------+
| deptcode | max_total |
+----------+-----------+
| 1 | 112 |
| 2 | 136 |
| 3 | 83 |
+----------+-----------+
Hope I helped!
Try the following query
select a.name, b.deptname,c.marks
from students a
, crs_regd b
, depts c
where a.rollno = b.crs_rollno
and a.deptcode = c.deptcode
and(c.deptname,b.marks) in (select do.deptname, max(x.marks)
from students so
inner join depts do
on do.deptcode=so.deptcode
inner join (select s.name as name
, d.deptname as deptname
, sum(c.marks) as marks
from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name
and x.deptname=do.deptname
group by do.deptname
)
Inner/Sub query will fetch the course name and max marks and the outer query gets the corresponding name of the student.
try and let know if you got the desired result
Dense_Rank() function would be helpful in this scenario:
SELECT subquery.*
FROM (SELECT Student_Total_Marks.rollno,
Student_Total_Marks.name,
Student_Total_Marks.deptcode, depts.deptname,
rank() over (partition by deptcode order by total_marks desc) Student_Rank
FROM (SELECT Stud.rollno,
Stud.name,
Stud.deptcode,
sum(course.marks) total_marks
FROM students stud inner join course course on stud.rollno = course.crs_rollno
GROUP BY stud.rollno,Stud.name,Stud.deptcode) Student_Total_Marks,
dept dept
WHERE Student_Total_Marks.deptcode = dept.deptname
GROUP BY Student_Total_Marks.deptcode) subquery
WHERE suquery.student_rank = 1

SQL Question Looking Up Value in Same Table

Trying to use a self join in SQL to look up a value in the table and apply it.
Her's what I got:
+-----------------+-----+--------+-----------+
| Acutal Output | | | |
+-----------------+-----+--------+-----------+
| TRKID | Fac | NewFac | BAG_TRKID |
| 449 | 11 | 11 | 999 |
| 473 | 11 | 11 | 737 |
| 477 | 11 | 11 | 737 |
| 482 | 11 | 11 | 737 |
| 737 | 89 | 89 | |
| Desired Out Put | | | |
| TRKID | Fac | NewFac | BAG_TRKID |
| 449 | 11 | 11 | 999 |
| 473 | 11 | 89 | 737 |
| 477 | 11 | 89 | 737 |
| 482 | 11 | 89 | 737 |
| 737 | 89 | 89 | |
+-----------------+-----+--------+-----------+
Here's the code below. I can't seem to get the table that I want. The Bag TrkID's Facility Num is not becoming the TrkID New Facility Num.
Select
TABLEA.TRKID,
TABLEA.FAC,
NVL(TABLEA.FAC, TABLEB.FAC) as NEWFAC,
TABLEA.BAG_TRKID
FROM
(
Select
HSD. TRKID,
HSD.NLPT as FAC,
SBPD.BAG_TRKID
From
HSD
LEFT JOIN
SBPD
ON
SBPD.BAG_TRKID = HSD. TRKID
Where
HSD.SCANDT BETWEEN ‘Yesterday’ and ‘Today’
) TABLEA
LEFT JOIN
(
Select
HSD. TRKID,
HSD.NLPT as FAC,
SBPD.BAG_TRKID
From
HSD
LEFT JOIN
SBPD
ON
SBPD.BAG_TRKID = HSD. TRKID
Where
HSD.SCANDT BETWEEN ‘Yesterday’ and ‘Today’
) TABLEB
ON
TABLEA.TRKID = TABLEB.BAG_TRKID
Perhaps something like
select a.TrkID, a."Facility Number", a.BAG_TRKID, b.TrkID as "NEW Fac"
from tbl a
left join tbl b on (a.TrkID = b.trk_id_reference)
Given the limited information that you've shared, I was able to achieve the expected output with the following query:
SELECT a.TrkID, a.facility_number, a.bag_trkid, b.facility_number as new_facility_number
FROM test_tbl AS a
LEFT JOIN test_tbl AS b ON a.bag_trkid = b.trkid OR (a.bag_trkid IS NULL AND b.trkid = a.trkid);
You want to get the new_facility_number for a row based on its bag_trkid (which can be achieved by this: LEFT JOIN test_tbl AS b ON a.bag_trkid = b.trkid).
BUT the trick is to account for the cases when the Left Table (which I refer as a) does not have a bag_trkid. In this case, we will keep the new_facility_number to be the same as a.facility_number, joining the tables on the trkid solely: OR (a.bag_trkid IS NULL AND b.trkid = a.trkid)

SQL: Cascading conditions on Join

I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello

Compare two most recent columns for same ID

I have a table for persons’ addresses. The Class column tells whether it is a home (H), postal (P) or internal (I) address. Each time a person updates the address, the date is stored as well.
The internal address is used by my company when a correspondence is returned to us due to incorrect addresses, so we update that to our address until we can obtain new contact details.
I’d like to know which persons have changed their address to a postal or home address with an effective date after the effective date of the internal address.
Here’s an example of what the table looks like:
| **PersonID**| **Class** | **EffDate** | **Line1**
| 1 | H | 12/01/2010 | 31 Academy Avenue
| 1 | H | 13/09/2010 | 433 Hillcrest Drive
| 1 | I | 26/10/2015 | 1 Bond Circle
| 2 | H | 17/12/2012 | 761 Circle St.
| 2 | H | 12/11/2013 | 597 Elm Lane
| 2 | I | 1/10/2015 | 1 Bond Circle
| 2 | H | 6/12/2016 | 8332 Mountainview St.
| 3 | P | 27/09/2010 | 8 Bow Ridge Lane
| 3 | H | 6/12/2010 | 22 Shady St.
| 3 | I | 7/12/2015 | 1 Bond Circle
| 3 | H | 8/12/2016 | 7423 Rockcrest Ave.
| 4 | P | 9/12/2015 | 888 N. Shady Street
| 4 | I | 10/12/2016 | 1 Bond Circle
I'd like the query to only return:
| **PersonID**| **Class** | **EffDate** | **Line1**
| 2 | H | 6/12/2016 | 8332 Mountainview St.
| 3 | H | 8/12/2016 | 7423 Rockcrest Ave.
Any ideas? I'm using SQL Server 2012.
Here is one method:
select t.*
from t
where t.class in ('H', 'P') and
t.effdate > (select max(t2.effdate)
from t t2
where t2.class = 'I' and t2.personId = t.personId
);
Another method that uses window functions:
select t.*
from (select t.*,
max(case when class = 'I' then date end) over (partition by personid) as max_idate
from t
) t
where t.class in ('H', 'P') and
t.date > t.maxidate;