How to select faulty records? - sql

I'm investigating an error in one of our tables of a geographical database. Given the table below, the DistrictName and DisId should always have the same combination (i.e. Bronx = 11, Manhatten = 14), but some records have a different DisId (while still sharing the the same DistrictName).
Id DistrictName DisId Section
------------------------------------------------
1 Bronx 11 1
2 Bronx 11 2
3 Brooklyn 12 1
4 Brooklyn 13 2 //wrong
5 Manhatten 14 1
6 Manhatten 14 2
7 Queens 15 1
8 Queens 16 2 //wrong
9 Queens 17 3 //wrong
How can I select all faulty records in a query?
There is always a Section 1, so records with a section > 1 containing the same DistrictName but having a deviating DisId are the ones I'm looking for.
I've tried using a group by (districtname) but I'm having difficulties comparing with the section1 record. I'm kind of lost when it comes to putting the logic in the having or where clause. Any help appreciated!

select * from your_table
where section > 1
and districtname in
(
select districtname
from your_table
group by districtname
having count(distinct disid) > 1
)

Related

Cant merge two queries with different columns

I'm studying SQL and somehow I'm stuck with a question. I have 2 tables ('users' and 'follows').
Follows Table
user_id
follows
date
1
2
1993-09-01
2
1
1989-01-01
3
1
1993-07-01
2
3
1994-10-10
3
2
1995-03-01
4
2
1988-08-08
4
1
1988-08-08
1
4
1994-04-02
1
5
2000-01-01
5
1
2000-01-02
5
6
1986-01-10
7
1
1990-02-02
1
7
1996-10-01
1
8
1993-09-03
8
1
1995-09-01
8
9
1995-09-01
9
8
1996-01-10
7
8
1993-09-01
3
9
1996-05-30
4
9
1996-05-30
Users Table
user_id
first_name
last_name
school
1
Harry
Potter
Gryffindor
2
Ron
Wesley
Gryffindor
3
Hermonie
Granger
Gryffindor
4
Ginny
Weasley
Gryffindor
5
Draco
Malfoy
Slytherin
6
Tom
Riddle
Slytherin
7
Luna
Lovegood
Ravenclaw
8
Cho
Chang
Ravenclaw
9
Cedric
Diggory
Hufflepuff
I need to list all rows from follows where someone from one house follows someone from a different house. I tried to make 2 queries, one to get all houses related to follows.user_id and another one with all houses related to follows.follows and "merge" then:
select a.nome_id, a.user_id_house, b.follows_id, b.follows_house
from ( select follows.user_id as nome_id
, users.house as user_id_house
from follows inner join users
ON users.user_id = follows.user_id
) as a,
( select follows.follows as follows_id
, users.house as follows_house
from follows inner join users
ON follows.user_id = users.user_id
) as b
where a.user_id_house <> b.follows_house;
The problem is that the result is like 400 rows, its not right. I have no idea how I could solve this.
Try this
SELECT follows.user_id, users.school, followers.user_id, followers.school FROM follows
JOIN users ON follows.user_id=users.user_id
JOIN users as followers ON follows.follows=followers.user_id
WHERE users.school <> followers.school
Note: Pay attention to naming in my answer
Thanks for correcting to Thorsten Kettner

SQL Hive add column based on column value

I have a query which looks like
select
number,
class,
unix_timestamp(date1) - unix_timestamp(date2) as time_example,
sum(unix_timestamp(date1) - unix_timestamp(date2)) over(partition by unix_timestamp(date1) - unix_timestamp(date2) order by class) as class_time
from myTable
Which gives results such as
number class time_example class_time
1 math 5 5
1 science 5 10
1 art 5 15
1 math 2 2
1 science 2 4
1 art 2 6
1 math 10 10
1 science 10 20
1 art 10 30
I want to add columns based on class, and only have 3 different columns because there are only 3 columns. For example the time for math and would get 17. This is the desired table i would like to get
number class class_time
1 math 17
1 science 17
1 art 17
You can do that by using group by
select number, class,
sum(unix_timestamp(date1) - unix_timestamp(date2)) as class_time
from myTable
group by number,class;

In a game show database scenario, how do I fetch the average total episode score per season in a single query?

Pardon the title gore. I'm having trouble finding a good way to express my question, which is endemic to the problem.
The Tables
season
id name
---- ------
1 Season 1
2 Season 2
3 Season 3
episode
id season_id number title
---- ----------- -------- ---------------------------------------
1 1 1 Pilot
2 1 2 1x02 - We Got Picked Up
3 1 3 1x03 - This is the Third Episode
4 2 1 2x01 - We didn't get cancelled.
5 2 2 2x02 - We're running out of ideas!
6 3 1 3x01 - We're still here.
7 3 2 3x02 - Okay, this game show is dying.
8 3 3 3x03 - Untitled
score
id episode_id score contestant_id (table not given)
---- ------------ ------- ---------------------------------
1 1 35 1
2 1 -12 2
3 1 8 3
4 1 5 4
5 2 13 1
6 2 -2 5
7 2 3 3
8 2 -14 6
9 3 -14.5 1
10 3 -3 2
11 3 1.5 7
12 3 9.5 5
13 4 22.8 1
14 4 -3 8
15 5 2 1
16 5 13.5 9
17 5 7 3
18 6 13 1
19 6 -84 10
20 6 12 11
21 7 3 1
22 7 10 2
23 8 29 1
24 8 1 5
As you can see, you have multiple episodes per season, and multiple scores per episode (one score per contestant). Contestants can reappear in later episodes (irrelevant), scores are floating point values, and there can be an arbitrary number of scores per episode.
So what am I looking for?
I'd like to get the average total episode score per season, where the total episode score is the sum of all the scores in an episode. Mathematically, this comes out to be the sum of all scores in a season divided by the number of episodes. Easy enough to comprehend, but I have had trouble doing it in a single query and getting the correct result. I'd like an output like the following:
name average_total_episode_score
---------- -----------------------------
Season 1 9.83
Season 2 21.15
Season 3 -5.33
The top-level query needs to be on the season table as it will be combined with other, similar queries on the same table. It's easy enough to do this with an aggregate in a subquery, but an aggregation executes the subquery, failing my single-query requirement. Can this be done in a single query?
Hope this should work
Select s.id, avg(score)
FROM Season S,
Episode e,
Score sc
WHERE s.id = e.season_id
AND e.id = sc.episode_id
Group by s.id
Okay, just figured it out. As usual, I had to write and post a whole book before the simple solution descended upon me.
The problem in my query (which I didn't give in the question) was the lack of a DISTINCT count. Here is a working query:
SELECT
"season"."id",
"season"."name",
(SUM("score"."score") / COUNT(DISTINCT "episode"."id")) AS "average_total_episode_score"
FROM "season"
LEFT OUTER JOIN "episode"
ON ("season"."id" = "episode"."season_id")
LEFT OUTER JOIN "score"
ON ("episode"."id" = "score"."episode_id")
GROUP BY "season"."id"
select Se.id AS Season_Id, sum(score) As season_score, avg(score) from score S join episode E ON S.episode_id = E.id
join Season se ON se.id = e.season_id group by se.id

Double Join to get data from a table based on other two tables

I have three tables with the following key fields:
CONTRACTS
reference
package
EVENTS
reference
condition1
condition2
TRADES
reference
event_reference
Basically, what I would like to do is the following:
Get all the reference of the table EVENTS where the two conditions (condition1 and condition2) are met;
Hence, getting all the reference of the table TRADES where TRADES.event_reference = EVENTS.reference
Finally, getting the CONTRACTS.package where the CONTRACTS.reference = TRADES.reference (after having filtered the data at the point 2).
In order to do this, I have tried a JOIN statement:
SELECT CONTRACTS.package
FROM CONTRACTS
JOIN TRADES ON CONTRACTS.reference = TRADES.reference
JOIN EVENTS ON TRADES.event_reference = EVENTS.reference
WHERE EVENTS.condition1 = '1.511' AND EVENTS.condition2 IN (1,2)
However, the above (which is executed without errors) does not issue any result, and I would actually expect to see some.
I hence understand that I'm being wrong in the logic that I follow: could anyone please help?
EDIT: this is an example of how the data look like (in yellow, I have highlighted the data that would be touched in the query if it was working as I had it in mind:
...here is the expected result:
1 (package of 4, related to 11 which satisfies condition 1 and 2)
2 (package of 6, related to 13 which satisfies condition 1 and 2)
4 (package of 10, related to 16 which satisfies condition 1 and 2)
and here are the data to copy-paste them:
CONTRACTS
reference package
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 4
10 4
EVENTS
reference condition1 condition2
10 1.511 0
11 1.511 1
12 1.202 0
13 1.511 2
14 1.511 0
15 1.202 0
16 1.511 1
TRADES
reference event_reference
2 10
4 11
5 12
6 13
7 14
9 15
10 16
Your query looks OK
SQL Fiddle Demo
SELECT CONTRACTS.package
FROM CONTRACTS
JOIN TRADES ON CONTRACTS.reference = TRADES.reference
JOIN EVENTS ON TRADES.event_reference = EVENTS.reference
WHERE EVENTS.condition1 = 'true' AND EVENTS.condition2 = 'true'
OUTPUT
| package |
|---------|
| 1 |
| 2 |
| 4 |

SQL comparing two tables with common id but the id in table 2 could being in two different columns

Given the following SQL tables:
Administrators:
id Name rating
1 Jeff 48
2 Albert 55
3 Ken 35
4 France 56
5 Samantha 52
6 Jeff 50
Meetings:
id originatorid Assitantid
1 3 5
2 6 3
3 1 2
4 6 4
I would like to generate a table from Ken's point of view (id=3) therefore his id could be possibly present in two different columns in the meetings' table. (The statement IN does not work since I introduce two different field columns).
Thus the ouput would be:
id originatorid Assitantid
1 3 5
2 6 3
If you really just need to see which column Ken's id is in, you only need an OR. The following will produce your example output exactly.
SELECT * FROM Meetings WHERE originatorid = 3 OR Assistantid = 3;
If you need to take the complex route and list names along with meetings, an OR in your join's ON clause should work here:
SELECT
Administrators.name,
Administrators.id,
Meetings.originatorid,
Meetings.Assistantid
FROM Administrators
JOIN Meetings
ON Administrators.id = Meetings.originatorid
OR Administrators.id = Meetings.Assistantid
Where Administrators.name = 'Ken'