How to eliminate false results [duplicate] - google-bigquery

This question already has answers here:
Q: How to exclude persons name from a table
(2 answers)
Closed 1 year ago.
I am trying to use bigquery to extract data about 10 most mentioned personalities in the leading newspapers in Israel using this code:
SELECT
person,
COUNT(1) AS count_mentions,
COUNT(DISTINCT url) AS count_distinct_urls
FROM
`composed-hold-309910.dataset_1.israel_media_person`
GROUP BY
person
ORDER BY
count_mentions DESC
LIMIT
10;
Unfortunately, some of the results iv'e gotten were not actual people but some "buzzwords" like 'Maccabi Haifa' and 'Gaza Gaza'
person
count_mentions
count_distinct_urls
Benjamin Netanyahu
32965
20660
------------------
----------------
--------------------
Maccabi Haifa
16528
5947
------------------
----------------
--------------------
Gaza Gaza
13267
7623
I would be delighted to find a way to eliminate these false results.
Matan

You can filter out these buzzwords by using where person not in ([list_buzz_word])

Related

Ascending and Decending ORDER BY clauses in a single query [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I working on these problems for a class I"m taking, but this one has me stumped. Here is the problem:
--Using the AUTHOR table, write a query that will list all information about authors
--whose first name ends with an “A”. Put the results in descending order of last name,
--and then ascending order by first name. This should be done using a single query.
Here is what I've come up with so far:
SELECT *
FROM author
WHERE(fname LIKE '%A')
ORDER BY lname DESC, fname ASC;
However all I get in the result is the information ordered by last name descending. First name ascending doesn't seem to work.
Any thoughts on what I'm missing? Using Oracle Express 10G, if it matters.
Thanks.
There is nothing wrong with your query. All you have to do is just pay attention to the data :-)
Here is how you would interpret your data output:
--------------+--------------
zzz | john
zza | adam
zaa | bob
ccc | jack
ccc | john
cca | mike
So, ordering works just you instruct Oracle - lname desc, fname acs, but you need to realize that fname asc comes in a picture once lname desc is processed. In other words: ZZZ comes before ZZA , but once CCC is ordered then and only then jack comes before john .

Group by a field not in select

I want to find how many modules a lecturer taught in a specific year and want to select name of the lecturer and the number of modules for that lecturer.
Problem is that because I am selecting Name, and I have to group it by name to make it work. But what if there are two lecturers with same name? Then sql will make them one and that would be wrong output.
So what I really want to do is select name but group by id, which sql is not allowing me to do. Is there a way around it?
Below are the tables:
Lecturer(lecturerID, lecturerName)
Teaches(lecturerID, moduleID, year)
This is my query so far:
SELECT l.lecturerName, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerName --I want lectureID here, but it doesn't run if I do that
SELECT a.lecturerName, b.NumOfModules
FROM Lecturer a,(
SELECT l.lecturerID, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerID) b
WHERE a.lecturerID = b.lecturerID
You should probably just group by lecturerID and include it in the select column list. Otherwise, you're going to end up with two rows containing the same name with no way to distinguish between them.
You raise the problem of "wrong output" when grouping just by name but "undecipherable output" is just as big a problem. In other words, your desired output (grouping by ID but giving name):
lecturerName Module
------------ ------
Bob Smith 1
Bob Smith 2
is no better than your erroneous output (grouping by, and giving, name):
lecturerName Module
------------ ------
Bob Smith 3
since, while you now know that one of the lecturers taught two modules and the other taught one, you have no idea which is which.
The better output (grouping by ID and displaying both ID and name) would be:
lecturerId lecturerName Module
---------- ------------ ------
314159 Bob Smith 1
271828 Bob Smith 2
And, yes, I'm aware this doesn't answer your specific request but sometimes the right answer to "How do I do XYZZY?" is "Don't do XYZZY, it's a bad idea for these reasons ...".
Things like writing operating systems in COBOL, accounting packages in assembler, or anything in Pascal come to mind instantly :-)
You could subquery your count statement.
SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l
Note there are other ways of doing this. If you also wanted to elimiate the rows with no modules you can then try.
SELECT *
FROM (SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l) AS temp
WHERE temp.numofmodules > 0

SQL comma separated values grouped in query [duplicate]

This question already has answers here:
Concat groups in SQL Server [duplicate]
(5 answers)
SQL in (#Variable) query
(3 answers)
Closed 9 years ago.
I am using SQL Server 2008 and inherited a database that did not use many to many. They instead used a comma-separated column. I have found how to link the comma-separated values to the name of the program. But I need a list of the programs and the offices they belong to, like this
OFFICE table:
ID Name
--- ------
1 HQ
2 PA
3 CEO
PRG table:
ID Name Office Affected
-- ---- ---------------
A PRG1 1,3
B PRG2 2
C PRG3 2,3
D PRG4 1,2
Output that I need :
Name Programs
---- ---------
HQ PRG1, PRG4
PA PRG2, PRG3, PRG4
CEO PRG1, PRG3
You can manage to do this. However, because storing lists in strings is a bad idea, I don't want to compound that by putting them back in a comma-delimited list. Instead, the following query produces the data in a more normalized form, with one row per office name and program:
select o.name, p.name as program_name
from prg p join
office o
on ','+p.OfficeAffected+',' like '%,'+cast(o.id as varchar(255)) + ',%';

how to get first female of each tool, oracle database [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Oracle SQL - How to Retrieve highest 5 values of a column
I'm writing oracle query but stuck in the following problem
table is like this:
**Tool** **Name** **Gender**
Facebook Alice F
Facebook Alex M
Facebook Loong M
Facebook Jimmy M
Twitter James M
Twitter Jessica F
Twitter Sam M
Twitter Kathrine F
Google Rosa F
Google Lily F
Google Bob M
What I wanna get is the first female in each tool
the result should be like:
Facebook Alice
Twitter Jessica
Google Rosa
I'm trying to get this by using query not functions or procedures
Thank for helping
select *
from (
select row_number() over (partition by tool order by name) as rn
, Name
, Tool
from YourTable
where Gender = 'F'
) SubQueryAlias
where rn = 1 -- Only first per tool
Example at SQL Fiddle.
This is another alternative.
select min(name), tool
from yourTable
where gender = 'F'
group by tool
I'd like to have a little bit of a discussion on which is better or which does what, for me its the first time I see row_number(). Note thas this one returns the female in the alphabetical order, yours does the same by sorting in a window, what is the difference?

Is there a way to optimise an array of subquery in a SQL select?

I currently have two tables
question
--------
id
title, character varying
answer
--------
id
question_id
votes, integer
I use the following query to return me a list of questions with its corresponding array of votes:
SELECT question.id,
question.title,
ARRAY(SELECT votes
FROM answer
WHERE answer.question_id = question.id)
FROM question
ORDER BY question.id
The output looks like:
id | title | ?column?
----+----------+-----------------------------------------------------
100 | How to | {5,2,7}
101 | Where is | {0}
102 | What is | {1}
The above query can take close to 50s to run with hundred of thousands of questions where each question can have at least 5 answers. Is there a way to optimise the above?
You should use a join:
SELECT question.id, question.title, answer.votes
FROM question
JOIN answer ON answer.question_id == question.id
ORDER BY question.id
If you want the output column to contain a concatenated list of all "votes" associated with a question, and you are on Postgres, check out this question: How to concatenate strings of a string field in a PostgreSQL 'group by' query?
I recommend creating an index on your answer table, and using your original query.
CREATE INDEX answer_question_id_idx ON answer(question_id);
Without this index, it will have to do a sequential scan of the entire table to find rows with a matching question_id. It will have to do that for every single question.
Alternatively, consider using a join, as arc suggested. I'm not an expert in the matter, but I think Postgres will use a hash join rather than multiple sequential scans, making the query faster. If you want to retain the id/title/array format, use array_agg:
SELECT question.id, question.title, array_agg(answer.votes)
FROM question
LEFT JOIN answer ON answer.question_id = question.id
GROUP BY question.id, question.title
ORDER BY question.id;
However, there's a caveat. If a question has no answers, you'll get a weird-looking result:
id | title | array_agg
----+-------------------+-----------
1 | How do I do this? | {3,5}
2 | How do I do that? | {NULL}
(2 rows)
This is because of the LEFT JOIN, which creates a NULL value when no rows from the joined table are available. With INNER JOIN, the second row won't appear at all.
That's why I recommend using your original query. It produces the expected result:
id | title | ?column?
----+-------------------+----------
1 | How do I do this? | {3,5}
2 | How do I do that? | {}
If you want the query to produce one row per question, with votes gathered into an array, you can use a join, with array_agg:
SELECT question.id,
question.title,
array_agg(answer.votes) as answer_votes
FROM question
JOIN answer ON answer.question_id = question.id
GROUP BY question.id, question.title
ORDER BY question.id