Summarizing three variables using sql [closed] - sql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
Here is the raw data
Book | Author | Year
A | A1 | 1985
A | B1 | 1985
B | A1 | 1988
B | C1 | 1988
D | A1 | 1990
D | C1 | 1990
D | B1 | 1990
Here is what output I am looking for,
Author1 | Author2 | year | count
A1 | B1 | 1985 | 1
A1 | C1 | 1985 | 1
A1 | C1 | 1988 | 1
A1 | B1 | 1990 | 1
A1 | C1 | 1990 | 1
B1 | C1 | 1990 | 1
Any help is deeply appreciated.
Thanks

The query you are looking for is a self join with an aggregation:
select t1.author as author1, t2.author as author2, t1.year, count(*) as `count`
from t t1 join
t t2
on t1.book = t2.book and
t1.author < t2.author
group by t1.author, t2.author, t1.year
order by t1.author, year;

SELECT A.author AS author1,
B.author AS author2,
A.year,
COUNT(*) AS "count"
FROM Author A
LEFT JOIN Author B
ON B.book = A.Book
AND B.author > A.Author
GROUP BY A.author, B.author, A.year
ORDER BY A.author, B.author, A.year
This will work okay only as long as there are no more than two rows per book in your Author table. Otherwise, it will produce multiple lines per book. If that is possibly the case, you should indicate what flavor of SQL should be used, as the means to limit the results from Table B differ from implementation to implementation. I have arbitrarily chosen to list the authors in alphabetical order, since there appears to be no indicator of which author is "primary."
I would hope that there are come additional columns in the table that you are not telling us about--most specifically a primary key, and perhaps some attribute indicating the "billing order" of the authors with respect to a given book.
You might want to reconsider your table design, if that's possible: it's in a non-normalized form that makes data integrity hard to enforce.

Related

How to compute overlap percentage of agreement between people in Hive table

Suppose I have a survey where each question has 4 possible answers, and surveyed people can choose at least one answer (multiple answers allowed). I want to compute per question per answer, how many people chose that answer. For example, if I have the hive table:
question_id | answer_id | person_id
-------------------------------------
1 | A | 1
1 | B | 1
1 | C | 1
1 | D | 1
1 | A | 2
1 | B | 2
1 | C | 2
2 | D | 1
2 | A | 1
Then the resulting table would be:
question_id | answer_id | Percentage
-------------------------------------
1 | A | 100
1 | B | 100
1 | C | 100
1 | D | 50
2 | D | 50
2 | A | 50
For question 1, both people put A,B,C giving 100% for all three, but one person put D as well, giving 50%. For question 2, one person put D and one person put A, giving 50% and 50%.
I've been really stuck and I haven't been able to find anything online that accomplishes what I'm looking for. Any help would be amazing!
Hmmm . . . If I understand correctly, you want the number of people who chose one particular question/answer combination divided by the people who chose the question. If so, I think
select qa.*, qa.num_persons * 100.0 / q.num_persons
from (select question_id, answer_id, count(*) as num_persons
from t
group by question_id, answer_id
) qa join
(select question_id, count(distinct person_id) as num_persons
from t
group by question_id
) q
on qa.question_id = q.question_id;
Also you can use analytic functions and size(collect_set) for counting distinct. This will allow to eliminate join and will work fine if the number of distinct person per question is not too big (array produced by collect_set can fit in memory)
select qa.question_id, qa.answer_id,
qa.num_persons * 100.0 / size(qa.question_persons) as Percentage
from (select question_id, answer_id,
count(*) over (partition by question_id, answer_id) as num_persons,
collect_set(person_id) over(partition by question_id) as question_persons
from t
) qa;
I'm not familiar with prestoDB but below is a SQL script that will have the same result as what you posted.
The 2.0 is the number of person. You might want to select that first and store it in a vairable.
select
question_id, answer_id, (count(answer_id)/2.0) * 100.0
from Sample
group by question_id, answer_id
order by question_Id, answer_id

Database Design question best way to solve this issue [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm kind of new in database design and I'm trying to find the best way to solve an issue I'm facing.
Let's think about the following example:
Imagine I want to store information about patients and these patients can have 0+ diseases.
What's the best way of storing arranging the tables to display the diseases that each patient can have? I get confused as to what happens when a patient would have 3 diseases; how is this stored in a relational database? e.g. without having repeated rows on the diseases table for example (static number of diseases)..
I'm not sure if I'm making myself totally clear here!
But let's say I don't think it's efficient to have:
Patient table -
Patient_id , disease_id
1, (3,4,5,6)
Any help is appreciated!
As mentioned in the comments, you would use an associative entity for that.
Let say you have the following diseases:
| Disease_id | Disease_name |
-----------------------------
| 1 | Cancer |
-----------------------------
| 2 | Leucemia |
-----------------------------
Your patients may look like this
| Patient_id | Patient_name |
-----------------------------
| 1 | Peter Jones |
-----------------------------
| 2 | Mark Jacobs |
-----------------------------
You now create a table like so (lets call it ill_patients)
| Patient_id | Disease_id |
---------------------------
| 1 | 1 |
---------------------------
| 1 | 2 |
---------------------------
This would mean that poor Peter Jones has cancer as well as leucemia. You can now query your patient table like so:
SELECT patients.patient_name, diseases.disease_name
FROM (diseases INNER JOIN ill_patients ON diseases.disease_id = ill_patients.disease_id) INNER JOIN patients ON ill_patients.patient_id = patients.patient_id;
This gives you all the patients with their respective diseaes.
Patients:
patient_id, name, phone
Disease:
disease_id, name, description
Patient_Disease:
patient_id, disease_id
Example:
patient_id, name , phone
1 , 'jhon', '555-1234'
disease_id, name , description
1 , 'Cancer' , 'Cancer is the uncontrolled development of cells.'
2 , 'Diabetes', 'Diabetes is a disease that occurs when your blood glucose is too high.'
patient_id, disease_id
1 , 1
1 , 2
Then you can do
SELECT p.name, d.name, d.description
FROM patients p
JOIN patient_disease pd
ON p.patient_id = pd.patient_id
JOIN diseases d
ON pd.disease_id = d.disease_id

sql query get Data on pre conditions [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
i need fruit list which price greater than in tableA for each fruit.
ID | fruit | Price
----------------------------
1 | apple | 10
2 | banana| 7
3 | grapes| 6
then i have daily table like below
ID | fruit | Price
----------------------------
1 | apple | 9
2 | banana| 5
3 | grapes| 9
4 | mango | 15
in this condition i get only grapes
I think you can just join the daily and tableA tables on the fruit's ID, and then compare prices.
SELECT t1.*
FROM daily t1
INNER JOIN tableA t2
ON t1.ID = t2.ID
WHERE t1.price > t2.price
Note that we join on the ID rather than the fruit name, since in theory names may not be completely unique across a very large table of fruits.
just join by ID and add your additional condition (price in tableA is greater than price in dailyTable).
you don't need to join by column fruit - but if so, it won't change your resultset.
SELECT TableA.*, dailyTable.Price
FROM TableA
INNER JOIN dailyTable
ON TableA.ID = dailyTable.ID
AND TableA.Price > dailyTable.Price
the column fruit is redundant data. so you shouldn't store it in the daily table.

Count(*) and Sum in the same row

I'm banging my head against the wall, here. I've looked at dozens of StackOverflow questions that are similar, and they get me close, but I haven't found one yet that does what I need.
I have thousands of questions in a database with answers from multiple users to each question. I need to aggregate the answers to show the count of distinct answers per question. That's the easy part; where I'm stumbling is in adding a Sum column to show the total number of answers given for each question. I can do it if I restrict the Where clause to specific questions, but I'm trying to get this all into one query if possible.
Here's the Query:
select c.ID, a.userID. c.question, a.answer, count(a.answer) as cnt
from NotableAnswers a, categories b, questions c
where c.fkCategory = b.ID and a.questionID = c.ID and b.ID = 18
Group By a.answer, c.ID, c.question
Order By c.ID, answer asc
What I need is a result set that looks like this
ID | userID | Question | Answer | cnt | totcnt
------------------------------------------------------------------
175 | 10318 |Favorite... |Dropbox | 15 | 35
175 | 10354 |Favorite... |Box | 2 | 35
175 | 10323 |Favorite... |Google Drive | 15 | 35
175 | 103111 |Favorite... |Cubby | 3 | 35
186 | 10318 |Best IDE... |IntelliJ | 4 | 12
186 | 103613 |Best IDE... |Android Studio| 6 | 12
186 | 103117 |Best IDE... |Eclipse | 2 | 12
This set shows the Answer as an aggregate and the count of that specific answer along with the sum of the number of answers provided to each distinct question.
Any and all help greatly appreciated.
First, learn to use proper join syntax. Simple rule: Never use commas in the FROM clause. Always use proper explicit JOIN syntax.
Second, the answer is window functions:
select q.ID, a.userID. q.question, a.answer, count(a.answer) as cnt,
sum(count(a.answer)) over (partition by q.id) as total_cnt
from NotableAnswers a join
questions q
on a.questionID = q.ID join
categories c
on q.fkCategory = c.ID
where c.ID = 18
Group By a.answer, c.ID, c.question
Order By q.ID, answer asc;
In addition, it is better to use table aliases that are abbreviations for the table names rather than arbitrary letters.

Oracle SQL - Inner Join & Count [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I have the following Oracle database, and I need to return the following:
d.directdomain, d.domaindisplayname, r.lastdate (The latest), and count(how many times, r.directdomain = d.directdomain)
basically, I have lots of "people" in r db, and "domains" in the d. I need to return how many times a person in R visited each "domain", and also return the last time they visited the domain.
I tried a few things, but it seems by using count, i need to GROUP BY the date, so that's confusing me.
Example return:
1, Site1, 21/05/13, 5
2, Site2, 20/05/13, 2
d
directdomain (PK)
domaindisplayname
r
rsld (PK)
lastdate
directdomain (FK)
Are you looking for something like this?
SELECT d.directdomain,
d.domaindisplayname,
MAX(r.lastdate) lastdate,
COUNT(*) rcount
FROM d JOIN r
ON d.directdomain = r.directdomain
GROUP BY d.directdomain, d.domaindisplayname
Sample output:
| DIRECTDOMAIN | DOMAINDISPLAYNAME | LASTDATE | RCOUNT |
|--------------|-------------------|-------------------------------|--------|
| 1 | Site1 | August, 15 2013 00:00:00+0000 | 4 |
| 2 | Site2 | August, 18 2013 00:00:00+0000 | 3 |
Here is SQLFiddle demo