Reorder columns and rows in StackExchange query - sql

I want to do Data Analysis on some Stack Overflow Posts and need to get a query output in the right format. My goal is to input Post ID's and get my answers in the following format:
ID|Title|Question|Answer1|Answer2|Answer3|Answer4|Answer5|Answer...
__________________________________________________________________
1 |Tit 1|Quest 1 |1.Answ |2.Answ |3.Answ |4.Answ |5.Answ |Answer...
2 |Tit 2|Quest 2 |1.Answ |2.Answ |3.Answ | | |
3 |Tit 3|Quest 3 |1.Answ |2.Answ |3.Answ |4.Answ | |
I am not familiar with writing queries on StackExchange but i managed to write a query to get almost the right output. My results is like this:
ID|Title|Question|Answer|
_________________________
1 |Tit 1|Quest 1 |1.Answ |
1 |Tit 1|Quest 1 |2.Answ |
1 |Tit 1|Quest 1 |3.Answ |
2 |Tit 2|Quest 2 |2.Answ |
2 |Tit 2|Quest 2 |2.Answ |
2 |Tit 2|Quest 2 |2.Answ |
As you can see i duplicate the Id,Title and Question for each answer. And the answers are in a column and not side by side.
This is the query i managed to write. Can somebody help me with that or point me in the right direction?
select
p.Id, p.Title, p.Body, k.Body
from
Posts as p inner join
Posts as k on
p.id = k.parentid
where
p.Id in (##id##) and k.posttypeid=2

You'll want to PIVOT your table to turn the rows into columns.
Check out this article about Pivots. The downside is that you need to hard code each possible answer which you don't know how many there will be (Answer1, Answer2, ...).
Using STUFF to put it in one column - something like:
SELECT Id, Title, Q,
STUFF(
(
SELECT '|'+ Body FROM POSTS WHERE Id = t.Id FOR XML path('') ), 1, 1, '')
FROM (
SELECT p.Id, p.Title, p.Body AS Q, k.Body AS ANS
FROM Posts as p
INNER JOIN Posts AS k ON
p.id = k.parentid
WHERE p.Id in (##id##) AND k.posttypeid=2 ) t

Related

How to count number of rows that corresponds to the ID given two tables in SQL?

I have two tables: 1) Places 2) Reviews
Table examples are below:
PLACES
ID | NAME
============
1 | Joe
2 | Cat
3 | Dog
REVIEWS
PLACE_ID | REVIEW_ID| REVIEW_CONTENT
====================================
1 | 1000 | "it's good"
1 | 1001 | "aweful place"
3 | 1002 | "good place"
PLACE_ID is my foreign key and I want to count number of review contents per each ID in PLACES table.
As you can see,
there are 2 review contents in REVIEWS table for place id 1 ("Joe")
there are 0 review contents in REVIEWS table for place id 2 ("Cat")
there are 1 review contents in REVIEWS table for place id 3 ("Dog")
The result should look like
RESULT
PLACE_ID | NAME | COUNT
=======================
1 | Joe | 2
2 | Cat | 0
3 | Dog | 1
Can someone please help how to count number of rows (e.g number of review contents) that has same foreign key (e.g. PLACE_ID), given two tables?
This is basic SQL. Please do some reading on simple aggregations.
SELECT P.ID as PLACE_ID,
P.NAME as NAME,
COUNT(R.ID) as COUNT
FROM PLACES P
LEFT JOIN REVIEWS R
ON P.ID = R.PLACE_ID
You can try the below - using left join and aggregation
SELECT p.id, p.name,count(r.id) as cnt
from place p left join reviews ON p.id = r.place_id
group by p.id, p.name
Simple Join both the tables and perform a aggregation to count the number of reviews for each ID available in Place table. You can find the code below.
Select A.PLACE_ID,
A.Name,
count(REVIEW_ID) COUNT
From Places A
Left Join Reviews B
on A.ID = B.PLACE_ID
group by A.PLACE_ID,
A.Name

How to compute overlap percentage of agreement between people in Hive table

Suppose I have a survey where each question has 4 possible answers, and surveyed people can choose at least one answer (multiple answers allowed). I want to compute per question per answer, how many people chose that answer. For example, if I have the hive table:
question_id | answer_id | person_id
-------------------------------------
1 | A | 1
1 | B | 1
1 | C | 1
1 | D | 1
1 | A | 2
1 | B | 2
1 | C | 2
2 | D | 1
2 | A | 1
Then the resulting table would be:
question_id | answer_id | Percentage
-------------------------------------
1 | A | 100
1 | B | 100
1 | C | 100
1 | D | 50
2 | D | 50
2 | A | 50
For question 1, both people put A,B,C giving 100% for all three, but one person put D as well, giving 50%. For question 2, one person put D and one person put A, giving 50% and 50%.
I've been really stuck and I haven't been able to find anything online that accomplishes what I'm looking for. Any help would be amazing!
Hmmm . . . If I understand correctly, you want the number of people who chose one particular question/answer combination divided by the people who chose the question. If so, I think
select qa.*, qa.num_persons * 100.0 / q.num_persons
from (select question_id, answer_id, count(*) as num_persons
from t
group by question_id, answer_id
) qa join
(select question_id, count(distinct person_id) as num_persons
from t
group by question_id
) q
on qa.question_id = q.question_id;
Also you can use analytic functions and size(collect_set) for counting distinct. This will allow to eliminate join and will work fine if the number of distinct person per question is not too big (array produced by collect_set can fit in memory)
select qa.question_id, qa.answer_id,
qa.num_persons * 100.0 / size(qa.question_persons) as Percentage
from (select question_id, answer_id,
count(*) over (partition by question_id, answer_id) as num_persons,
collect_set(person_id) over(partition by question_id) as question_persons
from t
) qa;
I'm not familiar with prestoDB but below is a SQL script that will have the same result as what you posted.
The 2.0 is the number of person. You might want to select that first and store it in a vairable.
select
question_id, answer_id, (count(answer_id)/2.0) * 100.0
from Sample
group by question_id, answer_id
order by question_Id, answer_id

Count(*) and Sum in the same row

I'm banging my head against the wall, here. I've looked at dozens of StackOverflow questions that are similar, and they get me close, but I haven't found one yet that does what I need.
I have thousands of questions in a database with answers from multiple users to each question. I need to aggregate the answers to show the count of distinct answers per question. That's the easy part; where I'm stumbling is in adding a Sum column to show the total number of answers given for each question. I can do it if I restrict the Where clause to specific questions, but I'm trying to get this all into one query if possible.
Here's the Query:
select c.ID, a.userID. c.question, a.answer, count(a.answer) as cnt
from NotableAnswers a, categories b, questions c
where c.fkCategory = b.ID and a.questionID = c.ID and b.ID = 18
Group By a.answer, c.ID, c.question
Order By c.ID, answer asc
What I need is a result set that looks like this
ID | userID | Question | Answer | cnt | totcnt
------------------------------------------------------------------
175 | 10318 |Favorite... |Dropbox | 15 | 35
175 | 10354 |Favorite... |Box | 2 | 35
175 | 10323 |Favorite... |Google Drive | 15 | 35
175 | 103111 |Favorite... |Cubby | 3 | 35
186 | 10318 |Best IDE... |IntelliJ | 4 | 12
186 | 103613 |Best IDE... |Android Studio| 6 | 12
186 | 103117 |Best IDE... |Eclipse | 2 | 12
This set shows the Answer as an aggregate and the count of that specific answer along with the sum of the number of answers provided to each distinct question.
Any and all help greatly appreciated.
First, learn to use proper join syntax. Simple rule: Never use commas in the FROM clause. Always use proper explicit JOIN syntax.
Second, the answer is window functions:
select q.ID, a.userID. q.question, a.answer, count(a.answer) as cnt,
sum(count(a.answer)) over (partition by q.id) as total_cnt
from NotableAnswers a join
questions q
on a.questionID = q.ID join
categories c
on q.fkCategory = c.ID
where c.ID = 18
Group By a.answer, c.ID, c.question
Order By q.ID, answer asc;
In addition, it is better to use table aliases that are abbreviations for the table names rather than arbitrary letters.

Joining Table that has no value

Im working on a friendship site for a client that has been built by another developer. The client has added new questions to the questions table. I can't figure out a query that will return results for all the questions, even for old users who have not answered the new questions.
The issue is that old users dont have an entry in the user question table so how can I get a default of 'not answer' for old users who have not input a value in the user question table?
See table structure below
User Table
id | username
0 | louis
User Question Table
ID | USERID | Question ID | Answer ID
0 | 1 | 0 | 5
1 | 1 | 1 | 8
Question Table
ID | QUESTION
0 | What is your favorite color
1 | What is your gender
2 | What is your favorite t.v. show
Answer Table
ID | answer
5 | Blue
8 | female
This is my desired result:
user | question | answer
louis | What is your favorite color | blue
louis | What is your gender | female
louis | What is your height | Not Answered
I would get all questions, left join the answers and userquestions and do a cross join with users:
select
username,
question,
answer = isnull(answer,'Not Answered')
from Question q
cross join User u
left join UserQuestion uq on uq.QuestionID = q.ID and u.id = uq.USERID
left join Answer a on uq.AnswerID = a.ID
Sample SQL Fiddle
You want to use a cross join to get combinations of all users and all questions. Then use a left join to bring in the information about existing answers. The final piece is coalesce() to substitute a value when there is no answer:
select u.username, q.question, coalesce(a.answer, 'Not Answered')
from user u cross join
question q left join
userquestion uq
on uq.userid = u.id and
uq.questionid = q.id left join
answer a
on uq.answerid = a.id

Recursive SQL Server query

In a table reviewers with a structure like this:
reviewer | reviewee
===================
2 | 1
3 | 2
4 | 3
5 | 4
In a function call, I know both a reviewer-id and a reviewee-id (the owner of the item the reviewee is looking to retrieve).
I'm now trying to send a query that iterates all the entries in the reviewers table, starting with the reviewer, and ends at the reviewee's id (and matches that to the reviewee id I know). So I'm trying to find out if there is a connection between reviewee and reviewer at all.
Is it possible to do this in a single query?
You can do this:
WITH CTE
AS
(
SELECT reviewer, reviewee
FROM TableName
WHERE reviewee = #revieweeID
UNION ALL
SELECT p.reviewer, p.reviewee
FROM CTE c
INNER JOIN TableName p ON c.reviewee = p.reviewer
)
SELECT *
FROM CTE;
--- WHERE reviewer = #reviewerID;
Demo