Applying distinct in multiple columns in SQL server - sql

I am trying to get distinct result by only one column( message). I tried
SELECT DISTINCT
[id], [message]
FROM Example table
GROUP BY [message]
But it doesn't show desired result.
Please let me know how can I do it?
Example table:
id | Message |
-- ------------
1 | mike |
2 | mike |
3 | star |
4 | star |
5 | star |
6 | sky |
7 | sky |
8 | sky |
Result table:
id | Message |
-- ------------
1 | mike |
3 | star |
6 | sky |

Group by the column you want to be unique and use an aggregate function on the other column. You want the lowest id for every message, so use MIN()
select min(id) as id,
message
from your_table
group by message

Related

SQL add new line if value is in second column also

So I am trying to pull the data from the table to give me each score the person has and if they have two scores I would like it to be on a new line with the second score. If the user has no scores I don't want anything returned. My query returns the first score if the user has one and if they don't it returns the second one. But if the user has two scores is where i'm not sure how to return that one on a new line.
table 1
+---------+--------+--------+
| name | score1 | score2 |
+---------+--------+--------+
| jim | null | 87 |
| doug | 21 | 45 |
| brandon | null | null |
| susy | 11 | null |
+---------+--------+--------+
The result my query gives is
+------+----+
| jim | 87 |
| doug | 21 |
| susy | 11 |
+------+----+
Wanted output
+------+----+
| jim | 87 |
| doug | 21 |
| doug | 45 |
| susy | 11 |
+------+----+
The query I wrote is
SELECT
name
,COALESCE(score1, score2)
FROM
table
WHERE
score1 IS NOT NULL
OR score2 IS NOT NULL
ORDER BY
name;
Treat this as two separate queries and combine the results together with UNION ALL. You'll want UNION ALL in this case and not just UNION so you get two rows returned in the case where the person has the same score in both columns.
SELECT name, Score1 as score
FROM table1
WHERE Score1 IS NOT NULL
UNION ALL
SELECT name, Score2 as score
FROM table1
WHERE Score2 IS NOT NULL
ORDER BY name, score;
I would recommend cross apply:
SELECT t.name, v.score
FROM table t CROSS APPLY
(VALUES (score1), (score2)) v(score)
WHERE v.score IS NOT NULL
ORDER BY name;
This is usually the most efficient way to unpivot data in SQL Server.

Window functions limited by value in separate column

I have a "responses" table in my postgres database that looks like
| id | question_id |
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
I want to produce a table with the response and question id, as well as the id of the previous response with that same question id, as such
| id | question_id | lag_resp_id |
| 1 | 1 | |
| 2 | 2 | |
| 3 | 1 | 1 |
| 4 | 2 | 2 |
| 5 | 2 | 4 |
Obviously pulling "lag(responses.id) over (order by responses.id)" will pull the previous response id regardless of question_id. I attempted the below subquery, but I know it is wrong since I am basically making a table of all lag ids for each question id in the subquery.
select
responses.question_id,
responses.id as response_id,
(select
lag(r2.id, 1) over (order by r2.id)
from
responses as r2
where
r2.question_id = responses.question_id
)
from
responses
I don't know if I'm on the right track with the subquery, or if I need to do something more advanced (which may involve "partition by", which I do not know how to use).
Any help would be hugely appreciated.
Use partition by. There is no need for a correlated subquery here.
select id,question_id,
lag(id) over (partition by question_id order by id) lag_resp_id
from responses

SQL Group by one column and decide which column to choose

Let's say I have data like this :
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 2 | 20 | B | 20 |
| 3 | 10 | C | 30 |
| 4 | 10 | D | 80 |
I would like to group rows by code value, but get real rows back (not some aggregate function).
I know that just
select *
from table
group by code
won't work because database don't know which row to return where code is the same.
So my question is how to tell database to select (for example) the lower number column so in my case
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 3 | 10 | C | 30 |
P.S.
I know how to do this by PARTITION but this is only allowed in Oracle databases and can't be created in JPA criteria builder (what is my ultimate goal).
Why You don't use code like this?
SELECT
id,
code,
name,
number
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY number ASC) AS RowNo
FROM table
) s
WHERE s.RowNo = 1
You can look at this site;
Data Partitioning

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

Mysql query: combine two queries

Below is an over-simplified version of table I'm using:
fruits
+-------+---------+
| id | type |
+-------+---------+
| 1 | apple |
| 2 | orange |
| 3 | banana |
| 4 | apple |
| 5 | apple |
| 6 | apple |
| 7 | orange |
| 8 | apple |
| 9 | apple |
| 10 | banana |
+-------+---------+
Following are the two queries of interest:
SELECT * FROM fruits WHERE type='apple' LIMIT 2;
SELECT COUNT(*) AS total FROM fruits WHERE type='apple'; // output 6
I want to combine these two queries so that the results looks like this:
+-------+---------+---------+
| id | type | total |
+-------+---------+---------+
| 1 | apple | 6 |
| 4 | apple | 6 |
+-------+---------+---------+
The output has to be limited to 2 records but it should also contain the total number of records of the type apple.
How can this be done with 1 query?
SELECT *, (SELECT COUNT(*) AS total FROM fruits WHERE type='apple') AS Total
FROM fruits WHERE type='apple' LIMIT 2;
Depending on how MySQL interprets it, it may cache the inner query so that it doesn't have to reevaluate it for every record.
Another way to do it is with a nested query and a join (this would be useful it you need more than one fruit type, for example):
SELECT fruits.*, counts.total
FROM fruits
INNER JOIN (SELECT type, COUNT(*) AS total FROM fruits GROUP BY type) counts ON (fruits.type = counts.type)
WHERE fruits.type='apple'
LIMIT 2;
You should use SQL_CALC_FOUND_ROWS for that.
SELECT SQL_CALC_FOUND_ROWS * FROM fruits WHERE type='apple' LIMIT 2;
will return the IDs of your apples, and remember how much it would have returned without the LIMIT clause
SELECT FOUND_ROWS();
will return how many apples would have been found, without the limit statement.