If I have a query like this
SELECT * FROM table1
I get a result something like this:
How can I write a query from the same table that returns me something like this:
value_name has to turn into columns and the value column has to turn into its values.
Also notice that the ids are repeated and its description is always the same one.
I'm working with PostgresQL
If you know the values in advance, you can use conditional aggregation:
select id, description,
max(value) filter (where value_name = 'FE') as fe,
max(value) filter (where value_name = 'H2O') as h2o,
max(value) filter (where value_name = 'N') as n
from t
group by id, description;
If you don't know the names, then you cannot accomplish this with a single SQL query. You need to use dynamic SQL or use an alternate data representation such as JSON.
Related
I am using spark sql. Let's say I have a table like this
ID,Grade
1,A
2,B
1,A
2,C
I want to make arrays that contain all the grades for each id. But i don't want to collapse the table with a group by. I am trying to maitain all the IDs. My desired output is the following:
ID,Grade
1,[A, A]
1,[A,A]
2,[B,C]
2,[B,C]
My query is the following
SELECT array_join(collect_list(GRADE), ",") AS GRADES
OVER (PARTITION BY ID)
FROM table
However i get an error like this:
AnalysisException: "grouping expressions sequence is empty, and 'ID' is not an aggregate function.
Any idea how to fix my query? Thank you
In your query, collect_list is the aggregate function, so you if you want to use a window you need to apply it directly on collect_list:
SELECT id,
array_join(collect_list(GRADE) OVER (PARTITION BY ID) , ",") AS GRADES
FROM table
I have one table in postgresql database, for example:
Is there any way to get result as below output with good performance? That means in each group I want get full of rows which matched with some conditions, such as userid=100, also add more fields by aggregate functions
Output (with userid=100 as the condition I want, or other condition):
Note: The data is dynamically, such as the content, seen... field are random
I have used this SQL query, but it only can two fields:
SELECT groupid,
string_agg(text(userid), ', ') AS lst_userids,
FROM t1
GROUP BY groupid
Thanks for any help!
You seem to want something like this:
SELECT min(id) as id, groupid,
string_agg(text(userid), ', ') AS lst_userids,
max(case when seen then content end) as content,
bool_or(seen) as seen
FROM t1
GROUP BY groupid;
I am guessing what the actual logic is, but you can definitely have multiple columns in an aggregation query.
I'm using Redshift's LISTAGG function to group tables by pairs:
SELECT id, LISTAGG(data, ', ') FROM ... GROUP BY 1;
This transforms tables like:
1 "data_A"
1 "data_B"
2 "data_C"
2 "data_D"
To:
1 "data_A, data_B"
2 "data_C, data_D"
However, this means that we still have two columns, but it would be nice to create three columns from the data:
1 "data_A" "data_B"
2 "data_C" "data_D"
Assuming we know that we can only have two items per id, can such a three column scheme be implemented in Redshift, using LISTAGG or some other function combination? As an added bonus, can we sort the data items in the columns, so that the data in the left column is smaller than the data in the right?
Instead of listagg(), you can just use aggregation. Because you want two values, min() and max() work:
SELECT id, MIN(data), MAX(data)
FROM ...
GROUP BY 1;
If you could have only one value for a given id, you can phrase this as:
SELECT id, MIN(data),
(CASE WHEN MIN(data) <> MAX(data) THEN MAX(data) END)
FROM ...
GROUP BY 1;
This puts NULL into the third column, if there is only one value for data.
I want to group some data being returned from a SQL 2012 database, I need to work out how to group by on a certain amount of fields.
The following SQL works fine
SELECT MessageId, SearchedString, COUNT(SearchedString) AS [SearchedStringCount], MAX(percentage) AS TopPercent
from (
select MessageId, SearchedString, Percentage
from table
where MessageId = '15'
) T
GROUP BY MessageId, SearchedString
But as soon as I add another fields in the select, then SQL is asking for it to be included in the group by, which isnt what i need.
How can I add another field to the above SQL, without having it be included in the Group By?
Ideally, I'm looking to include a Date value, like this:
select MessageId, SearchedString, COUNT(SearchedString) AS [SearchedStringCount], MAX(percentage) AS TopPercent, CAST(ScreenedDate AS DATE) AS DateScreened
from (
select MessageId, SearchedString, Percentage, ScreenedDate
from table
where MessageId = '15'
) T
GROUP BY MessageId, SearchedString
Simply by using an aggregate function like max and min(that is if you either want a random one or the column is always the same) like this:
select MessageId, SearchedString, COUNT(SearchedString) AS [SearchedStringCount], MAX(percentage) AS TopPercent, max(CAST(ScreenedDate AS DATE)) AS DateScreened
from (
select MessageId, SearchedString, Percentage, ScreenedDate
from table
where MessageId = '15'
) T
GROUP BY MessageId, SearchedString
When you use group by, all the columns(that are not aggregated) must be included in the group by section. So if you want another column to be included , there are several options depends on which results you want to get.
Option 1) the column is always the same, max or min should cover it.
Option 2) the column is different but it doesn't matter which one to take, max and min should cover that as well.
And option 3) You have to pick a specifiec one(like the latets or something) and then for each logic the answer will be different.
Using SQL Server you're obliged to:
Option 1: use aggregates in the SELECT list
or:
Option 2: add non-aggregated columns in the GROUP BY list
Other databases (for example MySQL) have what they call extended GROUP BY where they "... extend the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause...". In this case they return just "any value" for the non-aggregated column in the SELECT list not included in the GROUP BY. This makes sense only if you're sure that - in your query - ALL non-aggregated columns in the select list will return the same value.
With other databases (not SQL Server) I have coded a user defined function (any_value()) to return just the first retrieved value. It's useful, if pick option 1, because you don't have to waste CPU cycles looking for aggregates you don't need.
I have a query which returns a number of ints and varchars.
I want the result Distinct but by only one of the ints.
SELECT DISTINCT
t.TaskID ,
td.[TestSteps]
FROM SOX_Task t
snip
Is there a simple way to do this?
DISTINCT doesn't work that way ... it ensures that entire rows are not duplicated. Besides, if it did, how would it decide which values should appear in other columns?
What you probably want to do here is a GROUP BY on the column you want to be distinct, and then apply an appropriate aggregate operator to the other columns to get the values you want (eg. MIN, MAX, SUM, AVG, etc).
For example, the following returns a distinct list of task IDs with the maximum number of steps for that task. That may not be exactly what you want, but it should give you the general idea.
SELECT t.TaskID, MAX(t.TestSteps)
FROM SOX_Task t
GROUP BY t.TaskID
I want the result Distinct but by only one of the ints.
This:
SELECT DISTINCT t.taskid
FROM SOX_TASK t
...will return a list of unique taskid values - there will not be any duplicates.
If you want to return a specific value, you need to specify it in the WHERE clause:
SELECT t.*
FROM SOX_TASK t
WHERE t.taskid = ?
That will return all the rows with the specific taskid value. If you want a distict list of the values associated with the taskid value:
SELECT DISTINCT t.*
FROM SOX_TASK t
WHERE t.taskid = ?
GROUP BY is another means of getting distinct/unique values, and it's my preference to use, but in order to use GROUP BY we need to know the columns in the table and what grouping you want.