SQL Group By, case when on aggregated - sql

Can't wrap my mind around the next task:
I have a table, with some key, which represents some kind of group id. I would like to GROUP BY by this key and in resulted table show some columns from this table depending on the column value:
If all the values in this group by this key in col1 are equal (same number or text), then show this exact value, if they are different (at least one of them) - show some kind like "Others".
Example
key col1
1 1
1 1
1 1
2 4
2 5
Resulted table:
key col1
1 1
2 Others
Postgres 9.4, if this matters.

You can use aggregation and case:
select key,
(case when min(col1) = max(col1) then min(col1)
else 'others'
end) as col1
from t
group by key;

Related

QUERY returning row number and substring of SELECT

In Google Spreadsheet I have such table:
A B C
----------------
One [0] 3 2
Two [1] 6 0
Three [5] 1 1
Four [3] 1 2
Five [2] 1 0
I would like to generate SORT() or QUERY() which will return a two-column table sorted by a sum of B and C. Also the A column should be substringed to drop last four characters.
So the result should be:
Two 6
One 5
Four 3
Three 2
Five 1
I tried with SELECT MID(A,1,-5), (B+C) ORDER BY B+C but the executions is pretty bad. All suggestions welcome! Thank you in advance!
Alternatively, also try (in google spreadsheet)
=query({ArrayFormula(regexextract(A2:A, "^(.+) \[")), B2:C}, "Select Col1, sum(Col2)+sum(Col3) where Col2 is not null group by Col1 order by sum(Col2)+sum(Col3) label sum(Col2)+sum(Col3)''",0)
or, depending on your locale:
=query({ArrayFormula(regexextract(A2:A; "^(.+) \["))\ B2:C}; "Select Col1, sum(Col2)+sum(Col3) where Col2 is not null group by Col1 order by sum(Col2)+sum(Col3) label sum(Col2)+sum(Col3)''";0)
Also see this example spreadsheet.
This will help you
=QUERY(B2:D7,"Select B,SUM(C)+SUM(D) group by B order by SUM(C)+SUM(D) desc label SUM(C)+SUM(D) 'SUM'")
If you are new to SQL ,
Group By Clause : Whenever we use aggregate functions such as MAX,
MIN, SUM,COUNT,AVG we will be using GROUP BY clause.
Label is used to rename the column name like the AS keyword in SQL.
ORDER BY clause is used to sort desc or asc.
Example Spreadsheet link
If you are using mysql You an use sugstrin_index because your a string part is not fixed lenght
select SUBSTRING_INDEX(a,' ',1) as A , b+c as C
from your_table
order by C DESC
fro sqlserver you can use substring
select SUBSTRING(a, 1, instr(String," ") -1) as A , b+c as C
from your_table
order by C DESC
If you're using SQL, try this:
SELECT
LEFT(A,LEN(A)-4) AS Col1
,(B+C) AS Col2
FROM table
ORDER BY Col2 DESC

SQL Server: Sum of Unique values for multiple columns

Using SQL Server, I have a table as shown in the sample table below. I need to have sum of all the unique values per the columns "BookOrder, StationaryOrder, and Printing Order".
SAMPLE TABLE:
KeyIDCustomer BooksOrder StationaryOrder PrintingOrder
29945843 1070756 1891514 198876
29945843 1070756 1893827 198876
29945843 1070758 1891514 198876
29945843 1070758 1893827 198876
I am using the below coding to achieve this goal.
Select DISTINCT KeyIDCustomerID,
Sum(Case when BooksOrder is not null then 1 else 0 End) TotalBookOrders,
Sum(Case when StationaryOrder is not null then 1 else 0 End) TotalStationaryOrder,
Sum(Case when PrintingOrder is not null then 1 else 0 End)TotalPrintingOrder
With this coding in am getting the results as below
KeyIDCustomerID TotalBookOrders TotalStationaryOrder TotalPrintingOrder
29945843 4 4 4
I expect the results to be like this
KeyIDCustomerID TotalBookOrders TotalStationaryOrder TotalPrintingOrder
29945843 2 2 1
Is there a way i can accomplish this Goal in SQL?
Thanks
i think the proper term for
sum of all the unique values per the columns
is "count of unique values"
COUNT (DISTINCT column_name) returns the number of unique, non-null values in column_name
Select
KeyIDCustomerID,
COUNT(DISTINCT BooksOrder) as TotalBookOrders,
COUNT(DISTINCT StationaryOrder) as TotalStationaryOrder,
COUNT(DISTINCT PrintingOrder) as TotalPrintingOrder
FROM SAMPLE_TABLE
GROUP BY KeyIDCustomerID
Do a group by, count(distinct column) to count:
Select KeyIDCustomerID,
COUNT(distinct BooksOrder) TotalBookOrders,
COUNT(distinct StationaryOrder) TotalStationaryOrder,
COUNT(distinct PrintingOrder) TotalPrintingOrder
from tablename
group by KeyIDCustomerID

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001

Sort data row in sql

please help me i have columns from more than one table and the data type for all these columns is integer
i want to sort the data row (not columns (not order by)) Except the primary key column
for example
column1(pk) column2 column3 column4 column5
1 6 5 3 1
2 10 2 3 1
3 1 2 4 3
How do I get this result
column1(pk) column2 column3 column4 column5
1 1 3 5 6
2 1 2 3 10
3 1 2 3 4
Please help me quickly .. Is it possible ?? or impossible ???
if impossible how I could have a similar result regardless of sort
What database are you using? The capabilities of the database are important. Second, this suggests a data structure issue. Things that need to be sorted would normally be separate entities . . . that is, separate rows in a table. The rest of this post answers the question.
If the database supports pivot/unpivot you can do the following:
(1) Unpivot the data to get in the format , ,
(2) Use row_number() to assign a new column, based on the ordering of the values.
(3) Use the row_number() to create a varchar new column name.
(4) Pivot the data again using the new column.
You can do something similar if this functionality is not available.
First, change the data to rows:
(select id, 'col1', col1 as val from t) union all
(select id, 'col2', col2 from t) union all
. . .
Call this byrow. The following query appends a row number:
select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
Put this into a subquery to unpivot. The final solution looks like:
with byrow as (<the big union all query>)
select id,
max(case when seqnum = 1 then val end) as col1,
max(case when seqnum = 2 then val end) as col2,
...
from (select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
) br
group by id
You can use pivot function of sql server to convert the row in column. Apply the sorting there and again convert column to row using unpivot.
Here is a good example using PIVOT, you should be able to adapt this to meet your needs
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx

Grouping by intervals

Given a table (mytable) containing a numeric field (mynum), how would one go about writing an SQL query which summarizes the table's data based on ranges of values in that field rather than each distinct value?
For the sake of a more concrete example, let's make it intervals of 3 and just "summarize" with a count(*), such that the results tell the number of rows where mynum is 0-2.99, the number of rows where it's 3-5.99, where it's 6-8.99, etc.
The idea is to compute some function of the field that has constant value within each group you want:
select count(*), round(mynum/3.0) foo from mytable group by foo;
I do not know if this is applicable to mySql, anyway in SQL Server I think you can "simply" use group by in both the select list AND the group by list.
Something like:
select
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END,
count(*)
from Profiles
where 1=1
group by
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END
returns something like
column1 column2
---------- ----------
20and30 3
lessthan20 3
morethan30 13