Group rows with similar strings - sql

I have searched a lot, but most of solutions are for concatenation option and not what I really want.
I have a table called X (in a Postgres database):
anm_id anm_category anm_sales
1 a_dog 100
2 b_dog 50
3 c_dog 60
4 a_cat 70
5 b_cat 80
6 c_cat 40
I want to get total sales by grouping 'a_dog', 'b_dog', 'c_dog' as dogs and 'a_cat', 'b_cat', 'c_cat' as cats.
I cannot change the data in the table as it is an external data base from which I am supposed to get information only.
How to do this using an SQL query? It does not need to be specific to Postgres.

Use case statement to group the animals of same categories together
SELECT CASE
WHEN anm_category LIKE '%dog' THEN 'Dogs'
WHEN anm_category LIKE '%cat' THEN 'cats'
ELSE 'Others'
END AS Animals_category,
Sum(anm_sales) AS total_sales
FROM yourtables
GROUP BY CASE
WHEN anm_category LIKE '%dog' THEN 'Dogs'
WHEN anm_category LIKE '%cat' THEN 'cats'
ELSE 'Others'
END
Also this query should work with most of the databases.

By using PostgreSQL's split_part()
select animal||'s' animal_cat,count(*) total_sales,sum(anm_sales) sales_sum from(
select split_part(anm_cat,'_',2) animal,anm_sales from x
)t
group by animal
sqlfiddle
By creating split_str() in MySQL
select animal||'s' animal_cat,count(*) total_sales,sum(anm_sales) sales_sum from(
select split_str(anm_cat,'_',2) animal,anm_sales from x
)t
group by animal
sqlfiddle

You could group by a substr of anm_catogery:
SELECT SUBSTR(anm_catogery, 3) || 's', COUNT(*)
FROM x
GROUP BY anm_catogery

If you have a constant length of the appendix like in the example:
SELECT CASE right(anm_category, 3) AS animal_type -- 3 last char
, sum(anm_sales) AS total_sales
FROM x
GROUP BY 1;
You don't need a CASE statement at all, but if you use one, make it a "simple" CASE:
Simplify nested case when statement
Use a positional reference instead of repeating a possibly lengthy expression.
If the length varies, but there is always a single underscore like in the example:
SELECT split_part(anm_category, '_', 2) AS animal_type -- word after "_"
, sum(anm_sales) AS total_sales
FROM x
GROUP BY 1;

Related

Use a CASE expression without typing matched conditions manually using PostgreSQL

I have a long and wide list, the following table is just an example. Table structure might look a bit horrible using SQL, but I was wondering whether there's a way to extract IDs' price using CASE expression without typing column names in order to match in the expression
IDs
A_Price
B_Price
C_Price
...
A
23
...
B
65
82
...
C
...
A
10
...
..
...
...
...
...
Table I want to achieve:
IDs
price
A
23;10
B
65
C
82
..
...
I tried:
SELECT IDs, string_agg(CASE IDs WHEN 'A' THEN A_Price
WHEN 'B' THEN B_Price
WHEN 'C' THEN C_Price
end::text, ';') as price
FROM table
GROUP BY IDs
ORDER BY IDs
To avoid typing A, B, A_Price, B_Price etc, I tried to format their names and call them from a subquery, but it seems that SQL cannot recognise them as columns and cannot call the corresponding values.
WITH CTE AS (
SELECT IDs, IDs||'_Price' as t FROM ID_list
)
SELECT IDs, string_agg(CASE IDs WHEN CTE.IDs THEN CTE.t
end::text, ';') as price
FROM table
LEFT JOIN CTE cte.IDs=table.IDs
GROUP BY IDs
ORDER BY IDs
You can use a document type like json or hstore as stepping stone:
Basic query:
SELECT t.ids
, to_json(t.*) ->> (t.ids || '_price') AS price
FROM tbl t;
to_json() converts the whole row to a JSON object, which you can then pick a (dynamically concatenated) key from.
Your aggregation:
SELECT t.ids
, string_agg(to_json(t.*) ->> (t.ids || '_price'), ';') AS prices
FROM tbl t
GROUP BY 1
ORDER BY 1;
Converting the whole (big?) row adds some overhead, but you have to read the whole table for your query anyway.
A union would be one approach here:
SELECT IDs, A_Price FROM yourTable WHERE A_Price IS NOT NULL
UNION ALL
SELECT IDs, B_Price FROM yourTable WHERE B_Price IS NOT NULL
UNION ALL
SELECT IDs, C_Price FROM yourTable WHERE C_Price IS NOT NULL;

Loop Through a Table to concatenate Rows

I have a table of similar structure:
Name Movies_Watched
A Terminator
B Alien
A Batman
B Rambo
B Die Hard
....
I am trying to get this:
Name Movies_Watched
A Terminator;Batman
B Alien, Die Hard, Rambo
My initial guess was:
SELECT Name, Movies_Watched || Movies_Watched from TABLE
But obviously that's wrong. Can someone tell me how can I loop through the 2nd column and concatenate them? What's the logic like?
Got to know that group_concat is the right approach. But haven't been able to figure it out yet. When I've tried:
select name, group_concat(movies_watched) from table group by 1
But it throws an error saying User-defined transform function group_concat must have an over clause
You are looking for string_agg():
select name, string_agg(movie_watched, ';') as movies_watched
from t
group by name;
That said, you are using Postgres, so you should learn how to use arrays instead of strings for such things. For instance, there is no confusion with arrays when the movie name has a semicolon. That would be:
select name, array_agg(movie_watched) as movies_watched
from t
group by name;
use array_agg
SELECT Name, array_agg(Movies_Watched)
FROM data_table
GROUP BY Name
i think you need listagg or group_concat as you are using vertica upper is postgrey solution
SELECT Name, listagg(Movies_Watched)
FROM data_table
GROUP BY Name
or
select Name,
group_concat(Movies_Watched) over (partition by Name order by name) ag
from mytable
As already mentioned, in Vertica it's LISTAGG():
WITH
input(nm,movies_watched) AS (
SELECT 'A','Terminator'
UNION ALL SELECT 'B','Alien'
UNION ALL SELECT 'A','Batman'
UNION ALL SELECT 'B','Rambo'
UNION ALL SELECT 'B','Die Hard'
)
SELECT
nm AS "Name"
, LISTAGG(movies_watched) AS movies_watched
FROM input
GROUP BY nm;
-- out Name | movies_watched
-- out ------+----------------------
-- out A | Terminator,Batman
-- out B | Alien,Rambo,Die Hard
-- out (2 rows)
-- out
-- out Time: First fetch (2 rows): 12.735 ms. All rows formatted: 12.776 ms

Combine separate queries

I have these four separate queries that I would like to consolidate into one result set and I'm not quite sure how to do it. Basically, I would like to see a single output with the following columns:
name - items_created - items_modified - copies_created - copies_modified
select t02CreatedBy as name, count(t02CreatedBy) as items_created
from dbo.Items_t02
where t02DateCreated > getdate() - 7
group by t02CreatedBy
select t02ModifiedBy as name, count(t02ModifiedBy) as items_modified
from dbo.Items_t02
where t02DateModified > getdate() - 7
group by t02ModifiedBy
select t03CreatedBy as name, count(t03CreatedBy) as copies_created
from dbo.Copies_t03
where t03DateCreated > getdate() - 7
group by t03CreatedBy
select t03ModifiedBy as name, count(t03ModifiedBy) as copies_modified
from dbo.Copies_t03
where t03DateModified > getdate() - 7
group by t03ModifiedBy
The tricky part for me is understanding how to combine these while still keeping the various groupings. I need to make sure that t02DateCreated is tied to t02CreatedBy and t02DateModifed is tied to t02ModifiedBy (etc...). Not sure how to do this in one query.
Any suggestions? Or am I going about this the wrong way?
Change the select statement to include something like this
select **'Query 1' as Type**, t03ModifiedBy as name, count(t03ModifiedBy) as copies_modified
from dbo.Copies_t03
where t03DateModified > getdate() - 7
group by t03ModifiedBy
and then add a 'Union All' between each query.
Well, this is a way you could still do it in SQL-server (I haven't tested it though):
SELECT name,
(select count(t02CreatedBy) from dbo.Items_t02
where t02DateCreated>getdate()-7 and t02CreatedBy=name) createdItems
(select count(t02ModifiedBy) from dbo.Items_t02
where t02DateModified>getdate()-7 and t02ModifiedBy=name) modifiedItems
(select count(t03CreatedBy) from dbo.Copies_t03
where t03DateCreated>getdate()-7 and t03CreatedBy=name) createdCopies
(select count(t03ModifiedBy) from dbo.Copies_t03
where t03DateModified>getdate()-7 and t03ModifiedBy=name) modifiedCopies
FROM ( select t02CreatedBy name FROM dbo.Items_t02
union all select t02ModifiedBy FROM dbo.Items_t02
union all select t03CreatedBy FROM dbo.Copies_t03
union all select t03ModifiedBy FROM dbo.Copies_t03 )
allnames GROUP BY name
The outer (grouped) query collects all the possible names, as they might appear in any of the four columns t02CreatedBy,t02ModifiedBy, t03CreatedBy or t03ModifiedBy. It then puts together the counts for each of these columns in the relevant tables by using the four subqueries.
As I don't know your data I used a UNION ALL-construct in the outer query. If you can guarantee for one of those columns (e.g. t02ModifiedBy) to actually contain "all possible" names, then it would also be OK to just use that column alone there, like:
...
FROM t02ModifiedBy FROM dbo.Items_t02 GROUP BY name

How to use Order By clause on a column containing string values separated by comma?

I have a table with a column named Skills which contains comma separated values for different employees like
EmpID Skills
1 C,C++,Oracle
2 Java,JavaScript,PHP
3 C,C++,Oracle
4 JavaScript,C++,ASP
5 C,C++,JavaScript
So I want to write a query which will order all the employees first who knows JavaScript, how can I get this result?
You should not use one attribute to store multiple values. That goes against relation DB principles.
Instead of that you should create additional table to store skills and refer to employee in it. Then, your query will looks like:
SELECT
*
FROM
employees
LEFT JOIN employees_skills
ON employee.id=employees_skills.employee_id
WHERE
employees_skills='JavaScript'
Try this
SELECT *
FROM
(
SELECT *
,CASE WHEN Skills LIKE '%JavaScript%' THEN 0 ELSE 1 END AS Rnk
FROM MyTable
) T
ORDER BY rnk,EmpID
DEMO
OR
SELECT * FROM #MyTable
ORDER BY CASE WHEN Skills LIKE '%JavaScript%' THEN 0 ELSE 1 END,EmpID
select EmpID, Skills
from Table1
order by case when Skills like '%JavaScript%' then 0 else 1 end
Try this:
SELECT *
FROM YourTable
ORDER BY PATINDEX('%JavaScript%', Skills) DESC
But this is a bad way. You should really normalize your table.
For MySQL
select Skills from myTable
order by case Skills
when "Javascript" then 0
when "Java" then 1 when "C++" then 2
end
and so on
For SQL Server
select Skills from myTable
order by case
when Skills="Javascript" then 1
when Skill="Java" then 2
else 3
end
Make sure to start SQL server from 1 (That I'm not sure).
Include an else before end that will show all remaining results.
For more details about SQL Server see this or see this
This works for DB2/400:
with s (id, skill, rest) as
(select id, '', sk from skills
union all
select id, substr(rest, 1, locate(',',rest)-1),
substr(rest,locate(',',rest)+1)
from s
where locate(',',rest) > 0)
select id, skill from s
where skill = 'JavaScript'
order by id

Switch case in aggregate query

I want to have a switch case in my SQL query such that when the group by does not group any element i dont want to aggregate otherwise I want to. Is that possible.
my query is something like this:
select count(1),AVG(student_mark) ,case when Count(1)=1 then student_subjectid else null end from Students
group by student_id
i get this error Column 'student_subjectid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Thanks in advance..
SELECT
student_id,
COUNT(*) AS MarkCount,
AVG(student_mark) AS student_mark,
CASE COUNT(*) WHEN 1 THEN MIN(student_subjectid) END AS student_subjectid
FROM Students
GROUP BY student_id
Why in the world would you complicate it?
select count(1), AVG(Student_mark) Student_mark
from Students
group by student_id
If there is only one student_mark, it is also the SUM, AVG, MIN and MAX - so just continue to use the aggregate!
EDIT
The dataset that would eventuate with your requirement will not normally make sense. The way to achieve that would be to merge (union) two different results
select
numRecords,
Student_mark,
case when numRecords = 1 then student_subjectid end # else is implicitly NULL
from
(
select
count(1) AS numRecords,
AVG(Student_mark) Student_mark,
min(student_subjectid) as student_subjectid
from Students
group by student_id
) x