Grouping by a custom column composed of multiple actual columns - sql

I want to display location info constituted by multiple columns in the DB but then I need to group it by the ID. The solution I've got is to list the constituting columns as groupees in the following way.
select
Id,
Name,
Here + ' and ' + There as Location,
count(*) as Count
from KnownStuff
group by Id, Name, Here, There
However, I'd like to know if there a more like-a-bossy way to group by that column, i.d. something along the lines of this.
group by Id, Name, Location
Or, even better (although, based on my googlearching, I'm pretty sure that it's not possible), I'd like to exclude all the other columns except for Id from the grouping constraints. In some cases I'll use sum or some other aggregating function but it'd be nice to just tell the server not to bother And if there are non-identical occurrences, then so be it - let it crash, burn, cry or lie - after all, it's my problem that I wrote a faulty script.
So:
Is there a like-a-bossy approach to grouping a custom column?
Is there a bite-me-in-the-ass-laterish approach to make it easier for now?

Wrap the query up in a derived table. Do GROUP BY it's result:
select id, name, location, count(*)
from
(
select
Id,
Name,
Here + ' and ' + There as Location,
from KnownStuff
)
group by Id, Name, location

May be encapsulating the Select with a SQL CTE expression can be a way
;with cte as (
select
Id,
Name,
Here + ' and ' + There as Location,
--count(*) as Count
from KnownStuff
--group by Id, Name, Here, There
)
select
Id, Name, Location, count(*) as [Count]
from cte
group by Id, Name, Location
This is actually like a sub-query

You can add computed column to your table.
alter table knownstuff add Location as (Here + ' and ' + There)
Computed columns are not stored physically. Then you can rewrite the query as:
select
Id,
Name,
Location,
count(*) as Count
from KnownStuff
group by Id, Name,Location

Related

BigQuery from column to rows by separator

I want to generate table and add all values per distinct id to one row using BigQuery
Example:
id label
000756f4-1af2-439b-b607-ce7384a6b8ee fast
000756f4-1af2-439b-b607-ce7384a6b8ee streaming
000756f4-1af2-439b-b607-ce7384a6b8ee other
0007bac4-1bed-4bf0-8b55-d21216723ef5 issue
000a03d2-f88c-4150-aa96-40b9fdaccb17 fast
000a03d2-f88c-4150-aa96-40b9fdaccb17 other
I would like to receive such table:
id label
000756f4-1af2-439b-b607-ce7384a6b8ee fast, streaming, other
0007bac4-1bed-4bf0-8b55-d21216723ef5 issue
000a03d2-f88c-4150-aa96-40b9fdaccb17 fast, other
Is it possible to achieve it with BigQuery?
You can just use string_agg():
select id, string_agg(label, ', ') as labels
from t
group by id;
Note that the ordering is arbitrary (and might even vary from one run to another). You might want to include an order by as well:
select id, string_agg(label, ', ' order by label) as labels
from t
group by id;
Update
Use string_agg:
select id, string_agg(label, ', ')
from mytable
group by id
Original answer
Use array_agg and array_to_string:
select id, array_to_string(array_agg(label), ', ')
from mytable
group by id

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Count() how many times a name shows up in a table with the rest of info

I have read in various websites about the count() function but I still cannot make this work.
I made a small table with (id, name, last name, age) and I need to retrieve all columns plus a new one. In this new column I want to display how many times a name shows up or repeats itself in the table.
I have made test and can retrieve but only COLUMN NAME with the count column, but I haven't been able to retrieve all data from the table.
Currently I have this
select a.n_showsup, p.*
from [test1].[dbo].[person] p,
(select count(*) n_showsup
from [test1].[dbo].[person])a
This gives me all data on output but on the column n_showsup it gives me just the number of rows, now I know this is because I'm missing a GROUP BY but then when I write group by NAME it shows me a lot of records. This is an example of what I need:
You can use window functions, if you RDBMS supports them:
select t.*, count(*) over(partition by name) n_showsup
from mytable t
Alternatively, you can join the table with an aggregation query that counts the number of occurences of each name:
select t.*, x.n_showsup
from mytable t
inner join (select name, count(*) n_showsup from mytable group by name) x
on x.name = t.name
While the window function approach (#GMB's answer) is the right way to go, thinking through this from a subquery approach (like you were headed towards) would look something like:
select p.*, a.n_showsup
from [test1].[dbo].[person] p
INNER JOIN (
select name, count(*) n_showsup
from [test1].[dbo].[person]
GROUP BY name
) a ON p.name = a.name
This is VERY close to what you had, the difference is that we are grouping that subquery by name (so we get a count by name) and we can use that in the join criteria which we do with the ON clause on that INNER JOIN.
You should really never ever use a comma in your FROM clause. Instead use a JOIN.

DISTINCT AND COUNT(*)=1 not working on SQL

I need to show the ID (which is unique in every case) and the name, which is sometimes different. In my code I only want to show the names IF they are unique.
I tried with both distinct and count(*)=1, nothing solves my problem.
SELECT DISTINCT id, name
FROM person
GROUP BY id, name
HAVING count(name) = 1;
The result is still showing the names multiple times
By "unique", I assume you mean names that only appear once. That is not what "distinct" means in SQL; the use of distinct is to remove duplicates (either for counting or in a result set).
If so:
SELECT MAX(id), name
FROM person
GROUP BY name
HAVING COUNT(*) = 1;
If your DBMS supports it, you can use a window function:
SELECT id, name
FROM (
SELECT id, name, COUNT(*) OVER(PARTITION BY name) AS NameCount -- get count of each name
FROM person
) src
WHERE NameCount = 1
If not, you can do:
SELECT id, name
FROM person
WHERE name IN (
SELECT name
FROM person
GROUP BY name
HAVING COUNT(*) = 1 -- Only get names that occur once
)

GROUP BY Function Issue

I have the below example:
SELECT name, age, location, SUM(pay)
FROM employee
GROUP BY location
This as expected will give me an error:
ORA-00979: not a GROUP BY expression
How can I get around this? I need to group by one maybe two columns but need to return all columns even if they're not used in the GROUP BY clause, I've looked at sub-queries to get around it but have had no luck so far.
You can use analytic functions:
SELECT name
, age
, location
, pay
, SUM(pay) over (partition by location order by location ) total
FROM employee
So, you can return all rows even if they are not used in the grouping.
So you want to know the total pay by location, and you want to know the names and ages of employees at each location? How about:
SELECT e.NAME,
e.AGE,
e.LOCATION,
t.TOTAL_LOCATION_PAY
FROM EMPLOYEE e
INNER JOIN (SELECT LOCATION,
SUM(PAY) AS TOTAL_LOCATION_PAY
FROM EMPLOYEE
GROUP BY LOCATION) t
ON (t.LOCATION = e.LOCATION)
Share and enjoy.
(Group b[http://docs.oracle.com/javadb/10.6.2.1/ref/rrefsqlj32654.html] Must have an aggregate function in every column that is not in the group by clause. When you are grouping, means that you want one row per group. Distinct values of the columns in the clause appear in the final result set.
This is because oracle can't know which of the values for the column that you don't have in the group by to retrieve. Consider this:
A X
B X
Select col1, col2 from myTable group by col2; -- incorrect
Select min(col1), col2 from myTable group by col2; -- correct
Why is the first incorrect? Because oracle can't know whether to retrieve A or B for the X value you have to specify it. i.e. MIN, MAX, etc.
There is an alternative to this named analytic functions that allow you to work under windows of your result set.
Now if you want total employee pay by location, and every employee you may want this.
SELECT name, age, location, SUM(pay) OVER(PARTITION BY location)
FROM employee
I believe this is better than #Bob Jarvis query as you only query the table once. Please correct me if I'm wrong. He also has employees and employee. Typo?