In SQL Server, how to concat --> group by using the concat column - sql

SELECT
'dbo.our_table' as table_name,
CONCAT(col1, '-', col2, '-', col3) as table_id,
COUNT(*) as ct
FROM dbo.our_table
group by table_name, table_id
-- group by 1, 2 -- this doesn't work either...
order by ct desc
This does not work in SQL server because it does not recognize table_name or table_id. I understand that this can be done by nesting a 2nd SELECT clause into the FROM, so that table_name and table_id are explicitly available, however I am trying to understand if it is possible to achieve this output without having to create a nested SELECT statement, but rather by keeping the structure of my current query but only making a tweak.
Thanks

As mentioned in the comments, you need to put your 3 columns (col1, col2 & col3) into the GROUP BY. Unlike these 3 columns, dbo.our_table is not needed in the GROUP BY as it is a string.
SQL Server executes the components of SELECT queries in a particular order starting with FROM, then WHERE, GROUP BY, etc. In this case, SQL Server doesn't recognize the aliases table_name & table_id in the GROUP BY because they are not set until the SELECT, which is executed after the GROUP BY.
Googling "SQL Server SELECT query execution order" should give you a number of resources which will explain the order of execution in more detail.

You need to specify the full calculation for the GROUP BY as well as for the SELECT. This is because GROUP BY is logically considered before SELECT, so cannot access those calculations.
You could do it like this (table_name is not necessary because it's purely computed):
SELECT
'dbo.our_table' as table_name,
CONCAT(col1, '-', col2, '-', col3) as table_id,
COUNT(*) as ct
FROM dbo.our_table
group by CONCAT(col1, '-', col2, '-', col3)
order by ct desc;
But much better is to place calculations in a CROSS APPLY, this means it is accessible later by name as you wished:
SELECT
'dbo.our_table' as table_name,
v.table_id,
COUNT(*) as ct
FROM dbo.our_table
CROSS APPLY (VALUES (CONCAT(col1, '-', col2, '-', col3) ) ) as v(table_id)
group by v.table_id
order by ct desc;

Related

Distinct Values but get all the columns

In SSMS we have a view, which is returning distinct columns by the following command,
Create View [VIEWNAME] As
`Select distinct [Col1],[Col2], Max(TimeDate) as TimeDate
from [Table]
Group By [Col1],[Col2]`
I want the column [Col3] as well from the table in view.
I tried the following so far but unfortunately, it didn't work for me
Select Distint on [Col1],[Col2] * from [Table]
Error: Incorrect syntax near 'on'.
Also,
select [Col1],[Col2],[Col3],Max[TimeDate] from [Table]
Group by [Col1],[Col2],[Col3],[TimeDate]
Error: Column TABLE.Col3 is invalid in the select list because it is not contained in either an aggregated function or GROUP BY clause.
Below is the original table sample.
enter image description here
Desired table image
enter image description here
Thanks.
You can use distinct on in Postgresql.
select distinct on (col1, col2) *
from the_table
order by col1, col2, timedate desc;
I guess that your RDBMS is SQL Server. The query below uses the SQL server equivalent of Postgresql distinct on ().
select col1, col2, col3, timedate
from
(
select *, row_number() over (partition by col1,col2 order by timedate desc) as rn
from the_table
) as t
where rn = 1;

Count distinct multiple columns in redshift

I am trying to count rows which have a distinct combination of 2 columns in Amazon redshift. The query I am using is -
select count(distinct col1, col2)
from schemaname.tablename
where some filters
It is throwing me this error -
Amazon Invalid operation: function count(character varying, bigint) does not exist`
I tried casting bigint to char but it didn't work.
you can use sub-query and count
select count(*) from (
select distinct col1, col2
from schemaname.tablename
where some filter
) as t
A little late to the party but anyway: you can also try to concatenate columns using || operator. It might be inefficient so I wouldn't use it in prod code, but for ad-hoc analysis should be fine.
select count(distinct col1 || '_' || col2)
from schemaname.tablename
where some filters
Note separator choice might matter, i.e.
both 'foo' || '_' || 'bar_baz' and 'foo_bar' || '_' || 'baz' yield 'foo_bar_baz' and are thus equal. In some cases this might be concern, in some it's so insignificant you can skip separator completely.
You can use
select col1,col2,count(*) from schemaname.tablename
where -- your filter
group by col1,col2
If you are just trying to do count(distinct) then Zaynul's answer is correct. If you want other aggregations as well, here is another method:
select . . .,
sum(case when seqnum = 1 then 1 else 0 end) as col1_col2_unique_count
from (select t.*,
row_number() over (partition by col1, col2 order by col1) as seqnum
from schemaname.tablename t
where some filters
) c

Adding a constant value column in the group by clause

Netezza sql is giving error on this query:Cause: Invalid column name 'dummy'.
select col1,col2, '' as dummy, max(col3) from table1 group by col1,col2,dummy
If i remove the dummy from the group by clause, it works fine. But as per sql syntax, I am supposed to include all non aggregate columns in group by.
why do you need it in your group by, you can use an aggregate function and its result would always be right because the value is constant for example:
select col1,col2, min(' ') as dummy, max(col3) from table1 group by col1,col2
"dummy" is a static column (not in the table), so it does not need to be in the group by because it is an external column.
SELECT col1,
col2,
cast(5 as int) AS [dummy],
max(col3)
FROM test_1
GROUP BY col1,
col2,
col3,
'dummy'
The code produces an outer reference error # 164.
Take a look at these links
http://www.sql-server-helper.com/error-messages/msg-164.aspx
http://www.sql-server-helper.com/error-messages/msg-1-500.aspx
It's due to the order of operations...
FROM
JOIN
WHERE
GROUP BY
...
SELECT
When using group by, only the fields remaining from the previous step are available. Since you are not declaring your "Dummy" column until the Select statement the group by doesn't know it exists and therefore doesn't need to account for it.
Going by Basics..GROUP BY operations are something executed ,after the JOIN operations underneath(File IOs).. And then only the SELECTED resultet would be available.
Now that, you specified something as Dummy in SELECT, and the database would not know it, Because While GROUPing it is not available at the TABLE level.!
Try your query using GROUP BY your_column, ' ' it would work.. Because you have mentioned it directly instead referring an alias!
Finally, when a GROUP by is used.. You can specify any constants in SELECT or GROUP BY.. because they are afterall included in your SELECTed result, without a TABLE operation involved. So the database excuses them.
To resolve the issue, group it at an outer layer:
SELSE X.col1, X.col2, X.dummy, max(col3)
FROM (
SELECT col1,
col2,
cast(5 as int) AS [dummy],
col3
FROM test_1
)
GROUP BY X.col1,
X.col2,
X.dummy

SQL: windowed aggregate functions for DB2 iSeries

Is count not a valid aggregation function for row partitions for SQL DB2 on the iSeries?
This query works:
select ROW_NUMBER() over (partition by COL1, COL2 order by COL3 asc)
from MyTable
And this query gives a syntax error:
select COUNT(1) over (partition by COL1, COL2)
from MyTable
The error message is pointing at the parenthesis before the word partition:
[Message SQL0401] Token ( is not a valid token. A partial list of valid tokens is , FROM INTO.
I'm aware I can rewrite the query to avoid the row partition, but I'd like to know why this isn't working.
No, COUNT() is not the same type of function as ROW_NUMBER().
If you want the number of rows per (col1,col2) then you could simply use
select COL1, COL2, count(*)
from MyTable
group by col1, col2

SQL to find the number of distinct values in a column

I can select all the distinct values in a column in the following ways:
SELECT DISTINCT column_name FROM table_name;
SELECT column_name FROM table_name GROUP BY column_name;
But how do I get the row count from that query? Is a subquery required?
You can use the DISTINCT keyword within the COUNT aggregate function:
SELECT COUNT(DISTINCT column_name) AS some_alias FROM table_name
This will count only the distinct values for that column.
This will give you BOTH the distinct column values and the count of each value. I usually find that I want to know both pieces of information.
SELECT [columnName], count([columnName]) AS CountOf
FROM [tableName]
GROUP BY [columnName]
An sql sum of column_name's unique values and sorted by the frequency:
SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name ORDER BY 2 DESC;
Be aware that Count() ignores null values, so if you need to allow for null as its own distinct value you can do something tricky like:
select count(distinct my_col)
+ count(distinct Case when my_col is null then 1 else null end)
from my_table
/
SELECT COUNT(DISTINCT column_name) FROM table as column_name_count;
you've got to count that distinct col, then give it an alias.
select count(*) from
(
SELECT distinct column1,column2,column3,column4 FROM abcd
) T
This will give count of distinct group of columns.
select Count(distinct columnName) as columnNameCount from tableName
Using following SQL we can get the distinct column value count in Oracle 11g.
select count(distinct(Column_Name)) from TableName
After MS SQL Server 2012, you can use window function too.
SELECT column_name, COUNT(column_name) OVER (PARTITION BY column_name)
FROM table_name
GROUP BY column_name
To do this in Presto using OVER:
SELECT DISTINCT my_col,
count(*) OVER (PARTITION BY my_col
ORDER BY my_col) AS num_rows
FROM my_tbl
Using this OVER based approach is of course optional. In the above SQL, I found specifying DISTINCT and ORDER BY to be necessary.
Caution: As per the docs, using GROUP BY may be more efficient.
select count(distinct(column_name)) AS columndatacount from table_name where somecondition=true
You can use this query, to count different/distinct data.
Without using DISTINCT this is how we could do it-
SELECT COUNT(C)
FROM (SELECT COUNT(column_name) as C
FROM table_name
GROUP BY column_name)
Count(distinct({fieldname})) is redundant
Simply Count({fieldname}) gives you all the distinct values in that table. It will not (as many presume) just give you the Count of the table [i.e. NOT the same as Count(*) from table]