T-SQL Group By summarize fields using logical functions - sql

How can I aggregate and arrive to these results?
SELECT *
FROM TABLE
GROUP BY Column1

Assuming the columns actually contain the literal string values 'TRUE' and 'FALSE', we could use:
SELECT
Column1,
MAX(Column2) AS Column2,
MAX(Column3) AS Column3
FROM yourTable
GROUP BY
Column1;

Related

Concatenate rows based on multiple column values

I am trying to get a new column with a concatenation of all distinct row values. This aggregation would be based on other columns.
I have tried the following but I get the same values repeated in the new column (A1, A1, A4). I need the concatenation to be distinct.
SELECT
STRING_AGG(COLUMN1, ', ') AS COLUMN1_ALIAS
,COLUMN2
,COLUMN3
,COLUMN4
FROM TABLE
GROUP BY COLUMN2 ,COLUMN3 ,COLUMN4
It looks like you want windowing rather than aggregation. Unfortunately, string_agg does not support over() in SQL Server ; neither does it support distinct in its aggregated form.
We could work around it with subqueries ; it is probably more efficient to deduplicate and pre-compute the aggregates first, then join with the original table:
select t.*, x.column1_alias
from mytable t
inner join (
select column2, column3, column4, string_agg(column1, ', ') as column1_alias
from (select distinct column1, column2, column3, column4 from mytable) t
group by column2, column3, column4
) x on x.column2 = t.column2 and x.column3 = t.column3 and x.column4 = t.column4
Side note : in a database that supports both over() and distinct on string aggregation, the query would phrase as:
select t.*,
string_agg(distinct column4, ', ')
over(partition by column2, column3, column4) as column1_alias
from mytable t

Is there a function in SQL that allows me to sum specific rows based on a column value?

I want to sum City4, and the two misspellings together as one row. Any input on how to do this?
SELECT column1,
column2,
count(column3),
Sum(Column4)
FROM TABLE
AND column1 IN ('state1',
'State2',
'State3')
AND column2 IN ('City1',
'City2',
'City3',
'City4',
'City4 misspelled1',
'City4 misspelled 2')
GROUP BY column1,
column2
ORDER BY column1;

Group by on a column with duplicates (string values)

I have to create a MS SQL Query on the table below.
Consider I have following single table:
I want to get the following result by grouping on column1:
How can I accomplish this?
Use COUNT with GROUP BY.
Query
SELECT column1, 'N/A' as column2, 'N/A' as column3,
COUNT(column1) AS column4_amount
FROM your_table_name
GROUP BY column1;
SELECT column1, 'N/A' as column2, 'N/A' as column3, COUNT(1) AS column4_amount
FROM table_name
GROUP BY column1
if you don't need column2 and column3 then try below:
SELECT column1, COUNT(1) AS column4_amount
FROM table_name
GROUP BY column1
Update:
As you want results which has more than one occurrence, try this:
SELECT column1, 'N/A' as column2, 'N/A' as column3, COUNT(1) AS column4_amount
FROM table_name
GROUP BY column1
HAVING COUNT(1) > 1
You should use GROUP BY with column1 as GROUP BY helps you group all the items with the same name.
To count total items under a group of items, from the table with the same name, use the COUNT aggregate function. Now you need to check that each item exists more than once. Use the HAVING clause which takes all the groups that have more than two items.
SELECT column1, 'N/A' as column2, 'N/A' as column3, COUNT(1) AS column4_amount
FROM your_table_name
GROUP BY column1
HAVING COUNT(1) > 1

how to do nested SQL select count

i'm querying a system that won't allow using DISTINCT, so my alternative is to do a GROUP BY to get near to a result
my desired query was meant to look like this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT(column3)) AS column3
FROM table
for the alternative, i would think i'd need some type of nested query along the lines of this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(SELECT column FROM table GROUP BY column) AS column3
FROM table
but it didn't work. Am i close?
You are using the wrong syntax for COUNT(DISTINCT). The DISTINCT part is a keyword, not a function. Based on the docs, this ought to work:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT column3) AS column3
FROM table
Do, however, read the docs. BigQuery's implementation of COUNT(DISTINCT) is a bit unusual, apparently so as to scale better for big data. If you are trying to count a large number of distinct values then you may need to specify a second parameter (and you have an inherent scaling problem).
Update:
If you have a large number of distinct column3 values to count, and you want an exact count, then perhaps you can perform a join instead of putting a subquery in the select list (which BigQuery seems not to permit):
SELECT *
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
)
CROSS JOIN (
SELECT count(*) AS column3
FROM (
SELECT column3
FROM table
GROUP BY column3
)
)
Update 2:
Not that joining two one-row tables would be at all expensive, but #FelipeHoffa got me thinking more about this, and I realized I had missed a simpler solution:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(*) AS column3
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
GROUP BY column3
)
This one computes a subtotal of column1 and column2 values, grouping by column3, then counts and totals all the subtotal rows. It feels right.
FWIW, the way you are trying to use DISTINCT isn't how its normally used, as its meant to show unique rows, not unique values for one column in a dataset. GROUP BY is more in line with what I believe you are ultimately trying to accomplish.
Depending upon what you need you could do one of a couple things. Using your second query, you would need to modify your subquery to get a count, not the actual values, like:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
(SELECT sum(1) FROM table GROUP BY column) AS column3
FROM table
Alternatively, you could do a query off your initial query, something like this:
SELECT sum(column1), sum(column2), sum(column4) from (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
1 AS column4
FROM table GROUP BY column3)
GROUP BY column4
Edit: The above is generic SQL, not too familiar with Google Big Query
You can probably use a CTE
WITH result as (select column from table group by column)
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
Select Count(*) From result AS column3
FROM table
Instead of doing a COUNT(DISTINCT), you can get the same results by running a GROUP BY first, and then counting results.
For example, the number of different words that Shakespeare used by year:
SELECT corpus_date, COUNT(word) different_words
FROM (
SELECT word, corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
As a bonus, let's add a column that identifies which books were written during each year:
SELECT corpus_date, COUNT(word) different_words, GROUP_CONCAT(UNIQUE(corpus)) books
FROM (
SELECT word, corpus_date, UNIQUE(corpus) corpus
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date

TeraData aggregate function

When I try to select couple of columns with count, I get the following error:
Selected non-aggregate values must be part of the associated group
My query is something like this.
SELECT COUNT(1), COLUMN1, COLUMN2
FROM TABLE-NAME
If you're after a count for each combination of COLUMN1 and COLUMN2:
SELECT COUNT(1), COLUMN1, COLUMN2 FROM TABLE_NAME GROUP BY COLUMN1, COLUMN2
If you're after a count of all records in the table:
SELECT COUNT(1) OVER (), COLUMN1, COLUMN2 FROM TABLE_NAME