SQL group by without repeating fields same as excel pivot - sql

What I need is the following.
Currently, it's repeating column names with the regular group by and sum.
| column 1 | column2 | column3 | sum |
|-------------|-------------|----------|-----|
|main product |sub product1 |subsub 1 | 500|
|main product |sub product1 |subsub 2 | 300|
|main product |sub product2 |subsub 1 | 300|
I want to get rid of repeating the same as excel pivot, so below, I need.
| column 1 | column2 | column3 | sum |
|-------------|-------------|----------|-----|
|main product |sub product1 |subsub 1 | 500|
|main product | |subsub 2 | 300|
|main product |sub product2 |subsub 1 | 300|
Can someone help me with this?
edit : formatted

We can approximate this behavior with the help of ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3) rn
FROM yourTable
)
SELECT col1, CASE WHEN rn = 1 THEN col2 ELSE '' END AS col2, col3, sum
FROM cte t
ORDER BY col1, t.col2;

You can use row_number() :
select col1,
case when row_number() over(partition by col1, col2 order by col3) = 1 then col2 else 0 end as col2,
col3, sum
from table t
group by col1, col2, col3;

Related

SQLite/SQLAlchemy: Check values are either NULL or the same for several columns

I would like to check for several columns that: For each column the value is either NULL or the same for all records.
For example, the condition holds for the following table
+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------+------+
| A | B | NULL | NULL |
| NULL | B | C | NULL |
| A | B | C | NULL |
+------+------+------+------+
How can I do that, preferably with one query?
EDIT:
An alternative question would be: How can I sum the distinct values of each of the selected columns
You can check if the distinct number of values in each individual column is less than or equal to 1:
SELECT COUNT(DISTINCT Col1) <= 1 ok1,
COUNT(DISTINCT Col2) <= 1 ok2,
COUNT(DISTINCT Col3) <= 1 ok3,
COUNT(DISTINCT Col4) <= 1 ok4
FROM tablename;
Or, you can get a result for the the whole table:
SELECT MAX(
COUNT(DISTINCT Col1),
COUNT(DISTINCT Col2),
COUNT(DISTINCT Col3),
COUNT(DISTINCT Col4)
) <= 1 ok
FROM tablename;
See the demo.

Get rows with maximum count per one column - while grouping by two columns

I'm trying to get max count of a field.
This is what I get and what I'm tried to do.
| col1 | col2 |
| A | B |
| A | B |
| A | D |
| A | D |
| A | D |
| C | F |
| C | G |
| C | F |
I'm trying to get the max count occurrences of col2, grouped by col1.
With this query I get the occurrences grouped by col1 and col2.
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP by col1, col2
ORDER BY col1, col2
And I get:
| col1 | col2 | conta |
| A | B | 2 |
| A | D | 3 |
| C | F | 2 |
| C | G | 1 |
Then I used this query to get max of count:
SELECT max(conta) as conta2, col1
FROM (
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP BY col1, col2
ORDER BY col1, col2
) AS derivedTable
GROUP BY col1
And I get:
| col1 | conta |
| A | 3 |
| C | 2 |
What I'm missing is the value of col2. I would like something like this:
| col1 | col2 | conta |
| A | D | 3 |
| C | F | 2 |
The problem is that if I try to select the col2 field, I get an error message, that I have to use this field in group by or aggregation function, but using it in the group by it's not the right way.
Simpler & faster (and correct):
SELECT DISTINCT ON (col1)
col1, col2, count(*) AS conta
FROM tab
GROUP BY col1, col2
ORDER BY col1, conta DESC;
db<>fiddle here (based on a_horse's fiddle)
DISTINCT ON is applied after aggregation, so we don't need a subquery or CTE. Consider the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
Select first row in each GROUP BY group?
You can combine GROUP BY with a window function - which gets evaluated after the group by:
with cte as (
SELECT col1, col2,
count(*) as conta,
dense_rank() over (partition by col1 order by count(*) desc) as rnk
FROM tab
WHERE ...
GROUP by col1, col2
)
select col1, col2, conta
from cte
where rnk = 1
order by col1, col2;
This will return the combination of col1,col2 with the same highest max count twice. If you don't want that, use row_number() instead of dense_rank()
Online example
Possibly not the most elegant solution, but using a common table expression may help.
with cte as (
select col1, col2, count(*) as total
from dtable
group by col1, col2
)
select col1, col2, total
from cte c
where total = (select max(total)
from cte cc
where cc.col1 = c.col1)
order by col1 asc
Returns
col1|col2|total|
----+----+-----+
A | D | 3|
C | F | 2|
from the docs
I misunderstood the question. Here is your solution:
;with tablex as
(Select col1, col2, Count(col2) as Count From Your_Table Group by col1, col2),
aaaa as
(Select ROW_NUMBER() over (partition by col1 order by Count desc) as row, * From tablex)
Select * From aaaa Where row = 1
Using a window function:
select distinct on (col1) col1, col2, cnt
from
(
select col1, col2, count(*) over (partition by col1, col2) cnt
from the_table
) t
order by col1, cnt desc;
col1
col2
cnt
A
D
3
C
F
2
This solution does not solve cases with ties.

Using CTE to calculate cumulative sum

I want to write a non-recursive common table expression (CTE) in postgres to calculate a cumulative sum, here's an example, input table:
----------------------
1 | A | 0 | -1
1 | B | 3 | 1
2 | A | 1 | 0
2 | B | 3 | 2
An output should look like this:
----------------------
1 | A | 0 | -1
1 | B | 3 | 1
2 | A | 1 | -1
2 | B | 6 | 3
As you can see the cumulative sum of columns 3 and 4 are calculated, this is easy to do using a recursive CTE, but how is it done with a non-recursive one?
Use window functions. Assuming that your table has columns col1, col2, col3 and col4, that would be:
select
t.*,
sum(col3) over(partition by col2 order by col1) col3,
sum(col4) over(partition by col2 order by col1) col4
from mytable t
You would use a window function for a cumulative sum. I don't see what the sum is in your example, but the syntax is something like:
select t.*, sum(x) over (order by y) as cumulative_sum
from t;
For your example, this would seem to be:
select t.*,
sum(col3) over (partition by col2 order by col1) as new_col3,
sum(col4) over (partition by col2 order by col1) as new_col4
from t;

Trying to write a query that will display duplicates results as null

I have a table that looks like the first example.
I'm trying to write a MSSQL2012 statement that that will display results like the second example.
Basically I want null values instead of duplicate values in columns 1 and 2. This is for readability purposes during reporting.
This seems like it should be possible, but I'm drawing a blank. No amount of joins or unions I've written has rendered the results I need.
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| 1 | 2 | 5 |
| 1 | 3 | 6 |
| 1 | 3 | 7 |
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| Null | null | 5 |
| null | 3 | 6 |
| null | null | 7 |
+------+------+------+
I would do this with no subqueries at all:
select (case when row_number() over (partition by col1 order by col2, col3) = 1
then col1
end) as col1,
(case when row_number() over (partition by col2 order by col3) = 1
then col2
end) as col2,
col3
from t
order by t.col1, t.col2, t.col3;
Note that the order by at the end of the query is very important. The result set that you want depends critically on the ordering of the rows. Without the order by, the result set could be in any order. So, the query might look like it works, and then suddenly fail one day or on a slightly different set of data.
Using a common table expression with row_number():
;with cte as (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
)
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from cte
without the cte
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
) sub
rextester demo: http://rextester.com/UYA17142
returns:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 2 | 4 |
| NULL | NULL | 5 |
| NULL | 3 | 6 |
| NULL | NULL | 7 |
+------+------+------+

SQL select distinct rows

I have data like this (col2 is of type Date)
| col1 | col2 |
------------------------------
| 1 | 17/10/2007 07:19:07 |
| 1 | 17/10/2007 07:18:56 |
| 1 | 31/12/2070 |
| 2 | 28/11/2008 15:23:14 |
| 2 | 31/12/2070 |
How would select rows which col1 is distinct and the value of col2 is the greatest. Like this
| col1 | col2 |
------------------------------
| 1 | 31/12/2070 |
| 2 | 31/12/2070 |
SELECT col1, MAX(col2) FROM some_table GROUP BY col1;
select col1, max(col2)
from table
group by col1
i reckon it would be
select col1, max(col2)
from DemoTable
group by col1
unless i've missed something obvious
select col1, max(col2) from MyTable
group by col1
SELECT Col1, MAX(Col2) FROM YourTable GROUP BY Col1
In Oracle and MS SQL:
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) rn
FROM table t
) q
WHERE rn = 1
This will select other columns along with col1 and col2