Get count of combinations of rows having different values in different columns - sql

I want get count of combinations of rows having different values in different columns.
Sample Data as below:
+------+---------+---------+
| GUID | Column1 | Column2 |
+------+---------+---------+
| XXX | A | aaa |
| XXX | B | bbb |
| YYY | C | ccc |
| YYY | D | ddd |
| XXX | A | aaa |
| XXX | B | bbb |
+------+---------+---------+
I am expecting following result. So XXX should be 2 as we are having 2 records in which Column1=A, Column2=aaa and Column1=B, Column2=bbb (Combination of two different columns values)
XXX 2
YYY 1

You can group by GUID and Column2, then take the max of count(*) to get the number of combinations:
declare #tmp table ([GUID] varchar(3), Column1 varchar(1), Column2 varchar(3))
insert into #tmp values ('XXX','A','aaa'),('XXX','B','bbb'),('YYY','C','ccc'),
('YYY','D','ddd'),('XXX','A','aaa'),('XXX','B','bbb')
select T.[GUID], max(T.cnt) as count_combinations
from (
select [GUID], Column2, count(*) as cnt
from #tmp
group by [GUID], Column2
) T
group by T.[GUID]
Results:

Related

Count amount of same value

I have a simple task which I to be honest have no idea how to accomplish. I have these values from SQL query:
| DocumentNumber | CustomerID |
------------------------------
| AAA | 1 |
| BBB | 1 |
| CCC | 2 |
| DDD | 3 |
-------------------------------
I would like to display a bit modified table like this:
| DocumentNumber | CustomerID | Repeate |
-----------------------------------------
| AAA | 1 | Multiple |
| BBB | 1 | Multiple |
| CCC | 2 | Single |
| DDD | 3 | Single |
------------------------------------------
So, the idea is simple - I need to append a new column and set 'Multiple' and 'Single' value
depending on if customer Id exists multiple times
Use window functions:
select t.*,
(case when count(*) over (partition by CustomerId) = 1 then 'Single'
else 'Multiple'
end) as repeate
from t;
You also achieve the Same By using GROUP BY & SUB QUERY
DECLARE #T TABLE(
DocumentNumber VARCHAR(10),
CustomerID INT)
Insert Into #T VALUES('AAA', 1 ),('BBB', 1 ),('CCC', 2 ),('DDD', 3 )
select M.DocumentNumber,M.CustomerID,CASE WHEN Repeated_Row>1 THEN 'Multiple' ELSE 'Single' END As Repeate
from #T M
LEFT JOIN (SELECT CustomerID,COUNT(*) AS Repeated_Row FROM #T GROUP BY CustomerID) S ON S.CustomerID=M.CustomerID

Unexpected effect of filtering on result from crosstab() query with multiple values

I have a crosstab() query similar to the one in my previous question:
Unexpected effect of filtering on result from crosstab() query
The common case is to filter extra1 field with multiples values: extra1 IN(value1, value2...). For each value included on the extra1 filter, I have added an ordering expression like this (extra1 <> valueN), as appear on the above mentioned post. The resulting query is as follows:
SELECT *
FROM crosstab(
'SELECT row_name, extra1, extra2..., another_table.category, value
FROM table t
JOIN another_table ON t.field_id = another_table.field_id
WHERE t.field = certain_value AND t.extra1 IN (val1, val2, ...) --> more values
ORDER BY row_name ASC, (extra1 <> val1), (extra1 <> val2)', ... --> more ordering expressions
'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(extra1, extra2...)
WHERE extra1 = val1; --> condition on the result
The first value of extra1 included on the ordering expression value1, get the correct resulting rows. However, the following ones value2, value3..., get wrong number of results, resulting on less rows on each one. Why is that?
UPDATE:
Giving this as our source table (table t):
+----------+--------+--------+------------------------+-------+
| row_name | Extra1 | Extra2 | another_table.category | value |
+----------+--------+--------+------------------------+-------+
| Name1 | 10 | A | 1 | 100 |
| Name2 | 11 | B | 2 | 200 |
| Name3 | 12 | C | 3 | 150 |
| Name2 | 11 | B | 3 | 150 |
| Name3 | 12 | C | 2 | 150 |
| Name1 | 10 | A | 2 | 100 |
| Name3 | 12 | C | 1 | 120 |
+----------+--------+--------+------------------------+-------+
And this as our category table:
+-------------+--------+
| category_id | value |
+-------------+--------+
| 1 | Cat1 |
| 2 | Cat2 |
| 3 | Cat3 |
+-------------+--------+
Using the CROSSTAB, the idea is to get a table like this:
+----------+--------+--------+------+------+------+
| row_name | Extra1 | Extra2 | cat1 | cat2 | cat3 |
+----------+--------+--------+------+------+------+
| Name1 | 10 | A | 100 | 100 | |
| Name2 | 11 | B | | 200 | 150 |
| Name3 | 12 | C | 120 | 150 | 150 |
+----------+--------+--------+------+------+------+
The idea is to be able to filter the resulting table so I get results with Extra1 column with values 10 or 11, as follow:
+----------+--------+--------+------+------+------+
| row_name | Extra1 | Extra2 | cat1 | cat2 | cat3 |
+----------+--------+--------+------+------+------+
| Name1 | 10 | A | 100 | 100 | |
| Name2 | 11 | B | | 200 | 150 |
+----------+--------+--------+------+------+------+
The problem is that on my query, I get different result size for Extra1 with 10 as value and Extra1 with 11 as value. With (Extra1 <> 10) I can get the correct result size on Extra1 for that value but not in the case of 11 as value.
Here is a fiddle demonstrating the problem in more detail:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=5c401f7512d52405923374c75cb7ff04
All "extra" columns are copied from the first row of the group (as pointed out in my previous answer)
While you filter with:
.... WHERE extra1 = 'val1';
...it makes no sense to add more ORDER BY expressions on the same column. Only rows that have at least one extra1 = 'val1' in their source group survive.
From your various comments, I guess you might want to see all distinct existing values of extra - within the set filtered in the WHERE clause - for the same unixdatetime. If so, aggregate before pivoting. Like:
SELECT *
FROM crosstab(
$$
SELECT unixdatetime, x.extras, c.name, s.value
FROM (
SELECT unixdatetime, array_agg(extra) AS extras
FROM (
SELECT DISTINCT unixdatetime, extra
FROM source_table s
WHERE extra IN (1, 2) -- condition moves here
ORDER BY unixdatetime, extra
) sub
GROUP BY 1
) x
JOIN source_table s USING (unixdatetime)
JOIN category_table c ON c.id = s.gausesummaryid
ORDER BY 1
$$
, $$SELECT unnest('{trace1,trace2,trace3,trace4}'::text[])$$
) AS final_result (unixdatetime int
, extras int[]
, trace1 numeric
, trace2 numeric
, trace3 numeric
, trace4 numeric);
Aside: advice given in the following related answer about the 2nd function parameter applies to your case as well:
PostgreSQL crosstab doesn't work as desired
I demonstrate a static 2nd parameter query above. While being at it, you don't need to join to category_table at all. The same, a bit shorter and faster, yet:
SELECT *
FROM crosstab(
$$
SELECT unixdatetime, x.extras, s.gausesummaryid, s.value
FROM (
SELECT unixdatetime, array_agg(extra) AS extras
FROM (
SELECT DISTINCT unixdatetime, extra
FROM source_table
WHERE extra IN (1, 2) -- condition moves here
ORDER BY unixdatetime, extra
) sub
GROUP BY 1
) x
JOIN source_table s USING (unixdatetime)
ORDER BY 1
$$
, $$SELECT unnest('{923,924,926,927}'::int[])$$
) AS final_result (unixdatetime int
, extras int[]
, trace1 numeric
, trace2 numeric
, trace3 numeric
, trace4 numeric);
db<>fiddle here - added my queries at the bottom of your fiddle.

I need to group by one column and show more columns from one dataset

I have the following table:
AMNT1 | COLUMN1 | COLUMN2 | COLUMN3 | GROUP1
--------|-----------|-----------|-------------|--------
1.00 | COL1_ROW1 | COL2_ROW1 | COL3_ROW1 | AAA
9.00 | COL1_ROW2 | COL2_ROW2 | COL2_ROW2 | AAA
2.00 | COL1_ROW3 | COL2_ROW3 | COL3_ROW3 | BBB
3.00 | COL1_ROW4 | COL2_ROW4 | COL3_ROW4 | CCC
I want to sum AMNT1 grouped by GROUP1:
SELECT GROUP1, SUM(AMNT1) FROM ND_TEST GROUP BY GROUP1;
GROUP1 | SUM(AMNT1)
-------|-----------
AAA | 10.00
BBB | 2.00
CCC | 3.00
Addtionally I want to select COLUMN1, COLUMN2 and COLUMN3 from ONE row. So my output should be like this:
GROUP1 | SUM(AMNT1)| COLUMN1 | COLUMN2 | COLUMN3 |
-------|-----------|-----------|-----------|------------|
AAA | 10.00 | COL1_ROW1 | COL2_ROW1 | COL3_ROW1 |
BBB | 2.00 | COL1_ROW3 | COL2_ROW3 | COL3_ROW3 |
CCC | 3.00 | COL1_ROW4 | COL2_ROW4 | COL3_ROW4 |
If I use sum over partition I get duplicates per group... If I use aggregate functions, I dont get result from the same row...
Do you have an idea?
Thank you!
select group1, sum_amnt1, column1, column2, column3
from (
select group1, sum(amnt1) over (partition by group1) as sum_amnt1,
column1, column2, column3,
row_number() over (partition by group1 order by null) as rn
from your_table
)
where rn = 1
order by null in the row_number() function corresponds to your clarification (in a Comment) that any row from each group will be fine (you don't care which one).
You can use window function :
select nt.*
from (select nt.*, sum(AMNT1) over (partition by GROUP1) as sum,
row_number() over (partition by GROUP1 order by AMNT1) as seq
from ND_TEST as nt
) nt
where seq = 1;

Replacing set of rows with another set in sqlite

I've a table values with columns like this:
id: integer primary key
value: varchar(128)
type_id: integer (foreign key)
owner_id: integer (foreign key)
and some sample data:
id value type_id owner_id
...
5 aaa 0 1
6 bbb 0 2 // Rows
7 ccc 1 2 // to
8 ddd 1 2 // be
9 eee 2 2 // replaced
10 fff 0 3
...
Now I would like to replace all rows where owner_id == 2 with a new set of data. Simple approach is to DELETE all rows for owner_id == 2 and INSERT new ones. However I wonder if there is another solution?
In my case:
New set may contain exactly the same data (no action needed).
Or it could contain the same data but one row (deletion needed). Example: no more bbb with type_id == 0
Or there is one more row (insertion needed). Example: bbb, ccc, ddd and eee with exactly the same values for type_id plus ggg with type_id = 1
Or one of the values in values column changed (update needed). Example: exactly the same data but instead of ccc with type_id == 1 there is ggg with type_id == 1
It can be also any combination of operations above.
The reason I try to avoid DELETE + INSERT that I'll have many such updates and with such approach id will start growing fast.
As you don't seem to be around to respond to comments, let's get started.
In line with my above comments, I did iron out (what appears to me as) some wrinkles:
in your request: "no more bbb with type_id == 1" - which is not part of your sample data - going for type_id 0), and
your sample data: (values "ccc" and "ddd" for type_id 1 and owner_id - going for unique owner_id type_id combinations).
If applicable, you might enforce the latter by:
CREATE UNIQUE INDEX ValuesTable_TypeOwner ON ValuesTable(owner_id, type_id);
NB: I changed the tablename as VALUES is a SQL reserved word.
You might want to try along (pulling the to be applied modifications from a table called Changes):
Delete no longer existing owner_id type_id combinations:
WITH
To_Delete (id) AS (
SELECT
id
FROM ValuesTable V
JOIN Changes C
ON V.owner_id = C.owner_id
AND V.type_id
NOT IN (SELECT type_id
FROM Changes
WHERE owner_id = C.owner_id)
)
DELETE FROM ValuesTable
WHERE id IN (SELECT id FROM To_Delete)
;
Update deviating values:
WITH
To_Update (id) AS (
SELECT
id
FROM ValuesTable V
JOIN Changes C
ON V.owner_id = C.owner_id
AND V.type_id = C.type_id
AND V.value <> C.value
)
UPDATE ValuesTable
SET value = (SELECT value
FROM Changes
WHERE ValuesTable.owner_id = owner_id
AND ValuesTable.type_id = type_id
)
WHERE id IN (SELECT id FROM To_Update)
;
Insert new owner_id type_id combinations:
WITH
To_Insert (value, type_id, owner_id) AS (
SELECT
value
, type_id
, owner_id
FROM Changes
WHERE NOT EXISTS
(SELECT 1
FROM ValuesTable
WHERE Changes.owner_id = owner_id
AND Changes.type_id = type_id
)
)
INSERT INTO ValuesTable (value, type_id, owner_id)
SELECT value, type_id, owner_id FROM To_Insert
;
Starting from
ValuesTable Changes
| id | value | type_id | owner_id | | value | type_id | owner_id |
|----|-------|---------|----------| |-------|---------|----------|
| 5 | aaa | 0 | 1 | | ccc | 1 | 2 |
| 6 | bbb | 0 | 2 | | ddd | 2 | 2 |
| 7 | ccc | 1 | 2 | | xxx | 3 | 2 |
| 8 | ddd | 2 | 2 | | yyy | 4 | 2 |
| 9 | eee | 3 | 2 |
| 10 | fff | 0 | 3 |
it returns:
| id | value | type_id | owner_id |
|----|-------|---------|----------|
| 5 | aaa | 0 | 1 |
| 7 | ccc | 1 | 2 |
| 8 | ddd | 2 | 2 |
| 9 | xxx | 3 | 2 |
| 11 | yyy | 4 | 2 |
| 10 | fff | 0 | 3 |
See it in action: SQL Fiddle.
NB: Instead of using a Changes table, the WITH clause could, of course, be extended accordingly.
Please comment if and as this requires adjustment / further detail.

merging content of two tables without duplicating content

I have two identical SQL Server tables (SOURCE and DESTINATION) with lots a columns in each. I want to insert into table DESTINATION rows from table SOURCE that do not already exist in table DESTINATION. I define equality between the two rows if all columns match except for the timestamp, a count column, and the integer primary key. So I want to insert into DESTINATION all rows in SOURCE that dont already exist in DESTINATIONignoring count, timestamp, and the primary key columns.
How do I do this?
Thanks for all the contributions! I chose to use the Merge command since it is structured to allow for updates and inserts in one statement and I needed to do the update separately.
this is the code that worked:
Merge
into DESTINATION as D
using SOURCE as S
on (
D.Col1 = S.Col1
and D.Col2 = S.Col2
and D.Col3 = S.Col3
)
WHEN MATCHED
THEN UPDATE SET D.Count = S.Count
WHEN NOT MATCHED THEN
INSERT (Col1, Col2, Col3, Count, timestamp)
VALUES (S.Col1, S.Col2, S.Col3, S.Count, S.timestamp);
note: when I wrote this question first I called the tables AAA and BBB. I edited and changed the names of AAA to SOURCE AND BBB to DESTINATION for clarity
using Select statement for this purpose since Sql Server 2008 is obsolete instead of Select You can use Merge statement :
ref:
http://technet.microsoft.com/en-us/library/bb510625.aspx
http://weblogs.sqlteam.com/peterl/archive/2007/09/20/Example-of-MERGE-in-SQL-Server-2008.aspx
Something like this:
INSERT INTO BBB(id, timestamp, mycount, col1, col2, col3, etc.)
SELECT id, timestamp, mycount, col1, col2, col3, etc.
FROM AAA
WHERE
NOT EXISTS(SELECT NULL FROM BBB oldb WHERE
oldb.col1 = AAA.col1
AND oldb.col2 = AAA.col2
AND oldb.col3 = AAA.col3
)
Add columns as needed to the NOT EXISTS clause.
A solution using good ol'-fashioned LEFT JOIN -- note in the example below, only the first row of BBB is inserted into AAA, because only it has no matching row in AAA. You'd replace col1 and col2 with the actual columns of the tables.
> select * from AAA;
+---------------------+------+------+
| timestamp | col1 | col2 |
+---------------------+------+------+
| 2012-03-17 08:17:22 | 1 | 1 |
| 2012-03-17 08:17:27 | 1 | 2 |
| 2012-03-17 08:17:30 | 1 | 3 |
| 2012-03-17 08:17:32 | 1 | 4 |
| 2012-03-17 08:17:49 | 2 | 2 |
| 2012-03-17 08:17:52 | 2 | 3 |
| 2012-03-17 08:17:54 | 2 | 4 |
+---------------------+------+------+
7 rows in set (0.00 sec)
> select * from BBB;
+---------------------+------+------+
| timestamp | col1 | col2 |
+---------------------+------+------+
| 2012-03-17 08:18:16 | 2 | 1 |
| 2012-03-17 08:18:18 | 2 | 2 |
| 2012-03-17 08:18:20 | 2 | 3 |
+---------------------+------+------+
3 rows in set (0.00 sec)
> INSERT INTO AAA
SELECT BBB.* FROM BBB
LEFT JOIN AAA
USING(col1,col2)
WHERE AAA.timestamp IS NULL;
> select * from AAA;
+---------------------+------+------+
| timestamp | col1 | col2 |
+---------------------+------+------+
| 2012-03-17 08:17:22 | 1 | 1 |
| 2012-03-17 08:17:27 | 1 | 2 |
| 2012-03-17 08:17:30 | 1 | 3 |
| 2012-03-17 08:17:32 | 1 | 4 |
| 2012-03-17 08:17:49 | 2 | 2 |
| 2012-03-17 08:17:52 | 2 | 3 |
| 2012-03-17 08:17:54 | 2 | 4 |
| 2012-03-17 08:18:16 | 2 | 1 |
+---------------------+------+------+
8 rows in set (0.00 sec)