I have some data from a query and the shape pretty much looks like this:
| Id | category | value |
|----|----------|-------|
| 1 | 'a' | 2 |
| 1 | 'b' | 5 |
| 2 | 'a' | 3 |
| 2 | 'b' | 4 |
I'm wanting to group that data and insert it into a table of the following structure
| Id | category_a_value | category_b_value|
|----|------------------|-----------------|
| 1 | 2 | 5 |
| 2 | 3 | 4 |
Is there a nice way to achieve this in Postgres? I couldn't figure out to group the data how I wanted so eventually I tried a INSERT INTO and on conflict approach selecting from the orginial query but this failed because you can't affect a row multiple times.
Thanks in advance
You can use conditional aggregation, which in Postgres uses filter:
select id,
max(value) filter (where category = 'a') as category_a_value,
max(value) filter (where category = 'b') as category_b_value
from t
group by id;
You can then use insert . . . select to insert the results into an existing table.
Related
Say I have the following data:
+--------+-------+
| Group | Data |
+--------+-------+
| 1 | row 1 |
| 1 | row 2 |
| 1 | row 3 |
| 20 | row 1 |
| 20 | row 3 |
| 10 | row 1 |
| 10 | row A |
| 10 | row 2 |
| 10 | row 3 |
+--------+-------+
Is it possible to draw a map that shows which groups have which rows? Groups may not be contagious, so they can be placed into a separate table and use the row index for the string index instead. Something like this:
+-------+
| Group |
+-------+
| 1 |
| 20 |
| 10 |
+-------+
+-------+----------------+
| Data | Found in group |
+-------+----------------+
| row 1 | 111 |
| row A | 1 |
| row 2 | 1 1 |
| row 3 | 111 |
+-------+----------------+
Where the first character represents Group 1, the 2nd is Group 20 and the 3rd is Group 10.
Ordering of the Group rows isn't critical so long as I can reference which row goes with which character.
I only ask this because I saw this crazy example in the documentation generating a fractal, but I can't quite get my head around it.
Is this doable?
To find the missing values, first thing is to prepare a dataset which have all possible combination. You can achieve that using CROSS JOIN.
Once you have that DataSet, compare it with the actual DataSet.
Considering the Order by is done in the Grp column, you can achieve it using below.
SELECT
a.Data,group_concat(case when base.Grp is null then "." else "1" end,'') as Found_In_Group
,group_concat(b.Grp) as Group_Order
FROM
(SELECT Data FROM yourtable Group By Data)a
CROSS JOIN
(SELECT Grp FROM yourtable Group By Grp Order by Grp)b
LEFT JOIN yourtable base
ON b.Grp=base.Grp
AND a.Data=base.Data
GROUP BY a.Data
Note: Considered . instead of blank for better visibility to represent missing Group.
Data
Found_In_Group
Group_Order
row 1
111
1,10,20
row 2
11.
1,10,20
row 3
111
1,10,20
row A
.1.
1,10,20
Demo: Try here
SELECT Data, group_concat("Group") AS "Found in group"
FROM yourtable
GROUP BY Data
will give you a CSV list of groups.
I have a table that has a number column and an attribute column like this:
1.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+------------
I want to make the number unique, and the attribute to be whichever attribute occured most often for that number, like this (This is the end-product im interrested in) :
2.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 2 | b |
+------------
I have been working on this for a while and managed to write myself a query that looks up how many times an attribute occurs for a given number like this:
3.
+-----+-----+-----+
| num | att |count|
------------------+
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | a | 1 |
| 2 | b | 2 |
+-----------------+
But I can't think of a way to only select those rows from the above table where the count is the highest (for each number of course).
So basically what I am asking is given table 3, how do I select only the rows with the highest count for each number (Of course an answer describing providing a way to get from table 1 to table 2 directly also works as an answer :) )
You can use aggregation and window functions:
select num, att
from (
select num, att, row_number() over(partition by num order by count(*) desc, att) rn
from mytable
group by num, att
) t
where rn = 1
For each num, this brings the most frequent att; if there are ties, the smaller att is retained.
Oracle has an aggregation function that does this, stats_mode().:
select num, stats_mode(att)
from t
group by num;
In statistics, the most common value is called the mode -- hence the name of the function.
Here is a db<>fiddle.
You can use group by and count as below
select id, col, count(col) as count
from
df_b_sql
group by id, col
I have a table like below and would like to group across columns and then rows.
I have a solution that somewhat works but is very slow. Is there a more efficient way of doing it?
Thank you
| GROUP | VAL 1 | VAL 2 | VAL 3 |
| A | 1 | 2 | 3 |
| A | 4 | 5 | 6 |
| B | 7 | 8 | 9 |
| C | 10 | 11 | 12 |
Preferred result is
| GROUP | TEXT |
| A |123456|
| B | 789 |
| C |101112|
This is what I currently have but it is very slow. Is there an alternate solution which groups across columns and rows
select GROUP,
listagg(comments,',')
within GROUP (order by GROUP) "TEXT"
from
(select concat(val 1,concat(val 2,val 3)) as Comments, data.*
from data)
group by GROUP;
Thank you
Use the following query where you will directly get concatenated column values:
SELECT
"GROUP",
LISTAGG(VAL_1 || VAL_2 || VAL_3)
WITHIN GROUP(ORDER BY VAL_1) AS "TEXT"
FROM DATA
GROUP BY "GROUP";
Note: Do not use oracle reserved keywords as the column names. Here GROUP is the oracle reserved keyword.
Cheers!!
Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |
I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1