How can I build a query with records such that 2 columns are unique?
This is my attempted code:
Select x.a, y.b
from table1 x, table2 y
where x.id = y.id;
1 - a
2 - b
3 - a
4 - b
5 - c
1 - c
6 - a
2 - a
should return this:
1 - a
2 - b
5 - c
But there is data loss, that's okay, I only want both unique columns.
I have tried to use group by, but that doesn't get the uniqueness of both.
To make it more descriptive about the intention behind:
table1 above stores classification, and table2 stores values. One classification can have multiple values and one value can belong to any number of classification. Now i want to pick all(or some) classifications from table1 and corresponding to each pick one value, condition is that the values should not repeat if it is picked with any other classification. Here in my example 1,2,3..6 are classifications and a,b,c are values. Now i want to pick classifications and corresponding values, i picked classification 1,2,5 and corresponding values a,b,c because all other classification will make the value repetition(say 3 will make value a repetition as a is already picked with classification 1).
OP was looking for unique values (table2) with an arbitrary classification (table1). As such, a method that can be used is to enumerate all the classification and value pairs within each group of values from table2 and then only select the first value from each group.
WITH
enumerated_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY y.b ORDER BY x.a DESC) rn
, x.a
, y.b
FROM table1 x
INNER JOIN table2 y ON x.id = y.id
)
SELECT a, b
FROM enumerated_data
WHERE rn = 1
If your data is as you posted above, then it is already unique.
1 - a
2 - b
3 - a
4 - b
5 - c
1 - c
6 - a
2 - a
[1,a], [3,a], [6,a], [2,a] are all separate unique values.
You can use an aggregate function like min or max on the number portion if you want to match the data like that. Your desired output is not min or max though, so not really sure what you are trying to achieve.
Sounds like a two-level group by is required?
select count(*), x.a, y.b
from table1 x
inner join table2 y on x.id = y.id
group by x.a, y.b;
Related
I have a table showing production steps (PosID) for a production order (OrderID) and which machine (MachID) they will be run on; I’m trying to reduce the table to show one record for each order – the lowest position (field “PosID”) that is still open (field “Open” = Y); i.e. the next production step for the order.
Example data I have:
OrderID
PosID
MachID
Open
1
1
A
N
1
2
B
Y
1
3
C
Y
2
4
C
Y
2
5
D
Y
2
6
E
Y
Example result I want:
OrderID
PosID
MachID
1
2
B
2
4
C
I’ve tried two approaches, but I can’t seem to get either to work:
I don’t want to put “MachID” in the GROUP BY because that gives me all the records that are open, but I also don’t think there is an appropriate aggregate function for the “MachID” field to make this work.
SELECT “OrderID”, MIN(“PosID”), “MachID”
FROM Table T0
WHERE “Open” = ‘Y’
GROUP BY “OrderID”
With this approach, I keep getting error messages that T1.”PosID” (in the JOIN clause) is an invalid column. I’ve also tried T1.MIN(“PosID”) and MIN(T1.”PosID”).
SELECT T0.“OrderID”, T0.“PosID”, T0.“MachID”
FROM Table T0
JOIN
(SELECT “OrderID”, MIN(“PosID”)
FROM Table
WHERE “Open” = ‘Y’
GROUP BY “OrderID”) T1
ON T0.”OrderID” = T1.”OrderID”
AND T0.”PosID” = T1.”PosID”
Try this:
SELECT “OrderID”,“PosID”,“MachID” FROM (
SELECT
T0.“OrderID”,
T0.“PosID”,
T0.“MachID”,
ROW_NUMBER() OVER (PARTITION BY “OrderID” ORDER BY “PosID”) RNK
FROM Table T0
WHERE “Open” = ‘Y’
) AS A
WHERE RNK = 1
I've included the brackets when selecting columns as you've written it in the question above but in general it's not needed.
What it does is it first filters open OrderIDs and then numbers the OrderIDs from 1 to X which are ordered by PosID
OrderID
PosID
MachID
Open
RNK
1
2
B
Y
1
1
3
C
Y
2
2
4
C
Y
1
2
5
D
Y
2
2
6
E
Y
3
After it filters on the "rnk" column indicating the lowest PosID per OrderID. ROW_NUMBER() in the select clause is called a window function and there are many more which are quite useful.
P.S. Above solution should work for MSSQL
I currently have two tables, A and B, where
Table A:-
col1 col2
a 1,2,3
b 1,4,5
c 4
Table B:-
ID metric 1
1 231.0
2 1123.1
3 110
4 1231
5 116
I have to find the mean value of metric 1 for each col1 value in Table A. The resulting table should contain col1 in descending order measured by avg(metric1) value from table B, using SQL
Result: -
col1 avg(metric1) count
c 1231 1
b 526 3
c 488 3
any ideas on how I can come up with a query for the same in Postgres SQL? I've tried the following query, but this does not work :
combined_stats AS(
select avg(metric1), count(*)
from table_b
where ID in (select col2 from table_a)
group by (select col1 from table_a)
Fix your data model! Do not store numbers in strings! Do not store multiple values in a string!
Let me assume that you are stuck with someone else's really bad data model. If so, you can split the results and join:
select a.col1, avg(b.metric1), count(b.id)
from a left join
b
on b.id = any (regexp_split_to_array(col2, ','))
group by a.col1;
Note: If b.id is a number, then you need to deal with type conversions, something like:
on b.id::text = any (regexp_split_to_array(col2, ','))
Here is a db<>fiddle.
Fair warning: I'm new to using SQL. I do so on an Oracle server either via AQT or with SQL Developer.
As I haven't been able to think or search my way to an answer, I put myself in your able hands...
I'd like to combine data from table A (high quality data) with data from table B (fresh data) such that the entries from B are only included when the date stamp are later than those available from table A.
Both tables include entries from multiple entities, and the latest date stamp varies with those entities.
On the 4th of january, the tables may look something like:
A____________________________ B_____________________________
entity date type value entity date type value
X 1.jan 1 1 X 1.jan 1 2
X 1.jan 0 1 X 1.jan 0 2
X 2.jan 1 1 X 2.jan 1 2
Y 1.jan 1 1 (new entry)X 3.jan 1 1
Y 3.jan 1 1 Y 1.jan 1 2
Y 3.jan 1 2
(new entry)Y 4.jan 1 1
I have made an attempt at some code that I hope clarify my need:
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
WHERE date > ALL (SELECT date FROM AA)
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
Now, if the WHERE date > ALL (SELECT date FROM AA)would work seperately for each entity, I think have what I need.
That is, for each entity I want all entries from A, and only newer entries from B.
As the data in table A often differ from that of B (values are often corrected) I dont think I can use something like: table A UNION ALL (table B MINUS table A)?
Thanks
Essentially you are looking for entries in BB which do not exist in AA. When you are doing date > ALL (SELECT date FROM AA) this will not take into consideration the entity in question and you will not get the correct records.
Alternative is to use the JOIN and filter out all matching entries with AA.
Something like below.
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
LEFT OUTER JOIN AA
ON AA.entity = BB.entity
AND AA.DATE = BB.date
WHERE AA.date == null
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
I find your question confusing, because I don't know where the aggregation is coming from.
The basic idea on getting newer rows from table_b uses conditions in the where clause, something like this:
select . . .
from table_a a
union all
select . . .
from table_b b
where b.date > (select max(a.date) from a where a.entity = b.entity);
You can, of course, run this on your CTEs, if those are what you really want to combine.
Use UNION instead of UNION ALL , it will remove the duplicate records
SELECT * FROM (
SELECT *
FROM AA
UNION
SELECT *
FROM BB )
I am using Teradata and am stuck trying to write some code... I would like to remove the rows in which columnB has a duplicate value, based on the values in ColumnA - if anyone can help me that would be great!
I have a sequencial number in columnA and would like to retain the row with the highest value in columnA.
eg. in the below table I would like to retain rows 9,7,6 & 2, because although they have a duplicate in column 2 they have the highest ColumnA value for that Letter.
Table name: DataTable
Column1 Column2 Column3 Column4 Column5
1 B X X X
2 A Y Y Y
3 E Z Z Z
4 B X X X
5 C Y Y Y
6 E Z Z Z
7 C X X X
8 B Y Y Y
9 B Z Z Z
If you just want to select the rows, you can do:
select t.*
from t
where t.columnA = (select max(t2.columnA) from t t2 where t2.columnB = t.columnB);
If you actually want to remove them, then one method is:
delete from t
where t.columnA < (select max(t2.columnA) from t t2 where t2.columnB = t.columnB);
If you want to return those rows using a SELECT there's no need for a Correlated Subquery, OLAP-functions usually perform better:
select *
from tab
qualify
row_number() over (partition by ColumnB order by columnA DESC) = 1
If you actually want to DELETE the other rows go for Gordon's query.
I tried to solve one problem but without success.
I have two list of number
{1,2,3,4}
{5,6,7,8,9}
And I have table
ID Number
1 1
1 2
1 7
1 2
1 6
2 8
2 7
2 3
2 9
Now I need to count how many times number from second list come after number from first list but I should count only one by one id
in example table above result should be 2
three matched pars but because we have only two different IDs result is 2 instead 3
Pars:
1 2
1 7
1 2
1 6
2 3
2 9
note. I work with MSSQL
Edit. There is one more column Date which determined order
Edit2 - Solution
i write this query
SELECT * FROM table t
left JOIN table tt ON tt.ID = t.ID
AND tt.Date > t.Date
AND t.Number IN (1,2,3,4)
AND tt.Number IN (6,7,8,9)
And after this I had a plan to group by id and use only one match for each id but execution take a lot time
Here is a query that would do it:
select a.id, min(a.number) as a, min(b.number) as b
from mytable a
inner join mytable b
on a.id = b.id
and a.date < b.date
and b.number in (5,6,7,8,9)
where a.number in (1,2,3,4)
group by a.id
Output is:
id a b
1 1 6
2 3 9
So the two pairs are output each on one line, with the value a belonging to the first group of numbers, and the value of column b to the second group.
Here is a fiddle
Comments on attempt (edit 2 to question)
Later you added a query attempt to your question. Some comments about that attempt:
You don't need a left join because you really want to have a match for both values. inner join has in general better performance, so use that.
The condition t.Number IN (1,2,3,4) does not belong in the on clause. In combination with a left join the result will include t records that violate this condition. It should be put in the where clause.
Your concern about performance may be warranted, but can be resolved by adding a useful index on your table, i.e. on (id, number, date) or (id, date, number)