SQL left join to remove duplicates - sql

I have the following data table:
Table:
ID Event
A 1
A ?
B ?
I want to write a SQL query so that I can remove duplicate ID's with preference for actual values over '?'. I can also guarantee that the only duplicates are for IDs that have a regular event (value 1-9) as well as a '?' event. So in the case above, my query should return:
ID Event
A 1
B ?
I want my query to return the rows that match this description as well as all the columns for those rows. My attempt so far is a left join:
sel L.*
from table L
left join table R
on L.ID = R.ID and
(L.Event is null and R.Event is not null)
where R.ID is null
This seems to partially work. It's able to remove duplicates but somehow for the case where there is a non duplicate ID like B with a '?" event in the above example, that row is removed. However, there are other cases where the same case is kept.
Why is this happening? I would think that might join on condition is correct since I check for when
R.Event is not null
but something is evidently wrong in my logic. Any help would be greatly appreciated.

There are multiple ways to solve this problem, but left join just doesn't come to mind.
How about this?
select t.*
from t
where event <> '?'
union all
select t.*
from t
where event = '?' and
not exists (select 1 from t t2 where t2.id = t.id and t2.event <> '?');
For these values, you could also use group by:
select id, min(event)
from t
group by id;
But aggregation would not keep all the other values in the row.
Or, a general approach to such prioritization queries is row_number():
select t.*
from (select t.*,
row_number() over (partition by id
order by (case when event = '?' then 1 else 2 end) desc
) as seqnum
from t
) t
where seqnum = 1;

Related

Oracle 11G DB : Result from 'clob' type column in view changed while using the view in a where clause

I have the current query that i'm running in Oracle:
WITH viewa
AS (SELECT c.columna
FROM sometable c
LEFT JOIN othertable u
ON ( c.id = u.id )
WHERE id= '111'
ORDER BY c.created_date)
SELECT columna
FROM (SELECT rownum AS row_num,
t.*
FROM viewa t)
WHERE row_num > (SELECT CASE
WHEN ( Count(*) > 100 ) THEN Count(*) - 100
ELSE 0
END AS num
FROM viewa)
the idea is to always get the first 100 rows.
as you can see, i'm creating a view at the beginning and use it twice:
in the from and in the where.
i'm doing that so i wouldn't need to fetch the first select twice and it also make the query more readable.
notice that columna is of type CLOB!!
when i'm doing the same query with other column types its working!
so its probably something related to the clob column
The weird think is that the results that im getting are empty values even though i have values in the DB!
when i'm removing the subselect in the where i'm getting the right result:
WITH viewa
AS (SELECT c.columna
FROM sometable c
LEFT JOIN othertable u
ON ( c.id = u.id )
WHERE id = '111'
ORDER BY c.created_date)
SELECT columna
FROM (SELECT rownum AS row_num,
t.*
FROM viewa t)
WHERE row_num > 0
seems like Oracle is turning the values for the Clob column "columnA" into null when using the view in the where.
is someone familiar with that?
know how to go around it ?
i solved it with a different query but i still would like to know if Oracle does change the view while fetching from it?
thank you
Without sample data this is hard but I'm guessing the reason is you're depending on rownum. Use the FETCH clause instead to limit the number of rows.
WITH viewa
AS (SELECT c.columna
FROM sometable c
LEFT JOIN othertable u ON ( c.id = u.id )
-- an order by clause should go here
FETCH FIRST 100 ROWS ONLY)
SELECT columna
FROM viewa
But you don't need that CTE at all, just do
SELECT c.columna
FROM sometable c
LEFT JOIN othertable u ON ( c.id = u.id )
-- an order by clause should go here
FETCH FIRST 100 ROWS ONLY
Note that the "first" rows are not guaranteed to be a specific set of rows unless you explicitly add an ORDER BY clause.
Since 11g does not have FETCH FIRST, you can just use rownumber as the limiting criteria. See Example at Oracle Live
select columna, created_date
from (
select c.columna, c.created_date
from sometable c
left join othertable u
on ( c.id = u.id )
where c.id = '111'
order by c.created_date
)
where rownum <= 10;

Postgresql - Group By

I have a simple groupby scenario. Below is the output of the query.
Query is:
select target_date, type, count(*) from table_name group by target_date, type
The query and output is perfectly good.
My problem is I am using this in Grafana for plotting. That is Grafana with postgres as backend.
What happens is since "type2" category is missed on 01-10-2020 and 03-10-2020, type2 category never gets plotted (side to side bar plot) at all. Though "type2" is present in other days.
It is expecting some thing like
So whenever a category is missed in a date, we need a count with 0 value.
Need to handle this in query, as the source data cannot be modified.
Any help here is appreciated.
You need to create a list of all the target_date/type combinations. That can be done with a CROSS JOIN of two DISTINCT selects of target_date and type. This list can beLEFT JOINed to table_name to get counts for each combination:
SELECT dates.target_date, types.type, COUNT(t.target_date)
FROM (
SELECT DISTINCT target_date
FROM table_name
) dates
CROSS JOIN (
SELECT DISTINCT type
FROM table_name
) types
LEFT JOIN table_name t ON t.target_date = dates.target_date AND t.type = types.type
GROUP BY dates.target_date, types.type
ORDER BY dates.target_date, types.type
Demo on dbfiddle
You may use a calendar table approach here:
SELECT
t1.target_date,
t2.type,
COUNT(t3.target_date) AS count
FROM (SELECT DISTINCT target_date FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT type FROM yourTable) t2
LEFT JOIN yourTable t3
ON t3.target_date = t1.target_date AND
t3.type = t2.type
GROUP BY
t1.target_date,
t2.type
ORDER BY
t1.target_date,
t2.type;
The idea here is to cross join subqueries finding all distinct target dates and types, to generate a starting point for the query. Then, we left join this intermediate table to your actual table, and find the counts for each date and type.
select t.target_date, tmp.type, sum(case when t.type = tmp.type then 1 else 0 end)
from your_table t
cross join (select distinct type from your_table) tmp
group by t.target_date, tmp.type
Demo

Firebird group clause

I can't to understand firebird group logic
Query:
SELECT t.id FROM T1 t
INNER JOIN T2 j ON j.id = t.jid
WHERE t.id = 1
GROUP BY t.id
works perfectly
But when I try to get other fields:
SELECT * FROM T1 t
INNER JOIN T2 j ON j.id = t.jid
WHERE t.id = 1
GROUP BY t.id
I get error: Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause)
When you use GROUP BY in your query, the field or fields specified are used as 'keys', and data rows are grouped based on unique combinations of those 2 fields. In the result set, every such unique combination has one and only one row.
In your case, the only identifier in the group is t.id. Now consider that you have 2 records in the table, both with t.id = 1, but having different values for another column, say, t.name. If you try to select both id and name columns, it directly contradicts the constraint that one group can have only one row. That is why you cannot select any field apart from the group key.
For aggregate functions it is different. That is because, when you sum or count values or get the maximum, you are basically performing that operation only based on the id field, effectively ignoring the data in the other columns. So, there is no issue because there can only be one answer to, say, count of all names with a particular id.
In conclusion, if you want to show a column in the results, you need to group by it. This will however, make the grouping more granular, which may not be desirable. In that case, you can do something like this:
select * from T1 t
where t.id in
(SELECT t.id FROM T1 t
INNER JOIN T2 j ON j.id = t.jid
WHERE t.id = 1
GROUP BY t.id)
When you using GROUP BY clause in SELECT you should use only aggreagted functions or columns that listed in GROUP BY clause. More about GROUP BY clause:http://www.firebirdsql.org/manual/nullguide-aggrfunc.html
As example:
SELECT Max(t.jid), t.id FROM T1 t
INNER JOIN T2 j ON j.id = t.jid
WHERE t.id = 1
GROUP BY t.id
SELECT * FROM T1 t
INNER JOIN T2 j ON j.id = t.jid
WHERE t.id = 1
GROUP BY t.id
This will not execute,cause you have used t.id in group by, So all your columns in select clause should be using aggregate function , else those should be included in group by clause.
Select * means you are selecting all columns, so all columns except t.id are neither in group by nor in aggregate function.
Try this link, How to use GROUP BY in firebird

How do I limit the number of rows returned by this LEFT JOIN to one?

So I think I've seen a solution to this however they are all very complicated queries. I'm in oracle 11g for reference.
What I have is a simple one to many join which works great however I don't need the many. I just want the left table (the one) to just join any 1 row which meets the join criteria...not many rows.
I need to do this because the query is in a rollup which COUNTS so if I do the normal left join I get 5 rows where I only should be getting 1.
So example data is as follows:
TABLE 1:
-------------
TICKET_ID ASSIGNMENT
5 team1
6 team2
TABLE 2:
-------------
MANAGER_NAME ASSIGNMENT_GROUP USER
joe team1 sally
joe team1 stephen
joe team1 louis
harry team2 ted
harry team2 thelma
what I need to do is join these two tables on ASSIGNMENT=ASSIGNMENT_GROUP but only have 1 row returned.
when I do a left join I get three rows returned beaucse that is the nature of hte left join
If oracle supports row number (partition by) you can create a sub query selecting where row equals 1.
SELECT * FROM table1
LEFT JOIN
(SELECT *
FROM (SELECT *,
ROW_NUMBER()
OVER(PARTITION BY assignmentgroup ORDER BY assignmentgroup) AS Seq
FROM table2) a
WHERE Seq = 1) v
ON assignmet = v.assignmentgroup
You could do something like this.
SELECT t1.ticket_id,
t1.assignment,
t2.manager_name,
t2.user
FROM table1 t1
LEFT OUTER JOIN (SELECT manager_name,
assignment_group,
user,
row_number() over (partition by assignment_group
--order by <<something>>
) rnk
FROM table2) t2
ON ( t1.assignment = t2.assignment_group
AND t2.rnk = 1 )
This partitions the data in table2 by assignment_group and then arbitrarily ranks them to pull one arbitrary row per assignment_group. If you care which row is returned (or if you want to make the row returned deterministic) you could add an ORDER BY clause to the analytic function.
I think what you need is to use GROUP BY on the ASSIGNMENT_GROUP field.
http://www.w3schools.com/sql/sql_groupby.asp
In MySQL you could just GROUP BY ASSIGNMENT and be done. Oracle is more strict and refuses to just choose (in an undefined way) which values of the three rows to choose. That means all returned columns need to be part of GROUP BY or be subject to an aggregat function (COUNT, MIN, MAX...)
You can of course choose to just don't care and use some aggregat function on the returned columns.
select TICKET_ID, ASSIGNMENT, MAX(MANAGER_NAME), MAX(USER)
from T1
left join T2 on T1.ASSIGNMENT=T2.ASSIGNMENT_GROUP
group by TICKET_ID, ASSIGNMENT
If you do that I would seriously doubt that you need the JOIN in the first place.
MySQL could also help with GROUP_CONCAT in the case that you want a string concatenation of group values for a column (humans often like that), but with Oracle that is staggeringly complex.
Using a subquery as already suggested is an option, look here for an example. It also allows you to sort the subquery before selecting the top row.
In Oracle, if you want 1 result, you can use the ROWNUM statement to get the first N values of a query e.g.:
SELECT *
FROM TABLEX
WHERE
ROWNUM = 1 --gets the first value of the result
The problem with this single query is that Oracle never returns the data in the same order. So, you must oder your data before use rownum:
SELECT *
FROM
(SELECT * FROM TABLEX ORDER BY COL1)
WHERE
ROWNUM = 1
For your case, looks like you only need 1 result, so your query should look like:
SELECT *
FROM
TABLE1 T1
LEFT JOIN
(SELECT *
FROM TABLE2 T2 WHERE T1.ASSIGNMENT = T2.ASSIGNMENT_GROUP
AND
ROWNUM = 1) T3 ON T1.ASSIGNMENT = T3.ASSIGNMENT_GROUP
you can use subquery - select top 1

Is possible have different conditions for each row in a query?

How I can select a set of rows where each row match a different condition?
Example:
Supposing I have a table with a column called name, I want the result ONLY IF the first row name matches 'A', the second row name matches 'B' and the third row name matches 'C'.
Edit:
I want to do this to work without a fixed size, but in a way I can define the sequence like R,X,V,P,T and it matches the sequence, each one in a row, but in the order.
you can, but probably not in a way you would want:
if your table has a numeric id field, that is incremented with each row, you can self join that table 3 times (lets say as "a", "b" and "c") and use the join condition a.id + 1 = b.id and b.id + 1 = c.id and put you filter in a where clause like: a.name = 'A' AND b.name = 'B' AND c.name = 'C'
but don't expect performance ...
Assuming that You know how to provide a row number to your rows (ROW_NUMBER() in SQL Server, for instance), You can create a lookup (match) table and join on it. See below for explanation:
LookupTable:
RowNum Value
1 A
2 B
3 C
Your SourceTable source table (assuming You already added RowNum to it-in case You didn't, just introduce subquery for it (or CTE for SQL Server 2005 or newer):
RowNum Name
-----------
1 A
2 B
3 C
4 D
Now You need to inner join LookupTable with your SourceTable on LookupTable.RowNum = SourceTable.RowNum AND LookupTable.Name = SourceTable.Name. Then do a left join of this result with LookupTable on RowNum only. If there is LookupTable.RowNum IS NULL in final result then You know that there is no complete match on at least one row.
Here is code for joins:
SELECT T.*, LT2.RowNum AS Matched
FROM LookupTable LT2
LEFT JOIN
(
SELECT ST.*
FROM SourceTable ST
INNER JOIN LookupTable LT ON LT.RowNum = ST.RowNum AND LT.Name = ST.Name
) T
ON LT2.RowNum = T.RowNum
Result set of above query will contain rows with Matched IS NULL if row is not matching condition from LookupTable table.
I suppose you could do a sub query for each row, but it wouldn't perform well or scale well at all and would be hard to maintain.
This may be close to what your after... but I need to know where you're getting your values for A, B, C etc...
Select [insert your fields here]
FROM
(Select T1.Name, T1.Age, RowNum as t1RowNum from T T1 order by name) T1O
Full Outer JOIN
(Select T2.Name, T2.Age, RowNum as T2rowNum From T T2 order By name) T2O
ON T1O.T1RowNum+1 = T2O.T2RowNum