How to avoid duplicates of a single ID in SQL; Select Distinct - sql

I have some SQL code that reads like this. It intends to grab all of the data meeting the two conditions, but not to grab the data if we already have a row with the same ID as it. Select Distinct (t1.ID) works as intended, but when I add in the additional variables, it no longer filters properly.
Select Distinct (t1.ID),t1.Var2, t1.Var3...
FROM table_location AS t1
WHERE t1.FCT_BILL_CURRENCY_CODE_LCL = 'USD'
AND t1.RQ_GLOBAL_REGION = 'North America'
enter image description here
This clearly contains multiple rows with the same ID, contrary to how it should work. How do I fix this?

I'm not sure what DB you're using, but most will have the concept of numbering rows by a partition.
To get a distinct by a certain value, you need to make a subquery that selects your data plus a row number that is partitioned by your distinct property, then have your parent query select only the rows with 1 as the row number to get just the first of each.

I have added a query by looking into the sample query you mentioned in the problem. If you add sample data, the we will have better understanding of the problem.
Query
SELECT
ID,
Var2,
Var3
FROM (
SELECT
DENSE_RANK() OVER (PARTITION BY t1.ID ORDER BY t1.ID) AS Rnk_ID,
t1.ID,
t1.Var2,
t1.Var3
FROM table_location AS t1
WHERE t1.FCT_BILL_CURRENCY_CODE_LCL = 'USD'
AND t1.RQ_GLOBAL_REGION = 'North America'
) qry1
WHERE Rnk_ID = 1

Related

SQL select all columns for two distinct columns

The question might be not new but i need help from you .I have sql db table and i want to select all columns for the rows which have distinct values of two columns
.For instance i have a table named 'information' as below.
What i want is to select all rows with distinct values of 'ctc_card_no' and 'tarehe' .Can anyone please help me as i have struggled to get the results
I need results to be like
I think you want to use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.ctc_card_no = t.ctc_card_no and t2.tarehe = t.tarehe and
t2.id <> t.id
);
This returns all rows that appear only once in the table, which is how I interpret your question.
EDIT:
If you just want the distinct pairs, you can use group by:
select min(id), ctc_card_no, tarehe
from t
group by ctc_card_no, tarehe;
Maybe something like this?
select *
from information
where (ctc_card_no, tarehe) in
(select ctc_card_no, tarehe
from information
group by ctc_card_no, tarehe
having count(*) = 1
);

Use value of a field from a table dynamically in the 'where' clause of a HQL query?

Can one filter a table dynamically with a 'where' clause acting on a value of a field from another table under some other conditions such that it is made sure only one row is returned? Can I do something like this?
SELECT COUNT(*) FROM stud t1
WHERE t1.name==SELECT name FROM (
SELECT name, row_number() over (PARTITION BY name) AS rn
FROM stud t2) t3
WHERE t3.rn==1;
Of course, the above query is just a dummy one, but is filtering on where clause like above possible theoretically? If not how could one achieve such a functionality in the cases of more complex queries?
Yes. Query can be made like this:
SELECT COUNT(*) FROM stud t1
WHERE t1.name = (SELECT name
from sometable
where somecondition);
but you need to make sure that the subquery return zero or one row. If your query may return more than one row, you can use IN instead:
SELECT COUNT(*) FROM stud t1
WHERE t1.name IN (SELECT name
from sometable
where somecondition);

SQL Query for ID grouped AND WHERE otherfield has certain value

I have a table in Access that is set up where there are multiple records with the same ID, they correspond to each other.
I'd like to find certain records that have a specific date value. However, I want all the corresponding information WITH that ID (i.e. all the other records with the same ID). I've tried things like this:
SELECT *
FROM myTable
WHERE LEFT(Field1,7) = '2016-11' IN (SELECT ID
FROM myTable
GROUP BY ID
HAVING COUNT(*)>1)
and
SELECT *
FROM myTable
WHERE ID = (SELECT * FROM myTable WHERE LEFT(Field1,7) = '2016-11'
Neither of these are giving me the proper output. I think I may need a For loop of some sort but don't have much experience doing this with SQL. That way I can loop through all IDs that are returned with that date-part. Any suggestions? I would put the table format in the post but the table formatting isn't working for me for some reason. The frustration is real!
Haha thanks ahead of time for taking the time to even read my question. Much appreciated.
EDIT
Here is a visual of what my table is like:
ExampleTable
I'd like to choose all the records that occur during November, but also get the corresponding information (i.e. records with same ID number as the November records).
Consider adding WHERE condition in subquery:
SELECT *
FROM myTable
WHERE ID IN (SELECT ID FROM myTable
WHERE LEFT(Field1, 7) = '2016-11');
Alternatively to avoid subquery, try an INNER JOIN on a filtered self join by ID:
SELECT myTable.*
FROM myTable
INNER JOIN
(SELECT ID FROM myTable
WHERE LEFT(Field1, 7) = '2016-11') sub
ON sub.ID = myTable.ID

Querying query results

If I have a query for example
SELECT * FROM MY_TABLE WHERE FIRSTNAME = 'HENRY';
thats returns say twenty results for HENRY that are identical.
Is there a way to then query the results of the original query to only return non duplicates.
This is a trivial example but basically I have a query where I am trying to perform a SELECT DISTINCT on a large data set. If I don't specify DISTINCT I get a relatively small and fast return of some duplicate data. Is there any logic in SQL I can apply to then perform a SELECT DISTINCT on those results. Essentially breaking up the query to reduce response times? Assume everything of value is indexed.
Thanks
To return the first of a group of records you can do something like this:
select *
from
(
SELECT *, row_number() over (partition by firstname order by id) r
FROM MY_TABLE
--WHERE FIRSTNAME = 'HENRY'
) x
where x.r = 1
If the records are exact duplicates, you're not worried about the first since they're all the same, so you just want distinct records:
SELECT distinct *
FROM MY_TABLE
WHERE FIRSTNAME = 'HENRY'
or to see how many duplicates:
SELECT *, count(*)-1 NoOfDuplicates
FROM MY_TABLE
WHERE FIRSTNAME = 'HENRY'
group by firstname, lastname --, ...
Be warned that for the database to divide the data set up into those records which have a duplicate and those which do not is generally no more efficient than performing the actual distinct, unless the number of columns on which duplication occurs is very much less than the total number of columns.
In some cases of very wide tables where duplication exist only on a subset of columns and on a small proportion of the rows it might be more efficient to do something like:
select *
from my_table t1
where not exists (
select null
from my_table t2
where t2.duplication_column = t1.duplication_column and
t2.rowid != t1.rowid)
union all
select distinct *
from my_table t1
where exists (
select null
from my_table t2
where t2.duplication_column = t1.duplication_column and
t2.rowid != t1.rowid)
This would generally not be worth doing unless it avoided something very inefficient, like a very large sort spilling to disk.
Edit: modified the query

Merge two unrelated views into a single view

Let's say I have in my first view (ClothingID, Shoes, Shirts)
and in the second view I have (ClothingID, Shoes, Shirts) HOWEVER
the data is completely unrelated, even the ID field is not related in anyway.
I want them combined into 1 single view for reporting purposes.
so the 3rd view (the one I'm trying to make) should look like this: (ClothingID, ClothingID2, Shoes, Shoes2, Shirts, Shirts2)
so there's no relation AT ALL, I'm just putting them side by side, unrelated data into the same view.
Any help would be strongly appreciated
You want to combine the results, yet be able to tell the rows apart.
To duplicate all columns would be a bit of an overkill. Add a column with info about the source:
SELECT 'v1'::text AS source, clothingid, shoes, shirts
FROM view1
UNION ALL
SELECT 'v2'::text AS source, clothingid, shoes, shirts
FROM view2;
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_1
) v1
full outer join (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_2
) v2 on v1.row = v2.row
I think that full outer join that joins table using new unrelated column row will do the job.
row_number() exists in PostgreSQL 8.4 and above.
If you have lower version you can imitate row_number, example below. It's going to work only if ClothingID is unique in a scope of view.
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
) v1
full outer join (
select *, (select count(*) from view_2 t2
where t2.ClothingID <= t.ClothingID) as row
from view_2 t
) v2 on v1.row = v2.row
Added after comment:
I've noticed and corrected mistake in preceding query.
I'll try to explain a bit. First of all we'll have to add a row numbers to both views to make sure that there are no gaps in id's. This is quite simple way:
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
This consist of two things, simple query selecting rows(*):
select *
from view_1 t
and correlated subquery (read more on wikipedia):
(
select count(*)
from view_1 t1
where t1.ClothingID <= t.ClothingID
) as row
This counts for each row of outer query (here it's (*)) preceding rows including self. So you might say count all rows which have ClothingID less or equal like current row for each row in view. For unique ClothingID (that I've assumed) it gives you row numbering (ordered by ClothingID).
Live example on data.stackexchange.com - row numbering.
After that we can use both subqueries with row numbers to join them (full outer join on Wikipedia), live example on data.stackexchange.com - merge two unrelated views.
You could use Rownumber as a join parameter, and 2 temp tables?
So something like:
Insert #table1
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC) [Row_ID], Clothing_ID, Shoes, Shirts)
FROM Table1
Insert #table2
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC)[RowID], Clothing_ID, Shoes, Shirts)
FROM Table2
Select t1.Clothing_ID, t2.Clothing_ID,t1.Shoes,t2.Shoes, t1.Shirts,t2.Shirts
from #table1 t1
JOIN atable2 t2 on t1.Row_ID = t2.Row_ID
I think that should be roughly sensible. Make sure you are using the correct join so the full output for both queries appear
e;fb
If the views are unrelated, SQL will struggle to deal with it. You can do it, but there's a better and simpler way...
I suggest merging them one after the other, rather than side-by-side as you have suggested, ie a union rather than a join:
select 'view1' as source, ClothingID, Shoes, Shirts
from view1
union all
select 'view2', ClothingID, Shoes, Shirts
from view2
This would be the usual approach for this kind of situation, and is simple to code and understand.
Note the use of UNION ALL, which preserves row order as selected and does not remove duplicates, as opposed to UNION, which sorts the rows and removes duplicates.
Edited
Added a column indicating which view the row came from.
You can try following:
SELECT *
FROM (SELECT row_number() over(), * FROM table1) t1
FULL JOIN (SELECT row_number() over(), * FROM table2) t2 using(row_number)