How to remove duplicate data lines with data not in came columns

How to remove duplicate data lines with data not in came columns - sql

A SELECT QUERY for Person Matches produces a Table in which each 2nd line contains the same info as the line above it.
After a Sort By Surname,GivenName,BirthD
e.g.
IDIR1, Surname, GivenName, BirthD IDIR2.
IDIR2, Surname, GivenName, BirthD IDIR1.
(Both persons have the same criteria but diff IDIR)
What options are there to eliminate the appearence of the 2nd Lines.
Delete is acceptable but NOT IN, <>, etc. do not work because:
All IDIRs (1 & 2) are in the 2 IDIR columns.
Only one line is read to check if both are Individuals & not Same Person.

Something along theese lines should give you a result set where every second record has been left out.
SELECT a.*
FROM thetable AS a
JOIN thetable AS b
ON a.Surname = b.Surname
AND a.GivenName = b.GivenName
...
WHERE a.IDIR < b.IDIR
Note that it is untested - you may need to clean it up a little bit, but the trick is using a.IDIR < b.IDIR to weed out the duplicates.

Related

SQL Update based on secondary table in BQ

I have 2 tables, 1 containing the main body of information, the second contains information on country naming convensions. in the information table, countries are identified by Name, I would like to update this string to contain an ISO alpha 3 value which is contained in the naming convention table. e.g turning "United Kingdom" -> "GBR"
I have wrote the following query to make the update, but it effects 0 rows
UPDATE
`db.catagory.test_votes_ds`
SET
`db.catagory.test_votes_ds`.country = `db.catagory.ISO-Alpha`.Alpha_3_code
FROM
`db.catagory.ISO-Alpha`
WHERE
`LOWER(db.catagory.ISO-Alpha`.Country) = LOWER(`db.catagory.test_votes_ds`.country)
I've done an inner join outside of the update between the 2 to make sure that the values are compatable and it returns the correct value, any ideas as to why it isn't updating?
The join used to validate the result is listed below, along with the result:
SELECT
`db.catagory.test_votes_ds`.country, `db.catagory.ISO-Alpha`.Alpha_3_code
from
`db.catagory.test_votes_ds`
inner join
`db.catagory.ISO-Alpha`
on
LOWER(`db.catagory.test_votes_ds`.country) = LOWER(`db.catagory.ISO-Alpha`.Country)
1,Ireland,IRL
2,Australia,AUS
3,United States,USA
4,United Kingdom,GBR

This is not exactly an answer. But your test may not be sufficient. You need to check where the values do not match. So, to return those:
select tv.*
from `db.catagory.test_votes_ds` tv left join
`db.catagory.ISO-Alpha` a
on LOWER(tv.country) = LOWER(a.Country)
where a.Country IS NULL;
I suspect that you will find countries that do not match. So when you run the update, the matches are getting changed the first time. Then the non-matches are never changed.

SQL query , group by only one column

i want to group this query by project only because there are two records of same project but i only want one.
But when i add group by clause it asks me to add other columns as well by which grouping does not work.
*
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from Filterednew_projektkondition ps
left join Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
--and new_projekt= Filterednew_projekt.new_
--group by new_projekt
*
look at the column new_projekt . row 2 and 3 has same project, but i want it to appear only once. Due to different other columns this is not possible.
if its of interested , there is another coluim projectcondition id which is unique for both

You can't ask a database to decide arbitrarily for you, which records should be thrown away when doing a group. You have to be precise and specific
Example, here is some data about a person:
Name, AddressZipCode
John Doe, 90210
John Doe, 12345
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id
There are two addresses stored for this one guy, the person data is repeated in the output!
"I don't want that. I only want to see one line for this guy, together with his address"
Which address?
That's what you have to tell the database
"Well, obviously his current address"
And how do you denote that an address is current?
"It's the one with the null enddate"
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id WHERE address.enddate = null
If you still get two addresses out, there are two address records that are null - you have data that is in violation of your business data modelling principles ("a person's address history shall have at most one adddress that is current, denoted by a null end date") - fix the data
"Why can't i just group by name?"
You can, but if you do, you still have to tell the database how to accumulate the non-name data that it shows you. You want an address data out of it, it has 2 it wants to show you, you have to tell it which to discard. You could do this:
SELECT name, MAX(addresszipcode) FROM person INNER JOIN address on address.personid = person.id GROUP BY name
"But I don't want the max zipcode? That doesn't make sense"
OK, use the MIN, the SUM, the AVG, anything that makes sense. If none of these make sense, then use something that does, like the address line that has the highest end date, or the lowest end date that is a future end date. If you only want one address on show you must decide how to boil that data down to just one record - you have to write the rule for the database to follow and no question about it you have to create a rule so make it a rule that describes what you actually want
Ok, so you created a rule - you want only the rows with the minimum new_stundenstatz
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from
(SELECT *, ROW_NUMBER() OVER(PARTITON BY new_projekt ORDER BY new_stundensatz) rown FROM Filterednew_projektkondition) ps
left join
Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
and ps.rown = 1
Here I've used an analytic operation to number the rows in your PS table. They're numbered in order of ascending new_stundensatz, starting with 1. The numbering restarts when the new_projekt changes, so each new_projekt will have a number 1 row.. and then we make that a condition of the where
(Helpful side note for applying this technique in future.. Ff it were the FP table we were adding a row number to, we would need to put AND fp.rown= 1 in the ON clause, not the WHERE clause, because putting it in the where would make the LEFT join behave like an INNER, hiding rows that don't have any FP matching record)

Completely Unique Rows and Columns in SQL

I want to randomly pick 4 rows which are distinct and do not have any entry that matches with any of the 4 chosen columns.
Here is what I coded:
SELECT DISTINCT en,dialect,fr FROM words ORDER BY RANDOM() LIMIT 4
Here is some data:
**en** **dialect** **fr**
number SFA numero
number TRI numero
hotel CAI hotel
hotel SFA hotel
I want:
**en** **dialect** **fr**
number SFA numero
hotel CAI hotel
Some retrieved rows would have something similar with each other, like having the same en or the same fr, I would like to retrieved rows that do not share anything similar with each other, how do I do that?

I think I’d do this in the front end code rather the dB, here’s a pseudo code (don’t know what your node looks like):
var seenEn = “en not in (''“;
var seenFr = “fr not in (''“;
var rows =[];
while(rows.length < 4)
{
var newrow = sqlquery(“SELECT *
FROM table WHERE “ + seenEn + “) and ”
+ seenFr + “) ORDER BY random() LIMIT 1”);
if(!newrow)
break;
rows.push(newrow);
seenEn += “,‘“+ newrow.en + “‘“;
seenFr += “,‘“+ newrow.fr + “‘“;
}
The loop runs as many times as needed to retrieve 4 rows (or maybe make it a for loop that runs 4 times) unless the query returns null. Each time the query returns the values are added to a list of values we don’t want the query to return again. That list had to start out with some values (null) that are never in the data, to prevent a syntax error when concatenation a comma-value string onto the seenXX variable. Those syntax errors can be avoided in other ways like having a Boolean of “if it’s the first value don’t put the comma” but I chose to put dummy ineffective values into the sql to make the JS simpler. Same goes for the
As noted, it looks like JS to ease your understanding but this should be treated as pseudo code outlining a general algorithm - it’s never been compiled/run/tested and may have syntax errors or not at all work as JS if pasted into your file; take the idea and work it into your solution
Please note this was posted from an iphone and it may have done something stupid with all the apostrophes and quotes (turned them into the curly kind preferred by writers rather than the straight kind used by programmers)

You can use Rank or find first row for each group to achieve your result,
Check below , I hope this code will help you
SELECT 'number' AS Col1, 'SFA' AS Col2, 'numero' AS Col3 INTO #tbl
UNION ALL
SELECT 'number','TRI','numero'
UNION ALL
SELECT 'hotel','CAI' ,'hotel'
UNION ALL
SELECT 'hotel','SFA','hotel'
UNION ALL
SELECT 'Location','LocationA' ,'Location data'
UNION ALL
SELECT 'Location','LocationB','Location data'
;
WITH summary AS (
SELECT Col1,Col2,Col3,
ROW_NUMBER() OVER(PARTITION BY p.Col1 ORDER BY p.Col2 DESC) AS rk
FROM #tbl p)
SELECT s.Col1,s.Col2,s.Col3
FROM summary s
WHERE s.rk = 1
DROP TABLE #tbl

SQL statement joining 3 Oracle tables

I have reviewed several sites and numerous tutorials trying to determine the closest match for my situation and while I find many similar possible resolutions, no exact solution gives me what I think I need. With each attempt I either returned incorrect data, duplicate or unable to return all fields desired. Any assistance is greatly appreciated. Thanks, Newbie
I have three tables, they each share a primary_key of SUBR_ID. In lay terms, I am trying to extract BENEFIT_CARRY_OVER from TBL_SUBR_INDV_CARRY_OVER for all SUBR_ID and sub INDV_ID (one-to-many) where GRP_ID = '0G0000000', selecting first name, last name, and subscriber ID, Individual ID, and benefit carry over. Below are the referenced three tables and the statement attempts tried.
TBL_SUBR_INDV_CARRY_OVER:
SUBR_ID
INDV_ID
BENEFIT_CARRY_OVER
TBL_SUBR_GRP:
SUBR_ID
GRP_ID
TBL_SUBR_INDV:
SUBR_ID
INDV_ID
LNME
FNME
Attempt#1
select DISTINCT DCS2000.TBL_SUBR_GRP.SUBR_ID, DCS2000.TBL_SUBR_INDV.INDV_ID, LNME, FNME, GRP_ID, BENEFIT_YEAR, BENEFIT_CARRY_OVER
from DCS2000.TBL_SUBR_INDV_CARRY_OVER,
DCS2000.TBL_SUBR_GRP,
DCS2000.TBL_SUBR_INDV
where DCS2000.TBL_SUBR_INDV_CARRY_OVER.SUBR_ID = DCS2000.TBL_SUBR_INDV.SUBR_ID
and DCS2000.TBL_SUBR_INDV.SUBR_ID = DCS2000.TBL_SUBR_GRP.SUBR_ID
and DCS2000.TBL_SUBR_GRP.GRP_ID = '0G0000000'
and DCS2000.TBL_SUBR_INDV_CARRY_OVER.BENEFIT_CARRY_OVER > '0'
Attempt#2
select DCS2000.TBL_SUBR_INDV.SUBR_ID, DCS2000.TBL_SUBR_INDV.INDV_ID, DCS2000.TBL_SUBR_INDV.LNME, DCS2000.TBL_SUBR_INDV.FNME, DCS2000.TBL_SUBR_GRP.GRP_ID, DCS2000.TBL_SUBR_INDV_CARRY_OVER.BENEFIT_YEAR, DCS2000.TBL_SUBR_INDV_CARRY_OVER.BENEFIT_CARRY_OVER
from DCS2000.TBL_SUBR_INDV
join DCS2000.TBL_SUBR_GRP on (
where DCS2000.TBL_SUBR_INDV_CARRY_OVER.SUBR_ID = DCS2000.TBL_SUBR_INDV.SUBR_ID
and DCS2000.TBL_SUBR_INDV.SUBR_ID = DCS2000.TBL_SUBR_GRP.SUBR_ID
and DCS2000.TBL_SUBR_GRP.GRP_ID = '0G0000000'
and DCS2000.TBL_SUBR_INDV_CARRY_OVER.BENEFIT_CARRY_OVER > '0'
Attempt#3
SELECT LNME, FNME, SUBR_ID, INDV_ID
FROM DCS2000.TBL_SUBR_INDV
WHERE DCS2000.TBL_SUBR_INDV.SUBR_ID IN
(SELECT BENEFIT_CARRY_OVER
FROM DCS2000.TBL_SUBR_INDV_CARRY_OVER
WHERE DCS2000.TBL_SUBR_INDV_CARRY_OVER.SUBR_ID IN
(SELECT SUBR_ID
FROM DCS2000.TBL_SUBR_GRP
WHERE DCS2000.TBL_SUBR_GRP.GRP_ID = '0G0000000')
)

I assume that the duplicates occur because a single individual can have many carry-over records. Therefore, try:
select i.SUBR_ID,
i.INDV_ID,
max(i.LNME) LNME,
max(i.FNME) FNME,
max(g.GRP_ID) GRP_ID,
o.BENEFIT_YEAR,
sum(o.BENEFIT_CARRY_OVER) BENEFIT_CARRY_OVER
from DCS2000.TBL_SUBR_GRP g
join DCS2000.TBL_SUBR_INDV i
on g.SUBR_ID = i.SUBR_ID
join DCS2000.TBL_SUBR_INDV_CARRY_OVER o
on i.SUBR_ID = o.SUBR_ID and i.INDV_ID = o.INDV_ID and o.BENEFIT_CARRY_OVER > 0
where g.GRP_ID = '0G0000000'
group by i.SUBR_ID,
i.INDV_ID,
o.BENEFIT_YEAR
Note that the individual table needs to be joined to the carry-over table by both the subscriber and the individual ID; also, that numeric fields (such as BENEFIT_CARRY_OVER) should not have quotes around their values.

The problem is that due to the one to many, a simple join across all three tables will give a duplicate row for the data you are interested in. The join has to make an entire row for every row joined together. Whereas what you want is a single row.
I often find that to build a complex query is best done in parts
So, first to find all the ids with the value of interest, with a single row for each. It I'll shorten names somewhat due to the keyboard i am using.
Select distinct subr_id from subrgrp where grpid = 0000006
Then combine this with the next table up
Select distinct indvid from subrindv where subrid in (Select distinct subr_id from subrgrp where grpid = 0000006)
Then select the final rows of interest.
Select carryover from subrindv where indvid in (Select distinct indvid from subrindv where subrid in (Select distinct subr_id from subrgrp where grpid = 0000006))
The alternative is to use GROUP BY to group multiple rows into one.

How to use SELECT DISTINCT and CONCAT in the same SQL statement

So I am feeding the results of this SQL into an array. The array later becomes the suggestions for a textbox that operates while typing. I want it to only return each name 1 time, even if the person has multiple appointments. Currently, this returns all appointments for the person with that name, so if "Brad Robins" has 5 appointments, and I start to type "Brad", it displays "Brad Robins" 5 times in the suggestions instead of only once.
$sql = "SELECT DISTINCT CONCAT(clients.studentFirstName, ' ', clients.studentLastName) AS name, appointments.location, appointments.subLocation, appointments.appointmentAddress1, appointments.appointmentAddress2, appointments.appointmentCity, appointments.appointmentState, appointments.appointmentZip, appointments.startTime, appointments.endTime, appointments.date, clients.school
FROM appointments JOIN clients
ON appointments.clientID = clients.clientID
WHERE CONCAT(clients.studentFirstName, ' ', clients.studentLastName) = '".$roommate."' AND clients.school = '".$school."';";
To me, it just seems like DISTINCT and CONCAT aren't playing nicely together.

The problem are the other fields; DISTINCT applies to the whole result. Probably the best thing is to do to separate queries or populate 2 different arrays; if you ORDER BY name, you can remove duplicates by copying into the dest array only when the name changes.

Don't use DISTINCT, use group by:
$sql = "SELECT CONCAT(clients.studentFirstName, ' ', clients.studentLastName) AS name, appointments.location, appointments.subLocation, appointments.appointmentAddress1, appointments.appointmentAddress2, appointments.appointmentCity, appointments.appointmentState, appointments.appointmentZip, appointments.startTime, appointments.endTime, appointments.date, clients.school
FROM appointments JOIN clients
ON appointments.clientID = clients.clientID
WHERE CONCAT(clients.studentFirstName, ' ', clients.studentLastName) = '".$roommate."' AND clients.school = '".$school."' group by CONCAT(clients.studentFirstName, ' ', clients.studentLastName);";
Also be careful about XSS in $school and $roomate if this accessible outside.

Distinct goes against the entire row of ALL columns, not just the name portion... So if the appointments are on different date/times, locations, etc, they will all come out. If all you want to show is the NAME portion, strip the rest of the other content. Query the available appointments AFTER a person has been chosen.

You could use
group by name
at the end, which would cause the query to return only one of each name, but then you can't predict what appointment results will be returned in cases where a client has multiple appointments, and the query stops being very useful.
Like others have pointed out, you should probably just get the list of appointments after the client has been chosen.

select colA||' - '||colB
from table1
where colA like 'beer%'
group by colA||' - '||colB
order by colA||' - '||colB
;

SELECT DISTINCT MD5(CONCAT(clients.studentFirstName, ' ', clients.studentLastName)) as id, appointments.location, appointments.subLocation, ...

Select distinct concat(....)
From ....

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to remove duplicate data lines with data not in came columns - sql

Related

SQL Update based on secondary table in BQ

SQL query , group by only one column

Completely Unique Rows and Columns in SQL

SQL statement joining 3 Oracle tables

How to use SELECT DISTINCT and CONCAT in the same SQL statement

Categories

Resources