SQL query , group by only one column - sql

i want to group this query by project only because there are two records of same project but i only want one.
But when i add group by clause it asks me to add other columns as well by which grouping does not work.
*
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from Filterednew_projektkondition ps
left join Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
--and new_projekt= Filterednew_projekt.new_
--group by new_projekt
*
look at the column new_projekt . row 2 and 3 has same project, but i want it to appear only once. Due to different other columns this is not possible.
if its of interested , there is another coluim projectcondition id which is unique for both

You can't ask a database to decide arbitrarily for you, which records should be thrown away when doing a group. You have to be precise and specific
Example, here is some data about a person:
Name, AddressZipCode
John Doe, 90210
John Doe, 12345
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id
There are two addresses stored for this one guy, the person data is repeated in the output!
"I don't want that. I only want to see one line for this guy, together with his address"
Which address?
That's what you have to tell the database
"Well, obviously his current address"
And how do you denote that an address is current?
"It's the one with the null enddate"
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id WHERE address.enddate = null
If you still get two addresses out, there are two address records that are null - you have data that is in violation of your business data modelling principles ("a person's address history shall have at most one adddress that is current, denoted by a null end date") - fix the data
"Why can't i just group by name?"
You can, but if you do, you still have to tell the database how to accumulate the non-name data that it shows you. You want an address data out of it, it has 2 it wants to show you, you have to tell it which to discard. You could do this:
SELECT name, MAX(addresszipcode) FROM person INNER JOIN address on address.personid = person.id GROUP BY name
"But I don't want the max zipcode? That doesn't make sense"
OK, use the MIN, the SUM, the AVG, anything that makes sense. If none of these make sense, then use something that does, like the address line that has the highest end date, or the lowest end date that is a future end date. If you only want one address on show you must decide how to boil that data down to just one record - you have to write the rule for the database to follow and no question about it you have to create a rule so make it a rule that describes what you actually want
Ok, so you created a rule - you want only the rows with the minimum new_stundenstatz
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from
(SELECT *, ROW_NUMBER() OVER(PARTITON BY new_projekt ORDER BY new_stundensatz) rown FROM Filterednew_projektkondition) ps
left join
Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
and ps.rown = 1
Here I've used an analytic operation to number the rows in your PS table. They're numbered in order of ascending new_stundensatz, starting with 1. The numbering restarts when the new_projekt changes, so each new_projekt will have a number 1 row.. and then we make that a condition of the where
(Helpful side note for applying this technique in future.. Ff it were the FP table we were adding a row number to, we would need to put AND fp.rown= 1 in the ON clause, not the WHERE clause, because putting it in the where would make the LEFT join behave like an INNER, hiding rows that don't have any FP matching record)

Related

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.
For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )
You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause
See Update 4 for details of how I resolved the "data conversion or mapping" error.

Query that filters out results if they exist in a second table

I have two tables. First is a table Employees the second is TimeCard. I am trying to determine when employees are not here. Sample Tables:
So an employee name is returned if it meets the following criteria: Status = 1 and Date != '8/27/13'. However I only want an Employee returned once even if they have multiple entires in the TimeCard table as long as they are not on a specified date.
Try the following:
SELECT e.employee FROM employees e
WHERE status=1
AND NOT EXISTS ( SELECT 1 FROM timecard t
WHERE t.employee=e.employee
AND t.Date = '8/27/13')
where '8/27/13' will work only, if date is a varchar column. Otherwise you should use a more standard date format like '2013-8-27'.
Edit:
In MSSQL '8/27/13' will actually work, but it is still not a good idea to use this notation as it is not unambiguous when smaller numbers are used, like '5/6/7'. Depending on who entered it (and from which country they come), this could mean either May 6 2007, or 2005 May the 7th or 5 June 2007 ...
2. Edit:
Evidently not a good idea to prepare supper while you are putting together a SQLfiddle! The whole thing was buggy. I just fixed it and, of course corrected the typo with t.Date != '8/27/13' sorry for the confusion! See the corrected fiddle here: http://sqlfiddle.com/#!3/26454/1
Try this:
SELECT DISTINCT e.employee
FROM employees e
LEFT JOIN timecard t ON t.employee = e.employee
WHERE e.status = 1 AND (t.date != '8/27/13' OR t.date IS NULL)

Selecting advertisers on name, but including their parent advertisers names also

I have three tables:
Advertisers: a list of business that produce adverts, Adverts: the adverts themselves and AdvertiserChild: a table of Advertisers parents; note that this is a flat hierarchy and a single advertiser can be listed multiple times with a parent, no inkling as to 'level' is provided they are simply 'parents'.
So, I am trying to select all of the advertisers that had ads between certain dates where their names match the users input. The rub is that the name can also match a parent advertiser. Let me try and phrase that differently, the user input can match the name of either the parent or child advertiser as long as the child has some valid ads between the specified dates.
I'm just having trouble conceptually in regard to getting the parent information in there:
SELECT NewsPaperAd.AdvertiserID AS ADID, Advertiser.NameAbbrev AS Name
FROM NewsPaperAd INNER JOIN
Advertiser ON NewsPaperAd.AdvertiserID = Advertiser.AdvertiserID
WHERE (NewsPaperAd.PubDate BETWEEN '1/1/2012' AND '4/1/2012')
Ok, I think I have it!
Thanks.
I assume the hierarchy is defined with a ParentID or similar column in the Advertiser table. Then something like this might work:
DECLARE #search NVARCHAR(255);
SET #search = '%some search phrase%';
;WITH a AS
(
SELECT a.AdvertiserID, a.NameAbbrev, Parent = p.NameAbbrev
FROM dbo.Advertiser AS a
LEFT OUTER JOIN dbo.Advertiser AS p
ON a.ParentID = p.AdvertiserID
WHERE a.NameAbbrev LIKE #search
OR p.NameAbbrev LIKE #search
)
SELECT ADID = a.AdvertiserID, a.NameAbbrev, a.Parent
FROM dbo.NewsPaperAd AS n
INNER JOIN a
ON a.AdvertiserID = a.AdvertiserID
WHERE n.PubDate >= '20120101'
AND n.PubDate < '20120401';
Some suggestions: (1) don't change the column output name. ADID does not say the same thing to me as AdvertiserID; what is the point of the abbreviation? (2) Don't use BETWEEN for date/time range queries... while it is ok if your data type is DATE, that can't be the case in your scenario since you're using SQL Server 2005. (3) Don't use regional formatting like m/d/y for string literals. In fact I'm still not sure if your query is supposed to end on April 1st or January 4th. Depending on language and regional settings, SQL Server might get it wrong too.

MySQL to return only last date / time record

We have a database that stores vehicle's gps position, date, time, vehicle identification, lat, long, speed, etc., every minute.
The following select pulls each vehicle position and info, but the problem is that returns the first record, and I need the last record (current position), based on date (datagps.Fecha) and time (datagps.Hora). This is the select:
SELECT configgps.Fichagps,
datacar.Ficha,
groups.Nombre,
datagps.Hora,
datagps.Fecha,
datagps.Velocidad,
datagps.Status,
datagps.Calleune,
datagps.Calletowo,
datagps.Temp,
datagps.Longitud,
datagps.Latitud,
datagps.Evento,
datagps.Direccion,
datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
Group by Fichagps;
I need same result I'm getting, but instead of the older record I need the most recent
(LAST datagps.Fecha / datagps.Hora).
How can I accomplish this?
Add ORDER BY datagps.Fecha DESC, datagps.Hora DESC LIMIT 1 to your query.
I'm not sure why you are having any problems with this as Lex's answers seem good.
I would start putting ORDER BY's in your query so it puts them in an order, when it's showing the record you want as the first one in the list, then add the LIMIT.
If you want the most recent, then the following should be good enough:
ORDER BY datagps.Fecha DESC, datagps.Hora DESC
If you simply want the record that was added to the database most recently (irregardless of the date/time fields), then you could (assuming you have an auto-incremental primary key in the datagps table (I assume it's called dataID for this example)):
ORDER BY datagps.dataID DESC
If these aren't showing the data you want - then there is something missing from your example (maybe data-types aren't DATETIME fields? - if not - then maybe a CONVERT to change them from their current type before ORDERing BY would be a good idea)
EDIT:
I've seen the screenshot and I'm confused as to what the issue is still. That appears to be showing everything in order. Are you implying that there are many more than 5 records? How many are you expecting?
Do you mean: for each record returned, you want the one row from the table datagps with the latest date and time attached to the result? If so, how about this:
# To show how the query will be executed
# comment to return actual results
EXPLAIN
SELECT
configgps.Fichagps, datacar.Ficha, groups.Nombre, datagps.Hora, datagps.Fecha,
datagps.Velocidad, datagps.Status, datagps.Calleune, datagps.Calletowo,
datagps.Temp, datagps.Longitud, datagps.Latitud, datagps.Evento,
datagps.Direccion, datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
########### Add this section
LEFT JOIN datagps b ON (
configgps.Fichagps = b.Fichagps
# wrong condition
#AND datagps.Hora < b.Hora
#AND datagps.Fecha < b.Fecha)
# might prevent indexes to be used
AND (datagps.Fecha < b.Fecha OR (datagps.Fecha = b.Fecha AND datagps.Hora < b.Hora))
WHERE b.Fichagps IS NULL
###########
Group by configgps.Fichagps;
Similar question here only that that one uses outer joins.
Edit (again):
The conditions are wrong so corrected it. Can you show us the output of the above EXPLAIN query so we can pinpoint where the bottle neck is?
As hurikhan77 said, it will be better if you could convert both of the the columns into a single datetime field - though I'm guessing this would not be possible for your case (since your database is already being used?)
Though if you can convert it, the condition (on the join) would become:
AND datagps.FechaHora < b.FechaHora
After that, add an index for datagps.FechaHora and the query would be fast(er).
What you probably want is getting the maximum of (Fecha,Hora) per grouped dataset? This is a little complicated to accomplish with your column types. You should combine Fecha and Hora into one column of type DATETIME. Then it's easy to just SELECT MAX(FechaHora) ... GROUP BY Fichagps.
It could have helped if you posted your table structure to understand the problem.