Can we do this in SQL
Col1,Col2
Asset,1234
Date,1/1/2020
Date2,1/31/2020
Value,10
Asset,12344
Date,1/10/2020
Date2,1/21/2020
Value,45
Asset,12345
Date,1/15/2020
Date2,1/36/2020
Value,99
I have the asset numbers I want to query, but I would like to get corresponding rows as well.
So, when I write where Asset=12344, I need asset,Date, Date2, Value information for asset 12344.
Never came across this before. Is there any way we could achieve this.
Thanks.
Related
I want to group by two columns, however MS Access won't let me do it.
Here is the code I wrote:
SELECT
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter,
tbl_Produktion.ProduktionsID, tbl_Produktion.Linie,
tbl_Produktion.Schicht, tbl_Produktion.Anzahl_Schichten_P,
tbl_Produktion.Schichtteam, tbl_Produktion.Von, tbl_Produktion.Bis,
tbl_Produktion.Pause, tbl_Produktion.Kunde, tbl_Produktion.TeileNr,
tbl_Produktion.FormNr, tbl_Produktion.LabyNr,
SUM(tbl_Produktion.Stueckzahl_Prod),
tbl_Produktion.Stueckzahl_Ausschuss, tbl_Produktion.Ausschussgrund,
tbl_Produktion.Kommentar, tbl_Produktion.StvSchichtleiter,
tbl_Produktion.Von2, tbl_Produktion.Bis2, tbl_Produktion.Pause2,
tbl_Produktion.Arbeiter3, tbl_Produktion.Von3, tbl_Produktion.Bis3,
tbl_Produktion.Pause3, tbl_Produktion.Arbeiter4,
tbl_Produktion.Von4, tbl_Produktion.Bis4, tbl_Produktion.Pause4,
tbl_Produktion.Leiharbeiter5, tbl_Produktion.Von5,
tbl_Produktion.Bis5, tbl_Produktion.Pause5,
tbl_Produktion.Leiharbeiter6, tbl_Produktion.Von6,
tbl_Produktion.Bis6, tbl_Produktion.Pause6, tbl_Produktion.Muster
FROM
tbl_Personal
INNER JOIN
tbl_Produktion ON tbl_Personal.PersID = tbl_Produktion.Schichtleiter
GROUP BY
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter;
It works when I group it by all the columns, but not like this.
The error message say that the rest of the columns aren't part of the aggregation-function (translated from german to english as best as I could).
PS.: I also need the sum of "tbl_Produktion.Stueckzahl_Prod" therefore I tried using the SUM function (couldn't try it yet).
Have you tried something along these lines?
SELECT
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter,
MAX(tbl_Produktion.ProduktionsID), MAX(tbl_Produktion.Linie),
MAX(tbl_Produktion.Schicht), MAX(tbl_Produktion.Anzahl_Schichten_P),
MAX(tbl_Produktion.Schichtteam), MAX(tbl_Produktion.Von), MAX(tbl_Produktion.Bis),
SUM(tbl_Produktion.Stueckzahl_Prod)
FROM
tbl_Personal
INNER JOIN
tbl_Produktion ON tbl_Personal.PersID = tbl_Produktion.Schichtleiter
GROUP BY
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter;
I have used the MAX function for all the data except the two items you specify in the GROUP BY and the one where you desire the SUM. I took the liberty of leaving out mush of your data just to get started.
Using the MAX function turns out to be a convenient workaround when the data item is known to be unique within each group. We cannot know your data or your itent, so we cannot tell you whether MAX will yield the results you need.
If you use an aggregation function in the select clause, you must group by every column that you're selecting that's not an aggregation. If you don't want to do that for some reason (perhaps it changes the output of the aggregation in way that you don't intend) you either must think of an aggregate to use (pick a value. Average? Max? Min?) or just do two selects, one for the aggregate, and one for the non-aggregates. But, then, you have to decide how to get the non-aggregated fields that make sense for the aggregate (or show them all in a table, I suppose?)
I do not understand how to deal with duplicates when generating my output, so I ended up getting several duplicates but I want one only.
I've tried using LIMIT but that only applies when selecting I suppose. I also used DISTINCT but wrong scenario I guess.
grouped = GROUP wantedTails BY tail_number;
smmd = FOREACH grouped GENERATE wantedTails.tail_number as Tails, SUM(wantedTails.distance) AS totaldistance;
So for my grouped, I got smg like (not the whole):
({(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB),(N983JB)},44550)
but I expect (N983JB,44550). How can I delete those duplicates generated during grouping? Thank you!
The way I see it, there are two ways to de-duplicate data in Pig.
Less flexible but a convenient way is to apply MAX to the columns which need to be de-duplicated after performing a GROUP BY. Apply SUM only if you want to add up values across duplicates:
dataWithDuplicates = LOAD '<path_to_data>';
grouped = GROUP dataWithDuplicates BY tail_number;
dedupedData= FOREACH grouped GENERATE
--Since you have grouped on tailNumber, it is already de-duped
group AS tailNumber,
MAX(dataWithDuplicates.distance) AS dedupedDistance,
SUM(dataWithDuplicates.distance) AS totalDistance;
If you want more flexibility while de-duping, you can take help of nested-FOREACH in Pig. This question captures the gist of its usage: how to delete the rows of data which is repeating in Pig. Other references for nested-FORACH: https://www.safaribooksonline.com/library/view/programming-pig/9781449317881/ch06.html
I'm trying to query my database to pull only duplicate/old data to write to a scratch section in excel (Using a macro passing SQL to the DB).
For now, I'm currently testing in Access alone to only filter out the old data.
First, I'm trying to filter my database by a specifed WorkOrder, RunNumber, and Row.
The code below only filters by Work Order, RunNumber, and Row. ...but SQL doesn't like when I tack on a 2nd AND statement; so this currently isn't working.
SELECT *
FROM DataPoints
WHERE (((DataPoints.[WorkOrder])=[WO2]) AND ((DataPoints.[RunNumber])=6) AND ((DataPoints.[Row]=1)
Once I figure that portion out....
Then if there is only 1 entry with specified WorkOrder, RunNumber, and Row, then I want filter it out. (its not needed in the scratch section, because its data is already written to the main section of my report)
If there are 2 or more entries with said criteria(WO, RN, and Row), then I want to filter out the newest entry based on RunDate and RunTime, and only keep all older entries.
For instance, in the clip below. The only item remaining in my filtered query will be the top entry with the timestamp 11:47:00AM.
.
Are there any recommended commands to complete this problem? Any ideas are helpful. Thank you.
I would suggest something along the lines of the following:
select t.*
from datapoints t
where
t.workorder = [WO2] and
t.runnumber = 6 and
t.row = 1 and
exists
(
select 1
from datapoints u
where
u.workorder = t.workorder and
u.runnumber = t.runnumber and
u.row = t.row and
(u.rundate > t.rundate or (u.rundate = t.rundate and u.runtime > t.runtime))
)
Here, if the correlated subquery within the where clause finds a record with the same workorder, runnumber and row, but with either a later rundate or the same rundate and a later runtime, then the record is returned by the main query.
You need two more )'s at the end of your code snippet. Or you can delete the parentheses completely in this example, MS Access will ad them back in as it deems necessary.
M.S. Access SQL can be tricky as it is not standards compliant and either doesn't allow for super complex queries, or it needs an ugly work around, like having a parentheses nesting nightmare when trying to join more than two tables.
For these reasons, I suggest using multiple Access queries to produce your results.
I am using SQL server 2005, querying with Web Developer 2010, and the min function appears to be returning more than one value (for each ID returned, see below). Ideally I would like it to just return the one for each ID.
SELECT Production.WorksOrderOperations.WorksOrderNumber,
MIN(Production.WorksOrderOperations.OperationNumber) AS Expr1,
Production.Resources.ResourceCode,
Production.Resources.ResourceDescription,
Production.WorksOrderExcel_ExcelExport_View.PartNumber,
Production.WorksOrderOperations.PlannedQuantity,
Production.WorksOrderOperations.PlannedSetTime,
Production.WorksOrderOperations.PlannedRunTime
FROM Production.WorksOrderOperations
INNER JOIN Production.Resources
ON Production.WorksOrderOperations.ResourceID = Production.Resources.ResourceID
INNER JOIN Production.WorksOrderExcel_ExcelExport_View
ON Production.WorksOrderOperations.WorksOrderNumber = Production.WorksOrderExcel_ExcelExport_View.WorksOrderNumber
WHERE Production.WorksOrderOperations.WorksOrderNumber IN
( SELECT WorksOrderNumber
FROM Production.WorksOrderExcel_ExcelExport_View AS WorksOrderExcel_ExcelExport_View_1
WHERE (WorksOrderSuffixStatus = 'Proposed'))
AND Production.Resources.ResourceCode IN ('1303', '1604')
GROUP BY Production.WorksOrderOperations.WorksOrderNumber,
Production.Resources.ResourceCode,
Production.Resources.ResourceDescription,
Production.WorksOrderExcel_ExcelExport_View.PartNumber,
Production.WorksOrderOperations.PlannedQuantity,
Production.WorksOrderOperations.PlannedSetTime,
Production.WorksOrderOperations.PlannedRunTime
If you can get your head around it, I am selecting certain columns from multiple tables where the WorksOrderNumber is also contained within a subquery, and numerous other conditions.
Result set looks a little like this, have blurred out irrelevant data.
http://i.stack.imgur.com/5UFIp.png (Wouldn't let me embed image).
The highlighted rows are NOT supposed to be there, I cannot explicitly filter them out, as this result set will be updated daily and it is likely to happen with a different record.
I have tried casting and converting the OperationNumber to numerous other data types, varchar type returns '100' instead of the '30'. Also tried searching search engines, no one seems to have the same problem.
I did not structure the tables (they're horribly normalised), and it is not possible to restructure them.
Any ideas appreciated, many thanks.
The MIN function returns the minimum within the group.
If you want the minimum for each ID you need to get group on just ID.
I assume that by "ID" you are referring to Production.WorksOrderOperations.WorksOrderNumber.
You can add this as a "table" in your SQL:
(SELECT Production.WorksOrderOperations.WorksOrderNumber,
MIN(Production.WorksOrderOperations.OperationNumber)
FROM Production.WorksOrderOperations
GROUP BY Production.WorksOrderOperations.WorksOrderNumber)
I'm having trouble with Microsoft Access 2003, it's complaining about this statement:
select cardnr
from change
where year(date)<2009
group by cardnr
having max(time+date) = (time+date) and cardto='VIP'
What I want to do is, for every distinct cardnr in the table change, to find the row with the latest (time+date) that is before year 2009, and then just select the rows with cardto='VIP'.
This validator says it's OK, Access says it's not OK.
This is the message I get: "you tried to execute a query that does not include the specified expression 'max(time+date)=time+date and cardto='VIP' and cardnr=' as part of an aggregate function."
Could someone please explain what I'm doing wrong and the right way to do it? Thanks
Note: The field and table names are translated and do not collide with any reserved words, I have no trouble with the names.
Try to think of it like this - HAVING is applied after the aggregation is done.
Therefore it can not compare to unaggregated expressions (neither for time+date, nor for cardto).
However, to get the last (principle is the same for getting rows related to other aggregated functions as weel) time and date you can do something like:
SELECT cardnr
FROM change main
WHERE time+date IN (SELECT MAX(time+date)
FROM change sub
WHERE sub.cardnr = main.cardnr AND
year(date)<2009
AND cardto='VIP')
(assuming that date part on your time field is the same for all the records; having two fields for date/time is not in your best interest and also using reserved words for field names can backfire in certain cases)
It works because the subquery is filtered only on the records that you are interested in from the outer query.
Applying the same year(date)<200 and cardto='VIP' to the outer query can improve performance further.