sql to remove repeating data - sql

I have problem with data that I need to filter out to produce a report.
Basically every day we take a file and load it into the system but the file is cumulative file and is basically re imported with the same information plus the new data daily.
This loads into the Database into two tables, a header and detail table.
In the header table I have the key filed of No_, the date it was imported and the month and year the data is for (period) an example would be 'Apr2013'.
The detailed table holds all of the information from the import as you can imagine.
What I'm looking to do is to disregard all of the older data and only look at the most recent import for that month only.
Hopefully someone can help me, if you want me to post any example data or anything then please let me know and I'll add it in.
Thanks in advance!
Phil

If you cannot change your process to stop re-importing the same data over and over - especially if the data is exactly the same - you could try the following query:
WITH MostRecentData AS
(
SELECT MAX(dateImported) dateImported, month, year
FROM header
GROUP BY month, year
)
SELECT ...
FROM detail d
INNER JOIN header h ON h.[key] = d.[key]
INNER JOIN MostRecentData r ON h.month = r.month AND h.year = r.year
AND h.dateImported = r.dateImporter
This will pull out the latest rows (based on the date imported) for each month and year.
Honestly though I would try to change the import process if possible. It sounds like a lot of wasted space and subsequent processing to pull out the data you want.

Related

Qlik - Building a dynamic view

I have a SQL query that creates a table, and every month 2 new columns will be added for that table related to the current month.
I have tried without success to set up a flat table (visual) in Qlik that will automatically expand every month to include these table. Is there a way to do this, and i so please point me in the right direction.
You can have a look at CrossTable prefix.
This prefix allows a wide table to be converted to a long table.
So if we have data like this:
After running the following script:
CrossTable:
CrossTable(Month, Sales)
LOAD Item,
[2022-10],
[2022-11],
[2022-12],
[2023-01],
[2023-02],
[2023-03],
[2023-04]
FROM
[C:\Users\User1\Documents\SO_75447715.xlsx]
(ooxml, embedded labels, table is Sheet1);
The final data will looks like below. As you can see there are only 3 columns. All xls month columns (after Item) are now collapsed under one field - Month and all the values are collapsed under Sales column.
Having the data in this format then allows creating "normal" charts with adding Month column as dimension and use sum(Sales) as an expression.
P.S. If you dont want to manage the new columns being added then the script can be:
CrossTable(Month, Sales)
LOAD
Item,
*
FROM
...

How to query only old and duplicate data from a database in SQL

I'm trying to query my database to pull only duplicate/old data to write to a scratch section in excel (Using a macro passing SQL to the DB).
For now, I'm currently testing in Access alone to only filter out the old data.
First, I'm trying to filter my database by a specifed WorkOrder, RunNumber, and Row.
The code below only filters by Work Order, RunNumber, and Row. ...but SQL doesn't like when I tack on a 2nd AND statement; so this currently isn't working.
SELECT *
FROM DataPoints
WHERE (((DataPoints.[WorkOrder])=[WO2]) AND ((DataPoints.[RunNumber])=6) AND ((DataPoints.[Row]=1)
Once I figure that portion out....
Then if there is only 1 entry with specified WorkOrder, RunNumber, and Row, then I want filter it out. (its not needed in the scratch section, because its data is already written to the main section of my report)
If there are 2 or more entries with said criteria(WO, RN, and Row), then I want to filter out the newest entry based on RunDate and RunTime, and only keep all older entries.
For instance, in the clip below. The only item remaining in my filtered query will be the top entry with the timestamp 11:47:00AM.
.
Are there any recommended commands to complete this problem? Any ideas are helpful. Thank you.
I would suggest something along the lines of the following:
select t.*
from datapoints t
where
t.workorder = [WO2] and
t.runnumber = 6 and
t.row = 1 and
exists
(
select 1
from datapoints u
where
u.workorder = t.workorder and
u.runnumber = t.runnumber and
u.row = t.row and
(u.rundate > t.rundate or (u.rundate = t.rundate and u.runtime > t.runtime))
)
Here, if the correlated subquery within the where clause finds a record with the same workorder, runnumber and row, but with either a later rundate or the same rundate and a later runtime, then the record is returned by the main query.
You need two more )'s at the end of your code snippet. Or you can delete the parentheses completely in this example, MS Access will ad them back in as it deems necessary.
M.S. Access SQL can be tricky as it is not standards compliant and either doesn't allow for super complex queries, or it needs an ugly work around, like having a parentheses nesting nightmare when trying to join more than two tables.
For these reasons, I suggest using multiple Access queries to produce your results.

SQL Query stopped working and cant figure out why

Basically I am using MS Access 2013 to import all active work items that are assigned to a specific group from an API and select the data into 2 new tables (Requests & Request_Tasks).
I then have a form sourced from a query to select specific fields from the 2 tables.
Until yesterday it was working with no problems and nothing has changed.
All of the data appears in the 2 tables so the import from the API works fine.
When it comes to the query selecting the data from the 2 tables (Which are already populated with the correct data) the query returns only data from Requests table with blank fields instead of data from Request_Tasks.
The strange part is that out of 28 active work items it returns 24 correctly and the last 4 are having the problem.
Every new task added to the group has the problem also.
Query is below.
SELECT
Request_Tasks.RQTASK_Number,
Request_Tasks.Request_Number,
Requests.Task, Requests.Entity,
Request_Tasks.Description,
Request_Tasks.Request_Status,
Requests.Requested_for_date,
Request_Tasks.Work_On_Date,
Request_Tasks.Estimated_Time,
Request_Tasks.Actual_Time_Analysis,
Request_Tasks.Offers_Built,
Request_Tasks.Number_of_links_Opened,
Request_Tasks.Number_of_Links_Extended,
Request_Tasks.Number_Of_links_closed,
Request_Tasks.Build_Allocated_to,
Request_Tasks.Buld_Review_Allocated_to,
Request_Tasks.Keying_Allocated_to,
Request_Tasks.Keying_Approval_allocated_to,
Request_Tasks.Actual_Build_Time,
Request_Tasks.Actual_Stakeholder_Support,
Request_Tasks.Task_Completed_Date
FROM Request_Tasks
RIGHT JOIN Requests
ON Request_Tasks.Request_Number = Requests.Request_Number
WHERE (((Request_Tasks.Task_Completed_Date)>=Date()
Or (Request_Tasks.Task_Completed_Date) Is Null)
AND ((Requests.Task)<>"7"
And (Requests.Task)<>"8" And (Requests.Task)<>"9"))
OR (((Request_Tasks.Task_Completed_Date)>=Date()
Or (Request_Tasks.Task_Completed_Date) Is Null)
AND ((Requests.Task)<>"7"
And (Requests.Task)<>"8"
And (Requests.Task)<>"9"))
ORDER BY Request_Tasks.Work_On_Date Is Null DESC , Request_Tasks.Work_On_Date, Requests.Entity Is Null DESC , Requests.Task;
Any help would be great.
Thanks.
The query is using RIGHT JOIN, which means rows from Requests table is always reported even if there is no corresponding entry in Request_tasks table.
A full example is here http://www.w3schools.com/Sql/sql_join_right.asp
In your case, most likely somechange might have happened during data load/API and Request_tasks table is not being populated. That is the reason you see blank data for fields from that table.
Solution
Manually check data for 4 faulty records in Request_tasks table.
Ensure keys in both table request_number are matching including data type and any leading space/non printable characters (if they are string type of data) for faulty records.
Query seems fine, its more of issue with data based on problem statement.

Access - SQL Query -> find most recent record and save it to a table

I am sure this has been asked before and I have found several references to similar things. I just can't work out how to make it work for me.
I have a table (tblHistory) with data
Unit Number,Action,LoggerID,SensorID,CableID,Date,Data Location etc
I also have another table (tblUnit) with more data
Unit,Logger,Sensor,Cable
I want to create a query that scours tblHistory to find the most recent LoggerID, SensorID and CableID for each Unit.
To add to this there may need to be safeguards in the query in case the user does not place Logger, Sensor or Cable IDs so that it does not just find the most recent date and return a null value.
The final aspect of this which may or may not be simple is that I want this to overwrite the data in the tblUnit table so that when a new logger, sensor or cable is in place the unit uses this new data. I don't want these fields to be just edited because I want to keep a history of it. My knowledge of SQL and Access is pretty limited as I'm usually more familiar with excel and vb.
I have tried using these:
MSAccess: select N records from each category
sql for finding most recent record in a group
but being a beginner I'm not 100% sure how to apply them to my problem.
If anyone can help that would be great.
Example: (tblHistory)
Unit Number,Action,LoggerID,SensorID,CableID,ActionDate
1,Replace Sensor,,KTSM01S03,,15/02/2015
1,Replace Logger,KTSM01FM02,,,11/01/2014
1,Replace Cable,,,sa123124,10/01/2014
1,Replace Logger,KTSM01FM03,,,12/01/2014
2,Replace Sensor,,KTSM01S01,,13/01/2014
2,Replace Cable,,,sa123123,12/01/2014
2,Replace Cable,,,sa123124,16/02/2014
2,Replace Logger,KTSM01FM01,,,13/01/2014
tblUnit
Unit Number,Logger,SensorID,Cable
1,KTSM01FM01,KTSM01S01,sa123123
2,Logger1,Sensor1,Cable1
SELECT TOP 1 *
FROM tblHistory a
INNER JOIN
(SELECT Unit Number, Max([Date]) as MaxDate
FROM tblHistory
GROUP BY Unit Number) b
on a.[Unit Number] = b.[Unit Number]
and a.[Date] = b.MaxDate
You have to bracket the field called Date because Date() is a reserved function and Access will think you're trying to use the function instead of calling the field in your table. In the future, always try to avoid using reserved words as field names, it will save you a lot of headaches.

SQL add up rows in a column

I'm running SQL queries in Orion Report Writer for Solarwinds Netflow Traffic Analyzer and am trying to add up data usage for specific conversations coming from the same general sources. In this case it is netflix. I've made some progress with my query.
SELECT TOP 10000 FlowCorrelation_Source_FlowCorrelation.FullHostname AS Full_Hostname_A,
SUM(NetflowConversationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
SUM(NetflowConversationSummary.TotalBytes) AS Total_Bytes
FROM
((NetflowConversationSummary LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Source_FlowCorrelation ON (NetflowConversationSummary.SourceIPSort = FlowCorrelation_Source_FlowCorrelation.IPAddressSort)) LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Dest_FlowCorrelation ON (NetflowConversationSummary.DestIPSort = FlowCorrelation_Dest_FlowCorrelation.IPAddressSort)) INNER JOIN Nodes ON (NetflowConversationSummary.NodeID = Nodes.NodeID)
WHERE
( DateTime BETWEEN 41539 AND 41570 )
AND
(
(FlowCorrelation_Source_FlowCorrelation.FullHostname LIKE 'ipv4_1.lagg0%')
)
GROUP BY FlowCorrelation_Source_FlowCorrelation.FullHostname, FlowCorrelation_Dest_FlowCorrelation.FullHostname, Nodes.Caption, Nodes.NodeID, FlowCorrelation_Source_FlowCorrelation.IPAddress
So I've got an output that filters everything but netflix sessions (Full_Hostname_A) and their total usage for each session (Sum_Of_Bytes_Transferred)
I want to add up Sum_Of_Bytes_Transferred to get a total usage for all netflix sessions
listed, which will output to Total_Bytes. I created the column Total_Bytes, but don't know how to output a total to it.
For some asked clarification, here is the output from the above query:
I want the Total_Bytes Column to be all added up into one number.
I have no familiarity with the reporting tool you are using.
From reading your post I'm thinking you want the the first 2 columns of data that you've got, plus at a later point in the report, a single figure being the sum of the total_bytes column you're already producing.
Your reporting tool probably has some means of totalling a column, but you may need to get the support people for the reporting tool to tell you how to do that.
Aside from this, if you can find a way of calling a separate query in a latter section of the report, or if you embed a new report inside your existing report, after the detail section, and use that to run a separate query then you should be able to get the data you want with this:
SELECT Sum(Total_Bytes) as [Total Total Bytes]
FROM ( yourExistingQuery ) x
yourExistingQuery means the query you've already got, in full (doesnt have to be put on one line), the paretheses are required, and so is the "x". (The latter provides a syntax-required name for the virtual table which your query defines).
Hope this helps.