SQL query for percentage change compared to previous date - sql

I have a table within access containing the performance of departments on different reference dates. All data is within one table "tblmain". The table contains the following fields:
reference date (called "ref_date", formatted dd.mm.yyyy)
department identifier (called "dep_id")
performance value (called "val")
Every reference date consists of round about 100 departments and every week I import a new reference date.
My goal now is to build a query which calculates the percentage change from on reference date compared to the previous reference date. Furthermore, it should only show the departments with a change bigger than 5%.
I am currently stuck. I have created a query that gives me the val from the previous reference date but only for one specific department. And I do not know how to continue. This query looks as follows:
SELECT TOP 1 tblmain.val
FROM (SELECT TOP 2 tblmain.val, tblmain.ref_date FROM tblmain WHERE dep_id=1 ORDER BY tblmain.ref_date DESC)
ORDER BY tblmain.ref_date;
I would appreciate any feedback. After finishing this query, I plan to use this query in a form where I can choose an reference date and threshold.
Many thanks in advance!

Query to pull prior val for each record:
SELECT tblMain.ID, tblMain.ref_date, tblMain.dep_id, tblMain.val,
(SELECT TOP 1 val FROM tblMain AS Dupe
WHERE Dupe.dep_id=tblMain.dep_id AND Dupe.ref_Date < tblMain.ref_date
ORDER BY dupe.ref_date) AS PriorVal
FROM tblMain;
Now use that query to calculate percentage:
SELECT Query1.*, Abs(([PriorVal]-[val])/[PriorVal]*100) AS P
FROM Query1
WHERE (((Abs(([PriorVal]-[val])/[PriorVal]*100))>5));

Related

Query another table with results of an another query that include a csv column

Brief Summary:
I am currently trying to get a count of completed parts that fall within a specific time range, machine number, operation number, and matches the tool number.
For example:
SELECT Sequence, Serial, Operation,Machine,DateTime,value as Tool
FROM tbPartProfile
CROSS APPLY STRING_SPLIT(Tool_Used, ',')
ORDER BY DateTime desc
is running a query which pulls all the instances that a tool has been changed, I am splitting the CSV from Tool_Used column. I am doing this because there can be multiple changes during one operation.
Objective:
This is where the production count come into place. For example, record 1 has a to0l change of 36 on 12/12/2022. I will need to go back in to the table and get the amount of part completed that equals the OPERATION/MACHINE/TOOL and fall between the date range.
For example:
SELECT *
FROM tbPartProfile
WHERE Operation = 20 AND Machine = 1 AND Tool_Used LIKE '%36%'
ORDER BY DateTime desc
For example this query will give me the datetimes the tools LIKE 36 was changed. I will need to take this datetime and compare it previous query and get the sum of all parts that were ran in this TimeRange/Operation/Machine/Tool Used

How do I create a new SQL table with a percentage column which is conditional on whether information shows up in two other tables?

I have 3 tables, the table called agg((date,sname,open,high,low,close,volume)) contains daily information for every stock for x number of past years. Another table, split(date,sname,post,pre), has info for every time any stock split. Another table, div(date,sname,dividend), has info for every time a stock had a dividend. I want to create a new table, with a column that gives the percent change from closing of the previous day, to the day after, for every stock and every day listed in agg.
Here is the line I have for just the daily change, not including div and split:
create table daily
as
with prevclose as (
select date,sname,close,
lag(close) over (partition by symbol order by date) pclose
from agg
)
select a.*,
100.0*(close - pclose)/(case when pclose=0 then null else pclose end) as prcnt
from prevclose a
where pclose != 0;
I want to change this code to incorporate the change in split and dividends which is not incorporated in the agg table. I don't even need the full calculation for this, but I need help figuring out how to incorporate the condition into the new table. I only need to add in split and div info if there is split and div info for that particular date and time. I think if I could just see the query for a similar problem it would help.

Using range of cells as conditions in SQL Query

My company uses a SQL Server database.
Is it possible to use a range of cells as a condition in a SQL query if it equals ANY of those values? Can it even use date ranges on the same rows?
Reference Example:
Data Example:
Output Desired:
Question 1:
Can I reference an entire column?
SELECT ID, sum(units) FROM sales WHERE ID = any ID in Column A
Question 2:
Can I specify just a cell range?
SELECT ID, sum(units) FROM table WHERE ID = any value in A2:A10
Question 3:
Can I add a date range cell reference with the possibility that the same ID may appear more than once but have a different date range (see 747375 in sample) and return results for both ranges separately?
SELECT ID, sum(units) FROM table WHERE ID = any value in A2:A10 AND DATE >= date found in column B that is next to ID in the same row AND DATE <= date found in column C that is next to ID in the same row
You can use between as following
select
r.id,
sum(units) as units
from reference r
join data d
on r.id = d.id
where d.date between r.start and r.end
group by
r.id
Question 1: Can I reference an entire column?
Yes. A default select without a where clause will reference the entire column.
Your example SELECT ID, sum(units) FROM sales WHERE ID = any ID in Column A is not logically sound. From the select, I am presuming that you want the sum of units for each individual ID, not the sum of all the units without regard to the ID. For this, you want to use group by
select ID, sum(units) totalunits
from sales
group by ID
There is no need for a where clause because you want everything.
Question 2: Can I specify just a cell range?
Yes.
And no.
There is no direct concept of "cell range" in SQL (well, maybe top but not really). Data is stored unordered in SQL. In Excel, the cell range "A2:A10" means "whatever values just happen to be in those cells at this point in time". Often this will mean "the 2nd through 10th values entered in time", or "the first through 9th values entered in time" if there is a header row. But then later you can sort the data differently and now there is different data there. In SQL, there is no order in storage. You can specify an order for the output when you select data, but that is manually specified for each select.
However, the related concept is probably rather obvious. "A2:A10" is often going to mean "the first 9 values by date/time", or "the largest/smallest 9 values" etc.
Your example SELECT ID, sum(units) FROM table WHERE ID = any value in A2:A10 needs to change to define what values you expect to be in A2:A10. For example, if A2:A10 represents the first 9 values by date, you would do something like this: (untested)
select ID, sum(units) totalunits
from sales
where ID in (select top(9) ID
from sales
order by date
)
group by ID
This would provide the sum of units for each of the IDs that were amongst the first 9 IDs entered by date (what to do with a tie for 9th I will not go into here).
Question 3: Can I add a date range cell reference with the possibility that the same ID may appear more than once but have a different date range (see 747375 in sample) and return results for both ranges separately?
This one is difficult to understand. And it might be meaningless based on the answer to your 2nd question. However, you can setup a query that chooses the IDs you want, and in that query you can also select the min and max dates. Finally, you can use the information from that query as a subquery to get the information by ID that has the sum of units within the min/max dates and one that is the sum of units outside the min/max dates. This would require some effort and I will not at this time try to figure that out for you.

Query to get Daywise on the month selected

I have an MS Access database table datetime column. When I select a particular month (say, July), I have to get datewise data in that month.
The output should appear in the same format as the attached image.
Every Employee who comes in on a particular date should display “P” for that day. If the Employee doesn’t come in on a particular day (like Sat or Sun), then I have to display “WO” for that day.
When an Employee has not come in (like Sat or Sun), then there is no entry in the log table for that date.
How could an Access query be written to obtain this output? I am using an MS Access 2003 database.
Edit: we have to use TRANSFORM and PIVOT, but the issue is when an employee is not available (Sat, Sun) we still need to show data in the output.
Set up a query that reads EmpID and CheckTime from the first table, and adds one additional column:
DateWise: IIf(Weekday([CheckTime])=1 Or Weekday([CheckTime])=7,"WO","P")
You will need an additional table with every date of the year in it (we'll call it YearDates). Left join that table to your query like so:
Select YD.YearDates, Q2.* from YearDates YD LEFT JOIN Query2 Q2 ON YD.YearDates = DATEVALUE(Q2.CheckTime)
The DATEVALUE will strip the time off your dates in CheckTime so they will match date against date.

Is there a way to handle immutability that's robust and scalable?

Since bigquery is append-only, I was thinking about stamping each record I upload to it with an 'effective date' similar to how peoplesoft works, if anybody is familiar with that pattern.
Then, I could issue a select statement and join on the max effective date
select UTC_USEC_TO_MONTH(timestamp) as month, sum(amt)/100 as sales
from foo.orders as all
join (select id, max(effdt) as max_effdt from foo.orders group by id) as latest
on all.effdt = latest.max_effdt and all.id = latest.id
group by month
order by month;
Unfortunately, I believe this won't scale because of the big query 'small joins' restriction, so I wanted to see if anyone else had thought around this use case.
Yes, adding a timestamp for each record (or in some cases, a flag that captures the state of a particular record) is the right approach. The small side of a BigQuery "Small Join" can actually return at least 8MB (this value is compressed on our end, so is usually 2 to 10 times larger), so for "lookup" table type subqueries, this can actually provide a lot of records.
In your case, it's not clear to me what the exact query you are trying to run is.. it looks like you are trying to return the most recent sales times of every individual item - and then JOIN this information with the SUM of sales amt per month of each item? Can you provide more info about the query?
It might be possible to do this all in one query. For example, in our wikipedia dataset, an example might look something like...
SELECT contributor_username, UTC_USEC_TO_MONTH(timestamp * 1000000) as month,
SUM(num_characters) as total_characters_used FROM
[publicdata:samples.wikipedia] WHERE (contributor_username != '' or
contributor_username IS NOT NULL) AND timestamp > 1133395200
AND timestamp < 1157068800 GROUP BY contributor_username, month
ORDER BY contributor_username DESC, month DESC;
...to provide wikipedia contributions per user per month (like sales per month per item). This result is actually really large, so you would have to limit by date range.
UPDATE (based on comments below) a similar query that finds "num_characters" for the latest wikipedia revisions by contributors after a particular time...
SELECT current.contributor_username, current.num_characters
FROM
(SELECT contributor_username, num_characters, timestamp as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL)
AS current
JOIN
(SELECT contributor_username, MAX(timestamp) as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL AND timestamp > 1265073722 GROUP BY contributor_username) AS latest
ON
current.contributor_username = latest.contributor_username
AND
current.time = latest.time;
If your query requires you to use first build a large aggregate (for example, you need to run essentially an accurate COUNT DISTINCT) another option is to break this query up into two queries. The first query could provide the max effective date by month along with a count and save this result as a new table. Then, could run a sum query on the resulting table.
You could also store monthly sales records in separate tables, and only query the particular table for the months you are interested in, simplifying your monthly sales summaries (this could also be a more economical use of BigQuery). When you need to find aggregates across all tables, you could run your queries with multiple tables listed after the FROM clause.