Multiple filters on SQL query - sql

I have been reading many topics about filtering SQL queries, but none seems to apply to my case, so I'm in need of a bit of help. I have the following data on a SQL table.
Date item quantity moved quantity in stock sequence
13-03-2012 16:51:00 xpto 2 2 1
13-03-2012 16:51:00 xpto -2 0 2
21-03-2012 15:31:21 zyx 4 6 1
21-03-2012 16:20:11 zyx 6 12 2
22-03-2012 12:51:12 zyx -3 9 1
So this is quantities moved in the warehouse, and the problem is on the first two rows which was a reception and return at the same time, because I'm trying to make a query which gives me the stock at a given time of all items. I use max(date) but i don't get the right quantity on result.

SELECT item, qty_in_stock
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY item_date DESC, sequence DESC) rn
FROM mytable
WHERE item_date <= #date_of_stock
) q
WHERE rn = 1

If you are on SQL-Server 2012, these are several nice features added.
You can use the LAST_VALUE - or the FIRST_VALUE() - function, in combination with a ROWS or RANGE window frame (see OVER clause):
SELECT DISTINCT
item,
LAST_VALUE(quantity_in_stock) OVER (PARTITION BY item
ORDER BY date, sequence
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS quantity_in_stock
FROM tableX
WHERE date <= #date_of_stock

Add a where clause and do the summation:
select item, sum([quantity moved])
from t
group by item
where t.date <= #DESIREDDATETIME
If you put a date in for the desired datetime, remember that goes to midnight when the day starts.

Related

SQL Distinct / GroupBy

Ok, I’m stuck on an SQL query and tried long enough that it’s time to ask for help :) I'm using Objection.js – but that's not super relevant as I really just can't figure out how to structure the SQL.
I have the following example data set:
Items
id
name
1
Test 1
2
Test 2
3
Test 3
Listings
id
item_id
price
created_at
1
1
100
1654640000
2
1
60
1654640001
3
1
80
1654640002
4
2
90
1654640003
5
2
90
1654640004
6
3
50
1654640005
What I’m trying to do:
Return the lowest priced listing for each item
If all listings for an item have the same price, I want to return the newest of the two items
Overall, I want to return the resulting items by price
I’m trying to write a query that returns the data:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Any help would be greatly appreciated! I'm also starting fresh, so I can add new columns to the data if that would help at all :)
An example of where my query is right now:
select * from "listings" inner join (select "item_id", MIN(price) as "min_price" from "listings" group by "item_id") as "grouped_listings" on "listings"."item_id" = "grouped_listings"."item_id" and "listings"."price" = "grouped_listings"."min_price" where "listings"."sold_at" is null and "listings"."expires_at" > ? order by CAST(price AS DECIMAL) ASC limit ?;
This gets me listings – but if two listings have the same price, it returns multiple listings with the same item_id – not ideal.
Given the postgresql tag, this should work:
with listings_numbered as (
select *, row_number() over (
partition by item_id
order by price asc, created_at desc
) as rownum
from listings
)
select l.id, l.item_id, i.name, l.price, l.created_at
from listings_numbered l
join items i on l.item_id=i.id
where l.rownum=1
order by price asc;
This is a bit of an advanced query, using window functions and a common table expression, but we can break it down.
with listings_numbered as (...) select simply means to run the query inside of the ..., and then we can refer to the results of that query as listings_numbered inside of the select, as though it was a table.
We're selecting all of the columns in listings, plus one more:
row_number() over (partition by item_id order by price asc, created_at desc). partition by item_id means that we would like the row number to reset for each new item_id, and the order by specifies the ordering that the rows should get within each partition before we number them: first increasing by price, then decreasing by creation time to break ties.
The result of the CTE listings_numbered looks like:
id
item_id
price
created_at
rownum
2
1
60
1654640001
1
3
1
80
1654640002
2
1
1
100
1654640000
3
5
2
90
1654640004
1
4
2
90
1654640003
2
6
3
50
1654640005
1
If you look at only the rows where rownum (the last column) is 1, then you can see that it's exactly the set of listings that you're interested in.
The outer query then selects from this this dataset, joins on items to get the name, filters to only the listings where rownum is 1, and sorts by price, to get the final result:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Aggregation functions, as the MIN function you employed in your query, is a viable option, yet if you want to have an efficient query for your problem, window functions can be your best friends. This class of functions allow to compute values over "windows" (partitions) of your table given some specified columns.
For the solution to this problem I'm going to compute two values using the window functions:
the minimum value for "listings.price", by partitioning on "listings.item_id",
the maximum value for "created_at", by partitioning on "listings.item_id" and listings.price
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
Once you have all records of listings associated to the corresponding minimum price and latest date, it's necessary for you to select the records whose
price equals the minimum price
created_at equals the most recent created_at
WITH cte AS (
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
)
SELECT id,
item_id,
price,
created_at
FROM cte
WHERE price = min_price
AND created_at = max_created_at
If you need to order by price, it's sufficient to add a ORDER BY price clause.
Check the demo here.

select record from filtered results in postgreSQL

I have a function which gives me ordered rows (the function already order by the results):
By doing : select * from func1(ID)
Example A:
rownum date qty
1 1.1.10 -5
2 1.10.10 6
3 2.10.10 6
4 5.10.10 -2
5 6.10.10 -8
Example B:
rownum date qty
1 1.1.10 -7
2 1.10.10 6
Note: rownum is a column i calculate manually in my function. It's order is exact to my needs. So it's ok to be based on that.
I want to write a query which pass over the rows from bottom to top (highest rownum till lowest rownum) and return the date of the first encountered row that has negative qty
For example A the returned value is 6.10.10 (rownum 5 is the first row with negative value of qty)
for example B 1.1.10 (rownum 2 is the first row with negative value of qty)
How can I do that?
You could use ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY rownum DESC) as rn
FROM func1(ID)
WHERE qty < 0
)
SELECT *
FROM cte
WHERE rn = 1;
I don't believe that table-valued functions in Postgres guarantee the ordering of the results. You will need to use an order by or some other mechanism.
If you want the most recent date with a negative value:
select max(date)
from func1(ID) t
where qty < 0;
If you have another ordering in mind:
select date
from func1(ID) t
where qty < 0
order by rownum desc -- your order by conditions here, but inverted
fetch first 1 row only;
This will allow you to fetch other values from the row, if you like.

Calculate distinct totals over time

I have the following data:
UniqueID SenderID EntryID Date
1 1 1 2015-09-17
2 1 1 2015-09-23
3 2 1 2015-09-17
4 2 1 2015-09-17
5 3 1 2015-09-17
6 4 1 2015-09-19
7 3 1 2015-09-20
What I require is the following:
3 2015-09-17
4 2015-09-19
4 2015-09-20
4 2015-09-23
Where the first column is the total unique entries upto that date. So for example the entry on the 23/9 of Sender 1 and Entry 1 does not increase the total column because there is a duplicate from the 17/9.
How can I do this efficiently ideally without joining on the same table as what you end up with is a very large query which is not practical. I have done something similar in Postgres with OVER() but unfortunately this isn't available in this setup.
I could also do this in code - which I have but yet again it has to calculate outside of the db system and then import back in. With millions of rows this process takes days and I ideally only have hours.
OVER is ANSI standard functionality available in most databases. What you are counting are starts for users, and you can readily do this with a cumulative sum:
select startdate,
sum(count(*)) over (order by startdate) as CumulativeUniqueCount
from (select senderid, min(date) as startdate
from table t
group by senderid
) t
group by startdate
order by startdate;
This should work in any database that supports window functions, such as Oracle, SQL Server 2012+, Postgres, Teradata, DB2, Hive, Redshift, to name a few.
EDIT:
You need a left join to get all the dates in the data:
select d.date,
sum(count(d.date)) over (order by d.date) as CumulativeUniqueCount
from (select distinct date from table t) d left join
(select senderid, min(date) as startdate
from table t
group by senderid
) t
on t.startdate = d.date
group by d.date
order by d.date;
Credit to Gordon Linoff for the basic query. However, it will not return rows for dates that don't increase the cumulative sum.
To get those extra rows, you need to include an additional subquery that lists all the distinct dates from the table. And then you left join with Gordon's query + a few minor tweaks to get the desired result:
select d.SomeDate,
sum(count(t.SenderId)) over (order by d.SomeDate)
from (select distinct SomeDate
from SomeTable) d
left join (select SenderId, min(somedate) as MinDate
from SomeTable
group by SenderId) t
on d.SomeDate = t.MinDate
group by d.SomeDate
order by d.SomeDate;

T-SQL calculate moving average

I am working with SQL Server 2008 R2, trying to calculate a moving average. For each record in my view, I would like to collect the values of the 250 previous records, and then calculate the average for this selection.
My view columns are as follows:
TransactionID | TimeStamp | Value | MovAvg
----------------------------------------------------
1 | 01.09.2014 10:00:12 | 5 |
2 | 01.09.2014 10:05:34 | 3 |
...
300 | 03.09.2014 09:00:23 | 4 |
TransactionID is unique. For each TransactionID, I would like to calculate the average for column value, over previous 250 records. So for TransactionID 300, collect all values from previous 250 rows (view is sorted descending by TransactionID) and then in column MovAvg write the result of the average of these values. I am looking to collect data within a range of records.
The window functions in SQL 2008 are rather limited compared to later versions and if I remember correct you can only partition and you can't use any rows/range frame limit but I think this might be what you want:
;WITH cte (rn, transactionid, value) AS (
SELECT
rn = ROW_NUMBER() OVER (ORDER BY transactionid),
transactionid,
value
FROM your_table
)
SELECT
transactionid,
value,
movagv = (
SELECT AVG(value)
FROM cte AS inner_ref
-- average is calculated for 250 previous to current row inclusive
-- I might have set the limit one row to large, maybe it should be 249
WHERE inner_ref.rn BETWEEN outer_ref.rn-250 AND outer_ref.rn
)
FROM cte AS outer_ref
Note that it applies a correlated sub-query to every row and performance might not be great.
With the later versions you could have used window frame functions and done something like this:
SELECT
transactionid,
value,
-- avg over the 250 rows counting from the previous row
AVG(value) OVER (ORDER BY transactionid
ROWS BETWEEN 251 PRECEDING AND 1 PRECEDING),
-- or 250 rows counting from current
AVG(value) OVER (ORDER BY transactionid
ROWS BETWEEN 250 PRECEDING AND CURRENT ROW)
FROM your_table
Use a Common Table Expression (CTE) to include the rownum for each transaction, then join the CTE against itself on the row number so you can get the previous values to calculate the average with.
CREATE TABLE MyTable (TransactionId INT, Value INT)
;with Data as
(
SELECT TransactionId,
Value,
ROW_NUMBER() OVER (ORDER BY TransactionId ASC) as rownum
FROM MyTable
)
SELECT d.TransactionId , Avg(h.Value) as MovingAverage
FROM Data d
JOIN Data h on h.rownum between d.rownum-250 and d.rownum-1
GROUP BY d.TransactionId

SQL query to get status as of a given date

I'm sure this has been answered before but couldn't find it.
I have a table of items which change status every few weeks. I want to look at an arbitrary day and figure out how many items were in each status.
For example:
tbl_ItemHistory
ItemID
StatusChangeDate
StatusID
Sample data:
1001, 1/1/2010, 1
1001, 4/5/2010, 2
1001, 6/15/2010, 4
1002, 4/1/2010, 1
1002, 6/1/2010, 3
...
So I need to figure out how many items were in each status for a given day. So on 5/1/2010, there was one item (1001) in status 2 and one item in status 1 (1002).
Since these items don't change status very often, maybe I could create a cached table every night that has a row for every item and every day of the year? I'm not sure if that's best or how to do that though
I'm using SQL Server 2008R2
For an arbitrary day, you can do something like this:
select ih.*
from (select ih.*,
row_number() over (partition by itemId order by StatusChangeDate desc) as seqnum
from tbl_ItemHistory ih
where StatusChangeDate <= #YOURDATEGOESHERE
) ih
where seqnum = 1
The idea is to enumerate all the history records for each on or before the date,using row_nubmer. The ordering is in reverse chronological order, so the most recent record -- on or before the date -- has a value of 1.
The query then just chooses the records whose value is 1.
To aggregate the results to get each status for the date, use:
select statusId, count(*)
from (select ih.*,
row_number() over (partition by itemId order by StatusChangeDate desc) as seqnum
from tbl_ItemHistory ih
where StatusChangeDate <= #YOURDATEGOESHERE
) ih
where seqnum = 1
group by StatusId