SQL: efficiently get the last record - sql

I have a table order which contains order date.
WarehouseId | OrderId | ItemId | OrderDate
-------------------------------------------
1 | 1 | 1 | 2016-08-01
1 | 2 | 2 | 2016-08-02
1 | 3 | 5 | 2016-08-10
2 | 1 | 1 | 2016-08-05
3 | 1 | 6 | 2016-08-06
(table is simplified and only shown required fields)
How to efficiently select the last order for particular Warehouse? I am currently do:
SELECT TOP 1 * FROM tblOrder WHERE WarehouseId = 1 ORDER BY OrderDate DESC
My concern is, when I have a million (or more) orders for particular warehouse, by doing sorting and select the first record, it will be too slow (I think?).
Is there any more efficient way to select the last order record?
Thanks

If you're going to it a lot, you could consider setting an index on the OrderDate field. That will speed things up (but be aware it might have an impact on other queries against this table - it's a complicated topic, talk to a DBA!).
Otherwise, your query is fine, unless you're worried about the ordering when there are identical dates, in which case you should decide on a secondary field to order by as well, such as OrderID (which you suggested in the comments).

Related

Linking 2 columns, same table to a different table

First time poster, a little background I am not the most experienced SQL user, most of my knowledge is self taught, but I really struggling to get the results I am looking for here so I am hoping someone can point me in the right direction.
In the simplest form
I have a table that has all of our Item_ID's. Each of those item numbers has a Universal_ID associated with it stored in the same table structure. Most of the time these numbers match, except in the example below Item_ID 2 has a Universal_ID of 1
Item_ID | Univeral_ID
1 | 1
2 | 1
We then have an inventory table, which can be linked on the ItemID to show the QTY
Item_ID | Item_Qty | Item_Code
1 | 10 | 2/2/2021
1 | 20 | 2/3/2021
2 | 30 | 2/2/2021
If the Item_ID and Universal_ID are the same, it is quite easy to obtain the inventory
However I am struggling to get inventories for both when they do not match.
For example, if I wanted to find the QTY of Item_ID 1, I would be returned 2 results
Item_ID | Item_Qty | Item_Code
1 | 10 | 2/2/2021
1 | 20 | 2/3/2021
Problem: if I specifically am interested in Item_ID 2, how can I link it to the inventory table, to see not only Item_ID 2's qty available and also Item_ID 1's qty available since the Universal_ID does not match the Item_ID
So I would like the results to be just like the 2nd block of code I posted.
Item_ID | Item_Qty | Item_Code
1 | 10 | 2/2/2021
1 | 20 | 2/3/2021
2 | 30 | 2/2/2021
What is the best way to set up views or my select query to make this happen? If I need to add any more info I can!
You can use a left join and filtering:
select i.*
from inventory i left join
universal u
on i.item_id = u.item_id
where 1 in (u.universal_id, i.item_id);

Postgres - Return products not sold recently (more efficient query)

I am trying to find products in our database that have not been sold since 2019.
The VIEW(itemsales) that I have consist of multiple columns (80 to be exact)
I found a similar question MySQL. Products not sold in a period.
Sample of my view below
+----+--------+------------+------------+
| id | itemid | description | salesdate | + more columns
+----+--------+------------+------------+
| 1 | 10 | maltesers | 1/12/2018 |
| 2 | 11 | kitkat | 10/15/2018 |
| 3 | 12 | mars | 1/12/2018 |
| 4 | 13 | ferrero | 3/3/2018 |
| 5 | 12 | mars | 12/31/2019 |
| 6 | 10 | maltesers | 2/28/2019 |
| 7 | 16 | milk | 6/20/2020 |
| 8 | 17 | buttons | 12/23/2020 |
+----+--------+------------+------------+ + 100k rows
My query is below:
SELECT distinct description, itemid
FROM itemsales
WHERE itemid not in ( select itemid FROM itemsales WHERE
salesdate BETWEEN '2019-01-01' and current_date)
The problem is the view has probably 100,000+ rows and when I run the query above, it's taking ages to return a result.
Is there any other query to view the products not sold recently without altering tables? Thanks
One method is aggregation:
select itemid, description
from itemsales
group by itemid, description
having max(salesdate) < '2019-01-01'
One problem is while the query underlying the view can be indexed, the view itself cannot not. Without seeing the query, it's explain, and the schemas I can't say how much this matters. But you can solve it with a materialized view.
While a view is syntax sugar for a complex query, a materialized view copies the results of the query to a table. That table can then be indexed for performance. The downside is the materialized view has to be manually refreshed. For statistical data this often is not a problem. Depending on how often a view is used, refreshing even every few seconds can bring big performance boosts.
You can try out a materialized view with salesdate indexed.
begin
create materialized view m_itemsales as select ...;
create index m_itemsales_salesdate on m_itemsales(salesdate);
commit;
Then use Gordon's improved query and see if that helps performance. If it does, replace itemsales with m_itemsales and set up a scheduled job to refresh materialized view itemsales.

Select rows from a filtered portion of Table A where a column matches a relationship with a column from the row in Table B that matches by ID

I want to get all rows in a table where one column matches a relationship with the value of the column in the row in a different table that has the same value of another column.
Concretely, I have two tables, orders and product_info that I'm accessing through Amazon Redshift
Orders
| ID | Date | Amount | Region |
=====================================
| 1 | 2019/4/1 | $120 | A |
| 1 | 2019/4/4 | $100 | A |
| 2 | 2019/4/2 | $50 | A |
| 3 | 2019/4/6 | $70 | B |
The partition keys of order are region and date.
Product Information
| ID | Release Date | Region |
| ---- | ------------ | ------ |
| 1 | 2019/4/2 | A |
| 2 | 2019/4/3 | A |
| 3 | 2019/4/5 | B |
The primary key of product information is id, and the partition key is region.
I want to get all rows from Orders in region A where the date of the row is greater than the release date value in product information for that ID.
So in this case it should return just one row,
| 1 | 2019/4/4 | $100 | A |
I tried doing
select *
from orders
INNER JOIN product_info ON orders.date>product_info.release_date
AND orders.id=product_info.id
AND orders.region=A
AND product_info.region=A
limit 10
The problem is that this query was absurdly slow (cancelled it after 10 minutes). The tables are extremely large, and I have a feeling it was scanning the entire table without restricting it to region first (in reality I have other filters in addition to region that I want to apply to the list of IDs before I do the inner join, but I've limited it to only region for the sake of simplifying the question).
How can I efficiently write this type of query?
The best way to make an SQL query faster is to exclude rows as soon as possible.
So, rather than putting conditions like orders.region=A in the JOIN statement, you should move them to a WHERE statement. This will eliminate rows before they are joined.
Also, make the JOIN condition as simple as possible so that the database can optimize the comparison.
Try something like this:
SELECT *
FROM orders
INNER JOIN product_info ON orders.id = product_info.id
WHERE orders.region = 'A'
AND product_info.region = 'A'
AND orders.date > product_info.release_date
Any further optimization would require consideration of the DISTKEY and SORTKEY on the Redshift tables. (Preferably a DISTKEY of id and a SORTKEY of date).

SQL Oracle - where clause on value on column next to condition

I have a query which return more than a million rows based on the Entity-Attribute-Value model. Note that each entity may have a different number of attributes, therefore, I can't just look for a row ID. Here is an example table:
+----------+-----------+------------+
| EntityID | Attr_Name | Attr_Value |
+----------+-----------+------------+
| 1 | Age | 2 |
+----------+-----------+------------+
| 1 | Class | Spatial |
+----------+-----------+------------+
| 2 | Age | 3 |
+----------+-----------+------------+
| 2 | Class | Industrial |
+----------+-----------+------------+
| 3 | Class | Industrial |
+----------+-----------+------------+
I need to filter all the EntityID according to their Class. In this example, let's say I need all the EntityID that are Industrial, I want my query to return rows 3-4-5 (so all rows associated with EntityID 2 and 3).
I thought about using a sub-select on the same query and grouping by EntityID and looking only for all EntityIDs that are Industrial in the where clause (WHERE EntityID = (subquery)), but is not effective at all. The query has a lot of joins and unions and therefore, it takes a lot of time. I'm open to all suggestions for a more efficient way of doing it (which I'm sure there is) !
Thanks.
You can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.entityid = t.entityid and
t2.attr_name = 'Class' and
t2.attr_value = 'Industrial'
);

SQL: Bug in Joining two tables

I have a item table from which i want to get Sum of item quantity
Query:
Select item_id, Sum(qty) from item_tbl group by item_id
Result:
==================
| ID | Quantity |
===================
| 1 | 10 |
| 2 | 20 |
| 3 | 5 |
| 4 | 20 |
The second table is invoice table from which i am getting the item quantity which is sold. I am joining these two tables as
Query:
Select item_tbl.item_id, Sum(item_tbl.qty) as [item_qty],
-isnull(Sum(invoice.qty),0) as [invoice_qty]
from item_tbl
left join invoice on item_tbl.item_id = invoice invoice.item_id group by item_tbl.item_id
Result:
=================================
| ID | item_qty | invoice_qty |
=================================
| 1 | 10 | -5 |
| 2 | 20 | -20 |
| 3 | 10 | -25 | <------ item_qty raised from 5 to 10 ??
| 4 | 20 | -20 |
I don't know if i am joining these tables in right way. Because i want to get everything from item table and available things from invoice table to maintain the inventory. So i use left join. Help please..
Modification
when i added group by item_id, qty i got this:
=================================
| ID | item_qty | invoice_qty |
=================================
| 1 | 10 | -5 |
| 2 | 20 | -20 |
| 3 | 5 | -5 |
| 3 | 5 | -20 |
| 4 | 20 | -20 |
As its a view so ID is repeated. what should i do to avoid this ??
Clearing things up, my answer from the comments explained:
While using left join operation (A left join B) - a record will be created for every matching B record to an A record, also - a record will be created for any A record that has no matching B record, using null values wherever needed to complement the fields from B.
I would advise reading up on Using Joins in SQL when approaching such problems.
Below are 2 possible solutions, using different assumptions.
Solution A
Without any assumptions regarding primary key:
We have to sum up the item quantity column to determine the total quantity, resulting in two sums that need to be performed, I would advise using a sub query for readability and simplicity.
select item_tbl.item_id, Sum(item_tbl.qty) as [item_qty], -isnull(Sum(invoice_grouped.qty),0) as [invoice_qty]
from item_tbl left join
(select invoice.item_id as item_id, Sum(invoice.qty) as qty from invoice group by item_id) invoice_grouped
on (invoice_grouped.item_id = item_tbl.item_id)
group by item_tbl.item_id
Solution B
Assuming item_id is primary key for item_tbl:
Now we know we can rely on the fact that there is only one quantity for each item_id, so we can do without the sub query by selecting any (max) of the item quantities in the join result, resulting in a quicker execution plan.
select item_tbl.item_id, Max(item_tbl.qty) as [item_qty], -isnull(Sum(invoice.qty),0) as [invoice_qty]
from item_tbl left join invoice on (invoice.item_id = item_tbl.item_id)
group by item_tbl.item_id
If your database design is following the common rules, item_tbl.item_id must be unique.
So just change your query:
Select item_tbl.item_id, item_tbl.qty as [item_qty],
-isnull(Sum(invoice.qty),0) as [invoice_qty]
from item_tbl
left join invoice on item_tbl.item_id = invoice invoice.item_id group by item_tbl.item_id, item_tbl.qty