Grabbing the cost with the max date grouping on number openquery - sql

I am trying to select information from another server, and insert it into a table through open query... Here is where I am at so far:
INSERT INTO smallprojects..PhyInv_310QADLockedDet (MasterRecid, location, partnum, qty)
SELECT ##IDENTITY, ld_loc, ld_part, ld_qty_oh
FROM OPENQUERY(LANSRHQAD, 'SELECT ld_loc,ld_part,ld_qty_oh FROM PUB.ld_det as a left outer join PUB.pt_mstr as b on a.ld_part = b.pt_part where pt_status <> ''OB'' and ld_part not like ''S%'' and ld_part not like ''N%'' and ld_loc = ''310'' ')
But this will insert multiple part numbers if the PUB.ld_det has multiple entries for that part, sort of like the example below:
Here is the data (PUB.ld_det):
Part | Date | Qty
-------------------
1000 | 10-02 | 0
1000 | 10-03 | 2
1001 | 10-2 | 0
1001 | 10-2 | 2
I would like my result to be a insert into a table as:
Part | Qty
-------------------
1000 | 2
1001 | 2
Currently it is returning as:
Part | Qty
-----------
1000 | 0
1000 | 2
1001 | 0
1001 | 2
So when I go back to update this table I just have to hope it finds the right row.
How can I avoid bringing in the multiples and only bring it in with the highest date? The open query thing messes with me so much

Here is one simple method is you want one row per part:
INSERT INTO smallprojects..PhyInv_310QADLockedDet (MasterRecid, location, partnum, qty)
SELECT TOP (1) WITH TIES ##IDENTITY, ld_loc, ld_part, ld_qty_oh
FROM OPENQUERY(LANSRHQAD, 'SELECT ld_loc,ld_part,ld_qty_oh FROM PUB.ld_det as a left outer join PUB.pt_mstr as b on a.ld_part = b.pt_part where pt_status <> ''OB'' and ld_part not like ''S%'' and ld_part not like ''N%'' and ld_loc = ''310'' ')
ORDER BY ROW_NUMBER() OVER (PARTITION BY ld_part ORDER BY ld_qty_oh DESC);
Use RANK() if you want duplicates when there are ties.

Related

Sum of two tables using SQL

I'm trying to get the sum of two columns, but it seems to be adding incorrectly. I have a table Tbl_Booths and another table called Tbl_Extras.
In the Tbl_Booths:
BoothId | ExhId | BoothPrice
1 | 1 | 400
2 | 1 | 500
3 | 2 | 400
4 | 3 | 600
So totalBoothPrice for ExhId = 1 is 900
Tbl_Extras:
ExtraId | ExhId | Item | ItemCost
1 | 1 | PowerSupply | 400
2 | 2 | PowerSupply | 400
3 | 1 | Lights | 600
4 | 3 | PowerSupply | 400
5 | 4 | Lights | 400
So totalItemCost for ExhId = 1 is 1000
I need to find a way to get the sum of totalBoothPrice + totalItemCost
The value should of course be 900 + 1000 = 1900
I'm a total beginner to SQL so please have patience :-)
Thank you in advance for any input you can give me, since I'm going made here !
It is used in a Caspio database system.
You can use union all to combine the two tables and then aggregate:
select exhid, sum(price)
from ((select exhid, boothprice as price
from tbl_booths
) union all
(select exhid, itemcost as price
from tbl_extras
)
) e
group by exhid;
This returns the sum for all exhid values. If you want to filter them, then you can use a where clause in either the outer query or both subqueries.
Here is a db<>fiddle.
Booth totals:
select exhid, sum(boothprice) as total_booth_price
from tbl_booths
group by exhid;
Extra totals:
select exhid, sum(itemcost) as total_item_cost
from tbl_extras
group by exhid;
Joined:
select
exhid,
b.total_booth_price,
e.total_item_cost,
b.total_booth_price + e.total_item_cost as total
from
(
select exhid, sum(boothprice) as total_booth_price
from tbl_booths
group by exhid
) b
join
(
select exhid, sum(itemcost) as total_item_cost
from tbl_extras
group by exhid
) e using (exhid)
order by exhid;
This only shows exhids that have both booth and extras, though. If one can be missing use a left outer join. If one or the other can be missing, you'd want a full outer join, which MySQL doesn't support.

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Access query to grab +5 or more duplicates

i have a little problem with an Access query ( dont ask me why but i cannot use a true SGBD but Access )
i have a huge table with like 920k records
i have to loop through all those data and grab the ref that occur more than 5 time on the same date
table = myTable
--------------------------------------------------------------
| id | ref | date | C_ERR_ANO |
--------------------------------------------|-----------------
| 1 | A12345678 | 2012/02/24 | A 4565 |
| 2 | D52245708 | 2011/05/02 | E 5246 |
| ... | ......... | ..../../.. | . .... |
--------------------------------------------------------------
so to resume it a bit, i have like 900000+ records
there is duplicates on the SAME DATE ( oh by the way there is another collumn i forgot to add that have C_ERR_ANO as name)
so i have to loop through all those row, grab each ref based on date AND errorNumber
and if there is MORE than 5 time with the same errorNumber i have to grab them and display it in the result
i ended up using this query:
SELECT DISTINCT Centre.REFERENCE, Centre.DATESE, Centre.C_ERR_ANO
FROM Centre INNER JOIN (SELECT
Centre.[REFERENCE],
COUNT(*) AS `toto`,
Centre.DATESE
FROM Centre
GROUP BY REFERENCE
HAVING COUNT(*) > 5) AS Centre_1
ON Centre.REFERENCE = Centre_1.REFERENCE
AND Centre.DATESE <> Centre_1.DATESE;
but this query isent good
i tried then
SELECT DATESE, REFERENCE, C_ERR_ANO, COUNT(REFERENCE) AS TOTAL
FROM (
SELECT *
FROM Centre
WHERE (((Centre.[REFERENCE]) NOT IN (SELECT [REFERENCE]
FROM [Centre] AS Tmp
GROUP BY [REFERENCE],[DATESE],[C_ERR_ANO]
HAVING Count(*)>1 AND [DATESE] = [Centre].[DATESE]
AND [C_ERR_ANO] = [Centre].[C_ERR_ANO]
AND [LIBELLE] = [Centre].[LIBELLE])))
ORDER BY Centre.[REFERENCE], Centre.[DATESE], Centre.[C_ERR_ANO])
GROUP BY REFERENCE, DATESE, C_ERR_ANO
still , not working
i'm struggeling
Your group by clause needs to include all of the items in your select. Why not use:
select Centre.DATESE, Centre.C_ERR_ANO, Count (*)
Group by Centre.DATESE, Centre.C_ERR_ANO
HAVING COUNT (*) > 5
If you need other fields then you can add them, as long as you ensure the same fields appear in the select as the group by.
No idea what is going on with the formatting here!

SQL Query Count and Label if first

I'm using SQL Server management Studio 2010.
I need my query to populate a column if the record contains the first occurrence of a value.
The table my query returns is huge so I'll just use pretend columns to get my point across. My query currently returns a table that looks like this
| ROW | ItemNumber | DateOpen | Status |
| 1 | 10045 | 5/5/2005 | Open |
| 2 | 10045 | 5/5/2005 | Open |
| 3 | 10046 | 5/5/2005 | Open |
| 4 | 10046 | 5/5/2005 | Open |
| 5 | 10046 | 5/5/2005 | Open |
I've already added the row indicator in to the query thinking it would help identify the first occurrence of an ItemNumber. I need to have a new column that marks an X if the record is the first occurrence.
I have this so far
DECLARE #ItemData Table(itemRow BIGINT, itemNumber BIGINT, DateOpen VARCHAR(15), status VARCHAR(15))
INSERT INTO #ItemData (itemRow, ItemNumber, DateOpen, Status)
SELECT Row_Number() OVER(ORDER BY Cm_ItemNumber) AS 'ROW'
,Cm_ItemNumber AS ItemNumber
,Dates_DateOpen AS DateOpen
,St_Status AS Status
FROM db_Items
JOIN db_Dates ON Dates_Item = Cm_ItemID
JOIN db_Status ON St_ID = Cm_StatusID
Select * from #ItemData
The reason its thrown into a table then selected seperately is because there's actually a union and a lot more stuff in the query and when I realized I needed to do a unique checker and add it to a column, I figured the easiest way would be an "after the fact" type thing and it would go into the Select * from #ItemData portion.
I haven't tested this, but it's along the lines I'd be playing with...
using your existing query as the first of 2 CTEs :
With AllData as
(
SELECT Row_Number() OVER(ORDER BY Cm_ItemNumber) AS 'ROW'
,Cm_ItemNumber AS ItemNumber
,Dates_DateOpen AS DateOpen
,St_Status AS Status
FROM db_Items
JOIN db_Dates ON Dates_Item = Cm_ItemID
JOIN db_Status ON St_ID = Cm_StatusID
),
FirstRows as
(
SELECT Min(ROW) as Row, ItemNumber
FROM AllData
GROUP BY ItemNumber
)
SELECT
ad.*,
Case When fr.Row IS NULL then '' else 'X' end as X_Col
FROM AllData ad LEFT JOIN FirstRows fr
ON ad.ROW=fr.Row

Remove redundant SQL price cost records

I have a table costhistory with fields id,invid,vendorid,cost,timestamp,chdeleted. It looks like it was populated with a trigger every time a vendor updated their list of prices.
It has redundant records - since it was populated regardless of whether price changed or not since last record.
Example:
id | invid | vendorid | cost | timestamp | chdeleted
1 | 123 | 1 | 100 | 1/1/01 | 0
2 | 123 | 1 | 100 | 1/2/01 | 0
3 | 123 | 1 | 100 | 1/3/01 | 0
4 | 123 | 1 | 500 | 1/4/01 | 0
5 | 123 | 1 | 500 | 1/5/01 | 0
6 | 123 | 1 | 100 | 1/6/01 | 0
I would want to remove records with ID 2,3,5 since they do not reflect any change since the last price update.
I'm sure it can be done, though it might take several steps.
Just to be clear, this table has swelled to 100gb and contains 600M rows. I am confident that a proper cleanup will take this table's size down by 90% - 95%.
Thanks!
The approach you take will vary depending on the database you are using. For SQL Server 2005+, the following query should give you the records you want to remove:
select id
from (
select id, Rank() over (Partition BY invid, vendorid, cost order by timestamp) as Rank
from costhistory
) tmp
where Rank > 1
You can then delete them like this:
delete from costhistory
where id in (
select id
from (
select id, Rank() over (Partition BY invid, vendorid, cost order by timestamp) as Rank
from costhistory
) tmp
)
I would suggest that you recreate the table using a group by query. Also, I assume the the "id" column is not used in any other tables. If that is the case, then you need to fix those tables as well.
Deleting such a large quantity of records is likely to take a long, long time.
The query would look like:
insert into newversionoftable(invid, vendorid, cost, timestamp, chdeleted)
select invid, vendorid, cost, timestamp, chdeleted
from table
group by invid, vendorid, cost, timestamp, chdeleted
If you do opt for a delete, I would suggestion:
(1) Fix the code first, so no duplicates are going in.
(2) Determine the duplicate ids and place them in a separate table.
(3) Delete in batches.
To find the duplicate ids, use something like:
select *
from (select id,
row_number() over (partition by invid, vendorid, cost, timestamp, chdeleted order by timestamp) as seqnum
from table
) t
where seqnum > 1
If you want to keep the most recent version instead, then use "timestamp desc" in the order by clause.