I'm attempting to have a function or view that is able to calculate and roll up various counts while being able to search on a many to many affiliation.
Here is an example data set:
Invoice Table:
InvoiceID LocationID StatusID
1 5 1
2 5 1
3 5 1
4 5 2
5 7 2
5 7 1
5 7 2
Group Table:
GroupID GroupName
1 Group 1
2 Group 2
GroupToLocation Table:
GroupToLocationID GroupID LocationID
1 1 5
2 2 5
3 2 7
I have gotten to the point where I could sum up the various statuses per location and get this:
LocationID Status1 Status2
5 3 1
7 1 2
Location 5 has 3 Invoices with a status of 1, and 1 invoice with a status of 2 while Location 7 has 1 status 1 and 2 status 2
There are two groups, and Location 5 is in both, while Location 7 is only in the second. I need to be able to set it up where I can append a where statement like this:
select * from vw_GroupCounts
where GroupName = 'Group 2'
or
select Invoice, SUM(*) from vw_GroupCounts
where GroupName = 'Group 2'
And that result in only getting Location 7. Whenever I do this, as I have to use left joins or something along those lines, the counts are duplicating for each group the the Location is affiliated with. I know I could do something along the lines of a subquery and pass in the GroupName into that, but the system I am working with uses a dynamic query builder that appends WHERE statements based on user input.
I don't mind using view, or functions, or any number of functions inside of functions, but I hope there is a way to do what I'm looking for.
Since locations 5 and 7 are in Group 2, if you search for group 2 in the where clause after joining all the tables, then you would get all records in this case, this isn't duplication, just the way the data is. A different join wouldn't change this, only changing the data. Let me know if I am misunderstanding something though.
Here is how you would join them to do that search.
Here it is with your first example of the location and status count.
Related
I am trying to build a table of data for use in Yellowfin BI reporting. One limitation of this is that no temporary tables can be created and then dropped in the database. I am pulling the data from an existing database, which i have no control over. I can only use SQL to query the existing data.
There are two tables in the source database i need to work with. I've simplified them for clarity. The first contains organisations. It has an ORG_ID column which contains a unique ID for each organisation and a PARENT_ORG_ID column indicating which organisation is the Parent Company of others in the list:
ORG_ID PARENT_ORG_ID
1 Null
2 1
3 5
4 5
5 Null
6 1
Using the table above i can see that there are the following relationships between organisations:
ORG_ID RELATED_ORGANISATIONS
1 2 and 6
2 1 and 6
3 5 and 4
4 5 and 3
5 4 and 3
6 1 and 2
I'm not sure the best way to represent these connections in a query as i need to use these relationships with a second table.
The second table i have is a list of organisations and money owed:
ORG_ID MONEY_OWED
1 5
2 10
3 0
4 15
5 20
6 5
What i need to achieve is a table that i can search for any single ORG_ID, and see the combined data for that Organisation and all related Organisations. In the case of my example, this could be a results table something like this:
ORG_ID MONEY_OWED_BY_ALL_RELATED_ORGS
1 20
2 20
3 35
4 35
5 35
6 20
I'm thinking i should use a CTE to handle the relationships between organisations but i can't get my head around it.
Any help would be greatly appreciated!
For your particular example, you can use:
select o.*,
sum(mo.money_owed) over (partition by coalesce(o.parent_org_id, o.org_id)) as parent_owed
from organizations o left join
money_owed mo
on mo.org_id = o.org_id;
This works because your organizations are only one level deep -- which is consistent with your sample data.
I have the following table structure.
ITEM TOTAL
----------- -----------------
ID | TITLE ID |ITEMID|VALUE
1 A 1 2 6
2 B 2 1 4
3 C 3 3 3
4 D 4 3 8
5 E 5 1 2
6 F 6 5 4
7 4 5
8 2 8
9 2 7
10 1 3
11 2 2
12 3 6
I am using Apache Derby DB. I need to perform the average calculation in SQL. I need to show the list of item IDs and their average total of the last 3 records.
That is, for ITEM.ID 1, I will go to TOTAL table and select the last 3 records of the rows which are associated with the ITEMID 1. And take average of them. In Derby database, I am able to do this for a given item ID but I cannot make it without giving a specific ID. Let me show you what I've done it.
SELECT ITEM.ID, AVG(VALUE) FROM ITEM, TOTAL WHERE TOTAL.ITEMID = ITEM.ID GROUP BY ITEM.ID
This SQL gives the average of all items in a list. But this calculates for all values of the total tables. I need last 3 records only. So I changed the SQL to this:
SELECT AVG(VALUE) FROM (SELECT ROW_NUMBER() OVER() AS ROWNUM, TOTAL.* FROM TOTAL WHERE ITEMID = 1) AS TR WHERE ROWNUM > (SELECT COUNT(ID) FROM TOTAL WHERE ITEMID = 1) - 3
This works if I supply the item ID 1 or 2 etc. But I cannot do this for all items without giving an item ID.
I tried to do the same thing in ORACLE using partition and it worked. But derby does not support partitioning. There is WINDOW but I could not make use of it.
Oracle one
SELECT ITEMID, AVG(VALUE) FROM(SELECT ITEMID, VALUE, COUNT(*) OVER (PARTITION BY ITEMID) QTY, ROW_NUMBER() OVER (PARTITION BY ITEMID ORDER BY ID) IDX FROM TOTAL ORDER BY ITEMID, ID) WHERE IDX > QTY -3 GROUP BY ITEMID ORDER BY ITEMID
I need to use derby DB for its portability.
The desired output is this
RESULT
-----------------
ITEMID | AVERAGE
1 (9/3)
2 (17/3)
3 (17/3)
4 (5/1)
5 (4/1)
6 NULL
As you have noticed, Derby's support for the SQL 2003 "OLAP Operations" support is incomplete.
There was some initial work (see https://wiki.apache.org/db-derby/OLAPOperations), but that work was only partially completed.
I don't believe anyone is currently working on adding more functionality to Derby in this area.
So yes, Derby has a row_number function, but no, Derby does not (currently) have partition by.
This is my initial table structure.
MEMBER_ID ITEM_ID ACCOUNT
1 3 A
1 4 A
2 1 B
3 4 B
4 4 B
5 4 A
6 2 A
When I want the distinct number of members I do
Select COUNT(DISTINCT MEMBER_ID) FROM TABLE A
I get 6, the expected answer
When I do
SELECT COUNT(DISTINCT MEMBER_ID),ACCOUNT FROM TABLE A GROUP BY 2
I get something like A=4 and B=3, what do you think is the disconnect here.
Thanks
I find the results highly unlikely. You would, however, get 4 and 3 if the data were slightly different:
MEMBER_ID ITEM_ID ACCOUNT
1 3 A
1 4 B
2 1 B
3 4 B
4 4 A
5 4 A
6 2 A
With the group by, MEMBER_ID = 1 would be counted twice -- once for A and once for B. My guess is that something like this is happening for your real problem. COUNT(DISTINCT) is not additive. So, when you break it in apart using a group by, the sum of the values is not (necessarily) the sum for all the data. This differs from MIN(), MAX(), COUNT(*), and SUM(). However, AVG() is also not additive (although it is easily recalculated).
Ive been given some data in a spreadsheet which will soon be going into an automated import so i cannot do any manual entry on the spreadsheet. The data basically has the following columns. Trayid, trayname, itemdescription and rownumber. I didnt build these tables myself or i would of built it differently but i have to stick to the format which is already set.
The Data that is being imported will look at followed.
Trayid | Trayname | ItemDescription | RowNumber
1 Tray 1 Product 1 1
Product 2 2
Product 3 3
Product 4 4
2 Tray 2 Product 1 1
Product 2 2
Product 3 3
Product 4 4
Product 5 5
What i need to do is update the trayid and trayname for each of the other rows following row 1, so for example it will look like.
Trayid | Trayname | ItemDescription | RowNumber
1 Tray 1 Product 1 1
1 Tray 1 Product 2 2
1 Tray 1 Product 3 3
1 Tray 1 Product 4 4
2 Tray 2 Product 1 1
2 Tray 2 Product 2 2
2 Tray 2 Product 3 3
2 Tray 2 Product 4 4
2 Tray 2 Product 5 5
Im guessing i need to use a curser or something but im not sure, i think it can be done by going down the rownumbers and stopping when it see's rownumber 1 again and then carrying on with the next trayid and trayname.
Sorry if what i need doesnt make sense, it was awkward to explain.
SQL tables have no inherent ordering. So you cannot depend on that. But, there is something that you can do:
Define an identity column in the source table.
Create a view on the source table that excludes the identity.
Bulk insert into the view.
This will assign a sequential number to rows in the same order as the original data. Let's call this id. Then you can do your update by doing:
with toupdate (
select t.*,
max(TrayId) over (partition by grp) as new_TrayId,
max(TrayName) over (partition by grp) as new_TrayName
from (select t.*,
count(TrayId) over (order by id) as grp
from t
) t
)
update toupdate
set TrayId = new_TrayId,
TrayName = new_TrayName
where TrayId is null;
The idea is to define groups of rows corresponding to each tray. The simple idea is to count the number of non-NULL values before any given row -- everything in a group will then have the same grp value. Window functions then spread the actual value through all rows in the group (using max()), and these values are used for the update.
I have three tables: the first has a list of category IDs, the second has dataset information, and the third has import information.
What I have
select dataset.pc_id , count(*)
from import
join dataset on CAST (dataset.internal_id as varchar(20)) = import.product_id
group by dataset.pc_id, order by pc_id asc
This will output:
3 4
4 5
6 200
7 192
8 1000
Where product_category comes into play is this: I want the output to look like:
1 0
2 0
3 4
4 5
6 200
...
16 0
The 16 are the number of different product categories from the product_category table that I currently cannot figure out how to fit into that statement.
What is the way to get all the id's from product category into this list with the information joined occupying the result?
Figured it out, needed to get rid of selecting dataset.pc_id and just go with product_category.id and then right join product_category.