pentaho - sharing out from one transformation to another - pentaho

I have 2 files. One with stock purchase data and another with stock sales data. purchase file has stockid, count, purchaseamount. Sales data stockid, date, noofstockssold and stocksoldamount. I am able to process the purchase details and found the avg price of stocks purchased. I want to use this avg price with the stocksales data to arrive at profit/loss. I am not able to find a way to pass the avg price. Can someone help ?

You'll need to use a Merge Join step to join the information of both files in one unique stream of data. It seems that the stockid would be the natural column to use to identify how to merge both files.
To use the Merge Join you previously need to sort the stream of data of both files by stockid, so previous to the Merge Join you need two Sort rows steps, one for each file.

Related

Remove duplicates from fact table to calculate measure correctly

I'm very new to data warehousing and dimensional modelling. For a uni project I started out with a database that I need to turn into a data warehouse for analyses. To end up with a clean star schema, I had to denormalize a few tables into 1 fact table. The downside to this is the amount of redundancy.
Below is a part of the data from the fact table:
A voyage consists of multiple shipments, and a shipment can consist of multiple different items. In this example, containers 1-2000 of shipment 1 contain item 3, and containers 2001-5000 contain item 1. The total amount of containers for this shipment is 5000, obviously. However, for data analysis purposes, I need to calculate a measure for the total amount of containers per voyage. This presents a problem with the current fact table, because I have a record for each different item. So for voyage 1, the actual total amount should be 9200, but because of the duplication I'll end up with 19400, leading to an incorrect measure.
I need to find a way to get rid of the duplicates in my calculation, but I can't find a way to do so. Any help would be much appreciated.
What you'll need to do is group by your shipments (CTE, inner query, temp table, etc) to get the number of containers per shipment, then group by your voyages to get the number of containers per voyage.
Here's an example with an inner query:
SELECT voyage_id, SUM(num_ship_containers) AS num_voyage_containers
FROM (
SELECT voyage_id, shipment_id, MAX(container_end) AS num_ship_containers
FROM ShippingWarehouse
GROUP BY voyage_id, shipment_id
) AS ship_data
GROUP BY voyage_id;
voyage_id
num_voyage_containers
1
9200
Try it out!

How do you make a SQL table reflect vertical records in a horizontal fashion

I have two tables, one has sales records and one has deduct records. There is a primary key that links the two tables called Sales_ID but there are multiple deduct records that belong to one sales record. I am trying to get the sales information and the deduct information on one line. See below for tables and the desired result.
Sales Table:
Deducts table:
Desired results:
SELECT SALES.SALES_ID, SALES.SALE_QTY, SALES.SALE_AMT
FROM SALES
LEFT JOIN DEDUCTS
ON SALES.SALES_ID = DEDUCTS.SALE_ID
I understand that If I add a deduct code in the select statement I will get duplicates, but I don't know what to try to avoid that.
thanks in advance!

stock table for inventory database design architecture

I am currently building a management system. Is it good practice to create a balance table for inventory to store the inventory at hand and constantly update the table if there are changes, or should one just directly query total inventory ordered table - total inventory used table? Which would be the most efficient and fastest way to do?
It is likely a bad idea to use two separate tables. You will have to perform a join which is unnecessary. Simply have one table with an 'ordered' column and a 'used' column. In your query you can very efficiently calculate the net value e.g. :
SELECT ordered, used, (ordered - used) as net FROM inventory

Crystal Report XI - Conditional Sum Issue

I have a table with data called InvoiceShipments. It continues a row for each product shipped on an invoice. Each product belongs to a product category, which I can query and filter by. Some of the products are finished good products with a Bill of Material, where the Bill of Materials (BOM) is a list of the parts that combine to make the finished good.
In the InvoiceShipments table, the finished good is listed with a price but no cost. It is then followed by the components (BOM) of that finished good, which in turn have a cost but no price. I have a separate table that lists all of the component items and which finished goods it goes to. Note that component goods can belong to more than one BOM.
I can currently filter the InvoiceShipments by the products that I want based on the product category (from a join to a different table). What I want to do is grab that finished good number, and get a list of all the part #s that make up that BOM, then go back to the InvoiceShipments and sum the costs for all of the rows that match those component #s and invoice#. But I haven't been using Crystal long enough to know what to do at the query level, what to do with a command table, what to do with a formula, etc.
Sample Screenshots:
Top table in the gallery is BOM table, second table is the InvoiceShipments, and the third is the desired outcome.
Any help would be appreciated.
From what I gather you want to combine the invoice number but use the finished product information. I've done something similar, solution is a little weird but it works. You only need your InvoiceShipments table
Group by invoice number
Create a formula for Order ID, SKU and ProductName
IF Price <> 0 AND Cost = 0 THEN
Orider ID '<-Change this according to the formula (SKU, ProductName)
ELSE
""
Insert-> Summary on each of the formula as Maximum and place it on the grouped invoice line.
Since the Quantity is constant you can put that field on the grouped invoice line.
Insert-> Summary on Price then on Cost using SUM and place it on the grouped invoice line.
Hide the details.
This should give you the result you need. What's happening is because your formula is only printing the finished goods information, the other items are blank. So when you use MAXIMUM, the non-blank items will print.
Hope this helps.
NEW SOLUTION
I don't have any tables or views that is setup like your data so I can't test this solution, but hopefully there is enough info that you can formulate a good solution
I noticed that you can't use the materials in the InvoiceShipments to idenfity the Finished Product in BillofMaterials. The Materials repeats itself. We'll have to identify them using the finished product.
Add in InvoiceShipments and rename it InvoiceShipments1 (when adding tables, the right side windows, right click on the table and rename.
Using select expert, isolate your finished products. (Price <> 0 and Cost = 0)
Database -> Database Expert. Add in your BillofMaterials table. Link SKU to ProductSKU. Left Outer Join
Now the materials are associated with an invoice number, We can try and link another copy of the INvoiceShipments to the BOM. This is tricky.
Database -> Database Expert. Add in your InvoiceShipments table rename to InvoiceShipments2. Link InvoiceShipments2.invoice# to InvoiceShipments1.Invoice#, and InvoiceShipments2.SKU to Material#. Use Left Outer Join
Create a formula that alternates between InvoiceShipments1 and InvoiceShipments2 on columns OrderID, SKU, and ProductName
IF ISNULL({InvoiceShipments1.OrderID}) THEN
{InvoiceShipments2.OrderID}
ELSE
{InvoiceShipments1.OrderID}
Create a formula combining InvoiceShipments2.invoice# and SKU (SKU formula version)
Group by formula in previous step (If the invoice contains 2 finished products, it will create 2 lines for one invoice.
On the GF put InvoiceShipments2.invoice#, OrderID (formula version), SKU (formula version), ProductNmae (formula version), InvoiceShipments2.quantity, Summerize(InvoiceShipments2.price), Summerize(InvoiceShipments2.cost)
Hide GH
Hope it works!

What is a fast way of joining two tables and using the first table column to "filter" the second table?

I am trying to develop a SQL Server 2005 query but I'm being unsuccessful at the moment. I trying every different approach that I know, like derived tables, sub-queries, CTE's, etc, but I couldn't solve the problem. I won't post the queries I tried here because they involve many other columns and tables, but I will try to explain the problem with a simpler example:
There are two tables: PARTS_SOLD and PARTS_PURCHASED. The first contains products that were sold to customers, and the second contains products that were purchased from suppliers. Both tables contains a foreign key associated with the movement itself, that contains the dates, etc.
Here is the simplified schema:
Table PARTS_SOLD:
part_id
date
other columns
Table PARTS_PURCHASED
part_id
date
other columns
What I need is to join every row in PARTS_SOLD with a unique row from PARTS_PURCHASED, chose by part_id and the maximum "date", where the "date" is equal of before the "date" column from PARTS_PURCHASED. In other words, I need to collect some information from the last purchase event for the item for every event of selling this item.
The problem itself is that I didn't find a way of joining the PARTS_PURCHASED table with PARTS_SOLD table using the column "date" from PARTS_SOLD to limit the MAX(date) of the PARTS_PURCHASED table.
I could have done this with a cursor to solve the problem with the tools I know, but every table has millions of rows, and perhaps using cursors or sub-queries that evaluate a query for every row would make the process very slow.
You aren't going to like my answer. Your database is designed incorrectly which is why you can't get the data back out the way you want. Even using a cursor, you would not get good data from this. Assume that you purchased 5 of part 1 on May 31, 2010. Assume on June 1, you sold ten of part 1. Matching just on date, you would match all ten to the May 31 purchase even though that is clearly not correct, some parts might have been purchased on May 23 and some may have been purchased on July 19, 2008.
If you want to know which purchased part relates to which sold part, your database design should include the PartPurchasedID as part of the PartsSold record and this should be populated at the time of the purchase, not later for reporting when you have 1,000,000 records to sort through.
Perhaps the following would help:
SELECT S.*
FROM PARTS_SOLD S
INNER JOIN (SELECT PART_ID, MAX(DATE)
FROM PARTS_PURCHASED
GROUP BY PART_ID) D
ON (D.PART_ID = S.PART_ID)
WHERE D.DATE <= S.DATE
Share and enjoy.
I'll toss this out there, but it's likely to contain all kinds of mistakes... both because I'm not sure I understand your question and because my SQL is... weak at best. That being said, my thought would be to try something like:
SELECT * FROM PARTS_SOLD
INNER JOIN (SELECT part_id, max(date) AS max_date
FROM PARTS_PURCHASED
GROUP BY part_id) AS subtable
ON PARTS_SOLD.part_id = subtable.part_id
AND PARTS_SOLD.date < subtable.max_date