I have a large table with sales data, useful data below:
RowID Date Customer Salesperson Product_Type Manufacturer Quantity Value
1 01-06-2004 James Ian Taps Tap Ltd 200 £850
2 02-06-2004 Apple Fran Hats Hats Inc 30 £350
3 04-06-2004 James Lawrence Pencils ABC Ltd 2000 £980
...
Many rows later...
...
185352 03-09-2012 Apple Ian Washers Tap Ltd 600 £80
I need to calculate a large set of targets from table containing values different types, target table is under my control and so far is like:
TargetID Year Month Salesperson Target_Type Quantity
1 2012 7 Ian 1 6000
2 2012 8 James 2 2000
3 2012 9 Ian 2 6500
At present I am working out target types using a view of the first table which has a lot of extra columns:
SELECT YEAR(Date)
, MONTH(Date)
, Salesperson
, Quantity
, CASE WHEN Manufacturer IN ('Tap Ltd','Hats Inc') AND Product_Type = 'Hats' THEN True ELSE False END AS IsType1
, CASE WHEN Manufacturer = 'Hats Inc' AND Product_Type IN ('Hats','Coats') THEN True ELSE False END AS IsType2
...
...
, CASE WHEN Manufacturer IN ('Tap Ltd','Hats Inc') AND Product_Type = 'Hats' THEN True ELSE False END AS IsType24
, CASE WHEN Manufacturer IN ('Tap Ltd','Hats Inc') AND Product_Type = 'Hats' THEN True ELSE False END AS IsType25
FROM SalesTable
WHERE [some stuff here]
This is horrible to read/debug and I hate it!!
I've tried a few different ways of simplifying this but have been unable to get it to work.
The closest I have come is to have a third table holding the definition of the types with the values for each field and the type number, this can be joined to the tables to give me the full values but I can't work out a way to cope with multiple values for each field.
Finally the question:
Is there a standard way this can be done or an easier/neater method other than one column for each type of target?
I know this is a complex problem so if anything is unclear please let me know.
Edit - What I need to get:
At the very end of the process I need to have targets displayed with actual sales:
Type Year Month Salesperson TargetQty ActualQty
2 2012 8 James 2000 2809
2 2012 9 Ian 6500 6251
Each row of the sales table could potentially satisfy 8 of the types.
Some more points:
I have 5 different columns that need to be defined against the targets (or set to NULL to include any value)
I have between 30 and 40 different types that need to be defined, several of the columns could contain as many as 10 different values
For point 2, if I am using a row for each permutation of values, 2 columns with 10 values each would give me 100 rows for each sales person for each month which is a lot but if this is the only way to define multiple values I will have to do this.
Sorry if this makes no sense!
If I am correct that the "Target_Type" field in the Target Table is based on the Manufacturer and the Product_Type, then you can create a TargetType table that looks like what's below and JOIN on Manufacturer and the Product_Type to get your Target_Type_Value:
ID Product_Type Manufacturer Target_Type_Value
1 Taps Tap Ltd 1
2 Hats Hats Inc 2
3 Coats Hats Inc 2
4 Hats Caps Inc 3
5 Pencils ABC Ltd 6
This should address the "multiple values for each field" problem by having a row for each possibility.
Related
Good morning!
I am very new to pandas/python. I mainly use SQL and SSIS for my current ETL, but for a new data source it requires tedious manual reformatting in excel. I am trying to learn python to save hours of manual work. The data on the report is extremely redundant. I have spent days trying to phrase what I need in a way that returns the information I need, but to no avail.
I can't use my actual data because it contains PHI, so I will give an analogous example using Clients and Orders. An external system generates a 'MonthlyOrders.xls' report. There is pretty much ZERO flexibility in the export format. The .xls file extension gives you an idea about how dated the source environment is. First, I loaded the data to a data frame and split it down into smaller data frames by "Group". So each df represents one group. This is what it looks like after that:
General Format:
index
Name/Date
ID/Item
Price/ 'P'
Billed/'NaaN'
PaidOn/Seller
Total/Dept
1
ClientName
Client ID
'P'
Date Billed
Pmt_received_On
Order Total
2
Order Date
item name
item price
'NaaN'
sold by
dept
3
Same order date
2nd item
price
'Naan'
sold by
dept
4
NextClientName
NextID
'P'
Date Billed
Pmt_received_On
Order Total
Example of Data:
Index
Name/Date
ID/Item
Price/ 'P'
Billed/'NaaN'
PaidOn/Seller
Total/Dept
1
Victim, One
VO100
'P'
08/12/2021
08/13/2021
78
2
08/11/2021
books
12
'NaaN'
Mrs. White
The Study
3
08/11/2021
Rope
56
'Naan'
Mrs. White
The Study
4
08/11/2021
Pens
10
'NaaN'
Mrs. White
The Study
5
Second, Dead
SD123
'P'
08/18/2021
08/20/2021
250
6
08/17/2021
Pool Cue
198
'NaaN'
Mr. Green
Billiard Room
7
08/17/2021
Knife
52
'Naan'
Mr. Green
Billiard Room
What I want to do is create a multi-level index using Client Name and Client ID, OrderDate.
Maybe could I put the Name:ID as a dictionary and use that as the first level of index and then the date would be the next level. I am not sure if I can do that.
Or, I want to split the first and second columns into four columns (Name, ID, orderDate, Item). I do not use the 'Order Total' column. The data goes into a Billing_Import staging table, and then I is further manipulated and transformed in the Data Warehouse. The destination table has the following structure:
RecID
Group
ClientName
OrderDate
ClientID
ItemID
desc
ChgAmt
pmtAmt
seller
dept
The 'RecID' is added in SSIS, and 'Item ID' split from the 'Desc' column after import with SQL. I plan to add a "Group" column back into each data frame so I know which data belongs to which group. Right now the groups are in separate data frames.
The 'Department' will always be the same for an order. There almost always only one 'Seller', but if there were 2 sellers on one order another record would be added.
The format I want would look like this:
Group
ID
Name
OrderDate
ItemDesc
Charge
Pmt
Seller
Department
Group1
ID1a
Name1a
1/1/2021
item1
$x
$y
Ms. Scarlet
The Lounge
item2
$x
$y
Ms. Scarlet
item3
$x
$y
Ms. Scarlet
ID2a
Name2b
1/15/2021
item1
$x
$y
Mrs. Peacock
The Kitchen
item2
$x
$y
Mrs. Peacock
Group2
ID2a
Name2a
1/22/2021
item1
$x
$y
Wadsworth
The Cellar
item2
$x
$y
Wadsworth
ID2a
Name2a
1/22/2021
item1
$x
$y
Col. Mustard
The Cellar
Any and all suggestions are greatly appreciated!
Kindest Regards,
Cori
The first step would be to separate columns into a format similar to:
index
name
date
ID
item
price
billed
paid_on
seller
total
dept
1
John
10/20/21
1234
socks
10
10/12/21
10/20/21
James
10
garments
Once this step is complete, you can create your Multi-Index with:
df_muti_index = df.set_index(['ID', 'name', 'date'])
Right now I've got a Main table in which I am uploading data. Because the Main table has many different duplicates, I Append various data out of the Main table into other tables such as, username, phone number, and locations in order to keep things optimized. Once I have everything stripped down from the Main table, I then append what's left into a final optimized Main table. Before this happens though, I run a select query joining all the stripped tables with the original Main table in order to connect the IDs from each table, with the correct data. For example:
Original Main Table
--Name---------Number------Due Date-------Location-------Charges Monthly-----Charges Total--
John Smith 111-1111 4/3 Chicago 234.56 500.23
Todd Jones 222-2222 4/3 New York 174.34 323.56
John Smith 111-1111 4/3 Chicago 274.56 670.23
Bill James 333-3333 4/3 Orlando 100.00 100.00
This gets split into 3 tables (name, number, location) and then there is a date table with all the dates for the year:
Name Table Number Table Location Table Due Date Table
--ID---Name------ -ID--Number--------- ---ID---Location---- --Date---
1 John Smith 1 111-1111 1 Chicago 4/1
2 Todd Jones 2 222-2222 2 New York 4/2
3 Bill James 3 333-3333 3 Orlando 4/3
Before The Original table gets stripped, I run a select query that grabs the ID from the 3 new tables, and joins them based on the connection they have with the original Main table.
Select Output
--Name ID----Number ID---Location ID---Due Date--
1 1 1 4/3
2 2 2 4/3
1 1 1 4/3
3 3 3 4/3
My issue comes when I need to introduce a new table that isn't able to be tied into the Original Main Table. I have an inventory table that, much like the original Main table, has duplicates and needs to be optimized. I do this by creating a secondary table that takes all the duplicated devices out and put them in their own table, and then strips the username and number out and puts them into their tables. I would like to add the IDs from this new device table into the select output that I have above. Resulting in:
Select Output
--Name ID----Number ID---Location ID---Due Date--Device ID---
1 1 1 4/3 1
2 2 2 4/3 1
1 1 1 4/3 2
3 3 3 4/3 1
Unlike the previous tables, the device table has no relationship to the originalMain Table, which is what is causing me so much headache. I can't seem to find a way to make this happen...is there anyway to accomplish this?
Any two tables can be joined. A table represents an application relationship. In some versions (not the original) of Entity-Relationship Modelling (notice that the "R" in E-R stands for "(application) relationship"!) a foreign key is sometimes called a "relationship". You do not need other tables or FKs to join any two tables.
Explain, in terms of its column names and the values for those names, exactly when a row should turn up in the result. Maybe you want:
SELECT *
FROM the stripped-and-ID'd version of the Original AS o
JOIN the stripped-and-ID'd version of the Device AS d
USING NameID, NumberID, LocationID and DueDate
Ie
SELECT *
FROM the stripped-and-ID'd version of the Original AS o
JOIN the stripped-and-ID'd version of the Device AS d
ON o.NameID=d.NameId AND o.NumberID=d.NumberID
AND o.LocationID=d.LocationID AND o.DueDateID=d.DueDate.
Suppose p(a,...) is some statement parameterized by a,... .
If o holds the rows where o(NameID,NumberID,LocationID,DueDate) and d holds the rows where d(NameID,NumberID,LocationID,DueDate,DeviceID) then the above holds the rows where o(NameID, NumberID, LocationID, DueDate) AND d(NameID,NumberID,LocationID,DueDate,DeviceID). But you really have not explained what rows you want.
The only way to "join" tables that have no relation is by unioning them together:
select attribute1, attribute2, ... , attributeN
from table1
where <predicate>
union // or union all
select attribute1, attribute2, ... , attributeN
from table2
where <predicate>
the where clauses are obviously optional
EDIT
optionally you could join the tables together by stating ON true which will act like a cross product
I need A Function or a Trigger to solve this Problem??
customer_details :::
custid name creditid
----------------------------
2 a 1
3 b 2
4 c 3
balance_amount :::
creditid credit_type balance
-----------------------------------
1 rent 1000
1 transport 2000
1 food 1000
1 fruits 1500
2 rent 1500
2 transport 1020
2 food 1200
2 fruits 1000
3 transport 1600
3 rent 2000
3 food 1540
3 fruits 1560
Pay_the_loan :::
creditid credit_type Pay status
---------------------------------------------
1 rent 500 null
2 fruits 600 null
3 transport 400 null
1 fruits 500 null
once i update the status column in pay_the_loan table to ok for a particular creditid i.e..,
(update pay_the_loan set status='ok' where creditid=2)
then it should deduct the amount from the balance column in balance_amount table and it should be updated i.e..,(1000-600=400 in balance_amount table where balance_amount.credit_type=fruits and creditid=2 from balance amount table)
Possible post me a Function or a Trigger to solve this problem ?
You're probably better off restructuring a little, creating a loan_amount table with the original loans, and just make balance_amount a view that shows the current balance with all payments deducted. That way, for example, a correction in a payment amount will not make your system show the wrong balance.
A view that calculates the current balance for you could look something like;
CREATE VIEW balance_amount AS
SELECT la.creditid, la.credit_type, amount
- COALESCE(SUM(pay),0) balance
FROM loan_amount la
LEFT JOIN pay_the_loan pl
ON la.creditid = pl.creditid
AND la.credit_type = pl.credit_type
AND pl.status = 'OK'
GROUP BY la.creditid, la.credit_type, la.amount;
An SQLfiddle with the whole setup.
You can update both tables in the same query:
update
pay_the_loan
set
pay_the_loan.status='ok',
balance_amount.balance=balance_amount.balance-pay_the_loan.Pay
from balance_amount
where
pay_the_loan.creditid=balance_amount.creditid
and
pay_the_loan.creditid=2
and
balance_amount.credit_type='fruits';
Update. I've read the documentation of postgresql update statement. Apparently, I was wrong and it is not possible to update two tables in a single query. However, I still think that you do not need a trigger here. Just use two update queries one after another:
update
pay_the_loan
set
status='ok'
where
pay_the_loan.creditid=2;
update
balance_amount
set
amount=balance_amount.amount-pay_the_loan.Pay
from pay_the_loan
where
pay_the_loan.creditid=balance_amount.creditid
and
pay_the_loan.creditid=2
and
balance_amount.credit_type='fruits';
In a unique table, I have multiple lines with the same reference information (ID). For the same day, customers had drink and the Appreciation is either 1 (yes) or 0 (no).
Table
ID DAY Drink Appreciation
1 1 Coffee 1
1 1 Tea 0
1 1 Soda 1
2 1 Coffee 1
2 1 Tea 1
3 1 Coffee 0
3 1 Tea 0
3 1 Iced Tea 1
I first tried to see who appreciated a certain drink, which is obviously very simple
Select ID, max(appreciation)
from table
where (day=1 and drink='coffee' and appreciation=1)
or (day=1 and drink='tea' and appreciation=1)
Since I am not even interested in the drink, I used max to remove duplicates and keep only the lane with the highest appreciation.
But what I want to do now is to see who in fact appreciated every drink they had. Again, I am not interested in every lane in the end, but only the ID and the appreciation. How can I modify my where to have it done on every single ID? Adding the ID in the condition is also not and option. I tried switching or for and, but it doesn't return any value. How could I do this?
This should do the trick:
SELECT ID
FROM table
WHERE DRINK IN ('coffee','tea') -- or whatever else filter you want.
group by ID
HAVING MIN(appreciation) > 0
What it does is:
It looks for the minimum appreciation and see to it that that is bigger than 0 for all lines in the group. And the group is the ID, as defined in the group by clause.
as you can see i'm using the having clause, because you can't have aggregate functions in the where section.
Of course you can join other tables into the query as you like. Just be carefull not to add some unwanted filter by joining, which might reduce your dataset in this query.
I cant seem to group by multiple data fields and sum a particular grouped column.
I want to group Person to customer and then group customer to price and then sum price. The person with the highest combined sum(price) should be listed in ascending order.
Example:
table customer
-----------
customer | common_id
green 2
blue 2
orange 1
table invoice
----------
person | price | common_id
bob 2330 1
greg 360 2
greg 170 2
SELECT DISTINCT
min(person) As person,min(customer) AS customer, sum(price) as price
FROM invoice a LEFT JOIN customer b ON a.common_id = b.common_id
GROUP BY customer,price
ORDER BY person
The results I desire are:
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue,$170
The colors are the customer, that GREG and Bob handle. Each color has a price.
There are two issues that I can see. One is a bit picky, and one is quite fundamental.
Presentation of data in SQL
SQL returns tabular data sets. It's not able to return sub-sets with headings, looking something a Pivot Table.
The means that this is not possible...
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue, $170
But that this is possible...
Bob, Orange, $2230
Greg, Green, $360
Greg, Blue, $170
Relating data
I can visually see how you relate the data together...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 360 2
blue 2 greg 170 2
orange 1 bob 2330 1
But SQL doesn't have any implied ordering. Things can only be related if an expression can state that they are related. For example, the following is equally possible...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 170 2 \ These two have
blue 2 greg 360 2 / been swapped
orange 1 bob 2330 1
This means that you need rules (and likely additional fields) that explicitly state which customer record matches which invoice record, especially when there are multiples in both with the same common_id.
An example of a rule could be, the lowest price always matches with the first customer alphabetically. But then, what happens if you have three records in customer for common_id = 2, but only two records in invoice for common_id = 2? Or do the number of records always match, and do you enforce that?
Most likely you need an extra piece (or pieces) of information to know which records relate to each other.
you should group by using all your selected fields except sum then maybe the function group_concat (mysql) can help you in concatenating resulting rows of the group clause
Im not sure how you could possibly do this. Greg has 2 colors, AND 2 prices, how do you determine which goes with which?
Greg Blue 170 or Greg Blue 360 ???? or attaching the Green to either price?
I think the colors need to have unique identofiers, seperate from the person unique identofiers.
Just a thought.