Invoice table hold invoices based on their line items - sql

I have a table that records the invoices within our company. I need to determine if there is multiple entries for a particular invoice number, and if there is then I need to look at the bin location. I need the results to look like a single invoice entry with all the qty, taxvalue, costvalue and discvalue summed together. The final output should only display a single line for each invoice with all values summed together.
Invoice | QtyInvoiced | TaxValue | CostValue | DiscValue | Warehouse | Bin
___________________________________________________________________________
1 1000 5.0 5.0 1.0 KT 23
1 500 5.0 5.0 1.0 KT Stage
2 1000 3.0 9.0 0.0 KT Stage
3 1000 5.0 5.0 1.0 KT 19
3 500 5.0 5.0 1.0 KT Stage
Results need to be:
Invoice | QtyInvoiced | TaxValue | CostValue | DiscValue | Warehouse | Bin
___________________________________________________________________________
1 1500 10.0 10.0 2.0 KT
2 1000 3.0 9.0 0.0 KT
3 1500 10.0 10.0 2.0 KT

This SHOULD get you what you want, you'll probably have to modify it slightly to fit as you didn't really provide a table structure.
SELECT invoice_nbr, SUM(qty) As TotalQty,
SUM(taxvalue) As TotalTaxValue,
SUM(costvalue) As TotalCostValue,
SUM(discvalue) As TotalDiscValue
FROM invoices
GROUP BY invoice_nbr
ORDER BY invoice_nbr;

Related

How to calculate mean value per group in Teradata SQL?

have table in Teradata SQL like below:
SMS_ID | PRODUCT
-------------------
11 | A
22 | A
33 | A
87 | B
89 | B
14 | C
Column "SMS_ID" presents ID of SMS sent do client
Column "PRODUCT" presents ID of product which was a subject of SMS
My question is: How can I calculate in Teradata SQL mean number of SMS per PRODUCT ?
As a result I need something like below:
AVG | PRODUCT
-------
0.5 | A -> because 3 / 6 = 0.5
0.33 | B -> because 2 / 6 = 033
0.16 | C -> because 1 / 6 = 0.16
You want fractions of the total count:
SELECT
product
,COUNT(*) -- count per product
/ CAST(SUM(COUNT(*)) OVER () AS FLOAT) -- total count = sum of counts per procuct
FROM yourTable
GROUP BY PRODUCT

Multi Level pivoting of dataframe

I have this dataframe:
Group
Feature 1
Feature 2
Class
First
5
4
1
Second
5
5
0
First
1
2
0
I want to do a multi level pivot in pandas to have something like this:
Group | Feature1 (class 1)| Feature (Class 2) | Feature2 (Class 1) | Feature 1(Class 2)
What if I want to select only one feature to work with?
Like this?
out = (df.assign(Class=df["Class"]+1)
.pivot(index="Group", columns="Class"))
print(out)
Feature 1 Feature 2
Class 1 2 1 2
Group
First 1.0 5.0 2.0 4.0
Second 5.0 NaN 5.0 NaN

Groupby with conditions

df = pd.DataFrame({'Category': ['A','B','B','B','C','C'],
'Subcategory': ['X','X','Y','Y','Z','Z'],
'Values': [1,2,3,4,5,6]})
which I use groupby to summarize -
`df.groupby('Category')['Values'].agg({np.size, np.mean, np.median})`
size mean median
Category
A 1 1.0 1.0
B 3 3.0 3.0
C 2 5.5 5.5
Objective: In addition to the above, show additional groupby by subcategory 'X' to create below output:
ALL Subcategory Only Subcategory 'X'
size mean median size mean median
Category
A 1 1.0 1.0 1 1 1
B 3 3.0 3.0 1 2 2
C 2 5.5 5.5 0 0 0
My solution currently is to create two groupby, to_frame() then pd.merge them. Is there a better way? Thanks!
df.groupby('Category')['Values'].agg({np.size, np.mean, np.median})
df[df['Subcategory']=='X'].groupby('Category')['Values'].agg({np.size, np.mean, np.median})

See if a customer had a purchase across every quarter and then graph

I have a dataframe that looks like this:
customer_id|date |sales_amount
479485 |20190120 | 500
479485 |20180320 | 200
472848 |20191020 | 100
This data has transaction information from 2016-2019. For each business quarter (grouped by 3 months) I want to see if a unique customer had a transaction. Basically I want the y-axis for the table to be each unique customer_id and then the x-axis of the table to be the 12 quarters in the time period of the data with a Boolean of whether or not a customer had a transaction in that quarter.
Ultimately I want to visualize this data to see the distribution of the transactions for each quarter across all the unique customers.
Expect output:
customer_id|2017- Q1 |2017- Q2|.. |2019- Q4
479485 |20190120 | 0 |.. | 1
469488 |20180320 | 0 |.. | 0
452848 |20191020 | 1 |.. | 1
I have changed the date column to datetime but am unsure how to group and proceed to the next step.
Solution:
df.groupby([df['customer_id'], df['date'].apply(lambda _: pd.Period(_, 'Q'))])['sales_amount'].count().unstack().fillna(0)
Output:
date 2017Q1 2018Q1 2019Q1 2019Q4
customer_id
469471 1.0 0.0 0.0 0.0
469488 0.0 1.0 1.0 1.0
472848 0.0 0.0 0.0 1.0
479485 1.0 1.0 1.0 0.0
Notes
Assumptions: (1) all of the year-quarters appear in you data set, and (2) there's only a single transaction per quarter.
To get around (1), set the index as date, and reindex with missing dates, filling nans with zero values. The above output is based on a sample of dummy data, hence only four quarters are shown.
To get around (2), run np.sign(_) over your output.

adding a new column to data frame

I'm trying do something that should be really simple in pandas, but it seems anything but. I have two large dataframes
df1 has 243 columns which include:
ID2 K. C type
1 123 1. 2. T
2 132 3. 1. N
3 111 2. 1. U
df2 has 121 columns which include:
ID3 A B
1 123 0. 3.
2 111 2. 3.
3 132 1. 2.
df2 contains different information about the same ID (ID2=ID3) but in different order
I wanted to create a new column in df2 named (type) and match the type column in df1. If it's the same ID to the one in df1, it should copy the same type (T, N or U) from df1. In another word, I need it to look like the following data frame butwith all 121 columns from df2+type
ID3 A B type
123 0. 3. T
111 2. 3. U
132 1. 2. N
I tried
pd.merge and pd.join.
I also tried
df2['type'] = df1['ID2'].map(df2.set_index('ID3')['type'])
but none of them is working.
it shows KeyError: 'ID3'
As far as I can see, your last command is almost correct. Try this:
df2['type'] = df2['ID3'].map(df1.set_index('ID2')['type'])
join
df2.join(df1.set_index('ID2')['type'], on='ID3')
ID3 A B type
1 123 0.0 3.0 T
2 111 2.0 3.0 U
3 132 1.0 2.0 N
merge (take 1)
df2.merge(df1[['ID2', 'type']].rename(columns={'ID2': 'ID3'}))
ID3 A B type
0 123 0.0 3.0 T
1 111 2.0 3.0 U
2 132 1.0 2.0 N
merge (take 2)
df2.merge(df1[['ID2', 'type']], left_on='ID3', right_on='ID2').drop('ID2', 1)
ID3 A B type
0 123 0.0 3.0 T
1 111 2.0 3.0 U
2 132 1.0 2.0 N
map and assign
df2.assign(type=df2.ID3.map(dict(zip(df1.ID2, df1['type']))))
ID3 A B type
0 123 0.0 3.0 T
1 111 2.0 3.0 U
2 132 1.0 2.0 N