Extract id along with count of other columns

Extract id along with count of other columns - sql

I have four columns in my table
CUSTOMER, TRANSACTION(UNIQUE) PRODUCTA PRODUCTB
Product A or Product B is either 0 or 1 depending on the item bought. Both are not equal to 1 as each row corresponds to a transaction and it is either A or B.
Now I want to extract data such that each customer is listed along with the count of no of product A purchases and product B purchases he made.
select customer,count(PRODUCTA),count(PRODUCTB) from rm_saicharan_final6 group by customer
Its returning all the count including the 0s.
CUSTOMER PRODUCTA PRODUCTB
-------- -------- ---------
32444 209 209
But I want only the count having value=1 not all

Just use SUM as follows:
select customer,SUM(PRODUCTA),SUM(PRODUCTB)
from rm_saicharan_final6 group by customer
SQLFiddle: http://sqlfiddle.com/#!4/ee7da/596

Not entirely sure if this will work but, if it doesnt, it may help
select customer,count(PRODUCTA),count(PRODUCTB) from rm_saicharan_final6 group by customer where PRODUCTA>0 and PRODUCTB>0

Related

founding the total revenue by aggregating

I want to produce a table with two columns in the form of (country, total_revenue)
This is how the relational model looks like,
Each entry in the table orderdetails can produce revenue where its in the form of = quantityordered(a column)* priceEach(also a column).
The revenue an order produces is the sum of the revenue from the orderdetails in the order, but only if the order's status is shipped. The two tables orderdetails and order are related by the column ordernumber.
An order has a customer number that references customer table and the customer table has country field. The total_country_revenue is the sum over all shipped orders for customers in a country.
so far I have tried first producing a table by using group by(using ordernumber or customer number?) to produce a table with columns orderdetails revenue and the customer number to join with customer and use group by again but I keep getting weird results.....
-orderdetails table-
ordernumber
quantityordered
price_each
1
10
2.39
1
12
1.79
2
12
1.79
3
12
1.79
-orders table-
ordernumber
status.
customer_num
1
shipped
11
1
shipped
12
2
cancelled
13
3
shipped
11
-customers table-
custom_num
country
11
USA
12
France
13
Japan
11
USA
-Result table-
country
total_revenue
11
1300
12
1239
13
800
11
739

Your description is a bit weird. You are writing that you want to build the sum per country, but in your table which should show the desired outcome, you didn't build a sum and you also don't show the country.
Furthermore, you wrote you want to exclude orders that don't have the status "shipped", but your sample outcome includes them.
This query will produce the outcome you have described in words, not that one you have added as a table:
SELECT c.country,
SUM(d.quantityordered * d.price_each) AS total_revenue
FROM
orders o
JOIN orderdetails d ON o.ordernumber = d.ordernumber
JOIN customers c ON o.customer_num = c.custom_num
WHERE o.status = 'shipped'
GROUP BY c.country;
As you can see, you will need to JOIN your tables and apply a GROUP BY country clause.
A note: You could remove the WHERE clause and add its condition to a JOIN. It's possible this will reduce the execution time of your query, but it might be less readable.
A further note: You could also consider to use a window function for that using PARTITION BY c.country. Since you didn't tag your DB type, the exact syntax for that option is unclear.
A last note: Your sample data looks really strange. Is it really intended an order should be counted as for France and for the USA the same time?
If the query above isn't what you were looking for, please review your description and fix it.

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?

The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

PostgreSql - How to create conditional column with the filter on another column?

I want to add 1 more columns where segment out whether the customer had sold at least one product or not.
Data example:
ProductID Customer Status
1 John Not sold
2 John Not Sold
3 John Sold
My expect result
ProductID Customer Status Sold_at_least_1
1 John Not sold Yes
2 John Not Sold Yes
3 John Sold Yes
4 Andrew Not Sold No
5 Andrew Not Sold No
6 Brandon Sold Yes
This is an example data. Sorry for any inconvenience as I unable to extract data out. Btw, appreciating for any helps.

You can do a window count of records of the same customer that have status = 'Sold' in a case expression:
select
t.*,
case when sum( (status = 'Sold')::int ) over(partition by customer) >= 1
then 'Yes'
else 'No'
end
from mytable
NB: note that this does not magically create new records (as shown in your sample data). This query gives you as many records in the resultset as there are in the table, with an additionnal column that indicates whether each cutsomer has at least one sold item in the table.
Here is a demo provided by VBokšić (thanks).

Another option is to use bool_or() as a window function. If you can live with a boolean column rather than a varchar with Yes/No, this makes the expression even simpler:
select productid, customer, status,
bool_or(status = 'Sold') over (partition by customer) as sold_at_least_one
from mytable;
Online example: https://rextester.com/NDN54253

SQL returning different rows based on certain conditions

I'm sure if this is possible in SQL but my combination of case statements and wheres aren't working. This is some test data in the shape I'm using..
It shows items in an Order. In this order, the customer has amended the order for pens and increased the amount to 15. So the original order item, id 123, is marked as superceded and a new item row is created, id 158, and the PreviousVersion column is populated with the previous items itemId. AmendedStatusId is the status of the amended item. So in the example ItemId 158 is the updated version of ItemId 123. And the extra pens haven't been paid as they are AwaitingApproval. I know it's not the best laid out data but it's what I've to work with.
What I'm trying to do is when the amended items haven't been paid to select the old item, so in this example return ItemIds 123 and 124. When AmendedStatusId of ItemId 123 is updated to Paid, I would want to return ItemId 124 and 158. Is this possible?
Thanks in advance :)

This sort of structure should get you started.
select isnull(paidItemId, unpaidItemId) itemId
from yourTables
left join (subquery to identify paid items) paidItems on something
left join (subquery to identify unpaid items) unpaidItems on something
etc

SQL query for child table summary and generalazation

I have 4 tables with diagram below
I want to summary query for the Institution table. where I want to get result of only,
InstitutionType ProductName Quantity
For example. sample data of institution table
Id Name Address InstitionTypeId
1 aaa ny132 1001
2 bbb dx23 1001
3 ccc bn33 1002
And the InstitionProduct is like that
Id ProductId Quantity InstitionId
1 1000 120 1
2 1000 100 2
3 1000 50 3
Then I want a query result to output total quantity of a given product by Instition Type wise. The sample output will look like this.
InstitutionTypeId productId quantity
1001 1000 220
1002 1000 50
So I want to group the institution by type and aggregate the product quantity of all institution type group.
I tried to use the group by clause, but with the product quantity not as a grouping element it results in error.

SELECT
Institution.InstitutionTypeID,
InstitutionProduct.ProductID,
SUM(InstitutionProduct.Quantity)
FROM
Institution
LEFT JOIN
InstitutionProduct
ON InstitutionProduct.InstitutionID = Institution.ID
GROUP BY
Institution.InstitutionTypeID,
InstitutionProduct.ProductID

If you are querying with group by you need to use either aggregate functions or group by all included fields. The reason is, that the 'group by' returns exactly one row per 'group by' value, so if you introduce an ungrouped field, this would conflict if the field has more than one value per grouping constraint. Even though this might not be the case for your dataset, the query engine cannot know this, and raises an error.
The solution is to introduce aggregates for all non-grouping field with aggregates being (among others): average (avg), summarize (sum), minimum (min) and maximum (max). This would lead to something like
SELECT i.InstitutionTypeID, i.Institution.ID, SUM(ip.Quantity)
FROM Institution I LEFT JOIN InstitutionProduct IP
ON IP.InstituationID = I.ID
GROUP BY i.InstitutionTypeID, i.Institution.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas