Sum of different rows - sql

I am using an ERP that has a semi-"graphical" SQL-based report writer.
The way our software works is that products can be "Masters" or "Members" (or neither, but lets ignore that for now). Essentially a product can be the "parent" of a group or a "child" of another products group. So say you have the following table:
|---------------------|------------------|------------------|
| Product | ProductMaster | Quantity |
|---------------------|------------------|------------------|
| A | | 200 |
|---------------------|------------------|------------------|
| A1 | A | 50 |
|---------------------|------------------|------------------|
| A2 | A | 50 |
|---------------------|------------------|------------------|
| A3 | A | 25 |
|---------------------|------------------|------------------|
Product A is the Master, the others are members of A.
What I would like to do is have one row per product that combines the quantities of Masters and all their Members (in this case, one row with Quantity = 325), and I can't figure out how to do it. I can get the Sum of all the members, by putting Group By ProductMaster, but then I'm stuck.

Assuming that there is only one level of depth, you can use coalesce():
select coalesce(ProductMaster, Product) as ProductMaster, sum(Quantity)
from t
group by coalesce(ProductMaster, Product);
Note: This assumes that blank means NULL. If it means something else, then you'll want a CASE instead of COALESCE().

Related

How to check if two columns are consistent in a table. sql

I'm struggling to ask the question so I will just put an example table. Basically, if I have two columns with headings person and insured car, how can I check if the same person consistently insures the same brand of car.
------|------
|person|brand |
------|------
| 0 |Toyota|
| 0 |Mazda |
| 1 |Toyota|
| 1 |Toyota|
| 2 |Honda |
| 2 |Honda |
| 3 |Ford |
------|------
So basically in this table I want to filter out person 0 because he insures both Toyota's and Mazda's, however the other people exclusively insure one brand of a car.
Thanks.
If you just want the people and car, you can use aggregation:
select person, min(brand) as the_brand
from t
group by person
having min(brand) = max(brand);

Select rows from a filtered portion of Table A where a column matches a relationship with a column from the row in Table B that matches by ID

I want to get all rows in a table where one column matches a relationship with the value of the column in the row in a different table that has the same value of another column.
Concretely, I have two tables, orders and product_info that I'm accessing through Amazon Redshift
Orders
| ID | Date | Amount | Region |
=====================================
| 1 | 2019/4/1 | $120 | A |
| 1 | 2019/4/4 | $100 | A |
| 2 | 2019/4/2 | $50 | A |
| 3 | 2019/4/6 | $70 | B |
The partition keys of order are region and date.
Product Information
| ID | Release Date | Region |
| ---- | ------------ | ------ |
| 1 | 2019/4/2 | A |
| 2 | 2019/4/3 | A |
| 3 | 2019/4/5 | B |
The primary key of product information is id, and the partition key is region.
I want to get all rows from Orders in region A where the date of the row is greater than the release date value in product information for that ID.
So in this case it should return just one row,
| 1 | 2019/4/4 | $100 | A |
I tried doing
select *
from orders
INNER JOIN product_info ON orders.date>product_info.release_date
AND orders.id=product_info.id
AND orders.region=A
AND product_info.region=A
limit 10
The problem is that this query was absurdly slow (cancelled it after 10 minutes). The tables are extremely large, and I have a feeling it was scanning the entire table without restricting it to region first (in reality I have other filters in addition to region that I want to apply to the list of IDs before I do the inner join, but I've limited it to only region for the sake of simplifying the question).
How can I efficiently write this type of query?
The best way to make an SQL query faster is to exclude rows as soon as possible.
So, rather than putting conditions like orders.region=A in the JOIN statement, you should move them to a WHERE statement. This will eliminate rows before they are joined.
Also, make the JOIN condition as simple as possible so that the database can optimize the comparison.
Try something like this:
SELECT *
FROM orders
INNER JOIN product_info ON orders.id = product_info.id
WHERE orders.region = 'A'
AND product_info.region = 'A'
AND orders.date > product_info.release_date
Any further optimization would require consideration of the DISTKEY and SORTKEY on the Redshift tables. (Preferably a DISTKEY of id and a SORTKEY of date).

How to combine two tables allocating Sold amounts vs Demand without loops/cursor

My task is to combine two tables in a specific way. I have a table Demands that contains demands of some goods (tovar). Each record has its own ID, Tovar, Date of demand and Amount. And I have another table Unloads that contains unloads of tovar. Each record has its own ID, Tovar, Order of unload and Amount. Demands and Unloads are not corresponding to each other and amounts in demands and unloads are not exactly equal. One demand may be with 10 units and there can be two unloads with 4 and 6 units. And two demands may be with 3 and 5 units and there can be one unload with 11 units.
The task is to get a table which will show how demands are covering by unloads. I have a solution (SQL Fiddle) but I think that there is a better one. Can anybody tell me how such tasks are solved?
What I have:
------------------------------------------
| DemandNumber | Tovar | Amount | Order |
|--------------------------------|--------
| Demand#1 | Meat | 2 | 1 |
| Demand#2 | Meat | 3 | 2 |
| Demand#3 | Milk | 6 | 1 |
| Demand#4 | Eggs | 1 | 1 |
| Demand#5 | Eggs | 5 | 2 |
| Demand#6 | Eggs | 3 | 3 |
------------------------------------------
------------------------------------------
| SaleNumber | Tovar | Amount | Order |
|--------------------------------|--------
| Sale#1 | Meat | 6 | 1 |
| Sale#2 | Milk | 2 | 1 |
| Sale#3 | Milk | 1 | 2 |
| Sale#4 | Eggs | 2 | 1 |
| Sale#5 | Eggs | 1 | 2 |
| Sale#6 | Eggs | 4 | 3 |
------------------------------------------
What I want to receive
-------------------------------------------------
| DemandNumber | SaleNumber | Tovar | Amount |
-------------------------------------------------
| Demand#1 | Sale#1 | Meat | 2 |
| Demand#2 | Sale#1 | Meat | 3 |
| Demand#3 | Sale#2 | Milk | 2 |
| Demand#3 | Sale#3 | Milk | 1 |
| Demand#4 | Sale#4 | Eggs | 1 |
| Demand#5 | Sale#4 | Eggs | 1 |
| Demand#5 | Sale#5 | Eggs | 1 |
| Demand#5 | Sale#6 | Eggs | 3 |
| Demand#6 | Sale#6 | Eggs | 1 |
-------------------------------------------------
Here is additional explanation from author's comment:
Demand#1 needs 2 Meat and it can take them from Sale#1.
Demand#2 needs 3 Meat and can take them from Sale#1.
Demand#3 needs 6 Milk but there is only 2 Milk in Sale#3 and 1 Milk in Sale#4, so we show only available amounts.
And so on.
The field Order in the example determine the order of calculations. We have to process Demands according to their Order. Demand#1 must be processed before Demand#2. And Sales also must be allocated according to their Order number. We cannot assign eggs from sale if there are sales with eggs with lower order and non-allocated eggs.
The only way I can get this is using loops. Is it posible to avoid loops and solve this task only with t-sql?
If the Amount values are int and not too large (not millions), then I'd use a table of numbers to generate as many rows as the value of each Amount.
Here is a good article describing how to generate it.
Then it is easy to join Demand with Sale and group and sum as needed.
Otherwise, a plain straight-forward cursor (in fact, two cursors) would be simple to implement, easy to understand and with O(n) complexity. If Amounts are small, set-based variant is likely to be faster than cursor. If Amounts are large, cursor may be faster. You need to measure performance with actual data.
Here is a query that uses a table of numbers. To understand how it works run each query in the CTE separately and examine its output.
SQLFiddle
WITH
CTE_Demands
AS
(
SELECT
D.DemandNumber
,D.Tovar
,ROW_NUMBER() OVER (PARTITION BY D.Tovar ORDER BY D.SortOrder, CA_D.Number) AS rn
FROM
Demands AS D
CROSS APPLY
(
SELECT TOP(D.Amount) Numbers.Number
FROM Numbers
ORDER BY Numbers.Number
) AS CA_D
)
,CTE_Sales
AS
(
SELECT
S.SaleNumber
,S.Tovar
,ROW_NUMBER() OVER (PARTITION BY S.Tovar ORDER BY S.SortOrder, CA_S.Number) AS rn
FROM
Sales AS S
CROSS APPLY
(
SELECT TOP(S.Amount) Numbers.Number
FROM Numbers
ORDER BY Numbers.Number
) AS CA_S
)
SELECT
CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
,CTE_Demands.Tovar
,COUNT(*) AS Amount
FROM
CTE_Demands
INNER JOIN CTE_Sales ON
CTE_Sales.Tovar = CTE_Demands.Tovar
AND CTE_Sales.rn = CTE_Demands.rn
GROUP BY
CTE_Demands.Tovar
,CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
ORDER BY
CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
;
Having said all this, usually it is better to perform this kind of processing on the client using procedural programming language. You still have to transmit all rows from Demands and Sales to the client. So, by joining the tables on the server you don't reduce the amount of bytes that must go over the network. In fact, you increase it, because original row may be split into several rows.
This kind of processing is sequential in nature, not set-based, so it is easy to do with arrays, but tricky in SQL.
I have no idea what your requirements are or what the business rules are or what the goals are but I can say this -- you are doing it wrong.
This is SQL. In SQL you do not do loops. In SQL you work with sets. Sets are defined by select statements.
If this problem is not resolved with a select statement (maybe with sub-selects) then you probably want to implement this in another way. (C# program? Some other ETL system?).
However, I can also say there is probably a way to do this with a single select statement. However you have not given enough information for me to know what that statement is. To say you have a working example and that should be enough fails on this site because this site is about answering questions about problems and you don't have a problem you have some code.
Re-phrase the question with inputs, expect outputs, what you have tried and what your question is. This is covered well in the FAQ.
Or if you have working code you want reviewed, it may be appropriate for the code review site.
I see additional 2 possible ways:
1. for 'advanced' data processing and calculations you can use cursors.
2. you can use SELECT with CASE construction

How to Count the same field with different criteria on the same Query

I have a database like this
| Contact | Incident | OpenTime | Country | Product |
| C1 | | 1/1/2014 | MX | Office |
| C2 | I1 | 2/2/2014 | BR | SAP |
| C3 | | 3/2/2014 | US | SAP |
| C4 | I2 | 3/3/2014 | US | SAP |
| C5 | I3 | 3/4/2014 | US | Office |
| C6 | | 3/5/2014 | TW | SAP |
I want to run a query with criteria on country and and open time, and I want to receive back something like this:
| Product | Contacts with | Incidents |
| | no Incidents | |
| Office | 1 | 1 |
| SAP | 2 | 2 |
I can easily get one part to work with a query like
SELECT Service, count(
FROM database
WHERE criterias AND Incident is Null //(or Not Null) depending on the row
GROUP BY Product
What I am struggling to do is counting Incident is Null, and Incident is not Null on the same table as a result of the same query as in the example above.
I have tried the following
SELECT Service AS Service,
(SELECT count Contacts FROM Database Where Incident Is Null) as Contact,
(SELECT count Contacts FROM Database Where Incident Is not Null) as Incident
FROM database
WHERE criterias AND Incident is Null //(or Not Null) depending on the row
GROUP BY Product
The issue I have with the above sentence is that whatever criteria I use on the "main" select are ignored by the nested Selects.
I have tried using UNION ALL as well, but did not managed to make it work.
Ultimately I resolved it with this approach: I counted the total contacts per product, counted the numbers of incidents and added a calculated field with the result
SELECT Service, COUNT (Contact) AS Total, COUNT (Incident) as Incidents,
(Total - Incident) as Only Contact
From Database
Where <criterias>
GROUP BY Service
Although I make it work, I am still sure that there is a more elegant approach for it.
How can I retrieve the different counting on the same column with different count criteria in one query?
Just use conditional aggregation:
SELECT Product,
SUM(IIF(incident is not null, 1, 1)) as incidents,
SUM(IIF(incident is null, 1, 1)) as noincidents
FROM database
WHERE criterias
GROUP BY Product;
Possibly a very MS Access solution would suit:
TRANSFORM Count(tmp.Contact) AS CountOfContact
SELECT tmp.Product
FROM tmp
GROUP BY tmp.Product
PIVOT IIf(Trim([Incident] & "")="","No Incident","Incident");
This IIf(Trim([Incident] & "")="" covers all possibilities of Null string, Null and space filled.
tmp is the name of the table.

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .