Cartesian product and WHERE clause issue

Cartesian product and WHERE clause issue - sql

Let us have simple table:
CREATE TABLE dbo.test
(
c1 INT
)
INSERT INTO test (c1) VALUES (1)
INSERT INTO test (c1) VALUES (2)
INSERT INTO test (c1) VALUES (3)
Next calculate some SUM:
SELECT SUM(t1.c1) FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
Output is: 6 . Simple and easy.
But if I run:
SELECT SUM(t1.c1), * FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
The output is:
6 2 2
6 2 3
6 2 1
6 3 2
6 3 3
6 3 1
6 1 2
6 1 3
6 1 1
My question is: Why the second output is not matching the condition in WHERE clause?

Looks like Sybase implements it's own extensions to GROUP BY:
Through the following extensions, Sybase lifts restrictions on what
you can include or omit in the select list of a query that includes
group by.
The columns in the select list are not limited to the grouping columns
and columns used with the vector aggregates.
The columns specified by group by are not limited to those
non-aggregate columns in the select list.
However, the results of the extension are not always intuitive:
When you use the Transact-SQL extensions in complex queries that
include the where clause or joins, the results may become even more
difficult to understand.
How does this relate to your problem?
However, the way that Adaptive Server handles extra columns in the
select list and the where clause may seem contradictory. For example:
select type, advance, avg(price)
from titles
where advance > 5000
group by type
type advance
------------- --------- --------
business 5,000.00 2.99
business 5,000.00 2.99
business 10,125.00 2.99
business 5,000.00 2.99
mod_cook 0.00 2.99
mod_cook 15,000.00 2.99
popular_comp 7,000.00 21.48
popular_comp 8,000.00 21.48
popular_comp NULL 21.48
psychology 7,000.00 14.30
psychology 2,275.00 14.30
psychology 6,000.00 14.30
psychology 2,000.00 14.30
psychology 4,000.00 14.30
trad_cook 7,000.00 17.97
trad_cook 4,000.00 17.97
trad_cook 8,000.00 17.97
(17 rows affected)
It only seems as if the query is ignoring the where clause when you
look at the results for the advance (extended) column. Adaptive Server
still computes the vector aggregate using only those rows that satisfy
the where clause, but it also displays all rows for any extended
columns that you include in the select list. To further restrict these
rows from the results, you must use a having clause.
So, to give you the results you would expect, Sybase should allow you to do:
SELECT SUM(t1.c1), * FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
HAVING t2.c1 = 1
The WHERE will exclude the results from the total SUM; the HAVING will hide records that don't match the condition.
Confusing, isn't it?
Instead, you'd probably be better off writing the query so that it doesn't require Sybase's GROUP BY extensions.

Related

Write SQL query with division, total and %

I'm writing test queries in MS SQL Server to test reports.
Can't figure out how to calculate following:
Ingredient_Cost_Paid / Total Ingredient_Cost_Paid * 100 as 'Ingredient Cost Allow as % of Total'
This is Ingredient cost allowable as a percentage of the total ingredient cost allowable.
P.S. I'm new to SQL, so would appreciate explanations as well, so I learn for the future. Thanks
Also I'm not sure I correctly understand difference between Total and SUM.
Thanks everyone

The single quote (') is used as a delimiter for textual values. If you use the AS keyword to specify a (column) alias, you need to use square brackets ([]) if it includes spaces and/or special characters:
Ingredient_Cost_Paid / Total_Ingredient_Cost_Paid * 100 as [Ingredient Cost Allow as % of Total]
Is that what you are looking for?
Edit: I noticed that it also works with single quotes! I didn't know that! But honestly, I would not use it. I'm not sure if it's officially considered to be valid.
Regarding the difference between "Total" and SUM, I would need to understand what you mean with "Total", since that is not something that SQL understands. You could probably use the SUM aggregate function to calculate a total. An aggregate function calculates a value based on a certain column/expression in groups of rows (or in the entire table as a whole single group). So you probably need to provide (much) more information in your question to get effective help with that.
Edit:
I would like to elaborate a little on this SQL issue for you. My apologies in advance for this rather lengthy post. ;)
For example, assume that all query logic described here applies to a table called Recipe_Ingredients, which contains rows with information about ingredients for various recipes (identified by the column Recipe_ID) and the price of the recipe ingredient (in a column called Ingredient_Cost_Paid).
The (simplified) table definition would look something like this:
CREATE TABLE Recipe_Ingredients (
Recipe_ID INT NOT NULL,
Ingredient_Cost_Paid NUMERIC NOT NULL
);
For testing purposes, I created this table in a test database and populated it with the following query:
INSERT INTO Recipe_Ingredients
VALUES
(12, 4.65),
(12, 0.40),
(12, 9.98),
(27, 5.35),
(27, 12.50),
(27, 1.09),
(27, 3.00),
(65, 2.35),
(65, 0.99);
You could select all rows from the table to view all data in the table:
SELECT
Recipe_ID,
Ingredient_Cost_Paid
FROM
Recipe_Ingredients;
This would yield the following results:
Recipe_ID Ingredient_Cost_Paid
--------- --------------------
12 4.65
12 0.40
12 9.98
27 5.35
27 12.50
27 1.09
27 3.00
65 2.35
65 0.99
You could group the rows based on corresponding Recipe_ID values. Like this:
SELECT
Recipe_ID
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID;
This will yield the following result:
Recipe_ID
---------
12
27
65
Not very spectacular, I agree. But you could ask the query to calculate values based on those groups as well. That's where aggregate functions like COUNT and SUM come into play:
SELECT
Recipe_ID,
COUNT(Recipe_ID) AS Number_Of_Ingredients,
SUM(Ingredient_Cost_Paid) AS Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID;
This will yield the following result:
Recipe_ID Number_Of_Ingredients Total_Ingredient_Cost_Paid
--------- --------------------- --------------------------
12 3 15.03
27 4 21.94
65 2 3.34
Introducing your percentage column is somewhat tricky. The calculation has to be performed on a rowset (a table or a query result) and cannot be expressed directly in a SUM.
You could specify the previous query as a subquery in the FROM-clause of another query (this is called a table expression) and join it with table Recipe_Ingredients. That way you combine the group data back with the detail data.
I will drop the Number_Of_Ingredients column from now on. It was just an example for the COUNT function, but you do not need it for your issue at hand.
SELECT
Recipe_Ingredients.Recipe_ID,
Recipe_Ingredients.Ingredient_Cost_Paid,
Subquery.Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
INNER JOIN (
SELECT
Recipe_ID,
SUM(Ingredient_Cost_Paid) AS Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID
) AS Subquery ON Subquery.Recipe_ID = Recipe_Ingredients.Recipe_ID;
This will yield the following results:
Recipe_ID Ingredient_Cost_Paid Total_Ingredient_Cost_Paid
--------- -------------------- --------------------------
12 4.65 15.03
12 0.40 15.03
12 9.98 15.03
27 5.35 21.94
27 12.50 21.94
27 1.09 21.94
27 3.00 21.94
65 2.35 3.34
65 0.99 3.34
With this, it is pretty easy to add your calculation for the percentage:
SELECT
Recipe_Ingredients.Recipe_ID,
Recipe_Ingredients.Ingredient_Cost_Paid,
Subquery.Total_Ingredient_Cost_Paid,
CAST(Recipe_Ingredients.Ingredient_Cost_Paid / Subquery.Total_Ingredient_Cost_Paid * 100 AS DECIMAL(8,1)) AS [Ingredient Cost Allow as % of Total]
FROM
Recipe_Ingredients
INNER JOIN (
SELECT
Recipe_ID,
SUM(Ingredient_Cost_Paid) AS Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID
) AS Subquery ON Subquery.Recipe_ID = Recipe_Ingredients.Recipe_ID;
Note that I also cast the percentage column values to type DECIMAL(8,1) so that you do not get values with large fractions. The above query yields the following results:
Recipe_ID Ingredient_Cost_Paid Total_Ingredient_Cost_Paid Ingredient Cost Allow as % of Total
--------- -------------------- -------------------------- -----------------------------------
12 4.65 15.03 30.9
12 0.40 15.03 2.7
12 9.98 15.03 66.4
27 5.35 21.94 24.4
27 12.50 21.94 57.0
27 1.09 21.94 5.0
27 3.00 21.94 13.7
65 2.35 3.34 70.4
65 0.99 3.34 29.6
As I said earlier, you will need to supply more information in your question if you need more specific help with your own situation. These queries and their results are just examples to show you what can be possible. Perhaps (and hopefully) this contains enough information to help you find a solution yourself. But you may always ask more specific questions, of course.

BigQuery INSERT SELECT results in random order of records?

I used standard SQL to insert data form one table to another in BigQuery using Jupyter Notebook.
For example I have two tables:
table1
ID Product
0 1 book1
1 2 book2
2 3 book3
table2
ID Product Price
0 5 book5 8.0
1 6 book6 9.0
2 4 book4 3.0
I used the following codes
INSERT test_data.table1
SELECT *
FROM test_data.table2
ORDER BY Price;
SELECT *
FROM test_data.table1
I got
ID Product
0 1 book1
1 3 book3
2 2 book2
3 5 book5
4 6 book6
5 4 book4
I expected it appears in the order of ID 1 2 3 4 5 6 which 4,5,6 are ordered by Price
It also seems that the data INSERT and/or SELECT FROM display records in a random order in different run.
How do I control the SELECT FROM output without including the 'Price' column in the output table in order to sort them?
And this happened when I import a csv file to create a new table, the record order is random when using SELECT FROM to display them.

The ORDER BY clause specifies a column or expression as the sort criterion for the result set.
If an ORDER BY clause is not present, the order of the results of a query is not defined.
Column aliases from a FROM clause or SELECT list are allowed. If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.
So, you most likely wanted something like below
SELECT *
FROM test_data.table1
ORDER BY Price DESC
LIMIT 100
Note the use of LIMIT - it is important part - If you are sorting a very large number of values, use a LIMIT clause to avoid resource exceeded type of error

Searching for non unique values between two columns

I have a single table with three columns, 'id', 'number' and 'transaction.' Each id should only be tied to one number however may exist many times in the table under different values of transaction. I've been unable to develop a query that will return cases of a single id sharing multiple numbers (and show the id and number in the report). I don't wish to delete these values via the query, I just need to see the values. See example below:
Here's a screenshot example: http://i591.photobucket.com/albums/ss355/riggins_83/table2_zps5509f3cf.jpg I appreciate the assistance, I've tried all the code posted here and it hasn't given me the output I'm looking for. As seen in the screenshot it's possible for the same ID number and Number to appear in the table multiple times with a different transaction number, what shouldn't occur is what's on rows 1 and 2 (two different numbers with same ID number). The ID number is a value that should always be tied to the same Number which the transaction is only linked to that line. I'm trying to generate output of each number that's sharing an ID number (and the shared ID Number if possible).
Test IDNumber Number Transaction
1 31 1551 5
2 31 1553 7
3 32 1701 8
4 33 1701 9
5 33 1701 10
6 33 1701 11
7 39 1885 12
The result of output I would need:
IDNumber Number
31 1551
31 1553
This output is showing me the Number (and ID number) in cases where an ID number is being shared between two (or possibly more) numbers. I know there are cases in the table where an ID number is being shared among many numbers.
Any assistance is greatly appreciated!

SELECT *
FROM thetable t0
WHERE EXISTS (
SELECT *
FROM thetable t1
WHERE t1.id = t0.id
-- Not clear from the question if the OP wants the records
-- to differ by number
-- AND t1.number <> t1.number
-- ... or by "transaction"
AND t1."transaction" <> t0."transaction"
-- ... or by either ???
-- AND (t1.number <> t1.number OR t1."transaction" <> t0."transaction")
);

SELECT IDNumber, Number
FROM YourTable
WHERE IDNumber IN (
SELECT IDNumber
FROM YourTable
GROUP BY IDNumber
HAVING COUNT(DISTINCT number) > 1
)
The subquery returns all the IDNumbers with more than 1 Number. Then the main query returns all the numbers for each of those IDNumbers.
DEMO

Performing Comparisons "down" a table, not across rows

I have a SQL problem that I don't have the vocabulary to explain very well.
Here is a simplified table. My goal is to identify groups where the Tax_IDs are not equal. In this case, the query should return groups 1 and 3.
Group Line_ID Tax_ID
1 1001 11
1 1002 13
2 1003 17
2 1004 17
3 1005 23
3 1006 29
I can easily perform comparisons across rows, however I do not know how to perform comparisons "down" a table (here is really where my vocabulary fails me). I.e. what is the syntax that will cause SQL to compare Tax_ID values within groups?
Any help appreciated,
OB

The simplest way is to use group by with a having clause:
select "group"
from t
group by "group"
having min(tax_id) <> max(tax_id);
You can also phrase the having clause as:
having count(distinct tax_id) > 1;
However, count(distinct) is more expensive than just a min() or max()operation.

Need to repeat and calculate values in a single Select statement

I hope that someone can help me with my issue. I need to create in a single SELECT statement (the system that we use has some pivot tables in Excel that handle one single SELECT) the following:
I have a INL (Invoice Lines) table, that has a lot of fields, but the important one is the date.
INL_ID DATE
19 2004-03-15 00:00:00.000
20 2004-03-15 00:00:00.000
21 2004-03-15 00:00:00.000
22 2004-03-16 00:00:00.000
23 2004-03-16 00:00:00.000
24 2004-03-16 00:00:00.000
Now, I also have a ILD (Invoice Line Details) that are related by an ID field to the INL table. From the second table I will need to use the scu_qty field to "repeat" values from the first one in my results sheet.
The ILD table values that we need are:
INL_ID scu_qty
19 1
20 1
21 1
22 4
23 4
Now, with the scu_qty I need to repeat the value of the first table and also add one day each record, the scu_qty is the quantity of days of the services that we sell in the ILD table.
So I need to get something like (i'm going to show the INL_ID 22 that you can see has a value different of 1 in the SCU_QTY). The results of the select has to give me something like:
INL_ID DATE
22 2004-03-15 0:00:00
22 2004-03-16 0:00:00
22 2004-03-17 0:00:00
22 2004-03-18 0:00:00
In this information I only wrote the fields that need to be repeated and calculated, of course I will need more fields, but will be repeated from the INL table, so I don't put them so you don't get confused.
I hope that someone can help me with this, it's very important for us this report. Thanks a lot in advance
(Sorry for my English, that isn't my first language)

SELECT INL_ID, scu_qty, CalculatedDATE ...
FROM INL
INNER JOIN ILD ON ...
INNER JOIN SequenceTable ON SequenceTable.seqNo <= ILD.scu_qty
ORDER BY INL_ID, SequenceTable.seqNo
Depending on your SQL flavour you will need to lookup date manipulation functions to do
CalculatedDATE = {INL.DATE + SequenceTable.seqNo (days)}

select INL.INL_ID, `DATE`
from
INL
inner join
ILD on INL.INL_ID = ILD.INL_ID
inner join (
select 1 as qty union select 2 union select 3 union select 4
) s on s.qty <= ILD.scu_qty
order by INL.INL_ID
In instead of that subselect you will need a table if quantity is a bit bigger. Or tell what is your RDBMS and there can be an easier way.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas