Multiple entries with the same reference in a table with SQL - sql

In a unique table, I have multiple lines with the same reference information (ID). For the same day, customers had drink and the Appreciation is either 1 (yes) or 0 (no).
Table
ID DAY Drink Appreciation
1 1 Coffee 1
1 1 Tea 0
1 1 Soda 1
2 1 Coffee 1
2 1 Tea 1
3 1 Coffee 0
3 1 Tea 0
3 1 Iced Tea 1
I first tried to see who appreciated a certain drink, which is obviously very simple
Select ID, max(appreciation)
from table
where (day=1 and drink='coffee' and appreciation=1)
or (day=1 and drink='tea' and appreciation=1)
Since I am not even interested in the drink, I used max to remove duplicates and keep only the lane with the highest appreciation.
But what I want to do now is to see who in fact appreciated every drink they had. Again, I am not interested in every lane in the end, but only the ID and the appreciation. How can I modify my where to have it done on every single ID? Adding the ID in the condition is also not and option. I tried switching or for and, but it doesn't return any value. How could I do this?

This should do the trick:
SELECT ID
FROM table
WHERE DRINK IN ('coffee','tea') -- or whatever else filter you want.
group by ID
HAVING MIN(appreciation) > 0
What it does is:
It looks for the minimum appreciation and see to it that that is bigger than 0 for all lines in the group. And the group is the ID, as defined in the group by clause.
as you can see i'm using the having clause, because you can't have aggregate functions in the where section.
Of course you can join other tables into the query as you like. Just be carefull not to add some unwanted filter by joining, which might reduce your dataset in this query.

Related

Select top using SQL Server returns different output than select *

I tried to get select top n data from a database based on alphabetical & numbering format. The output must order by alphabet first and number after that.
When I try to get all data (select *), I get the correct output:
select nocust, share
from TB_STOCK
where share = ’BBCA’
and concat(share, nocust) < ‘ZZZZZZZZ’
order by
case when nocust like ‘[a-z]%’ then 0 else 1 end
nocust | share
-------+--------
a522 | BBCA
b454 | BBCA
k007 | BBCA
p430 | BBCA
q797 | BBCA
s441 | BBCA
s892 | BBCA
u648 | BBCA
v107 | BBCA
4211 | BBCA
6469 | BBCA
6751 | BBCA
But when I try to select top n (ex : top 5), I get different output than expected (not like select * from table) :
select top 5 nocust, share
from TB_STOCK
where share = ’BBCA’
and concat(share, nocust) < ‘ZZZZZZZZ’
order by
case when nocust like ‘[a-z]%’ then 0 else 1 end
nocust | share
-------+--------
k007 | BBCA
b454 | BBCA
a522 | BBCA
p430 | BBCA
q797 | BBCA
I expect the mistake is somewhere between the concat and order by, can someone tell me how to get the right top 5 output like :
nocust | share
-------+--------
a522 | BBCA
b454 | BBCA
k007 | BBCA
p430 | BBCA
q797 | BBCA
You have a very strange ORDER BY - it only makes sure entries with a letter at the beginning are ordered before those which have a number in the beginning - but you're NOT actually ordering by the values itself. No specific ORDER BY means: there's no guarantee as to how the rows will be ordered - as you're seeing here.
You need to adapt your ORDER BY to:
ORDER BY
CASE WHEN nocust LIKE '[a-z]%' THEN 1 ELSE 0 END,
nocust
NOW you're actually ordering by nocust - and now, I'm pretty sure, the outputs will be identical
Your ORDER BY is not a stable sort; it sorts data broadly into one of two categories but doesn't specify in enough detail how items are to then be sorted within the category. This means in the TOP 5 form sqlserver is free to choose a data access strategy that means it can easily stop after it has found 5 rows whose data is such that the case when returns 0
Suppose you have this output from SELECT * ... ORDER BY Category
Category, Thing
Animal, Cat
Animal, Dog
Animal, Goat
Vegetable, Potato
Vegetable, Turnip
Vegetable, Swede
There is absolutely no guarantee that if you do a SELECT TOP 2 * ... ORDER BY category that you will get "Cat, Dog" in that order. You could reasonably get "Goat, Dog" today and "Cat, Goat" tomorrow, when SQL server has shuffled its indexes around after new data was added. The only thing you can guarantee with a top 2 order by category is that, so long as there are at least two animals in the db, and there is no new category that is alphabetically earlier than "animal" you'll get two animals
Is it this way because an optimization of TOP N means that sqlserver can stop early once it has N rows that meet the criteria; it doesn't need to access and sort a million rows if it already found 5 rows that have a category that would be first in the sort. Let's imagine it can know the distinct values and the count of those values in the column as part of its statistics, it can sort those distinct values to know which ones will come first then go and find any 5 random rows that have a value that will sort first, and return them. Essentially sql server may think "I know I have 3 'animal', and animals come before everything else, and the user wants 2. I'll just start reading rows and stop after I get 2 animals" rather than "I'll read every Thing, sort all million of them on category, then take the first 2 rows"
This could be hugely faster than sorting a million rows then plucking the first X
To get repeatable results each time you have to make the sort stable by specify sort conditions that guarantee the Thing within the Category, will be sorted right the way down to where there is no ambiguity
Add more columns to your order by so that every row has a guaranteed place in the overall ordering and then your sort will be stable and TOP N will return the same rows each time. To make a sort stable the collection of columns you sort by has to have a unique combination of values. You could sort by 20 columns but if there are any rows where all 30 of those columns have identical values (and differentiation only occurs on the 21st value, which you don't order by) then the sort order isn't guaranteed
I am trying to answer this in different perspective.
First it should be clear that Optimizer make the best possible plan quickly.
Optimizer select index or do not select index in most cost effective manner.
I am using Adventure 2016 database and Production.Product has 504 rows.
select [Name],ProductNumber from Production.Product
order by [Name]
It sort the rows as expected.
select top 5 [Name],ProductNumber from Production.Product
order by [Name]
It sort the rows as expected.
If I use case statement in Order
select [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end
It sort the record as intended. All 504 rows are process.
If I use less than equal to 20% of total rows in Top like
select Top 5 [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end
Then it pick first n records and display n record quickly.
Sorting was not as expected.
If I use more 20% of total rows in Top like
select Top (101) [Name],ProductNumber from Production.Product
order by case when [name] like '[a]%' then 1 else -1 end
It will process all 504 rows and sort accordingly.
Sorting result is as expected.
In all above case Clustered Index Scan (Product id) is done.
In this example [Name]and ProductNumber are two different non clustered index.
But it was not selected.
You can do this,
;With CTE as(
select nocust, share ,
case when nocust like ‘[a-z]%’ then 0 else 1 end SortCol
from TB_STOCK
where share = ’BBCA’
and concat(share, nocust) < ‘ZZZZZZZZ’
)
select top 5 * from CTE
order by SortCol

sum not calculating correct no. of units in SQL command

I have the following SQL script(of which the result is displayed under the script). The issue I am having is that I need to add up the quantity on the invoice. The quantity works fine when all the products on the invoice are different. When there is a product that appears twice on the invoice, the result is incorrect. Any help appreciated.
The DISTINCT keyword acts on all columns you select.
A new product introduces a difference which makes it no longer distinct. Hence the extra row(s).
Where you had:
Order Product Total
1 Toaster $10
2 Chair $20
And another item is added to order 1:
Order Product Total
1 Toaster $99
1 Balloon $99 -- Yes that's a $89 balloon!
2 Chair $20
The new row (balloon) is distinct and isn't reduced into the previous row (toaster).
To make is distinct again, don't select the product name:
Order Total
1 $99
2 $20
Uniqueness kicks in and everyone's happy!
If you can remove the column from the select list that's "different", you should get the results you need.

SQL Server query related to grouping and counting for inventory analysis

My table is concerned with inventory items, stocking levels and required numbers. An example is given below:
Part Sub Part Pieces Required Pieces In Stock
Barbie Legs 2 1000
Barbie Arms 2 5000
Barbie Head 1 20
Barbie Torso 1 40000
Dora Legs 2 1000
Dora Arms 2 5000
Dora Head 1 0
Dora Torso 1 40000
I want my end result to look like:
Part No: of dolls that can be built
Barbie 20
Dora 0
So the logic is we need a minimum number of each part to make a complete doll. If even one of the required parts is not in stock, then no doll can be made. The complexity comes when we need 2 of certain parts and only 1 of other parts.
How do I achieve this using SQL Server?
Thank you in advance for your help and time.
SELECT part, MIN([Pieces In Stock]/[Pieces Required]) AS PossibleCompleteDolls
FROM tblName
WHERE [Pieces Required] <> 0
GROUP BY part;
The WHERE clause here is just to prevent division by zero.
First add a column of how many can be build with each piece
SELECT *, CAST([Pieces In Stock]/[Pieces Required] as INTEGER) AS canmake
FROM TABLE
Then group that by the part and take the min
SELECT Part, min(canmake) as [No: of dolls that can be built]
FROM (
SELECT *, CAST([Pieces In Stock]/[Pieces Required] as INTEGER) AS canmake
FROM TABLE
WHERE [Pieces Required] != 0
) T
GROUP BY Part

Counting number of occurences of tuples in an m:n relationship

I'd like to know if there's an efficient way to count the number of occurences of a permutation of entities from one side of the m:n relationship. Hopefully, the next example will illustrate properly what I mean:
Let's imagine a base with people and events of some sort. People can organize multiple events and events can be organized by more than one person. What i'd like to count is whether a certain tuple of people have already organized an event or if it's their first time. My first idea to do this is to add an attribute to the m:n relationship
PeopleID | EventID | TimesOrganized
100 1 1
200 1 1
300 2 1
400 3 1
Now, there's an event no. 4 that's again organized by persons 200 and 100 (let's say they should be added in that order). The new table should look like:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
Now, if I added an event organized by persons 200 and 300 it would look like this:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
200 5 1
300 5 1
How would I go about keeping the third column updated properly and what are my options?
I should also add that this a part of the larger project we have for one of the classes and we'll be implementing an application that uses the database in some way, so I might as well move this to application logic if there's no easy way.
I wouldn't recommend tracking a TimesOrganized column as you suggest.
You can simple query it as needed using a COUNT(EventId)..GROUP BY PeopleID.
If you do feel you need to maintain the value somewhere it probably is better normalized to the (presumed) table People. Something like People.TimesOrganized. But then you have to increment it as you go instead of just recalculating as needed.
If you want to count how many many time someone have organized an event the problem is not m:n, but 1:m. Just count the event grouped by the people, that's it, you don't really need to have that column in the table, if it's not needed a lot of time.
That said I find you table a little confusing, there are detail and aggregation mixed, the third one downright wrong: the PeopleID 200 had organized 3 event and the 300 have 2 event.

SELECT datafields with multiple groups and sums

I cant seem to group by multiple data fields and sum a particular grouped column.
I want to group Person to customer and then group customer to price and then sum price. The person with the highest combined sum(price) should be listed in ascending order.
Example:
table customer
-----------
customer | common_id
green 2
blue 2
orange 1
table invoice
----------
person | price | common_id
bob 2330 1
greg 360 2
greg 170 2
SELECT DISTINCT
min(person) As person,min(customer) AS customer, sum(price) as price
FROM invoice a LEFT JOIN customer b ON a.common_id = b.common_id
GROUP BY customer,price
ORDER BY person
The results I desire are:
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue,$170
The colors are the customer, that GREG and Bob handle. Each color has a price.
There are two issues that I can see. One is a bit picky, and one is quite fundamental.
Presentation of data in SQL
SQL returns tabular data sets. It's not able to return sub-sets with headings, looking something a Pivot Table.
The means that this is not possible...
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue, $170
But that this is possible...
Bob, Orange, $2230
Greg, Green, $360
Greg, Blue, $170
Relating data
I can visually see how you relate the data together...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 360 2
blue 2 greg 170 2
orange 1 bob 2330 1
But SQL doesn't have any implied ordering. Things can only be related if an expression can state that they are related. For example, the following is equally possible...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 170 2 \ These two have
blue 2 greg 360 2 / been swapped
orange 1 bob 2330 1
This means that you need rules (and likely additional fields) that explicitly state which customer record matches which invoice record, especially when there are multiples in both with the same common_id.
An example of a rule could be, the lowest price always matches with the first customer alphabetically. But then, what happens if you have three records in customer for common_id = 2, but only two records in invoice for common_id = 2? Or do the number of records always match, and do you enforce that?
Most likely you need an extra piece (or pieces) of information to know which records relate to each other.
you should group by using all your selected fields except sum then maybe the function group_concat (mysql) can help you in concatenating resulting rows of the group clause
Im not sure how you could possibly do this. Greg has 2 colors, AND 2 prices, how do you determine which goes with which?
Greg Blue 170 or Greg Blue 360 ???? or attaching the Green to either price?
I think the colors need to have unique identofiers, seperate from the person unique identofiers.
Just a thought.