Sql Distinct Count of resulting table with no conditionals

Sql Distinct Count of resulting table with no conditionals - sql

I want to count the number of accounts from the resulting table generated from this code. This way, I know how many people liked blue at one time.
Select Distinct PEOPLE.FullName, PEOPLE.FavColor From PEOPLE
Where FavColor='Blue'
Lets say this is a history accounting of what people said their favorite color when they were asked so there may be multiple records of the same full name if asked again at a much later time; hence the distinct.
The code I used may not be reusable in your answer so feel free to use what you think can work. I am sure I found a possible solution to my problem using declare and if statements but I lost that page... so I am left with no solution. However, I think there is a way to do it without using conditionals which is what I am asking and rather have. Thanks.
Edit: My question is: From the code above, is there a way to count the number of accounts in the resulting table?

If I understand what you are asking correctly (how many people liked blue at one time?), try this:
select count(*)
from PEOPLE
where FavColor = 'Blue'
group by FullName
If your question is in fact, how can I count the results of any select query?, you can do this:
Suppose your original query is:
select MyColumn
from MyTable
where MyOtherColumn = 26
You can wrap it in another query to get the count
select count(*)
from (
select MyColumn
from MyTable
where MyOtherColumn = 26
) a

Select Count (Distinct PEOPLE.FullName)
From PEOPLE
Where FavColor='Blue'

Are you saying that you want to generate the count with no WHERE clause?
How about this?
SELECT
count(*)
FROM
people
INNER JOIN (SELECT FavColor = 'Blue') col ON col.FavColor = people.FavColor
Edit: OK, I see what you want now.
You just need to wrap your query in SELECT count(*) FROM ( <your-query-goes-here> ).

I think this may be what you want.
Select Count(*) as 'NumberOfPeople'
From
(
Select Distinct PEOPLE.FullName, PEOPLE.FavColor From PEOPLE
Where FavColor='Blue'
)a

If the same person has multiple answers, that is, their favourite colour has changed over time - resulting in several colours, some repeating, for the same person (aka "account") then to find the number of accounts where the favourite colour is / was blue (oracle):
select count(*)
from (select FULLNAME
from PEOPLE
where FAVCOLOR = 'BLUE'
group by FULLNAME);

Related

Counting unique text in query

I need to count how many distinct shipping methods there are in a query (the answer is 2). I am trying to use DISTINCT but it doesn't seem to be working the way I thought it would.
SELECT DISTINCT Count(Order.ship_method) AS CountOfship_method
FROM [Order];

Try this instead -
SELECT COUNT(*) as CountOfship_method
FROM
(SELECT DISTINCT Order.ship_method FROM [Order]);

SQL Server, IN operator show different results

Working on SQL server,i was trying to count the total number of rows based on some criteria as shown in the simple query bellow:
SELECT count(*) FROM my_table WHERE (supp=92 OR (supp=94 and organisation <> 'LDF'))
knowing that my_table do not contain any rows with supp=92 and organisation='LDF' i decided to make the query simpler like the following:
SELECT count(*) FROM my_table WHERE supp IN (92,94) and organisation <> 'LDF'
The results of the two queries were totally different.
Yet the queries look totally the same for me, i've been trying to figure out where the problem is, but i couldn't find an answer.
it's really confusing to me, thank you in advance for your answers.

If your organisation column contained any NULL values where supp=92, the first query would return them but the second query would exclude them. This is because this code:
organisation <> 'LDF'
would return 'NULL', not 'TRUE', if the organisation field is NULL.
I'd recommend you read Robert Shelden's excellent article on all the ways that NULL can trip you up; the entire article is worth reading, but the ninth point talks about this specific scenario.

They are not the same
First:
SELECT COUNT(*)
FROM my_table
WHERE supp=92
OR (supp=94 AND organisation <> 'LDF')
Second is equivalent to:
SELECT COUNT(*)
FROM my_table
WHERE (supp = 92 OR supp = 94)
AND organisation <> 'LDF'

Selecting the biggest ZIP code from a column

I want to get the biggest ZIP code in DB. Normally I do this
SELECT *
FROM (
Select * From tbuser ORDER BY zip DESC
)
WHERE rownum = 1
with this code I can get the biggest zip code value without a duplicate row (since zip code is not a primary key).
But the main company at Japan said that I cant use it since when the connection is slow or the DB have very large data, you cant get the right row of it. It will be a great help for me if someone can helps.

I want to get the biggest ZIP code in DB.
If you really only want the zip code, try that:
SELECT MAX(zip) FROM TBUSER;
This will use the index on the zip column (if it exists).
That being said, Oracle is usually smart enough to properly optimize sub-query selection using ROWNUM. Maybe your main company is more concerned about the possible "full table" ̀ORDER BY` in the subquery ? OTH, if the issue is really with "slow network", maybe worth taking some time with your DBA to look on the wire using a network analyzer or some other tool if your approach really leads to "excessive bandwidth consumption". I sincerely doubt about that...
If you want to retrieve the whole row having the maximum zip code here is a slight variation on an other answer (in my opinion, this is one of the rare case for using a NATURAL JOIN):
select * from t
natural join (select max(zip) zip from t);
Of course, in case of duplicates, this will return multiple rows. You will have to combine that with one of the several options posted in the various other answers to return only 1 row.
As an extra solution, and since you are not allowed to use ROWNUM (and assuming row_number is arbitrary forbidden too), you can achieve the desired result using something as contrived as:
select * from t
where rowid = (
select min(t.rowid) rid from t
natural join (select max(zip) zip from t)
);
See http://sqlfiddle.com/#!4/3bd63/5
But honestly, there isn't any serious reason to hope that such query will perform better than the simple ... ORDER BY something DESC) WHERE rownum <= 1 query.

This sounds to me like bad advice (masquerading as a rule) from a newbie data base administrator who doesn't understand what he's looking at. That insight isn't going to help you, though. Rarely does a conversation starting with "you're an obstructionist incompetent" achieve anything.
So, here's the thing. First of all, you need to make sure there's an index on your zip column. It doesn't have to be a primary key.
Second, you can try explaining that Oracle's table servers do, in fact, optimize the ... ORDER BY something DESC) WHERE rownum <= 1 style of query. Their servers do a good job of that. Your use case is very common.
But if that doesn't work on your DBA, try saying "I heard you" and do this.
SELECT * FROM (
SELECT a.*
FROM ( SELECT MAX(zip) zip FROM zip ) b
JOIN ZIP a ON (a.zip = b.zip)
) WHERE rownum <= 1
This will get one row with the highest numbered zip value without the ORDER BY that your DBA mistakenly believes is messing up his server's RAM pool. And, it's reasonably efficient. As long as zip has an index.

As you are looking for a way to get the desired record without rownum now, ...
... here is how to do it from Oracle 12c onward:
select *
from tbuser
order by zip desc fetch first 1 row only;
... and here is how to do it before Oracle 12c:
select *
from (select tbuser.*, row_number() over(order by zip desc) as rn from tbuser)
where rn = 1;
EDIT: As Sylvain Leroux pointed out, it is more work for the dbms to sort all records rather than just find the maximum. Here is a max query without rownum:
select *
from tbuser where rowid =
(select max(rowid) keep (dense_rank last order by zip) from tbuser);
But as Sylvain Leroux also mentioned, it makes also a difference whether there is an index on the column. Some tests I did show that with an index on the column, the analytic functions are slower than the traditional functions. Your original query would just get into the index, go to the highest value, pick the record and then stop. You won't get this any faster. My last mentioned query being quite fast on a none-indexed column is slower than yours on an indexed column.

Your requirements seem arbitrary, but this should give you the result you've requested.
SELECT *
FROM (SELECT * FROM tbuser
WHERE zip = (SELECT MAX(zip) FROM tbuser))
WHERE rownum = 1

OK - try something like this:
SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER);
Fetch a single row from a cursor based on the above statement, then close the cursor. If you're using PL/SQL you could do it like this:
FOR aRow IN (SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER))
LOOP
-- Do something with aRow
-- then force an exit from the loop
EXIT;
END LOOP;
Share and enjoy.

I was wondering that nobody posted this answer yet. I think that is the way, you should do something like that.
SELECT *
FROM (
Select a.*, max(zip) over () max_zip
From tbuser a
)
WHERE zip=max_zip
and rownum = 1

Your query gets exactly one random row of all records having the max zip code. So it cannot be the problem that you retrieve a record with another zip code or more than one record or zero records (as long as there is at least one record in the table).
Maybe Japan simply expects one of the other rows with that zip code? Then you may just have to add another order criteria to get that particular desired row.
Another thought: As they are talking about slow connection speed, it may also be that they enter a new max zip code on one session, query with another and get the old max zip, because the insert statement of the other session hasn't gone through yet. But well, that's just the way this works of course.
BTW: A strange thing to select a maximum zip code. I guess that's just an example to illustrate the problem?

IF you are getting multiple records using MAX function (which is not possible, but in your case you are getting, I don't know how until you post screenshot) then You can use DISTINCT in your sql query to get single record
SELECT DISTINCT MAX(zipcode) FROM TableUSER
SQL FIDDLE

The used SELECT statements have a different number of columns

For examples I don't know how many rows in each table are and I try to do like this:
SELECT * FROM members
UNION
SELECT * FROM inventory
What can I put to the second SELECT instead of * to remove this error without adding NULL's?

Put the columns names explicitly rather than *, and make sure the number of columns and data types match for the same column in each select.
Update:
I really don't think you want to be UNIONing those tables, based on the tables names. They don't seem to contain related data. If you post your schema and describe what you are trying to achieve it is likely we can provide better help.

you could do
SELECT *
from members
UNION
SELECT inventory.*, 'dummy1' AS membersCol1, 'dummy2' AS membersCol2
from inventory;
Where membersCol1, membersCol12, etc... are the names of columns from members that are not in inventory. That way both queries in the union will have the same columns (Assuming that all the columns in inventory are the same as in members which seems very strange to me... but hey, it's your schema).
UPDATE:
As HLGEM pointed out, this will only work if inventory has columns with the same names as members, and in the same order. Naming all the columns explicitly is the best idea, but since I don't know the names I can't exactly do that. If I did, it might look something like this:
SELECT id, name, member_role, member_type
from members
UNION
SELECT id, name, '(dummy for union)' AS member_role, '(dummy for union)' AS member_type
from inventory;
I don't like using NULL for dummy values because then it's not always clear which part of the union a record came from - using 'dummy' makes it clear that the record is from the part of the union that didn't have that record (though sometimes this might not matter). The very idea of unioning these two tables seems very strange to me because I very much doubt they'd have more than 1 or 2 columns with the same name, but you asked the question in such a way that I imagine in your scenario this somehow makes sense.

Are you sure you don't want a join instead? It is unlikely that UNOIN will give you what you want given the table names.

Try this
(SELECT * FROM members) ;
(SELECT * FROM inventory);
Just add semicolons after both the select statements and don't use union or anything else. This solved my error.

I don't know how many rows in each table
Are you sure this isn't what you want?
SELECT 'members' AS TableName, Count(*) AS Cnt FROM members
UNION ALL
SELECT 'inventory', Count(*) FROM inventory

Each SELECT statement within the MySQL UNION ALL operator must have the same number of fields in the result sets with similar data types
Visit https://www.techonthenet.com/mysql/union_all.php

Counting DISTINCT over multiple columns

Is there a better way of doing a query like this:
SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery
I need to count the number of distinct items from this table but the distinct is over two columns.
My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.
Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.
I believe a distinct count of the computed column would be equivalent to your query.

Edit: Altered from the less-than-reliable checksum-only query
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable
SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?
It certainly works as you might expect in Oracle.
SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN
9 rows selected.
SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /
COUNT(*)
----------
9
SQL>
edit
I went down a blind alley with analytics but the answer was depressingly obvious...
SQL> select count(distinct concat(deptno,job)) from emp
2 /
COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9
SQL>
edit 2
Given the following data the concatenating solution provided above will miscount:
col1 col2
---- ----
A AA
AA A
So we to include a separator...
select col1 + '*' + col2 from t23
/
Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.
SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;
In MySQL you can do the same thing without the concatenation step as follows:
SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;
This feature is mentioned in the MySQL documentation:
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

How about something like:
select count(*)
from
(select count(*) cnt
from DocumentOutputItems
group by DocumentId, DocumentSessionId) t1
Probably just does the same as you are already though but it avoids the DISTINCT.

Some SQL databases can work with a tuple expression so you can just do:
SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
FROM DocumentOutputItems;
If your database doesn't support this, it can be simulated as per #oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.
COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).
MySQL specifically supports COUNT(DISTINCT expr, expr, ...) which is non-SQL standard syntax. It also notes In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).
A related use of tuples is performing IN queries such as:
SELECT * FROM DocumentOutputItems
WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

Here's a shorter version without the subselect:
SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems
It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.
Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.

I have used this approach and it has worked for me.
SELECT COUNT(DISTINCT DocumentID || DocumentSessionId)
FROM DocumentOutputItems
For my case, it provides correct result.

There's nothing wrong with your query, but you could also do it this way:
WITH internalQuery (Amount)
AS
(
SELECT (0)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
FROM internalQuery

If you're working with datatypes of fixed length, you can cast to binary to do this very easily and very quickly. Assuming DocumentId and DocumentSessionId are both ints, and are therefore 4 bytes long...
SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems
My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.
However, using the above solution had virtually no increase on the query time (comparing with using simply the SUM), and should be completely reliable! It should be able to help others in a similar situation so I'm posting it here.

if you had only one field to "DISTINCT", you could use:
SELECT COUNT(DISTINCT DocumentId)
FROM DocumentOutputItems
and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:
SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId))
FROM DocumentOutputItems
but you'll have issues if NULLs are involved. I'd just stick with the original query.

Hope this works i am writing on prima vista
SELECT COUNT(*)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId

I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.
At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.
Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using
SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))
but this is also not conforting.
You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.
I would suggest using
SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems

I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)
SELECT COUNT(DISTINCT DocumentID) AS Count1,
COUNT(DISTINCT DocumentSessionId) AS Count2
FROM DocumentOutputItems

How about this,
Select DocumentId, DocumentSessionId, count(*) as c
from DocumentOutputItems
group by DocumentId, DocumentSessionId;
This will get us the count of all possible combinations of DocumentId, and DocumentSessionId

It works for me. In oracle:
SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;
In jpql:
SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;

I had a similar question but the query I had was a sub-query with the comparison data in the main query. something like:
Select code, id, title, name
(select count(distinct col1) from mytable where code = a.code and length(title) >0)
from mytable a
group by code, id, title, name
--needs distinct over col2 as well as col1
ignoring the complexities of this, I realized I couldn't get the value of a.code into the subquery with the double sub query described in the original question
Select count(1) from (select distinct col1, col2 from mytable where code = a.code...)
--this doesn't work because the sub-query doesn't know what "a" is
So eventually I figured out I could cheat, and combine the columns:
Select count(distinct(col1 || col2)) from mytable where code = a.code...
This is what ended up working

This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in MySQL like a charm.
select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)
from DocumentOutputItems
group by i ,s;

You can just use the Count Function Twice.
In this case, it would be:
SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId)
FROM DocumentOutputItems

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sql Distinct Count of resulting table with no conditionals - sql

Select Count (Distinct PEOPLE.FullName) From PEOPLE Where FavColor='Blue'

I think this may be what you want. Select Count(*) as 'NumberOfPeople' From ( Select Distinct PEOPLE.FullName, PEOPLE.FavColor From PEOPLE Where FavColor='Blue' )a

Related

Counting unique text in query

SQL Server, IN operator show different results

Selecting the biggest ZIP code from a column

The used SELECT statements have a different number of columns

Counting DISTINCT over multiple columns

Categories

Resources