SQL - Count(*) not behaving in expected manner [duplicate] - sql

This question already has an answer here:
Access query producing results like ROW_NUMBER() in T-SQL
(1 answer)
Closed 7 years ago.
I have the following code
SELECT C_Record.BunchOfColumns, Count(*) AS Degrees
FROM C_Record
WHERE (((C_Record.[C#])=[Enter Value])) //Parameter Input from User
GROUP BY C_Record.BunchofColumns;
My Degrees column never increments, it shows 1 always no matter how many rows are returned from the query. I am suspecting that I have not implemented my GROUP BY method properly. If I understand it correctly, all columns that are selected and are not part of the aggregate function (COUNT in my case) should be put together in GROUP BY. Any help is much appreciated. Thanks in advance
Edit: What I am trying to achieve is to check how many rows have a particular value for a column, then select all other relevant columns and create a Index columns. For example if there are three rows that meet my requirement
Col1 Col2 Degrees
A X 1
B Y 2
C Z 3
and if only 2 rows meet my requirement then
Col1 Col2 Degrees
P X 1
Q Y 2
P.S - my C_Record.BunchofColumns consists of about 10 columns that I did not include for the sake of brevity.
P.P.S - If I try to skip out on any column it gives me the error You Tried to execute a query that does not include the specified expression <<column_name>> as part of an aggregate function

When you use Count() with a GROUP BY the count returned is the number of rows in each group. So to get a count greater than one you would have to have more than one row in your table that had exactly the same values. If you are selecting 10 different columns it seems likely that you have no two columns in the database that have exactly those 10 same values.
If you start with a selecting and grouping by a single column you will see count's of more than one.

That is not how GROUP BY works.
GROUP BY completely changes the meaning of your query. Each row of the result is an "aggregate grouping" of the original rows. Each aggregate grouping consists of all the rows with a particular combination of values for their GROUP BY columns. So if you GROUP BY ten columns, each grouping will consist of rows which are identical on all ten columns.
Once these groupings have been formed, you SELECT various aggregate values like count() or sum(), which provide you with information about the group as a whole. count(*) gives you the number of rows in the group, while count(column) gives you the number of rows in which column is non-NULL. You can also select any of the columns which appear in the GROUP BY clause, because those columns are identical across the whole group.
You are getting a count(*) of one because each of your groups only contains a single row. This is probably because you are grouping by ten columns, and there are no two rows which are identical for all ten columns.
If you just want a count of how many rows satisfy some query, and you don't want this aggregation at all, you write it like this:
SELECT count(*)
FROM something
WHERE something
-- no GROUP BY
;
That will form a single aggregate group of your whole query, and count the rows.
If you want something else, you will need to further explain what you're trying to do.

Related

SQL: Aggregation of string values from diffrent rows in a mulit-leveled JOIN

I have a multi-leveled table hierarchy in SQL Server, and when joining them I want to do an aggregation of strings from rows of one of the tables. In my (simplified) example in the screenshot below, I have Level1->Level2->Level3 and also a table PerformingUser "hanging under" Level1.
I have done a JOIN on all these four tables, resulting in 8 rows. So far all is fine.
Now, what I want is that each of this 8 rows gets a new column ("AllUsersForLevel1") with an aggregation of all PerformingUsers for its Level1.
(The intent is that the PerformingUser also can see what other PerformingUsers have the same Level1.)
So for the first 2 rows I want a new column with 'aretha, mary'. And in the other 6 rows that column should have 'john, jim'.
I have tried STRING_AGG (as indicated in the comment in the image), but it requires a grouping (which I do not want; I want all 8 rows).
I also tried to do a STRING_AGG on a OVER (PARTITION BY Level1ID), but got "The function 'STRING_AGG' is not a valid windowing function, and cannot be used with the OVER clause."
Does anyone have advice on how to do this?

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

Drill on Hbase: issue with SELECT *

I created a test table in HBASE with 2 column families and 3 columns for each, all same value, 100,000 rows in total.
SELECT COUNT(*) will not return the correct row count (much much less).
However, doing SELECT on a subset of those columns gives the right count.
If I reduced the column number to 2 in this case, SELECT COUNT(*) gives the right count.
Later I tried tables with only one column family, drill returns the right count only when column number is smaller than 6.
What could be the possible reason of this?
Have I missed any drill configurations?

Column value divided by row count in SQL Server

What happens when each column value in a table is divided with the total table row count. What function is basically performed by sql server? Can any one help?
More specifically: what is the difference between sum(column value ) / row count and column value/ row count. for e.g,
select cast(officetotal as float) /count(officeid) as value,
sum(officetotal)/ count(officeid) as average from check1
where officeid ='50009' group by officeid,officetotal
What is the operation performed on both select?
In your example both will be allways the same value because count(officeid) is allways equal to 1 because officeid is contained in the WHERE clause and officetotal is also contained in GROUP BY clause. So the example will not work because no grouping will be applied.
When you remove officetotal from the GROUP BY, you will get following message:
Column 'officetotal' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
It means that you cannot use officetotal and SUM(officetotal) in one select - because SUM is meant to work for set of values and it is pointless to SUM only one value.
It is just not possible to write it this way in SQL using GROUP BY. If you look for something like first or last value from a group, you will have to use MIN(officetotal) or MAX(officetotal) or some other approach.

String Grouping from a single column in Oracle database having million rows and removing duplicates

We have a huge table and one of the column contains queries like e.g. in row 1
1. (((firstname:Adam OR firstname:Neil ) AND lastname:Lee) ) AND category:"Legal" AND type:Individual
and in row 2 of same column
2. (((firstname:Adam* OR firstname:Neil ) AND lastname:Lee) ) AND category:"Legal" AND type:Organization
Similarly there are few other types of Query strings which are used eventually to query external services.
Issue is based on certain criteria I have to group and remove duplicates from this table.
There are few rules to determine grouping of Strings in different rows.One of them is that if first name and lastname are same then ignore category and type values, therefore above two rows will be grouped to one. There are around million rows. Comparing Strings and doing grouping is not looking elegant solution. What could be best possible solution using sql.