Counting the number of rows based on like values - sql

I'm a little bit lost on this. I would like to list the number of names beginning with the same letter, and find the total amount of names containing that first same letter.
For instance:
name | total
-------|--------
A | 12
B | 10
C | 8
D | 7
E | 3
F | 2
...
Z | 1
12 names beginning with letter 'A', 10 with 'B' and so on.
This is what I have so far
SELECT
LEFT(customers.name,1) AS 'name'
FROM customers
WHERE
customers.name LIKE '[a-z]%'
GROUP BY name
However, I'm unsure how I would add up columns based on like values.

This should work for you:
SELECT
LEFT(customers.name,1) AS 'name',
COUNT(*) AS NumberOfCustomers
FROM customers
WHERE
customers.name LIKE '[a-z]%'
GROUP BY LEFT(customers.name,1)
EDIT: Forgot the explanation; as many have mentioned already, you need to group on the calculation itself and not the alias you give it, as the GROUP BY operation actually happens prior to the SELECT and therefore has no idea of the alias yet. The COUNT part you would have figured out easily. Hope that helps.

You don't want to count the names, but only the first letters. So you must not group by name, but group by the first letter
SELECT LEFT(name, 1) AS name, count(*)
FROM customers
GROUP BY LEFT(name, 1)
SQLFiddle

Related

Oracle SQL Count function

I am hoping someone can advise on the below please?
I have some code (below), it is pulling the data I need with no issues. I have been trying (in vain) to add a COUNT function in here somewhere. The output I am looking for would be a count of how many orders are assigned to each agent. I tried a few diffent things based on other questions but can't seem to get it correct. I think I am placing the COUNT 'Agent' statement and the GROUP BY in the wrong place. Please can someone advise? (I am using Oracle SQL Developer).
select
n.ordernum as "Order",
h.employee as "Name"
from ordermgmt n, orderheader h
where h.ordernum = n.ordernum
and h.employee_group IN ('ORDER.MGMT')
and h.employee is NOT NULL
and n.percentcomplete = '0'
and h.order_status !='CLOSED'
Output I am looking for would be, for example:
Name Orders Assigned
Bob 3
Peter 6
John 2
Thank you in advance
Name
Total
49
49
49
49
49
John
4
John
4
John
4
John
4
Peter
2
Peter
2
Bob
3
Bob
3
Bob
3
for example. so there are 49 blank rows summed up as 49 in the Total column. I did not add the full 49 blank columns to save space
Would be easier with sample data and expected output, but maybe you are looking for something like this
select
n.ordernum as "Order",
h.employee as "Name",
count(*) over (partition by h.employee) as OrdersAssigned
from ordermgmt n, orderheader h
where h.ordernum = n.ordernum
and h.employee_group IN ('ORDER.MGMT')
and h.employee is NOT NULL
and n.percentcomplete = '0'
and h.order_status !='CLOSED'
The use of COUNT (as other aggregate functions) is simple.
If you want to add an aggregate function, please group all scalar fields in the GROUP BY clause.
So, in the SELECT you can manage field1, field2, count(1) and so on but you must add in group by (after where conditions) field1, field2
Try this:
select
h.employee as "Name",
count(1) as "total"
from ordermgmt n, orderheader h
where h.ordernum = n.ordernum
and h.employee_group IN ('ORDER.MGMT')
and h.employee is NOT NULL
and n.percentcomplete = '0'
and h.order_status !='CLOSED'
GROUP BY h.employee

Group by with similar ID

I have a table that has multiple items with slightly different IDs that I want to group together under one common ID. My table looks like this:
ID | Total
alex_dog 1
ben_dog 2
charlie_dog 3
alex_cat 4
ben_cat 5
charlie_cat 6
And I want to be able to group them into one table to look like this:
ID | total
dog 6
cat 15
If i leave the _ before the ID that is fine. Is is possible to do a groupby query where you can groupby '_%ID%'?
This can be done with a bit of string manipulation to get everything after _ along with a group by on the same value
SELECT SUBSTRING(ID,CHARINDEX('_',ID)+1,LEN(ID)), SUM(Total) as Total
FROM Data
GROUP BY SUBSTRING(ID,CHARINDEX('_',ID)+1,LEN(ID))
Live example: http://rextester.com/BHNY98025

Count particular substring text within column

I have a Hive table, titled 'UK.Choices' with a column, titled 'Fruit', with each row as follows:
AppleBananaAppleOrangeOrangePears
BananaKiwiPlumAppleAppleOrange
KiwiKiwiOrangeGrapesAppleKiwi
etc.
etc.
There are 2.5M rows and the rows are much longer than the above.
I want to count the number of instances that the word 'Apple' appears.
For example above, it is:
Number of 'Apple'= 5
My sql so far is:
select 'Fruit' from UK.Choices
Then in chunks of 300,000 I copy and paste into Excel, where I'm more proficient and able to do this using formulas. Problem is, it takes upto an hour and a half to generate each chunk of 300,000 rows.
Anyone know a quicker way to do this bypassing Excel? I can do simple things like counts using where clauses, but something like the above is a little beyond me right now. Please help.
Thank you.
I think I am 2 years too late. But since I was looking for the same answer and I finally managed to solve it, I thought it was a good idea to post it here.
Here is how I do it.
Solution 1:
+-----------------------------------+---------------------------+-------------+-------------+
| Fruits | Transform 1 | Transform 2 | Final Count |
+-----------------------------------+---------------------------+-------------+-------------+
| AppleBananaAppleOrangeOrangePears | #Banana#OrangeOrangePears | ## | 2 |
| BananaKiwiPlumAppleAppleOrange | BananaKiwiPlum##Orange | ## | 2 |
| KiwiKiwiOrangeGrapesAppleKiwi | KiwiKiwiOrangeGrapes#Kiwi | # | 1 |
+-----------------------------------+---------------------------+-------------+-------------+
Here is the code for it:
SELECT length(regexp_replace(regexp_replace(fruits, "Apple", "#"), "[A-Za-z]", "")) as number_of_apples
FROM fruits;
You may have numbers or other special characters in your fruits column and you can just modify the second regexp to incorporate that. Just remember that in hive to escape a character you may need to use \\ instead of just one \.
Solution 2:
SELECT size(split(fruits,"Apple"))-1 as number_of_apples
FROM fruits;
This just first split the string using "Apple" as a separator and makes an array. The size function just tells the size of that array. Note that the size of the array is one more than the number of separators.
This is straight-forward if you have any delimiter ( eg: comma ) between the fruit names. The idea is to split the column into an array, and explode the array into multiple rows using the 'explode' function.
SELECT fruit, count(1) as count FROM
( SELECT
explode(split(Fruit, ',')) as fruit
FROM UK.Choices ) X
GROUP BY fruit
From your example, it looks like fruits are delimited by Capital letters. One idea is to split the column based on capital letters, assuming there are no fruits with same suffix.
SELECT fruit_suffix, count(1) as count FROM
( SELECT
explode(split(Fruit, '[A-Z]')) as fruit_suffix
FROM UK.Choices ) X
WHERE fruit_suffix <> ''
GROUP BY fruit_suffix
The downside is that, the output will not have first letter of the fruit,
pple - 5
range - 4
I think you want to run in one select, and use the Hive if UDF to sum for the different cases. Something like the following...
select sum( if( fruit like '%Apple%' , 1, 0 ) ) as apple_count,
sum( if( fruit like '%Orange%', 1, 0 ) ) as orange_count
from UK.Choices
where ID > start and ID < end;
instead of a join in the above query.
No experience of Hive, I'm afraid, so this may or may not work. But on SQLServer, Oracle etc I'd do something like this:
Assuming that you have an int PK called ID on the row, something along the lines of:
select AppleCount, OrangeCount, AppleCount - OrangeCount score
from
(
select count(*) as AppleCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Apple%'
) a,
(
select count(*) as OrangeCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Orange%'
) o
I'd leave the division by the total count to the end, when you have all the rows in the spreadsheet and can count them there.
However, I'd urgently ask my boss to let me change the Fruit field to be a table with an FK to Choices and one fruit name per row. Unless this is something you can't do in Hive, this design is something that makes kittens cry.
PS I'd missed that you wanted the count of occurances of Apple which this won't do. I'm leaving my answer up, because I reckon that my However... para is actually a good answer. :(

SQL server - How to find the highest number in '<> ' in a text column?

Lets say I have the following data in the Employee table: (nothing more)
ID FirstName LastName x
-------------------------------------------------------------------
20 John Mackenzie <A>te</A><b>wq</b><a>342</a><d>rt21</d>
21 Ted Green <A>re</A><b>es</b><1>t34w</1><4>65z</4>
22 Marcy Nate <A>ds</A><b>tf</b><3>fv 34</3><6>65aa</6>
I need to search in the X column and get highest number in <> these brackets
What sort of SELECT statement can get me, for example, the number 6 like in <6>, in the x column?
This type of query generally works on finding patterns, I consider that the <6> is at the 9th position from left.
Please note if the pattern changes the below query will not work.
SELECT A.* FROM YOURTABLE A INNER JOIN
(SELECT TOP 1 ID,Firstname,Lastname,SUBSTRING(X,LEN(X)-9,1) AS [ORDER]
FROM YOURTABLE
WHERE ISNUMERIC(SUBSTRING(X,LEN(X)-9,1))=1
ORDER BY SUBSTRING(X,LEN(X)-9,1))B
ON
A.ID=B.ID AND
A.FIRSTNAME=B.FIRSTNAME AND
A.LASTNAME=B.LASTNAME

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.