Count Instances Of Occuring String With Unique IDs - sql

I need to count the number of times that a specific string occurs but they when one ID has the same string more than once, only count it once. Basically, I need to count the number of occurrences of a string that occur uniquely to an ID. I believe this should be a simple thing to do but I don't know what I'm doing. Here is my current code:
SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS`
GROUP BY
ID,
Name
ORDER BY
Number
When run, it says everything was counted as 1. Thanks for the help!
UPDATE:
Dataset: https://storage.googleapis.com/omnihealth/MepsPrescriptionData.csv
OUTPUT when run with code above:
Row Name ID Number
1 SUMATRIPTAN 68896102 1
2 IBUPROFEN 65063102 1
3 PENICILLN VK 66179101 1
4 FUROSEMIDE 63217102 1
5 HYSINGLA ER 70373101 1
6 FUROSEMIDE 76090101 1
7 SKELETAL MUSCLE RELAXANTS 78414101 1
8 AMOXICILLIN 69467103 1
9 TRAMADOL HCL 67667101 1
10 PANTOPRAZOLE 60737102 1
11 CARBAMIDE PEROXIDE 6.5% OTIC SOLN 63990104 1
12 PROMETH/COD 68433101 1
13 AZITHROMYCIN 79045102 1
14 METRONIDAZOL 75414101 1
15 DEXILANT 69625101 1
16 TRAMADOL HCL 66890203 1
17 AZITHROMYCIN 73838101 1
18 COLCRYS 63856102 1
19 PERMETHRIN 62103107 1
20 ACETAMINOPHEN TAB 500 MG 62456102 1

not sure if it is what you asked - but if you are looking for DISTINCT COUNT - go with below:
#standardSQL
SELECT
RXNAME AS Name,
COUNT(DISTINCT DUPERSID) AS Number
FROM `OmniHealth.PrescriptionsMEPS`
GROUP BY 1
ORDER BY Number DESC

Try this...You are grouping on a different field than you are counting. I think you are meaning to group by RXNAME.
SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS`
GROUP BY
ID,
RXNAME
ORDER BY
Number

I think you want:
SELECT DUPERSID as ID, COUNT(DISTINCT RXNAME) as Number
FROM `OmniHealth.PrescriptionsMEPS`
GROUP BY ID
ORDER BY Number;
This assumes that "same string" means "same value for RXNAME".

Related

MS access Group by

I have a big access database table and part of it is given below and I want to get the minimum value by group, which is the minimum value for 3, 5, 9,.... I have tried in the query using minimum value of Group by but it couldn't give me:
I want like:
Point ParaA Modvalue MinofParaA
3 0.02345610000 1.1304602327522
5 0.541734307717087 0.592591581187
7 4.4094560894325 0.393245327246
9 5.5450476528689 1.034165859885
11 ….. …….
13 ……. …….
Part of my database is:
Point ParaA Modvalue
3 1.01537282042687E-07 1.1304602327522
3 5.41734307717087E-06 1.592591581187
3 4.4094560894325E-05 2.393245327246
3 5.5450476528689E-05 1.034165859885
3 5.6210847721211E-05 1.9269298608176
3 7.33007048018759E-05 3.17251848741499
3 1.26935918181651E-04 7.577150615919
3 1.27908837646644E-04 4.466340029852
9 0.205576008517929 32.1580666011739
9 0.2058403012141 6.080246238675
9 0.205888794863275 4.48451872092713
9 0.205970780609684 30.2695831828562
9 0.206476048361761 2.3287221969481
9 0.206500794273712 4.48657381393526
9 0.206507173199086 3.54388543810806
9 0.206701769548586 77.5713240109687
5 0.127510144904596 0.692657575677875
5 0.127593565284236 16.812067790848
5 0.127765437607527 1.5228257707606
5 0.12803789311445 13.185719005611
5 0.12821555669427 15.488318488284
5 0.128929582513692 1.24166466944275
5 0.129137495154857 20.811097854043
5 0.129492706221109 1.73300570963531
5 0.130290993399936 6.7783307471853
5 0.130328615583637 11.879218642047
I have tried like: Group by and min (minimum value) but it doesn't give me what I want.
Based on the information provided in the comments, perhaps something along the following lines yields the result that you require:
select t.point, first(t.paraA), min(t.minofpara)
from YourTable t
group by t.point
The second column of this query essentially returns the value of a 'random' record within each point group. Change YourTable to the name of your table.
EDIT: Based on your comments to my answer, it seems that you simply require:
select t.point, min(t.paraa)
from YourTable t
group by t.point
To select the record(s) which hold this minimum, you might then use:
select t1.* from YourTable t1 inner join
(
select t2.point, min(t2.paraa) as mp
from YourTable t2
group by t2.point
) t3
on t1.point = t3.point and t1.paraa = t3.mp

How to get top 10 from one column and sort by another column in hive?

I want to find top 10 title with high number of user ids. So I used query like
select title,count(userid) as users from combined_moviedata group by title order by users desc limit 10
But i need to sort them based on title, I tried this query
select title,count(userid) as users from combined_moviedata group by title order by users desc,title asc limit 10
But it doesnot sort them. Merely returned same results. How to do this
The answer from #KaushikNayak is very close to what I'd consider the "right" answer.
At one level, work out what your top 10 records are
At a different level, sort them by a different field
The only thing I'd say is that if the 10th and 11th most common titles are tied for the same count, they should generally also be included in the results. This is a RANK().
WITH
ranked_titles AS
(
SELECT
RANK() OVER (ORDER BY COUNT(*) DESC) frequency_rank,
title
FROM
combined_moviedata
GROUP BY
title
)
SELECT
*
FROM
ranked_titles
WHERE
frequency_rank <= 10
ORDER BY
title
;
http://sqlfiddle.com/#!6/7283c/1
Note that in the example linked, 12 rows are returned. That is because 4 titles are all tied for the 9th most frequent, and it is actually impossible to determine which two should be selected in preference over the others. In this case selecting 10 rows would normally be statistically incorrect.
title frequency frequency_rank
title06 2 9
title07 2 9
title08 2 9
title09 2 9
title10 3 6
title11 3 6
title12 3 6
title13 4 4
title14 4 4
title15 5 2
title16 5 2
title17 6 1
You could make use of a WITH clause
with t AS
(
select title,count(userid) as users from combined_moviedata
group by title
order by users desc limit 10
)
select * FROM t ORDER BY title ;

Sort by specific order, including NULL, postgresql

best explained with an example:
So I have users table:
id name product
1 second NULL
2 first 27
3 first 27
4 last 6
5 second NULL
And I would like to order them in this product order: [27,NULL, 6]
So I will get:
id name product
2 first 27
3 first 27
1 second NULL
5 second NULL
4 last 6
(notice user id 3 can be before user id 2 since they both have the same product value)
Now without NULL I could do it like that:
SELECT id FROM users ORDER BY users.product=27, users.product=6;
How can I do it with NULL ?
p.s.
I would like to do that for many records so it should be efficient.
You can use case to produce custom sort order:
select id
from users
order by case
when product = 27
then 1
when product is null
then 2
when product = 6
then 3
end
As a note, you can follow your original approach. You just need a NULL-safe comparison:
SELECT id
FROM users
ORDER BY (NOT users.product IS DISTINCT FROM 27)::int DESC,
(user.product IS NULL)::int DESC,
(NOT users.product IS DISTINCT FROM 6)::int DESC;
The reason your version has unexpected results is because the first comparison can return NULL, which is ordered separately from the "true" and "false".

Counting substrings in a string

How can I count the numbers of times a substring shows up in a string?
In this case, I'm looking for every time "connect.facebook.net/en_US/all.js" shows up in the HTML bodies of the top 300K internet sites (stored in httparchive).
You could use SPLIT() on the string, and count the number of records produced:
SELECT fb_times, COUNT(*) n_pages
FROM
(SELECT COUNT(splits)-1 WITHIN RECORD AS fb_times
FROM
(SELECT SPLIT(body, 'connect.facebook.net/en_US/all.js') splits
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE body CONTAINS 'connect.facebook.net/en_US/all.js'
AND mimeType="text/html"
AND page=url))
GROUP BY 1
ORDER BY 1
Note the usage of WITHIN RECORD to count how many sub-records SPLIT() produced.
Results:
Fb_times N_pages
1 12,471
2 1,222
3 163
4 34
5 18
6 12
7 12
8 6
... ...

removing duplicates in ms access

please tell me how to write this query
i have an access table
number
2
2
1
2
2
1
1
3
2
i want a query that gives
number count
2 5
1 3
3 1
any help appreciated
something like...
SELECT number, count(number) AS count
FROM table
GROUP BY number
It's a bad idea to have a column named number since it is a reserved keyword.
You probably want something like
select number_, count(*) as count from ... group by number_