Counting occurrences in a table - sql

Lets say I want to count the total number of occurrences of a name contained within a string in a column and display that total next to all occurrences of that name in a new column beside it. For example, if I have:
Name | Home Address | Special ID
==================================
Frank | 152414 | aTRF342
Jane | 4342342 | rRFC432
Mary | 423432 | xTRF353
James | 32111111 | tLZQ399
May | 4302443 | 3TRF322
How would I count the occurrences of special tags like 'TRF', 'RFC', or 'LZQ' so the table looks like this:
Name | Home Address | Special ID | Occurrences
================================================
Frank | 152414 | aTRF342 | 3
Jane | 4342342 | rRFC432 | 1
Mary | 423432 | xTRF353 | 3
James | 32111111 | tLZQ399 | 1
May | 4302443 | 3TRF322 | 3
Currently using Access 2007. Is this even possible using a SQL query?

Using Access 2007, I stored your sample data in a table named tblUser1384831. The query below returns this result set.
Name Home Address Special ID special_tag Occurrences
---- ------------ ---------- ----------- -----------
Frank 152414 aTRF342 TRF 3
Jane 4342342 rRFC432 RFC 1
Mary 423432 xTRF353 TRF 3
James 32111111 tLZQ399 LZQ 1
May 4302443 3TRF322 TRF 3
Although your question has a vba tag, you don't need to use a VBA procedure for this. You can do it with SQL and the Mid() function.
SELECT
base.[Name],
base.[Home Address],
base.[Special ID],
base.special_tag,
tag_count.Occurrences
FROM
(
SELECT
[Name],
[Home Address],
[Special ID],
Mid([Special ID],2,3) AS special_tag
FROM tblUser1384831
) AS base
INNER JOIN
(
SELECT
Mid([Special ID],2,3) AS special_tag,
Count(*) AS Occurrences
FROM tblUser1384831
GROUP BY Mid([Special ID],2,3)
) AS tag_count
ON base.special_tag = tag_count.special_tag;

You would have to GROUP BY the substring of Special ID. In MS Access, you can read about how to compute substrings here.
The problem in your case is that your data in Special ID column does not follow a standard pattern, one which easy to extract via the substring function. You might need to use regular expressions to extract such values, and later apply the GROUP BY to them.
With MSSQL, Oracle, PostgreSQL you would be able to declare a stored procedure (example CLR function in MS SQL Server) that would do this for you. Not sure with MS Access.

you can do something like this:
select Name, [Home Address], [Special ID],
(select count(*) from [your table] where [Special ID] = RemoveNonAlphaCharacters([Special ID]) ) as Occurrences
from [your table]
auxiliar function (got from this link):
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
While PatIndex('%[^a-z]%', #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex('%[^a-z]%', #Temp), 1, '')
Return #Temp
End

lets say your first table is called 'table_with_string'
the following code will show the occurance based on the first 3 charecters of string in Special ID column. since it is not clear how exactly you are passing the string to match
select tws.Name,tws.HomeAddress,tws.SpecialID,str_count.Occurrences from
table_with_string tws
left join
(select SpecialID,count(*) from table_with_string where specialID like(substring
(specialid,0,3))
group by specialId) as str_count(id,Occurrences)
on str_count.id=tws.SpecialID

I would suggest doing this explicitly as a join, so you are clear on how it works:
select tws.Name, tws.HomeAddress, tws.SpecialID, str_count.Occurrences
from table_with_string tws
join
(
select substring(spcecialid, 2, 3) as code, count(*) as Occurrences
from table_with_string tws
group by substring(spcecialid, 2, 3)
) s
on s.code = substring(tws.spcecialid, 2, 3)

Related

Case statement logic and substring

Say I have the following data:
Passes
ID | Pass_code
-----------------
100 | 2xBronze
101 | 1xGold
102 | 1xSilver
103 | 2xSteel
Passengers
ID | Passengers
-----------------
100 | 2
101 | 5
102 | 1
103 | 3
I want to count then create a ticket in the output of:
ID 100 | 2 pass (bronze)
ID 101 | 5 pass (because it is gold, we count all passengers)
ID 102 | 1 pass (silver)
ID 103 | 2 pass (steel)
I was thinking something like the code below however, I am unsure how to finish my case statement. I want to substring pass_code so that we get show pass numbers e.g '2xBronze' should give me 2. Then for ID 103, we have 2 passes and 3 customers so we should output 2.
Also, is there a way to firstly find '2xbronze' if the pass_code contained lots of other things such as '101001, 1xbronze, FirstClass' - this may change so i don't want to substring, could we search for '2xbronze' and then pull out the 2??
SELECT
CASE
WHEN Passes.pass_code like '%gold%' THEN Passengers.passengers
WHEN Passes.pass_code like '%steel%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%bronze%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%silver%' THEN SUBSTRING(passes.pass_code, 1,1)
else 0 end as no,
Passes.ID,
Passes.Pass_code,
Passengers.Passengers
FROM Passes
JOIN Passengers ON Passes.ID = Passengers.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=db698e8562546ae7658270e0ec26ca54
So assuming you are indeed using Oracle (as your DB fiddle implies).
You can do some string magic with finding position of a splitter character (in your case the x), then substringing based on that. Obviously this has it's problems, and x is a bad character seperator as well.. but based on your current set.
WITH PASSCODESPLIT AS
(
SELECT PASSES.ID,
TO_Number(SUBSTR(PASSES.PASS_CODE, 0, (INSTR(PASSES.PASS_CODE, 'x')) - 1)) AS NrOfPasses,
SUBSTR(PASSES.PASS_CODE, (INSTR(PASSES.PASS_CODE, 'x')) + 1) AS PassType
FROM Passes
)
SELECT
PASSCODESPLIT.ID,
CASE
WHEN PASSCODESPLIT.PassType = 'gold' THEN Passengers.Passengers
ELSE PASSCODESPLIT.NrOfPasses
END AS NrOfPasses,
PASSCODESPLIT.PassType,
Passengers.Passengers
FROM PASSCODESPLIT
INNER JOIN Passengers ON PASSCODESPLIT.ID = Passengers.ID
ORDER BY PASSCODESPLIT.ID ASC
Gives the result of:
ID NROFPASSES PASSTYPE PASSENGERS
100 2 bronze 2
101 5 gold 5
102 1 silver 1
103 2 steel 3
As can also be seen in this fiddle
But I would strongly advise you to fix your table design. Having multiple attributes in the same column leads to troubles like these. And the more variables/variations you start storing, the more 'magic' you need to keep doing.
In this particular example i see no reason why you don't simply have the 3 columns in Passes, also giving you the opportunity to add new columns going forward. I.e. to keep track of First class.
You can extract the numbers using regexp_substr(). So I think this does what you want:
SELECT (CASE WHEN p.pass_code LIKE '%gold%'
THEN TO_NUMBER(REGEXP_SUBSTR(p.pass_code, '^[0-9]+'))
ELSE pp.passengers
END) as num,
p.ID, p.Pass_code, pp.Passengers
FROM Passes p JOIN
Passengers pp
ON p.ID = pp.ID;
Here is a db<>fiddle.
This converts the leading digits in the code to a number. Also note the use of table aliases to simplify the query.

SQL LIKE using the same row value

I'm wondering how can I use a row value as a variable for my like statement? For example
ID | PID | DESCRIPTION
1 | 4124 | Hi4124
2 | 2451 | Test
3 | 1467 | Hello
4 | 9642 | Me9642
I have a table above, I want to return IDs 1 and 4 since DESCRIPTION contains PID.
I'm thinking it would be SELECT * from TABLE WHERE DESCRIPTION LIKE '%PID%' but I can't get it.
You can use CONCAT() to assemble the matching pattern, as in:
select *
from t
where description like concat('%', PID, '%')
We could also try using CHARINDEX here:
SELECT ID, PID, DESCRIPTION
FROM yourTable
WHERE CHARINDEX(PID, DESCRIPTION) > 0;
Demo
Note that I assume in the demo that the PID column is actually text, and not a numeric column. If PID be numeric, we might have to first use a cast in order to use CHARINDEX (or any of the methods given in the other answers).
Use the CONCAT SQL function
SELECT *
FROM TABLE
WHERE DESCRIPTION LIKE CONCAT('%', PID, '%')

In proc sql when using SELECT * and GROUP BY, the result is not collapsed

When using the asterisk in combination with sum and group, the duplicates are not removed as I expect (and as it works in for example mysql):
col1 | country
-----------------
5 | sweden
20 | sweden
30 | denmark
select *, sum(col1) as s from table
group by country
the data returned is:
col1 | country | s
--------------------
5 | sweden | 25
20 | sweden | 25
30 | denmark | 30
instead of what I expected:
col1 | country | s
------------------------
5 | sweden | 25
30 | denmark | 30
If I don't use asterisk (*), the data returned is as I expect it to be.
SELECT country, sum(col1) as s from table
You are correct, SAS does not collapse WHEN you have variables in the statement that are not in the GROUP BY statement.
There will be a note to that effect in the log, about your data being merged.
If you want just the variables, you'll have to list them unfortunately, but since you have to list them in GROUP BY it's not extra work per se.
Different SQL implementations handle things differently, this is one way that SAS is different. It's handy when you do want to merge a summary stat back with the main data set though.
If you don't want this behaviour add the NOREMERGE option to your PROC SQL - but it throws an error, it still doesn't work the way you want.
See the documentation for the reference
Don't use SELECT *, ever. It's bad practice, risky, unsustainable... Read about it.
What flavor of SQL?
Your first query shouldn't work. You're basically saying...
select col1
, country
, sum(col1) as s
from table
group by country
...which will return an error:
Column 'table.col1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
SELECT country, sum(col1) as s from table
...also should not work:
Column 'table.country' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Given your expected output, I suspect what you are looking for is...
select min(col1) as col1
, country
, sum(col1) as s
from table
group by country

SQL multipart messages in two tables. Doesn't work if second table is empty

I am working on a query that will fetch multipart messages from 2 tables. However, it only works IF there are multiple parts. If there is only a one part message then the the join condition won't be true anymore. How could I make it to work for both single and multipart messages?
Right now it fails if there is an entry in outbox and nothing in outbox_multipart.
My first table is "outbox" that looks like this.
TextDecoded | ID | CreatorID
Helllo, m.. | 123 | Martin
Yes, I wi.. | 124 | Martin
My second table is "outbox_multipart" that looks very similar.
TextDecoded | ID | SequencePosition
my name i.. | 123 | 2
s Martin. | 123 | 3
ll do tha.. | 124 | 2
t tomorrow. | 124 | 3
My query so far
SELECT
CONCAT(ob.TextDecoded,
GROUP_CONCAT(obm.TextDecoded
ORDER BY obm.SequencePosition ASC
SEPARATOR ''
)
) AS TextDecoded,
ob.ID,
ob.creatorID
FROM outbox AS ob
JOIN outbox_multipart AS obm ON obm.ID = ob.ID
GROUP BY
ob.ID,
ob.creatorID
Use a left join instead of an (implicit) inner join. Then, also use COALESCE on the TextDecoded alias to make sure that empty string (and not NULL) appears in the expected output.
SELECT
CONCAT(ob.TextDecoded,
COALESCE(GROUP_CONCAT(obm.TextDecoded
ORDER BY obm.SequencePosition
SEPARATOR ''), '')) AS TextDecoded,
ob.ID,
ob.creatorID
FROM outbox AS ob
LEFT JOIN outbox_multipart AS obm
ON obm.ID = ob.ID
GROUP BY
ob.ID,
ob.creatorID,
ob.TextDecoded;
Note: Strictly speaking, outbox.TextDecoded should also appear in the GROUP BY clause, since it is not an aggregate. I have made this change in the query.

SQL Server Best match query with update (T-SQL)

I am trying to find out what's the most optimized SQL Query to achieve the following.
I have a table containing ZipCodes/PostalCodes, let's assume the following structure:
table_codes:
ID | ZipCode
---------------
1 1234
2 1235
3 456
and so on.
The users of my application fill up a profile where they are required to enter their ZipCode (PostalCode).
Assuming that sometimes, the user will enter a ZipCode not defined in my table, I am trying to suggest a Best Match based on the zip entered by the user.
I am using the following query:
Declare #entered_zipcode varchar(10)
set #entered_zipcode = '23456'
SELECT TOP 1 table_codes.ZipCode
FROM table_codes
where #entered_zipcode LIKE table_codes.ZipCode + '%'
or table_codes.ZipCode + '%' like #entered_zipcode + '%'
ORDER BY table_codes.ZipCode, LEN(table_codes.ZipCode) DESC
Basically, I am trying the following:
if the #entered_zipcode is longer than any zip code in the table, I am trying to get to get the best prefix in the zip table matching the #entered_zipcode
if the #entered_zipcode is shorter than any existing code in the table, I am trying to use it as a prefix and get the best match in the table
Moreover, I am building a temp table with the following structure:
#tmpTable
------------------------------------------------------------------------------------
ID | user1_enteredzip | user1_bestmatchzip | user2_enteredzip | user2_bestmatchzip |
------------------------------------------------------------------------------------
1 | 12 | *1234* | 4567 | **456** |
2 |
3 |
4 |
Entered zip is the one the user enters and the code between * .. * is the best matching code from my lookup table, that I am trying to get using the query below.
The query seems to take a little bit to long and this is why I am asking for help in optimizing it:
update #tmpTable
set user1_bestmatchzip = ( SELECT TOP 1
zipcode
FROM table_codes
where #tmpTable.user1_enteredzip LIKE table_codes.zipcode + '%'
or table_codes.zipcode + '%' like #tmpTable.user1_enteredzip + '%'
ORDER BY table_codes.zipcode, LEN(table_codes.zipcode) DESC
),
user2_bestmatchzip = ( SELECT TOP 1
zipcode
FROM table_codes
where #tmpTable.user2_enteredzip LIKE table_codes.zipcode + '%'
or table_codes.zipcode + '%' like #tmpTable.user2_enteredzip + '%'
ORDER BY table_codes.zipcode, LEN(table_codes.zipcode) DESC
)
from #tmpTable
What if you change your temp table to be like:
id | user | enteredzip | bestmatchzip
10 | 1 | 12345 | 12345
20 | 2 | 12 | 12345
That is: use a column to save the user number (1 or 2). This way you will update one row at a time.
Also, the ORDER BY takes time, did you set indices on the zipcode? Couldn't you create a field "length" in the zipcodes table to pre-compute the zipcodes lenghts?
EDIT:
I was thinking that ordering by LEN makes no sense, you could remove that! If the zipcodes cannot have duplicates, then ordering by the zipcode is just enought. If they can though, the LEN will always be equal!
You are comparing first characters of both strings - what if you compare substrings of minimal length?
select top 1 zipcode
from table_zipcodes
where substring(zipcode, 1, case when len(zipcode) > len (#entered_zipcode) then len(#entered_zipcode) else len (zipcode) end)
= substring (#entered_zipcode, 1, case when len(zipcode) > len (#entered_zipcode) then len(#entered_zipcode) else len (zipcode) end)
order by len (zipcode) desc
This will remove OR and allow for usage of index *in_#entered_zipcode LIKE table_codes.ZipCode + '%'*. Also, it seems to me that the ordering of results is wrong - shorter zipcodes go first.