Finding and counting text in a text field with PSQL - sql

In a postgres db, I need to find, extract and count URLs embedded in a text column. (Pseudocode)
SELECT id,
body,
xxx? AS the_url,
COUNT(DISTINCT(the_url)) as count
FROM messages
WHERE body LIKE '%://%'
GROUP BY the_url;
How do I accomplish that?

You can use a FILTER combined with count(*) to count how many records in your table contain a certain pattern, e.g.
SELECT count(*) FILTER (WHERE body LIKE '%://%')
FROM messages;
If you want to count the occurrences of a certain text inside a column you might wanna try this:
SELECT id,body,
array_length(string_to_array(body,'%://%'),1)-1
FROM messages;
Demo: db<>fiddle

Related

SQL how to display the value that appears more than once just once

Example: If I have a table called "list" and have more columns like (id, game and genre) and I want to display a list of all the genres that games have and if a genre appears more than once I want to display it just once. (Need the SQL code) I've tried with COUNT but it doesn't work.
Example Table list:
DISTINCT * will not work here, you need DISTINCT only :
SELECT DISTINCT l.genre
FROM list l;
*denotes all columns & in your sample data all columns data are not same.
So, you need only columns that are duplicate & need to show in SELECT statement.
try to use "distinct in sql"
list -> your table name
genre -> the column you wish to work on
SELECT DISTINCT genre FROM list;
A solution using COUNT(), which you have tried, but 'did not work':
SELECT genre
FROM list
GROUP BY genre
HAVING count(*)>0

ORDER BY an aggregated column in Report Builder 3.0

On a report builder 3.0, i retreived some items and counted them using a Count aggregate. Now i want to order them from highest to lowest. How do i use the ORDER BY function on the aggregated column? The picture below show the a column that i want to ORDER BY it, it is ticked.
Pic
The code is vers simple as shown bellow:
SELECT DISTINCT act_id,NameOfAct,
FROM Acts
Your picture indicates you also want a Total row at the bottom:
SELECT
COALESCE(NameOfAct,'Total') NameOfAct,
COUNT(DISTINCT act_id) c
FROM Acts
GROUP BY ROLLUP(NameOfAct)
ORDER BY
CASE WHEN NameOfAct is null THEN 1 ELSE 0 END,
c DESC;
Result of example data:
NameOfAct count
-------------- -------
Act_B 3
Act_A 2
Act_Z 1
Total 6
Try it with example rows at: http://sqlfiddle.com/#!18/dbd6c/2
I looked at the Pic. So you might have duplicate acts with the same name. And you want to know the number of acts that have the same unique name.
You might want to group the results by name:
GROUP BY NameOfAct
And include the act names and their counts in the query results:
SELECT NameOfAct, COUNT(*) AS ActCount
(Since the act_id column is not included in the groups, you need to omit it in the SELECT. The DISTINCT is also not necessary anymore, since all groups are unique already.)
Finally, you can sort the data (probably descending to get the acts with the largest count on top):
ORDER BY ActCount DESC
Your complete query would become something like this:
SELECT NameOfAct, COUNT(*) AS ActCount
FROM Acts
GROUP BY NameOfAct
ORDER BY ActCount DESC
Edit:
By the way, you use field "act_id" in your SELECT clause. That's somewhat confusing. If you want to know counts, you want to look at either the complete table data or group the table data into smaller groups (with the GROUP BY clause). Then you can use aggregate functions to get more information about those groups (or the whole table), like counts, average values, minima, maxima...
Single record information, like an act's ID in your case, is typically not important if you want to use statistic/aggregate methods on grouped data. Suppose your query returns an act name which is used 10 times. Then you have 10 records in your table, each with a unique act_id, but with the same name.
If you need just one act_id that represents each group / act name (and assuming act_id is an autonumbering field), you might include the latest / largest act_id value in the query using the MAX aggregate function:
SELECT NameOfAct, COUNT(*) AS ActCount, MAX(act_id) AS LatestActId
(The rest of the query remains the same.)

Combine multiple date fields into one on query

I have a requirement to create a report that counts a total from 2 date fields into one. A simplified example of the table I'm querying is:
ID, FirstName, LastName, InitialApplicationDate, UpdatedApplicationDate
I need to query the two date fields in a way that creates similar output to the following:
Date | TotalApplications
I would need the date output to include both InitialApplicationDate and
UpdatedApplicationDate fields and the TotalApplications output to be a count of the total for both types of date fields. Originally I thought maybe a Union would work however that returns 2 separate records for each date. Any ideas how I might accomplish this?
The simplest way, I think, is to unpivot using apply and then aggregate:
select v.thedate, count(*)
from t cross apply
(values (InitialApplicationDate), (UpdatedApplicationDate)) v(thedate)
group by v.thedate;
You might want to add where thedate is not null if either column could be NULL.
Note that the above will count the same application twice, once for each date. That appears to be your intention.

SELECT USING COUNT in mysql

I have a very large database with about 120 Million records in one table.I have clean up the data in this table first before I divide it into several tables(possibly normalizing it). The columns of this table is as follows: "id(Primary Key), userId, Url, Tag " . This is basically a subset of the dataset from delicious website. As I said, each row has an id, userID a url and only "one" tag. So for example a bookmark in delicious website is composed of several tags for a single url, this corresponds to several lines of my database. for example:
"id"; "user" ;"url" ;"tag"
"38";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"ajax"
"39";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"api"
"40";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"javascript"
"41";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"library"
"42";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"rails"
I need a query to count the number of times that a tag is used for a url.
Thank you for you help
This query should work for you:
SELECT tag, url, count(tag) FROM table GROUP BY tag, url
Haven't tested it for you though.
Is this what you are looking for?
SELECT COUNT(tag) FROM TABLENAME
WHERE tag='sometag'
I think it's actually more like SELECT tag, COUNT(tag) FROM TABLENAME WHERE URL='someurl' GROUP BY tag

distinct sql query

I have a simple table with just name and email called name_email.
I am trying to fetch data out of it so that:
If two rows have the same name, but one has an email which is ending with ‘#yahoo.com’ and the other has a different email, then the one with the ‘#yahoo.com’ email should be discarded.
what would be best way to get this data out?
Okay, I'm not going to get involved in yet another fight with those who say I shouldn't advocate database schema changes (yes, you know who you are :-), but here's how I'd do it.
1/ If you absolutely cannot change the schema, I would solve it with code (either real honest-to-goodness procedural code outside the database or as a stored procedure in whatever language your DBMS permits).
This would check the database for a non-yahoo name and return it, if there. If not there, it would attempt to return the yahoo name. If neither are there, it would return an empty data set.
2/ If you can change the schema and you want an SQL query to do the work, here's how I'd do it. Create a separate column in your table called CLASS which is expected to be set to 0 for non-yahoo addresses and 1 for yahoo addresses.
Create insert/update triggers to examine each addition or change of a row, setting the CLASS based on the email address (what it ends in). This guarantees that CLASS will always be set correctly.
When you query your table, order it by name and class, and only select the first row. This will give you the email address in the following preference: non-yahoo, yahoo, empty dataset.
Something like:
select name, email
from tbl
where name = '[name]'
order by name, class
fetch first row only;
If your DBMS doesn't have an equivalent to the DB2 "fetch first row only" clause, you'll probably still need to write code to only process one record.
If you want to process all names but only the specific desired email for that name, a program such as this will suffice (my views on trying to use a relational algebra such as SQL in a procedural way are pretty brutal, so I won't inflict them on you here):
# Get entire table contents sorted in name/class order.
resultSet = execQuery "select name, email from tbl order by name, class"
# Ensure different on first row
lastName = resultSet.value["name"] + "X"
# Process every single row returned.
while not resultSet.endOfFile:
# Only process the first in each name group (lower classes are ignored).
if resultSet.value["name"] != lastName:
processRow resultSet.value["name"] resultSet.value["email"]
# Store the last name so we can detect next name group.
lastName = resultSet.value["name"]
select ne.*
from name_email ne
where ne.email not like '%#yahoo.com' escape '\' or
not exists(
select 1 from name_email
where name = ne.name and
email not like '%#yahoo.com' escape '\'
)
You could use something like the following to exclude invalid email addresses:
SELECT name, email
FROM name_email
WHERE email NOT LIKE '%#yahoo.com' // % symbol is a wildcard so joe#yahoo.com and guy#yahoo.com both match this query.
AND name = 'Joe Guy';
Or do it like this to include only the valid email address or domain:
SELECT name, email
FROM name_email
WHERE email LIKE '%#gmail.com'
AND name = 'Joe Guy';
This works well if you know ahead of time what specific names you are querying for and what email addresses or domains you want to exclude or include.
Or if you don't care which email address you return but only want to return one, you could use something like this:
SELECT DISTINCT (name, email)
FROM name_email;
You could do
SELECT TOP 1 email
FROM name_email
WHERE name = 'Joe Guy'
ORDER BY case when email like '%yahoo.com' then 1 else 0 end
So sort them by *#yahoo.com last and anything else first, and take the first one.
EDIT: sorry, misread the question - you want a list of each name, with only one email, and a preference for non-yahoo emails. Probably can use the above along with a group by, I'll have to rethink it.
Grabbing all the rows from the database, knowing not what the names are (and not needing to care about that really), but just want them to show, and if matching, skip a match if the email contains, in this case, #yahoo.com
SELECT DISTINCT name, email FROM name_email
WHERE email NOT LIKE '%#yahoo.com'
GROUP BY name;
Doing that will grab all the rows, but only one of a record if the names match with another row. But then, if there are two rows with matching names, junk the one with #yahoo.com in the email.
Not very pretty, but I believe it should work
select
ne.name
,ne.email
from
name_email ne
inner join (
select
name
,count(*) as emails_per_name
from
name_email
group by name
) nec
on ne.name = nec.name
where
nec.emails_per_name = 1
or (nec.emails_per_name > 1 and ne.email not like ('%#yahoo.com'))
That is assuming that the duplicate emails would be in yahoo.com domain - as specified in your question, and those would be excluded if there is more than one email per name
If you are working with SQL Server 2005 or Oracle, you can easily solve your problem with a ranking (analytical) function.
select a.name, a.name_email
from (select name, name_email,
row_number() over (partition by name
order by case
when name_email like '%#yahoo.com' then 1
when name_email like '%#gmail.com' then 1
when ... (other 'generic' email) then 1
else 0
end) as rn) as a
where a.rn = 1
By assigning different values to the various generic email names you can even have 'preferences'. As it is written here, if you have both a yahoo and a gmail address, you can't predict which one will be picked up.
You could use a UNION for this. Select everything without the yahoo.com and then just select the records that have yahoo.com and is not in the first list.
SELECT DISTINCT (name, name_email) FROM TABLE
WHERE name_email NOT '%yahoo.com'
UNION
SELECT DISTINCT (name, name_email) FROM TABLE
WHERE name NOT IN (SELECT DISTINCT (name, name_email) FROM TABLE
WHERE name_email NOT '%yahoo.com')