SQL Count occurrences of non-unique column - sql

Suppose I have a SQL table looking something like this:
--------------------
| id| name|
--------------------
| 1| Alice|
| 2| Bob|
| 3| Alice|
| 4| Alice|
| 5| Jeff|
| ...| ...|
--------------------
Is it possible to formulate a query which returns a list of names and the number of times they occur? I've made a solution to this by querying all the rows, removing duplicates counting and then ordering; which works, but just looks messy. Can this be neatened up in a SQL query?

This is standard SQL and should deliver your expected result:
select name, count(*)
from tblName
group by name
order by name
If you want to order by the count in descending order, you can use:
select name, count(*)
from tblName
group by name
order by 2 DESC

Related

SQL delete duplicate rows based on multiple fields

I have the following table in sql:
id | trip_id | stop_id | departure_time
----------------------------------------
1 | 1| 1| 06:25:00
2 | 1| 2| 06:35:00
3 | 1| 3| 06:45:00
4 | 1| 2| 06:55:00
What I need to do is identify where a trip_id as multiple instances of a certain stop_id (in this case stop_id 2).
I then need to delete any duplications leaving only the one with the latest departure time.
So given the above table Id delete the row with id 2 and be left with:
id | trip_id | stop_id | departure_time
----------------------------------------
1 | 1| 1| 06:25:00
3 | 1| 3| 06:45:00
4 | 1| 2| 06:55:00
I have managed to do this with a series of sql queries but I hit the N+1 issue and it takes ages.
Can anyone recommend a way I may be able to do this in one query? Or at the very least identify all the ids of rows that need deleting in one query?
Im doing this in ruby on rails so if you prefer an active record solution I wouldn't hate you for it :)
Thanks in advance.
You may try the following logic:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.trip_id = t1.trip_id AND
t2.stop_id = t1.stop_id AND
t2.departure_time > t1.departure_time);
In plain English, this says to scan your entire table, and delete any record for which we can find another record with an identical trip_id and stop_id, where the departure time is also greater than that of the record being considered for deletion. If we find such a match, then it is a duplicate according to your definition.
You can try below way -
DELETE FROM tablename
WHERE id in
(
select id from
(select *, row_number() over(partition by stop_id order by departure_time desc) as rn from tablename)aa
)A where rn>1
try like below
DELETE FROM table a
WHERE a.ctid <> (SELECT max(b.ctid)
FROM table b
WHERE a.stop_id = b.stop_id)

Sql server select query of with ids, Count of ids grouped by casted date from datetime

i am struggling to find a right way to write as select query that produces a count of ids with unique date, i have Log table as
id| DateTime
1|23-03-2019 18:27:45|
1|23-03-2019 18:27:45|
2|23-03-2019 18:27:50|
2|23-03-2019 18:27:51|
2|23-03-2019 18:28:01|
3|23-03-2019 18:33:15|
1|24-03-2019 18:13:18|
2|23-03-2019 18:27:12|
2|23-03-2019 15:27:46|
3|23-03-2019 18:21:58|
3|23-03-2019 18:21:58|
4|24-03-2019 10:11:14|
What i have am tried
select id, count(cast(DateTime as DATE)) as Counts from Logs group by id
its producing proper count of ids with id like
id|count
1 | 2|
2 | 3|
3 | 1|
1 | 1|
2 | 2|
3 | 2|
4 | 1|
What i want is to add datetime column casted as date
id|count|Date
1 | 2| 23-03-2019
2 | 3| 23-03-2019
3 | 1| 23-03-2019
1 | 1| 24-03-2019
2 | 2| 24-03-2019
3 | 2| 24-03-2019
4 | 1| 24-03-2019
However i get an error saying
Column 'Logs.DateTime' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
when i try
select id, count(cast(DateTime as DATE)) as Counts from Logs group by id
You need to add cast(DateTime as DATE) also in group by
select id,cast(DateTime as DATE) as dateval, count(cast(DateTime as DATE)) as Counts
from Logs
group by id,cast(DateTime as DATE)

SQL: Merge localized version of a table to the main one

Imagine I have a main table like:
Table guys
|id| name|profession|
|--|------|----------|
| 1| John| developer|
| 2| Mike| boss|
| 3| Roger| fireman|
| 4| Bob| policeman|
I also have a localized version which is not complete (the boss is missing):
Table guys_bg
|id| name | profession|
|--|------|-----------|
| 1| Джон|разработчик|
| 3|Роджър| пожарникар|
| 4| Боб| полицай|
I want to prioritize guys_bg results while still showing all the guys (The boss is still a guy, right?).
This is the desired result:
|id| name | profession|
|--|------|-----------|
| 1| Джон|разработчик|
| 2| Mike| boss|
| 3|Роджър| пожарникар|
| 4| Боб| полицай|
Take into consideration that both tables may have a lot of (100+) columns so joining the tables and using CASE for every column will be very tedious.
What are my options?
Here is one way using union all:
select gb.*
from guys_bg gb
union all
select g.*
from guys g
where not exists (select 1 from guys_bg gb where gb.id = g.id);
You can also make it with using FULL JOIN.
SELECT
ISNULL(b.id,g.id) id
, ISNULL(b.name, g.name) name
, ISNULL(b.profession, g.profession) profession
FROM
guys g
FULL JOIN guys_bg b ON g.id = b.id

SQL query for finding the most frequent value of a grouped by value

I'm using SQLite browser, I'm trying to find a query that can find the max of each grouped by a value from another column from:
Table is called main
| |Place |Value|
| 1| London| 101|
| 2| London| 20|
| 3| London| 101|
| 4| London| 20|
| 5| London| 20|
| 6| London| 20|
| 7| London| 20|
| 8| London| 20|
| 9| France| 30|
| 10| France| 30|
| 11| France| 30|
| 12| France| 30|
The result I'm looking for is the finding the most frequent value grouping by place:
| |Place |Most Frequent Value|
| 1| London| 20|
| 2| France| 30|
Or even better
| |Place |Most Frequent Value|Largest Percentage|2nd Largest Percentage|
| 1| London| 20| 0.75| 0.25|
| 2| France| 30| 1| 0.75|
You can group by place, then value, and order by frequency eg.
select place,value,count(value) as freq from cars group by place,value order by place, freq;
This will not give exactly the answer you want, but near to it like
London | 101 | 2
France | 30 | 4
London | 20 | 6
Now select place and value from this intermediate table and group by place, so that only one row per place is displayed.
select place,value from
(select place,value,count(value) as freq from cars group by place,value order by place, freq)
group by place;
This will produce the result like following:
France | 30
London | 20
This works for sqlite. But for some other programs, it might not work as expected and return the place and value with least frequency. In those, you can put order by place, freq desc instead to solve your problem.
The first part would be something like this.
http://sqlfiddle.com/#!7/ac182/8
with tbl1 as
(select a.place,a.value,count(a.value) as val_count
from table1 a
group by a.place,a.value
)
select t1.place,
t1.value as most_frequent_value
from tbl1 t1
inner join
(select place,max(val_count) as val_count from tbl1
group by place) t2
on t1.place=t2.place
and t1.val_count=t2.val_count
Here we are deriving tbl1 which will give us the count of each place and value combination. Now we will join this data with another derived table t2 which will find the max count and we will join this data to get the required result.
I am not sure how do you want the percentage in second output, but if you understood this query, you can use some logic on top of it do derive the required output. Play around with the sqlfiddle. All the best.
RANK
SQLite now supports RANK, so we can use the exact same syntax that works on PostgreSQL, similar to https://stackoverflow.com/a/12448971/895245
SELECT "city", "value", "cnt"
FROM (
SELECT
"city",
"value",
COUNT(*) AS "cnt",
RANK() OVER (
PARTITION BY "city"
ORDER BY COUNT(*) DESC
) AS "rnk"
FROM "Sales"
GROUP BY "city", "value"
) AS "sub"
WHERE "rnk" = 1
ORDER BY
"city" ASC,
"value" ASC
This would return all in case of tie. To return just one you could use ROW_NUMBER instead of RANK.
Tested on SQLite 3.34.0 and PostgreSQL 14.3. GitHub upstream.

Retrieving rows that share multiple ID's in SQL

I am stuck on how to narrow down a selection of rows that are related by multiple ID's. Here is my problem with the data as follows:
|Widget | |Widget Category | |Part Category | |Part |
+---------+ +--------------------+ +--------------+ +-------------+
|Id|Name | |WidId|CatId|CatName | |PartId| CatId | |Id|Name |
+---------+ +-----+-----+--------+ +------+-------+ +--+----------+
| 1|item01| | 1| 1|Windows | | 1| 1| | 1|Glass |
| 2|item02| | 2| 1|Windows | | 1| 2| | 2|Door Frame|
| 3|item03| | 3| 1|Windows | | 2| 2| | 3|Wheel |
| 4|item04| | 1| 2|Door | | 4| 2| | 4|Handle |
| 5|item05| | 5| 2|Door |
| 6|item06| | 6| 3|Trunk |
One or more widgets can be in a Widget Category. Many widget categories can have many part Categories. Many Parts can be part of many part categories. I need to know what Parts are linked to what Widgets. So we know that Item01 has parts "Glass" and Item05 has Parts "Glass, Door Frame, and Handle".
Here is my SQL I have so far but I need it to be dynamic so it can run once a week on a stored procedure.
---- This gives me the Correct number of Widgets to Parts based on set of 2 category ID's as a quick and static hack
SELECT W.Id
FROM Widget W
INNER JOIN dbo.[WidgetCategory] WC1 ON WC1.WidId = W.Id
INNER JOIN dbo.[WidgetCategory] WC2 ON WC2.WidId = W.Id
WHERE WC1.CatId = 1 AND WC2.CatId = 2
GROUP BY W.Id
The reason for the above query is to get a table structure that is grouped by PartId's to WidgetId's as an intersection of the two related categories and all the widgets that are related to parts. The below table is what I am trying to get so that I can aggregate how many widgets are in a part (COUNT(WidId) GROUP BY PartId):
|WidId|PartId|WidgetName|
+-----+------+----------+
| 1| 1| Item01|
| 2| 1| Item02|
| 3| 1| Item03|
| 1| 2| Item01|
| 5| 2| Item05|
Updated question: How can I get this response from the tables above with only returning the intersection of the two categories?
|WidId|PartId|WidgetName|
+-----+------+----------+
| 1| 1| Item01|
| 1| 2| Item01|
Any help would be greatly appreciated! Sorry for the sloppiness, had to post quickly before I left for weekend.
EDIT: Sorry, about the ProductId, was left over from some SQL that I was using. Should be Widget Id. Added more clarity to the problem and added an addition problem I was having.
I think you need a query like this.
SELECT DISTINCT w.WidId, p.ParId, w.Name
FROM Widget w
JOIN WidgetCategory wc ON wc.WidId=w.Id
JOIN PartCategory pc ON pc.CatId=wc.CatId
JOIN Part p ON p.Id=pc.ParId
I don't see why you would need to join twice on the WidgetCategory table. What you need is to reach the Part table by joining the PartCategory table.
And why are you grouping? If you want all the parts, then you can't group, unless you use some specific SQL feature to concatenate all the parts in a single row. This may or may not be possible, depending on which database engine you are using.
I added the DISTINCT, just in case you have more than one ways to get from Widget X to Part Y... that is enough to remove duplicates. There is no need for a GROUP BY unless you need to COUNT or do something else with the aggregation.