How to group years in decades in sqlite3 in jupyter notebook?

How to group years in decades in sqlite3 in jupyter notebook? - sql

I'm suppose to Find the decade D with the largest number of films and the total number of films in D. A decade is a sequence of 10 consecutive years. For example, say in your database you have movie information starting from 1965. Then the first decade is 1965, 1966, ..., 1974; the second one is 1967, 1968, ..., 1976 and so on.
I'm suppose to implement this in jupyter note book where I imorpted sqlite3
I wrote the following code for it.
Select count(*) as total_films,concat(decade,'-',decade+9)
FROM (Select floor(YEAR('year')/10)*10 as decade FROM movie) t
GROUP BY decade
Order BY total_films desc;
However, the notebook threw error like "no such function: floor" and "no such function: Year" and no such function: concat"
Therefore, after going through sqlite documentation I changed code to
Select count(*) as total_films,decade||'-'||decade+9
FROM (Select cast(strftime('%Y',year)/10 as int)*10 as decade FROM movie) t
GROUP BY decade
Order BY total_films desc;
However, I got an incorrect output :
count(*) decade||'-'||decade+9
0 117 NaN
1 3358 -461.0
Would appreciate insights on why this is happening.
Updating question after going through comments by c.Perkins
1) I began, checking the type of year column
using the query PRAGMA table_info(movie)
Got the following result
cid name type notnull dflt_value pk
0 0 index INTEGER 0 None 0
1 1 MID TEXT 0 None 0
2 2 title TEXT 0 None 0
3 3 year TEXT 0 None 0
4 4 rating REAL 0 None 0
5 5 num_votes INTEGER 0 None 0
Since the year column is of the type text I changed to int using the cast function and check for nulls or NaN SELECT CAST(year as int) as yr FROM MOVIE WHERE yr is null
I didn't get any results, therefore it appears there are no nulls. However, on using the query SELECT CAST(year as int) as yr FROM MOVIE order by yr asc I see a lot of zeros in the year column
yr
0 0
1 0
2 0
3 0
4 0
-
-
-
-
3445 2018
3446 2018
3447 2018
3448 2018
3449 2018
3450 2018
From the above we see that the year is given as it is and in another stamp, therefore using strftime('%Y', year) did not yield result as mentioned in the comment.
Therefore, keeping all the above in mind, I changed the inner query to
SELECT (CAST( (year/10) as int) *10) as decade FROM MOVIE WHERE decade!=0 order by decade asc
Output for the above query :
decade
0 1930
1 1930
2 1930
3 1930
4 1930
5 1930
6 1940
7 1940
8 1940
-
-
-
3353 2010
3354 2010
3355 2010
3356 2010
3357 2010
Finally, placing this inner query, in the first query I wrote above
Select count(*) as total_films,decade||'-'||decade+9 as period
FROM (SELECT (CAST( (year/10) as int) *10) as decade FROM MOVIE WHERE decade!=0 order by decade asc)
GROUP BY decade
Output :
total_films period
0 6 1939
1 12 1949
2 71 1959
3 145 1969
4 254 1979
5 342 1989
6 551 1999
7 959 2009
8 1018 2019
As far as I can see the only issue is with period column where instead of showing 1930-1939 it is only showing 1939 and so on, if is using || is not right, is there anythother function that could be used ? because concat is not working.
Thanks in advance.

Pending updates to the question as requested in comments, here are a few immediate points that might help solve the problem without having all details:
Does the movie.year column contain null values? Likewise non-numeric or non-date values? the NaN (Not A Number) result likely indicates a null/invalid data in the source. (Technically there is no such NaN value in SQLite, so I'm assuming that the question data is copied form some other data grid or processed output.)
What type of data is in the column movie.year? Does it contain full ISO-8601 date strings or a Julian-date numeric value? Or does it only contain the year (as the column name implies)? If it only contains the year (as a string or integer), then the function call like strftime('%Y', year) will NOT return what you expect and is unnecessary. Just refer to the column directly.
I suspect this is where the -461.0 is coming from.
The operator / is an "integer division" operator if both operands are integers. A valid isolated year value will be an integer and the literal 10 is of course an integer, so integer division will automatically drop any decimal part and returns only the integer part of the division without having to explicitly cast to integer.
According to sqlite docs, the concatenation operator || has highest precedence. That means in the expression decade||'-'||decade+9, the concatenation is applied first so that one possible intermediate would be '1930-1930'+9. (Technically I would consider this result undefined since the string value does not contain a basic data type. In practices on my system, the string is apparently interpreted as 1930 and the overall result is the integer value 1939. Either way you will get unexpected bogus results rather than the desired string.)

Related

Calculating Column value based on row above and previous column [duplicate]

This question already has answers here:
How to calculate Running Multiplication
(4 answers)
Closed 6 months ago.
I have a table I'm trying to create that has a column that needs to be calculated based on the row above it multiplied by the previous column. The first row is defaulted to 100,000 and the rest of the rows would be calculated off of that. Here's an example:
Age
Population
Deaths
DeathRate
DeathPro
DeathProb
SurvivalProb
PersonsAlive
0
1742
0
0
0.1
0
1
100,000
51
2048
1
0.00048
0.5
0.00048
0.99951
99951.18379
52
1921
0
0
0.5
0
1
99951.18379
61
1965
1
0.00051
0.5
0.00051
0.99949
99900.33
I skipped some ages so I didn't have type it all in there, but the ages go from 0 - 85. This was orginally done in excel and the formula for PersonsAlive (which is what I'm trying to recreate) was G3*H2 aka previous value of PersonsAlive * Survival Probability.
I was thinking I could accomplish this with the lag function, but with the example I provided above, I get null values for everything after age 1 because there is no value in the previous row. What I want to happen is that PersonsAlive returns 100,000 until I get a death (in the example at Age 51) and then it does the calculation and returns the value (99951) until another death happens (Age 61). Here's my code, which includes two extra columns, ZipCode (the reason we want to do it in SQL is so we can calculate all zips at once) and PersonsAliveTemp, which I used to set Age 0 to 100,000:
SELECT
ZipCode
,Age
,[Population]
,Deaths
,DeathRate
,Death_Proportion
,DeathProbablity
,SurvivalProbablity
,PersonsAliveTemp
,(LAG(PersonsAliveTemp,1) OVER(PARTITION BY ZipCode ORDER BY Age))*SurvivalProbablity as PersonsAlive
FROM #temp4
I also tried it with defaulting PersonsAliveTemp to 100,000 and 0, which "works" but doesn't do the running calculation.
Is it possible to get the lag function (or some other function) to do a running row by row calc?

This converts a running product into an addition via logarithms.
select *,
100000 * exp(sum(log(SurvivalProb)) over
(partition by ZipCode order by Age
rows between unbounded preceding and current row)
) as PersonsAlive
from data
order by Age;
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=36be4d66260c74196f7d36833018682a

InfluxDB v1.8: subquery using MAX selector

I'm using InfluxDB 1.8 and trying to make a little more complex query than Influx was made for.
I want to retrieve all data that refers to the last month stored, based on tag and field values that my script stores (not the default "time" field that Influx creates). Say we have this infos measurement:
time
field_month
field_week
tag_month
tag_week
some_data
1631668119209113500
8
1
8
1
random
1631668119209113500
8
2
8
2
random
1631668119209113500
8
3
8
3
random
1631668119209113500
9
1
9
1
random
1631668119209113500
9
1
9
1
random
1631668119209113500
9
2
9
2
random
Which 8 refers to August, 9 to September, and then some_data that is stored on a given week of that month.
I can use MAX selector at field_month to get the last month of the year stored (can't use Flux date package because I'm using v1.8). Further, I want the data grouped by tag_month and tag_week so I can COUNT how many times some_data was stored on each week of the month, that's why the same data is repeated in field and tag keys. Something like that:
SELECT COUNT(field_month) FROM infos WHERE field_month = 9 GROUP BY tag_month, tag_week
Replacing 9 by MAX Selector:
SELECT COUNT(field_month) FROM infos WHERE field_month = (SELECT MAX(field_month) FROM infos) GROUP BY tag_month, tag_week
The first query works (see results here), but not the second.
Am I doing something wrong? Is there any other possibility to make this work in v1.8?
NOTE: I know Influx wasn't supposed to be used like that. I've tried and managed this easily with PostgreSQL, using an adapted form of the second query above. But while we straighten things up to use Postgres, we have to use InfluxDB v1.8.

in postgresql you can try :
SELECT COUNT(field_month) FROM infos WHERE field_month =
(SELECT field_month FROM infos ORDER BY field_month DESC limit 1)
GROUP BY tag_month, tag_week;

Restricting a SQL query so that any particular value in a certain column can only appear 3 times in the results, with respect to a given ordering

Suppose that I have a table in a SQL database with columns like the ones shown below. The table records various performance metrics of the employees in my company each month.
I can easily query the table so that I can see the best monthly sales figures that my employees have ever obtained, along with which employee was responsible and which month the figure was obtained in:
SELECT * FROM EmployeePerformance ORDER BY Sales DESC;
NAME MONTH SALES COMMENDATIONS ABSENCES
Karen Jul 16 36,319.13 2 0
David Feb 16 35,398.03 2 1
Martin Nov 16 33,774.38 1 1
Sandra Nov 15 33,012.55 4 0
Sandra Mar 16 31,404.45 1 0
Karen Sep 16 30,645.78 2 2
David Feb 16 29,584.81 1 1
Karen Jun 16 29,030.00 3 0
Stuart Mar 16 28,877.34 0 1
Karen Nov 15 28,214.42 1 2
Martin May 16 28,091.99 3 0
This query is very simple, but it's not quite what I want. How would I need to change it if I wanted to see only the top 3 monthly figures achieved by each employee in the result set?
To put it another way, I want to write a query that is the same as the one above, but if any employee would appear in the result set more than 3 times, then only their top 3 results should be included, and any further results of theirs should be ignored. In my sample query, Karen's figure from Nov 15 would no longer be included, because she already has three other figures higher than that according to the ordering "ORDER BY Sales DESC".
The specific SQL database I am using is either SQLite or, if what I need is not possible with SQLite, then MySQL.

In MySQL you can use windows function:
SELECT *
FROM EmployeePerformance
WHERE row_number() OVER (ORDER BY Sales DESC)<=3
ORDER BY Sales DESC
In SQLite window functions aren't available, but you still can count the preceding rows:
SELECT *
FROM EmployeePerformance e
WHERE
(SELECT COUNT(*)
FROM EmployeePerformance ee
WHERE ee.Name=e.Name and ee.Sales>e.Sales)<3
ORDER BY e.Sales DESC

I have managed to find an answer myself. It seems to work by pairing each record up with all of the records from the same person that were equal or greater, and then choosing only the (left) records that had no more than 3 greater-or-equal pairings.
SELECT P.Name, P.Month, P.Sales, P.Commendations, P.Absences
FROM Performance P
LEFT JOIN Performance P2 ON (P.Name = P2.Name AND P.Sales <= P2.Sales)
GROUP BY P.Name, P.Month, P.Sales, P.Commendations, P.Absences
HAVING COUNT(*) <= 3
ORDER BY P.Sales DESC;
I will give the credit to a_horse_with_no_name for adding the tag "greatest-n-per-group", as I would have had no idea what to search for otherwise, and by looking through other questions with this tag I managed to find what I wanted.
I found this question that was similar to mine... Using LIMIT within GROUP BY to get N results per group?
And I followed this link that somebody had included in a comment... https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
...and the answer I wanted was in the first comment on that article. It's perfect as it uses only a LEFT JOIN, so it will work in SQLite.
Here is my SQL Fiddle: http://sqlfiddle.com/#!7/580f0/5/0

Why percentage is not working properly in SQLite3?

I have the following code for the following question however percentage is happening just to be zero:
SELECT p.state, (p.popestimate2011/sum(p.popestimate2011)) * 100
FROM pop_estimate_state_age_sex_race_origin p
WHERE p.age >= 21
GROUP BY p.state;
Also here's the table schema:
sqlite> .schema pop_estimate_state_age_sex_race_origin
CREATE TABLE pop_estimate_state_age_sex_race_origin (
sumlev NUMBER,
region NUMBER,
division NUMBER,
state NUMBER,
sex NUMBER,
origin NUMBER,
race NUMBER,
age NUMBER,
census2010pop NUMBER,
estimatesbase2010 NUMBER,
popestimate2010 NUMBER,
popestimate2011 NUMBER,
PRIMARY KEY(state, age, sex, race, origin),
FOREIGN KEY(sumlev) REFERENCES SUMLEV(sumlev_cd),
FOREIGN KEY(region) REFERENCES REGION(region_cd),
FOREIGN KEY(division) REFERENCES DIVISION(division_cd),
FOREIGN KEY(sex) REFERENCES SEX(sex_cd),
FOREIGN KEY(race) REFERENCES RACE(race_cd),
FOREIGN KEY(origin) REFERENCES ORIGIN(origin_cd));
So when I run the query it just shows 0 for the percentage:
stat p.popestimate
---- -------------
1 0
2 0
4 0
5 0
6 0
8 0
9 0
10 0
11 0
12 0
13 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
I was trying to write it using nested queries by didn't get anywhere too:
SELECT p.state, 100.0 * sum(p.popestimate2011) / total_pop AS percentage
FROM pop_estimate_state_age_sex_race_origin p
JOIN (SELECT state, sum(p2.popestimate2011) AS total_pop
FROM pop_estimate_state_age_sex_race_origin p2) s ON (s.state = p.state)
WHERE age >= 21
GROUP BY p.state, total_pop
ORDER BY p.state;
The current problem I am having is that it just shows one row as result and just shows the result for the last state number (state ID=56):
56 0.131294163192301

Here's an approach (not tested) that does not require an inner query. It makes a single pass over the table, aggregating by state, and using CASE to calculate the numerator of population aged over 20 and denominator of total state population.
SELECT
state,
(SUM(CASE WHEN age >= 21 THEN popestimate2011 ELSE 0) / SUM(popestimate2011)) * 100
FROM pop_estimate_state_age_sex_race_origin
GROUP BY state

I'm not sure why your SQL statement is executing at all. You are including the non-aggregated column value popestimate2011 in a GROUP BY select and that should generate an error.
A closer reading of the SQLite documentation indicates that it does, in fact, support random value selection for non-aggregate columns in the result expression list (a feature also offered by MySQL). This explains:
Why your SELECT statement is able to execute (a random value is chosen for the non-aggregated popestimate2011 reference).
Why you are seeing a result of 0: the random value chosen is probably the first occurring row and if the rows were added to the database in order that row probably has an age value of 0. Since the numerator in your division would then be 0, the result is also 0.
As to the meat of your calculation it's not clear from your table definition whether the data in your base table is already aggregated or not and, if so, what the age column represents (an average? the grouping factor for that row?)
Finally, SQLite does not have a NUMBER data type. These columns will get the default affinity of NUMERIC which is probably what you want but might not be.

You need something along these lines (not tested):
SELECT state, SUM(popestimate2011) /
(SELECT SUM(popestimate2011)
FROM pop_estimate_state_age_sex_race_origin
WHERE age > 21)))
* 100 as percentage
FROM pop_estimate_state_age_sex_race_origi
WHERE age >= 21
GROUP by state
;

The NUMBER type does not exist in SQLite.
SQLite interprets as INTEGER and
decimals are lost in an integer division
(p.popestimate2011 / sum (p.popestimate2011))
is always 0.
Change the type of the column popestimate2011 REAL
or use CAST (...)
(CAST (p.popestimate2011 AS REAL) / SUM (p.popestimate2011))

Count the number of rows that contain a letter/number

What I am trying to achieve is straightforward, however it is a little difficult to explain and I don't know if it is actually even possible in postgres. I am at a fairly basic level. SELECT, FROM, WHERE, LEFT JOIN ON, HAVING, e.t.c the basic stuff.
I am trying to count the number of rows that contain a particular letter/number and display that count against the letter/number.
i.e How many rows have entries that contain an "a/A" (Case insensitive)
The table I'm querying is a list of film names. All I want to do is group and count 'a-z' and '0-9' and output the totals. I could run 36 queries sequentially:
SELECT filmname FROM films WHERE filmname ilike '%a%'
SELECT filmname FROM films WHERE filmname ilike '%b%'
SELECT filmname FROM films WHERE filmname ilike '%c%'
And then run pg_num_rows on the result to find the number I require, and so on.
I know how intensive like is and ilike even more so I would prefer to avoid that. Although the data (below) has upper and lower case in the data, I want the result sets to be case insensitive. i.e "The Men Who Stare At Goats" the a/A,t/T and s/S wouldn't count twice for the resultset. I can duplicate the table to a secondary working table with the data all being strtolower and working on that set of data for the query if it makes the query simpler or easier to construct.
An alternative could be something like
SELECT sum(length(regexp_replace(filmname, '[^X|^x]', '', 'g'))) FROM films;
for each letter combination but again 36 queries, 36 datasets, I would prefer if I could get the data in a single query.
Here is a short data set of 14 films from my set (which actually contains 275 rows)
District 9
Surrogates
The Invention Of Lying
Pandorum
UP
The Soloist
Cloudy With A Chance Of Meatballs
The Imaginarium of Doctor Parnassus
Cirque du Freak: The Vampires Assistant
Zombieland
9
The Men Who Stare At Goats
A Christmas Carol
Paranormal Activity
If I manually lay out each letter and number in a column and then register if that letter appears in the film title by giving it an x in that column and then count them up to produce a total I would have something like this below. Each vertical column of x's is a list of the letters in that filmname regardless of how many times that letter appears or its case.
The result for the short set above is:
A x x xxxx xxx 9
B x x 2
C x xxx xx 6
D x x xxxx 6
E xx xxxxx x 8
F x xxx 4
G xx x x 4
H x xxxx xx 7
I x x xxxxx xx 9
J 0
K x 0
L x xx x xx 6
M x xxxx xxx 8
N xx xxxx x x 8
O xxx xxx x xxx 10
P xx xx x 5
Q x 1
R xx x xx xxx 7
S xx xxxx xx 8
T xxx xxxx xxx 10
U x xx xxx 6
V x x x 3
W x x 2
X 0
Y x x x 3
Z x 1
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 x x 1
In the example above, each column is a "filmname" As you can see, column 5 marks only a "u" and a "p" and column 11 marks only a "9". The final column is the tally for each letter.
I want to build a query somehow that gives me the result rows: A 9, B 2, C 6, D 6, E 8 e.t.c taking into account every row entry extracted from my films column. If that letter doesn't appear in any row I would like a zero.
I don't know if this is even possible or whether to do it systematically in php with 36 queries is the only possibility.
In the current dataset there are 275 entries and it grows by around 8.33 a month (100 a year). I predict it will reach around 1000 rows by 2019 by which time I will be no doubt using a completely different system so I don't need to worry about working with a huge dataset to trawl through.
The current longest title is "Percy Jackson & the Olympians: The Lightning Thief" at 50 chars (yes, poor film I know ;-) and the shortest is 1, "9".
I am running version 9.0.0 of Postgres.
Apologies if I've said the same thing multiple times in multiple ways, I am trying to get as much information out so you know what I am trying to achieve.
If you need any clarification or larger datasets to test with please just ask and I'll edit as needs be.
Suggestion are VERY welcome.
Edit 1
Erwin Thanks for the edits/tags/suggestions. Agree with them all.
Fixed the missing "9" typo as suggested by Erwin. Manual transcribe error on my part.
kgrittn, Thanks for the suggestion but I am not able to update the version from 9.0.0. I have asked my provider if they will try to update.
Response
Thanks for the excellent reply Erwin
Apologies for the delay in responding but I have been trying to get your query to work and learning the new keywords to understand the query you created.
I adjusted the query to adapt into my table structure but the result set was not as expected (all zeros) so I copied your lines directly and had the same result.
Whilst the result set in both cases lists all 36 rows with the appropriate letters/numbers however all the rows shows zero as the count (ct).
I have tried to deconstruct the query to see where it may be falling over.
The result of
SELECT DISTINCT id, unnest(string_to_array(lower(film), NULL)) AS letter
FROM films
is "No rows found". Perhaps it ought to when extracted from the wider query, I'm not sure.
When I removed the unnest function the result was 14 rows all with "NULL"
If I adjust the function
COALESCE(y.ct, 0) to COALESCE(y.ct, 4)<br />
then my dataset responds all with 4's for every letter instead of zeros as explained previously.
Having briefly read up on COALESCE the "4" being the substitute value I am guessing that y.ct is NULL and being substituted with this second value (this is to cover rows where the letter in the sequence is not matched, i.e if no films contain a 'q' then the 'q' column will have a zero value rather than NULL?)
The database I tried this on was SQL_ASCII and I wondered if that was somehow a problem but I had the same result on one running version 8.4.0 with UTF-8.
Apologies if I've made an obvious mistake but I am unable to return the dataset I require.
Any thoughts?
Again, thanks for the detailed response and your explanations.

This query should do the job:
Test case:
CREATE TEMP TABLE films (id serial, film text);
INSERT INTO films (film) VALUES
('District 9')
,('Surrogates')
,('The Invention Of Lying')
,('Pandorum')
,('UP')
,('The Soloist')
,('Cloudy With A Chance Of Meatballs')
,('The Imaginarium of Doctor Parnassus')
,('Cirque du Freak: The Vampires Assistant')
,('Zombieland')
,('9')
,('The Men Who Stare At Goats')
,('A Christmas Carol')
,('Paranormal Activity');
Query:
SELECT l.letter, COALESCE(y.ct, 0) AS ct
FROM (
SELECT chr(generate_series(97, 122)) AS letter -- a-z in UTF8!
UNION ALL
SELECT generate_series(0, 9)::text -- 0-9
) l
LEFT JOIN (
SELECT letter, count(id) AS ct
FROM (
SELECT DISTINCT -- count film once per letter
id, unnest(string_to_array(lower(film), NULL)) AS letter
FROM films
) x
GROUP BY 1
) y USING (letter)
ORDER BY 1;
This requires PostgreSQL 9.1! Consider the release notes:
Change string_to_array() so a NULL separator splits the string into
characters (Pavel Stehule)
Previously this returned a null value.
You can use regexp_split_to_table(lower(film), ''), instead of unnest(string_to_array(lower(film), NULL)) (works in versions pre-9.1!), but it is typically a bit slower and performance degrades with long strings.
I use generate_series() to produce the [a-z0-9] as individual rows. And LEFT JOIN to the query, so every letter is represented in the result.
Use DISTINCT to count every film once.
Never worry about 1000 rows. That is peanuts for modern day PostgreSQL on modern day hardware.

A fairly simple solution which only requires a single table scan would be the following.
SELECT
'a', SUM( (title ILIKE '%a%')::integer),
'b', SUM( (title ILIKE '%b%')::integer),
'c', SUM( (title ILIKE '%c%')::integer)
FROM film
I left the other 33 characters as a typing exercise for you :)
BTW 1000 rows is tiny for a postgresql database. It's beginning to get large when the DB is larger then the memory in your server.
edit: had a better idea
SELECT chars.c, COUNT(title)
FROM (VALUES ('a'), ('b'), ('c')) as chars(c)
LEFT JOIN film ON title ILIKE ('%' || chars.c || '%')
GROUP BY chars.c
ORDER BY chars.c
You could also replace the (VALUES ('a'), ('b'), ('c')) as chars(c) part with a reference to a table containing the list of characters you are interested in.

This will give you the result in a single row, with one column for each matching letter and digit.
SELECT
SUM(CASE WHEN POSITION('a' IN filmname) > 0 THEN 1 ELSE 0 END) AS "A",
SUM(CASE WHEN POSITION('b' IN filmname) > 0 THEN 1 ELSE 0 END) AS "B",
SUM(CASE WHEN POSITION('c' IN filmname) > 0 THEN 1 ELSE 0 END) AS "C",
...
SUM(CASE WHEN POSITION('z' IN filmname) > 0 THEN 1 ELSE 0 END) AS "Z",
SUM(CASE WHEN POSITION('0' IN filmname) > 0 THEN 1 ELSE 0 END) AS "0",
SUM(CASE WHEN POSITION('1' IN filmname) > 0 THEN 1 ELSE 0 END) AS "1",
...
SUM(CASE WHEN POSITION('9' IN filmname) > 0 THEN 1 ELSE 0 END) AS "9"
FROM films;

A similar approach like Erwins, but maybe more comfortable in the long run:
Create a table with each character you're interested in:
CREATE TABLE char (name char (1), id serial);
INSERT INTO char (name) VALUES ('a');
INSERT INTO char (name) VALUES ('b');
INSERT INTO char (name) VALUES ('c');
Then grouping over it's values is easy:
SELECT char.name, COUNT(*)
FROM char, film
WHERE film.name ILIKE '%' || char.name || '%'
GROUP BY char.name
ORDER BY char.name;
Don't worry about ILIKE.
I'm not 100% happy about using the keyword 'char' as table title, but hadn't had bad experiences so far. On the other hand it is the natural name. Maybe if you translate it to another language - like 'zeichen' in German, you avoid ambiguities.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to group years in decades in sqlite3 in jupyter notebook? - sql

Related

Calculating Column value based on row above and previous column [duplicate]

InfluxDB v1.8: subquery using MAX selector

Restricting a SQL query so that any particular value in a certain column can only appear 3 times in the results, with respect to a given ordering

Why percentage is not working properly in SQLite3?

Count the number of rows that contain a letter/number

Categories

Resources