SQL column compare - sql

I have 3 columns in the same table in SQL one for the number, a name, and another unrelated data. The numbers repeat for a certain amount of times and have a name next to them, there can't be a name twice on the same number, but the names can be present in multiple different numbers. I need to make an SQL query to find what names have been under the same number the most amount of times. Any help will be very appreciated.
Example: SQL query will find what names have been grouped together the most.
1 Bill
1 Bob
1 Dave
2 Bob
2 John
2 Bill

To confirm - you would like to find
The pairs of names that occur together within a 'number'
Of those, find the pair that occurs most often
The trick here is to get all the pairs, then count how many 'numbers' that pair appears in.
To get the pairs, join the table to itself (on the number) - and then to only have one pairing in each, also join on name with the first in the pair < second in the pair.
The answer to this question depends on your database (SQL Server, MySQL, etc). However, here is an example written in T-SQL but it is fairly generic that does most of the work: it shows the counts and orders them by the the relevant count.
Feel free to get the TOP or LIMIT 1 just to get a pair with the most matches (noting that if there is a tie, only one would be chosen this way)
Alternatively modify the query to work out what the maximum number is, then get the pairs with that number.
CREATE TABLE NameGrps (NameNum int, Name varchar(30));
INSERT INTO NameGrps (NameNum, Name)
VALUES
(1, 'Bill'),
(1, 'Bob'),
(1, 'Dave'),
(2, 'Bob'),
(2, 'John'),
(2, 'Bill');
SELECT NamePairs.FirstInPair, NamePairs.SecondInPair, COUNT(NameNum) AS Num_Paired
FROM
(SELECT A.Name AS FirstInPair, B.Name AS SecondInPair, A.NameNum
FROM NameGrps A
INNER JOIN NameGrps B ON A.NameNum = B.NameNum AND A.Name < B.Name
) AS NamePairs
GROUP BY NamePairs.FirstInPair, NamePairs.SecondInPair
ORDER BY COUNT(NameNum) DESC, NamePairs.FirstInPair, NamePairs.SecondInPair;
And here are the results of the above
FirstInPair SecondInPair Num_Paired
Bill Bob 2
Bill Dave 1
Bill John 1
Bob Dave 1
Bob John 1
If you take a TOP or LIMIT 1 of that, it will find the pair of Bill and Bob is the most frequent.
Here is a db<>fiddle with the query, as well as additional information (e.g., what the sub-query does, and adding a TOP 1 version).

Related

Retrieving duplicate values in column SQL database

I have a small database which contains a table that holds information in each row on a Movie (e.g, Movie Name, Movie Runtime, Movie Rating) and I also have a separate Genre table which contains a list of genres (Horror, Action etc).
I have an association table which links a movie to a genre (a typical row will contain the unique Id for that row, the genreId and the movieId).
I have written a query which pulls back all the genres a user has watched; however, it is removing the duplicate row values and is giving me what seems to be a distinct count.
Below is the SQL statement:
SELECT g.Type,
g.Id
FROM GenreTable g
WHERE
g.Id in (
SELECT gma.GenreId
FROM MovieGenreAssociationTable gma
WHERE gma.MovieId in (
SELECT uma.MovieSeriesId
FROM UserMovieAssociationTable uma
WHERE uma.UserId = '1'
)
);
This returns all of the genres a user has watched, but I'm noticing that it's not bringing back the duplicates which I know exist in the association table.
How do I get those duplicates?
You are not making a JOIN but a SELECT on a single table, so it will never return any duplicates unless they exist in GenreTable.
If you do something like SELECT a FROM tbl WHERE b IN (1,1,1,1,1), it will return only one row -- not five. And even if you have a complicated WHERE there, it's still a simple IN clause.
update: quick and dirty refresher on JOINs.
I'd actually suggest you look for a SQL tutorial. I make no claim about the completeness of this note - rather, the contrary. First google hit, second hit, etc.
Say that you have two simple tables:
a.id a.a b.id b.b
1 1 1 'Hello'
2 1 2 'World'
3 2 7 'foobar'
4 3
If you run a JOIN between a and b, ON(a.a = b.id), the query will select all records in a; each of them will be then joined to all matching records in b. This is what JOIN is for.
In this case, the second and third columns will always be equal:
1 1 1 'Hello'
2 1 1 'Hello'
3 2 2 'World'
Notice that the fourth row of a is discarded because it has no matches, and the third row of b is never selected at all. The second row of b is selected twice, because there are two elements of a which have a match.
A LEFT JOIN works the same, except that if there are no matches for the left side of the query (i.e. table a), as it happens for the fourth row, that row is selected all the same; but the extra fields that would have come from b are replaced by NULLs. You get a further row for which the JOIN clause, ON(a.a = b.id), is actually false:
4 3 NULL NULL
(And you can use this to select the rows of a that have no matches in b: just specify e.g. WHERE b.primary_key_of_b IS NULL).
Your case
You should do something like:
SELECT
g.Type,
g.Id
FROM GenreTable AS g
JOIN MovieGenreAssociationTable AS gma ON (gma.GenreId = g.Id)
JOIN UserMovieAssociationTable AS uma ON (uma.MovieSeriesId = gma.MovieId)
WHERE uma.UserId = 1;
You can then GROUP BY e.g. Type and Id to get the COUNT() of movies watched for each genre.
But...
Say that you have a GenreTable with two rows (Id=123, Type="Science Fiction" and Id=456, Type="Comedy"), a Movie table with one row (777, "Galactic Quest"), a MovieGenreAssociationTable with (123, 777) and (456, 777) because that movie is a great comedy too, and finally user 1 watched only movie 777. You would get:
Genre gma uma Movie
123 "Science Fiction" 123, 777 777, 1 777, "Galaxy Quest"
456 "Comedy" 456, 777 777, 1 777, "Galaxy Quest"
and would see that user 1 has seen two movies - one SciFi, one Comedy.
In this case you need to either accept the result (how many comedies did he watch? One. How many SciFis? One), or make a more complicated query for which you must decide which is the main genre. Otherwise you would get illogical results ("How many comedies? One. How many movies? One. Then number of non-comedies is one minus one, ie, zero? No, it is again one - wait, what?").
In this case you could add a column for this purpose in MovieGenreAssociation, a boolean column "IsMainGenre". So when you want to know how many comedies one watched, you would do as above. But when you split movies by genre, you add AND IsMainGenre=1 and you calculate "Galaxy Quest" among SciFis, but not among comedies or parodies.

Node / Postgres SQL Select distinct entries then put all other entries with the same reference into one column

this question was probably asked somewhere but I can't seem to phrase it correctly in the search to find an accurate answer.
I'm doing a query on a Postgres DB, it has quite a few joins, the results are something like this:
WON | name | item
1 Joe A
1 Joe B
2 Smith A
So one row for each entry, I need to somehow get the result back as such:
WON | name | item
1 Joe A, B
2 Smith A
This can be done in the query or with NodeJS, there are hundreds to thousands of results for the query, so getting a distinct row (WON 1) then searching the DB for all entries that match it then repeating for the rest isn't feasible, so this may be better done in Node / Javascript, but I'm somewhat new to that, what would be a (somewhat) efficient way to do this?
If there IS a way to do this in the query itself then that would be my preference though.
Thanks
A sql approach:
SELECT won, name
,STRING_AGG(item, ',' ORDER BY item) AS items
FROM myTable
GROUP BY won, name
ORDER BY won, name
You can use GROUP BY and string_agg to cancat rows, somelike this:
Create table:
CREATE TABLE test
(
won int,
name character varying(255),
item character varying(255)
);
insert into test (won, name, item) values (1,'Joe', 'A'),(1, 'Joe', 'B'),(2, 'Smith', 'A')
And do this in the query:
select won, name, string_agg(item, ',') from test group by won, name order by won
See this example in sqlFiddle

Can IBM DB2 return a 0 (zero) when no records are found?

for example :
I have a table with student ID and student grades
-----------------------
ID | grades
-----------------------
1 | 80
2 | 28
-----------------------
I want to get 0 when I query about ID = 3
can I do that ?
like select grades from student where id = 3 .
I want to get 0 because ID is not in the table
Run a select command with the reserved function called count:
select count(*) from STUDENT.GRADES where ID=3
It should be just like that.
Maybe this will do what you want:
SELECT ID, MAX(Grades)
FROM (SELECT ID, Grade FROM Students WHERE ID = 3
UNION
VALUES (3, 0) -- Not certain of syntax here
)
GROUP BY ID
The basic idea is that students present in the table will have two rows and the MAX will pick their proper grade (assuming that there are no circumstances where the grade is coded as a negative value). Students that are not represented will have just the one row with a grade of 0. The repeated 3 is the ID of the student being sought.
Have fun chasing down the full syntax. I started at Queries in the DB2 9.7 Information Centre, but ran out of patience before I got a good answer — and I don't have DB2 to experiment on. You might need to write SELECT ID, Grades FROM VALUES (3, 0), or there might be some other magical incantation that does the job. You could probably use SELECT 3 AS ID, 0 AS Grades FROM SYSIBM.SYSTABLES WHERE TABID = 1, but that's a clumsy expression.
I've kept with the column name Grades (plural) even though it looks like it contains one grade. It is depressing how often people ask questions about anonymous tables.

Select Value Of CSV in MasterTable From Reference Table

Consider these two tables:
--Subscriber_File---
ID GenreId FileName
01 1,2 TestFile.pdf
--MasterGenre--
ID Genrename
1 TEst1
2 Test2
When I issue this query, I'd like the result to be formatted as follows
Select * From Subscriber_File
ID GenreId FileName GenreName
1 1,2 TestFile.pdf TEst1,Test2
How can this be done?
Your data is not normalized. Specifically, in the one row for Subscriber_File, you have two facts in one place: the fact that the one entry is realted to both MasterGenre 1 and MaterGenre 2. What if they were related with three MaterGenres? What if 10? The code requited to associate your facts quickly escalates into an unmanageable mess.
The standard solution—when using relational database systems—is to normalize you data, such that each “repeating fact” is represented by one row in a table. (Google "database normalization" and you'll find thousands of articles on the subject. Really.) Here, you might end up with:
Table Subscriber
SubscriberId
FileName
(01, TestFile.pdf)
Table Genre
GenreId
GenreName
(1, Test1)
(2, Test2)
Table SubscriberGenre
SubscriberId
GenreId
(01, 1)
(01, 2)
At which point querying the data becomse trivial:
SELECT sub.SubscribeId, gen.GenreId, sub.FileName, gen.GenreName
From Subscriber sub
Inner join SubscriberGenre subgen
On subgen.SubscriberId = sub.SubscriberId
Inner join Gener gen
On gen.GenreId = subgen.GenreId
This should produce the result set
(01, 1, TestFile.pdf, Test1)
(01, 2, TestFile.pdf, Test2)
Hmm, you’re still challenged with converting those two lines into the one with a “1,2” value. I’ll let someone else answer that; my main point is that without normalized table structures, you’ll have trouble getting anything done.

SQL select replace integer with string

Goal is to replace a integer value that is returned in a SQL query with the char value that the number represents. For example:
A table attribute labeled ‘Sport’ is defined as a integer value between 1-4. 1 = Basketball, 2 = Hockey, etc. Below is the database table and then the desired output.
Database Table:
Player Team Sport
--------------------------
Bob Blue 1
Roy Red 3
Sarah Pink 4
Desired Outputs:
Player Team Sport
------------------------------
Bob Blue Basketball
Roy Red Soccer
Sarah Pink Kickball
What is best practice to translate these integer values for String values? Use SQL to translate the values prior to passing to program? Use scripting language to change the value within the program? Change database design?
The database should hold the values and you should perform a join to another table which has that data in it.
So you should have a table which has say a list of people
ID Name FavSport
1 Alex 4
2 Gnats 2
And then another table which has a list of the sports
ID Sport
1 Basketball
2 Football
3 Soccer
4 Kickball
Then you would do a join between these tables
select people.name, sports.sport
from people, sports
where people.favsport = sports.ID
which would give you back
Name Sport
Alex Kickball
Gnat Football
You could also use a case statement eg. just using the people table from above you could write something like
select name,
case
when favsport = 1 then 'Basketball'
when favsport = 2 then 'Football'
when favsport = 3 then 'Soccer'
else 'Kickball'
end as "Sport"
from people
But that is certainly not best practice.
MySQL has a CASE statement. The following works in SQL Server:
SELECT
CASE MyColumnName
WHEN 1 THEN 'First'
WHEN 2 THEN 'Second'
WHEN 3 THEN 'Third'
ELSE 'Other'
END
In oracle you can use the DECODE function which would provide a solution where the design of the database is beyond your control.
Directly from the oracle documentation:
Example: This example decodes the value warehouse_id. If warehouse_id is 1, then the function returns 'Southlake'; if warehouse_id is 2, then it returns 'San Francisco'; and so forth. If warehouse_id is not 1, 2, 3, or 4, then the function returns 'Non domestic'.
SELECT product_id,
DECODE (warehouse_id, 1, 'Southlake',
2, 'San Francisco',
3, 'New Jersey',
4, 'Seattle',
'Non domestic') "Location"
FROM inventories
WHERE product_id < 1775
ORDER BY product_id, "Location";
The CASE expression could help. However, it may be even faster to have a small table with an int primary key and a name string such as
1 baseball
2 football
etc, and JOIN it appropriately in the query.
Do you think it would be helpful to store these relationships between integers and strings in the database itself? As long as you have to store these relationships, it makes sense to store it close to your data (in the database) instead of in your code where it can get lost. If you use this solution, this would make the integer a foreign key to values in another table. You store integers in another table, say sports, with sport_id and sport, and join them as part of your query.
Instead of SELECT * FROM my_table you would SELECT * from my_table and use the appropriate join. If not every row in your main column has a corresponding sport, you could use a left join, otherwise selecting from both tables and using = in the where clause is probably sufficient.
definitely have the DB hold the string values. I am not a DB expert by any means, but I would recommend that you create a table that holds the strings and their corresponding integer values. From there, you can define a relationship between the two tables and then do a JOIN in the select to pull the string version of the integer.
tblSport Columns
------------
SportID int (PK, eg. 12)
SportName varchar (eg. "Tennis")
tblFriend Columns
------------
FriendID int (PK)
FriendName (eg. "Joe")
LikesSportID (eg. 12)
In this example, you can get the following result from the query below:
SELECT FriendName, SportName
FROM tblFriend
INNER JOIN tblSport
ON tblFriend.LikesSportID = tblSport.SportID
Man, it's late - I hope I got that right. by the way, you should read up on the different types of Joins - this is the simplest example of one.