Select Value Of CSV in MasterTable From Reference Table - sql

Consider these two tables:
--Subscriber_File---
ID GenreId FileName
01 1,2 TestFile.pdf
--MasterGenre--
ID Genrename
1 TEst1
2 Test2
When I issue this query, I'd like the result to be formatted as follows
Select * From Subscriber_File
ID GenreId FileName GenreName
1 1,2 TestFile.pdf TEst1,Test2
How can this be done?

Your data is not normalized. Specifically, in the one row for Subscriber_File, you have two facts in one place: the fact that the one entry is realted to both MasterGenre 1 and MaterGenre 2. What if they were related with three MaterGenres? What if 10? The code requited to associate your facts quickly escalates into an unmanageable mess.
The standard solution—when using relational database systems—is to normalize you data, such that each “repeating fact” is represented by one row in a table. (Google "database normalization" and you'll find thousands of articles on the subject. Really.) Here, you might end up with:
Table Subscriber
SubscriberId
FileName
(01, TestFile.pdf)
Table Genre
GenreId
GenreName
(1, Test1)
(2, Test2)
Table SubscriberGenre
SubscriberId
GenreId
(01, 1)
(01, 2)
At which point querying the data becomse trivial:
SELECT sub.SubscribeId, gen.GenreId, sub.FileName, gen.GenreName
From Subscriber sub
Inner join SubscriberGenre subgen
On subgen.SubscriberId = sub.SubscriberId
Inner join Gener gen
On gen.GenreId = subgen.GenreId
This should produce the result set
(01, 1, TestFile.pdf, Test1)
(01, 2, TestFile.pdf, Test2)
Hmm, you’re still challenged with converting those two lines into the one with a “1,2” value. I’ll let someone else answer that; my main point is that without normalized table structures, you’ll have trouble getting anything done.

Related

SQL column compare

I have 3 columns in the same table in SQL one for the number, a name, and another unrelated data. The numbers repeat for a certain amount of times and have a name next to them, there can't be a name twice on the same number, but the names can be present in multiple different numbers. I need to make an SQL query to find what names have been under the same number the most amount of times. Any help will be very appreciated.
Example: SQL query will find what names have been grouped together the most.
1 Bill
1 Bob
1 Dave
2 Bob
2 John
2 Bill
To confirm - you would like to find
The pairs of names that occur together within a 'number'
Of those, find the pair that occurs most often
The trick here is to get all the pairs, then count how many 'numbers' that pair appears in.
To get the pairs, join the table to itself (on the number) - and then to only have one pairing in each, also join on name with the first in the pair < second in the pair.
The answer to this question depends on your database (SQL Server, MySQL, etc). However, here is an example written in T-SQL but it is fairly generic that does most of the work: it shows the counts and orders them by the the relevant count.
Feel free to get the TOP or LIMIT 1 just to get a pair with the most matches (noting that if there is a tie, only one would be chosen this way)
Alternatively modify the query to work out what the maximum number is, then get the pairs with that number.
CREATE TABLE NameGrps (NameNum int, Name varchar(30));
INSERT INTO NameGrps (NameNum, Name)
VALUES
(1, 'Bill'),
(1, 'Bob'),
(1, 'Dave'),
(2, 'Bob'),
(2, 'John'),
(2, 'Bill');
SELECT NamePairs.FirstInPair, NamePairs.SecondInPair, COUNT(NameNum) AS Num_Paired
FROM
(SELECT A.Name AS FirstInPair, B.Name AS SecondInPair, A.NameNum
FROM NameGrps A
INNER JOIN NameGrps B ON A.NameNum = B.NameNum AND A.Name < B.Name
) AS NamePairs
GROUP BY NamePairs.FirstInPair, NamePairs.SecondInPair
ORDER BY COUNT(NameNum) DESC, NamePairs.FirstInPair, NamePairs.SecondInPair;
And here are the results of the above
FirstInPair SecondInPair Num_Paired
Bill Bob 2
Bill Dave 1
Bill John 1
Bob Dave 1
Bob John 1
If you take a TOP or LIMIT 1 of that, it will find the pair of Bill and Bob is the most frequent.
Here is a db<>fiddle with the query, as well as additional information (e.g., what the sub-query does, and adding a TOP 1 version).

Placing different rows in succession

I've started working with access around 1 month ago and I'm actually making a tool for preventive medicine so they can use a digital version of their actual paper form.
While the program is nearly finished, the med who requested it now wants to export to excel (the easy part) all the data from a patient his treatment and all the medicines used during that treatment in a single line (the problem).
I've been beating my head over that for two days, trying and researching on google, but all i could find was how to put values from a column in a single cell, and that's not how it has to be displayed.
So far, my best attempt (which is far from a good one) has been something like that:
CREATE TABLE Patient
(`SIP` int, `name` varchar(10));
INSERT INTO Patient
(`SIP`, `name`)
VALUES
(70,'John');
-- A patient can have multiple treatments
CREATE TABLE Treatment
(`id` int, `SIPFK` int);
INSERT INTO Treatment
(`id`,`SIPFK`)
VALUES
(1,70);
-- A treatment can have multiple medicines used while it's open
CREATE TABLE Medicine
(`Id` int, `Name` varchar(8), `TreatFK` int);
INSERT INTO Medicine
(`Id`, `Name`, `TreatFK`)
VALUES
(7, 'Apples', 1),
(7, 'Tomatoes', 1),
(7, 'Potatoes', 1),
(8, 'Banana', 2),
(8, 'Peach', 2);
-- The query
select c.id, c.Name, p.id as id2, p.Name as name2, r.id as id3, r.Name as name3
from Medicine as c, Medicine as p, Medicine as r
where c.id = 7 and p.id=7 and r.id=7;
The output I was trying to get was:
7 | Apples | 7 | Tomatoes | 7 | Potatoes
The table medicines will have more columns than that and i need to show every row related to a treatment in a single row along with the treatment.
But the values keep repeating themselves on different rows and the output on the subsequent columns besides the first ones is not as expected. Also GROUP BY won't solve the problem and DISTINCT doesn't work.
The output of the query is as follows: sqlfiddle.com
If any one could give me a hint, I would be grateful.
EDIT: Since access is a derp and won't let me use any good SQL fix nor will recognize DISTINCT to make the data from the queries not repeat themselves, I will try and search for a way to organize the rows directly in the exported excel.
Thank you all for your help, I'll save it cause I'm sure it'll save me hours of hands in the head.
This is a bit problemation, because MS Access does not support recursive CTE's and I dont see a way of doing that without Ranking.
Hence, I have tried to reproduce the results by using subquery which ranks the Medicines
and store these into a temporary table.
create table newtable
select c.id
, c.Name
,(SELECT COUNT(T1.Name) FROM Medicine AS T1 WHERE t1.id=c.id and T1.Name >= c.Name) AS Rank
from Medicine as c;
Afterwards, it is easy because my query is mostly based on Ranks and IDs.
select distinct id
,(select Name from newtable t2 where t1.id=t2.id and rank=1) as firstMed
,(select Name from newtable t2 where t1.id=t2.id and rank=2) as secMed
,(select Name from newtable t2 where t1.id=t2.id and rank=3) as ThirdMed
from newtable t1;
According to me, the SELF JOIN concept and the notion of recursive CTE's are the most important points for that particular example and a good practice would be to do a resarch on these.
for reference: http://sqlfiddle.com/#!2/f80a9/2

How to design the Tables / Query for (m:n relation?)

I am sorry if the term m:n is not correct, If you know a better term i will correct. I have the following situation, this is my original data:
gameID
participID
result
the data itself looks like that
1 5 10
1 4 -10
2 5 150
2 2 -100
2 1 -50
when i would extract this table it will easily have some 100mio rows and around 1mio participIDs ore more.
i will need:
show me all results of all games from participant x, where participant y was present
luckily only for a very limited amount of participants, but those are subject to change so I need a complete table and can reduce in a second step.
my idea is the following, it just looks very unoptimized
1) get the list of games where the "point of view participant" is included"
insert into consolidatedtable (gameid, participid, result)
select gameID,participID,sum(result) from mastertable where participID=x and result<>0
2) get all games where other participant is included
insert into consolidatedtable (gameid, participid, result)
where gameID in (select gameID from consolidatedtable)
AND participID=y and result<>0
3) delete all games from consolidate table where count<2
delete from consolidatedDB where gameID in (select gameid from consolidatedtable where count(distinct(participID)<2 group by gameid)
the whole thing looks like a childrens solution to me
I need a consolidated table for each player
I insert way to many games into this table and delete them later on
the whole thing needs to be run participant by participant over
the whole master table, it would not work if i do this for several
participants at the same time
any better ideas, must be, this ones just so bad. the master table will be postgreSQL on the DW server, the consolidated view will be mySQL (but the number crunching will be done in postgreSQL)
my problems
1) how do i build the consolidated table(s - do i need more than one), without having to run a single query for each player over the whole master table (i need to data for players x,y,z and no matter who else is playing) - this is the consolidation task for the DW server, it should create the table for webserver (which is condensed)
2) how can i then extract the at the webserver fast (so the table design of (1) should take this into consideration. we are not talking about a lot of players here i need this info, maybe 100? (so i could then either partition by player ID, or just create single table)
Datawarehouse: postgreSQL 9.2 (48GB, SSD)
Webserver: mySQL 5.5 (4GB Ram, SSD)
master table: gameid BIGINT, participID, Result INT, foreign key on particiP ID (to participants table)
the DW server will hold the master table, the DW server should also prepare the consolidated/extracted Tables (processing power, ssd space is not
an issue)
the webserver should hold the consoldiated tables (only for the 100
players where i need the info) and query this data in a very
efficient manner
so efficient query at webserver >> workload of DW server)
i think this is important, sorry that i didnt include it at the beginning.
the data at the DW server updates daily, but i do not need to query the whole "master table" completely every day. the setup allows me to consolidate only never values. eg: yesterday consolidation was up to ID 500, current ID=550, so today i only consolidate 501-550.
Here is another idea that might work, depending on your database (and my understanding of the question):
SELECT *
FROM table a
WHERE participID = 'x'
AND EXISTS (
SELECT 1 FROM table b
WHERE b.participID = 'y'
AND b.gameID=a.gameID
);
Assuming you have indexes on the two columns (participID and gameID), the performance should be good.
I'd compare it to this and see which runs faster:
SELECT *
FROM table a
JOIN (
SELECT gameID
FROM table
WHERE participID = 'y'
GROUP BY gameID
) b
ON a.gameID=b.gameID
WHERE a.participID = 'x';
Sounds like you just want a self join:
For all participants:
SELECT x.gameID, x.participID, x.results, y.participID, y.results
FROM table as x
JOIN table as y
ON T1.gameID = T2.gameID
WHERE x.participID <> y.participID
The downside of that is you'd get each participant on each side of each game.
For 2 specific particpants:
SELECT x.gameID, x.results, y.results
FROM (SELECT gameID, participID, results
FROM table
WHERE t1.participID = 'x'
and results <> 0)
as x
JOIN (SELECT gameID, participID, results
FROM table
WHERE t1.participID = 'y'
and results <> 0)
as y
ON T1.gameID = T2.gameID
You might not need to select participID in your query, depending on what you're doing with the results.

How can I retrieve similar data from two separate tables simultaneously?

Disclaimer: my SQL skills are basic, to say the least.
Let's say I have two similar data types in different tables of the same database.
The first table is called hardback and the fields are as follows:
hbID | hbTitle | hbPublisherID | hbPublishDate
The second table is called paperback and its fields hold similar data but the fields are named differently:
pbID | pbTitle | pbPublisherID | pbPublishDate
I need to retrieve the 10 most recent hardback and paperback books, where the publisher ID is 7.
This is what I have so far:
SELECT TOP 10
hbID, hbTitle, hbPublisherID, hbPublishDate AS pDate
bpID, pbTitle, bpPublisherID, pbPublishDate AS pDate
FROM hardback CROSS JOIN paperback
WHERE (hbPublisherID = 7) OR (pbPublisherID = 7)
ORDER BY pDate DESC
This returns seven columns per row, at least three of which may or may not be for the wrong publisher. Possibly four, depending on the contents of pDate, which is almost certainly going to be a problem if the other six columns are for the correct publisher!
In an effort to release an earlier version of this software, I ran two separate queries fetching 10 records each, then sorted them by date and discarded the bottom ten, but I just know there must be a more elegant way to do it!
Any suggestions?
Aside: I was reviewing what I'd written here, when my Mac suddenly experienced a kernel panic. Restarted, reopened my tabs and everything I'd typed was still here! Stack Exchange sites are awesome :)
The easiest way is probably a UNION:
SELECT TOP 10 * FROM
(SELECT hbID, hbTitle, hbPublisherID as PublisherID, hbPublishDate as pDate
FROM hardback
UNION
SELECT hpID, hpTitle, hpPublisher, hpPublishDate
FROM paperback
) books
WHERE PublisherID = 7
If you could have two copies of the same title (1 paperback, 1 hardcover), change the UNION to a UNION ALL; UNION alone discards duplicates. You could also add a column that indicates what book type it is by adding a pseudo-column to each select (after the publish date, for instance):
hbPublishDate as pDate, 'H' as Covertype
You'll have to add the same new column to the paperback half of the query, using 'P' instead. Note that on the second query you don't have to specify column names; the resultset takes the names from the first one. All column data types in the two queries have match, also - you can't UNION a date column in the first with a numeric column in the second without converting the two columns to the same datatype in the query.
Here's a sample script for creating two tables and doing the select above. It works just fine in SQL Server Management Studio.Just remember to drop the two tables (using DROP Table tablename) when you're done.
use tempdb;
create table Paperback (pbID Integer Identity,
pbTitle nvarchar(30), pbPublisherID Integer, pbPubDate Date);
create table Hardback (hbID Integer Identity,
hbTitle nvarchar(30), hbPublisherID Integer, hbPubDate Date);
insert into Paperback (pbTitle, pbPublisherID, pbPubDate)
values ('Test title 1', 1, GETDATE());
insert into Hardback (hbTitle, hbPublisherID, hbPubDate)
values ('Test title 1', 1, GETDATE());
select * from (
select pbID, pbTitle, pbPublisherID, pbPubDate, 'P' as Covertype
from Paperback
union all
select hbID, hbTitle, hbPublisherID, hbPubDate,'H'
from Hardback) books
order by CoverType;
/* You'd drop the two tables here with
DROP table Paperback;
DROP table HardBack;
*/
i think it is clearly better, if you make only one table with a reference to another one which holds information about the category of the entry like hardback or paperback. this is my first suggestion.
by the way, what is your programming language?

A Simple Sql Select Query

I know I am sounding dumb but I really need help on this.
I have a Table (let's say Meeting) which Contains a column Participants.
The Participants dataType is varchar(Max) and it stores Participant's Ids in comma separated form like 1,2.
Now my problem is I am passing a parameter called #ParticipantsID in my Stored Procedure and want to do something like this:
Select Participants from Meeting where Participants in (#ParticipantsID)
Unfortunately I am missing something crucial here.
Can some one point that out?
I've been there before... I changed the DB design to have one record contain a single reference to the other table. If you can't change your DB structures and you have to live with this, I found this solution on CodeProject.
New Function
IF EXISTS(SELECT * FROM sysobjects WHERE ID = OBJECT_ID(’UF_CSVToTable’))
DROP FUNCTION UF_CSVToTable
GO
CREATE FUNCTION UF_CSVToTable
(
#psCSString VARCHAR(8000)
)
RETURNS #otTemp TABLE(sID VARCHAR(20))
AS
BEGIN
DECLARE #sTemp VARCHAR(10)
WHILE LEN(#psCSString) > 0
BEGIN
SET #sTemp = LEFT(#psCSString, ISNULL(NULLIF(CHARINDEX(',', #psCSString) - 1, -1),
LEN(#psCSString)))
SET #psCSString = SUBSTRING(#psCSString,ISNULL(NULLIF(CHARINDEX(',', #psCSString), 0),
LEN(#psCSString)) + 1, LEN(#psCSString))
INSERT INTO #otTemp VALUES (#sTemp)
END
RETURN
END
Go
New Sproc
SELECT *
FROM
TblJobs
WHERE
iCategoryID IN (SELECT * FROM UF_CSVToTable(#sCategoryID))
You would not typically organise your SQL database in quite this way. What you are describing are two entities (Meeting & Participant) that have a one-to-many relationship. i.e. a meeting can have zero or more participants. To model this in SQL you would use three tables: a meeting table, a participant table and a MeetingParticipant table. The MeetingParticipant table holds the links between meetings & participants. So, you might have something like this (excuse any sql syntax errors)
create table Meeting
(
MeetingID int,
Name varchar(50),
Location varchar(100)
)
create table Participant
(
ParticipantID int,
FirstName varchar(50),
LastName varchar(50)
)
create table MeetingParticipant
(
MeetingID int,
ParticipantID int
)
To populate these tables you would first create some Participants:
insert into Participant(ParticipantID, FirstName, LastName) values(1, 'Tom', 'Jones')
insert into Participant(ParticipantID, FirstName, LastName) values(2, 'Dick', 'Smith')
insert into Participant(ParticipantID, FirstName, LastName) values(3, 'Harry', 'Windsor')
and create a Meeting or two
insert into Meeting(MeetingID, Name, Location) values(10, 'SQL Training', 'Room 1')
insert into Meeting(MeetingID, Name, Location) values(11, 'SQL Training', 'Room 2')
and now add some participants to the meetings
insert into MeetingParticipant(MeetingID, ParticipantID) values(10, 1)
insert into MeetingParticipant(MeetingID, ParticipantID) values(10, 2)
insert into MeetingParticipant(MeetingID, ParticipantID) values(11, 2)
insert into MeetingParticipant(MeetingID, ParticipantID) values(11, 3)
Now you can select all the meetings and the participants for each meeting with
select m.MeetingID, p.ParticipantID, m.Location, p.FirstName, p.LastName
from Meeting m
join MeetingParticipant mp on m.MeetingID=mp.MeetingID
join Participant p on mp.ParticipantID=p.ParticipantID
the above should produce
MeetingID ParticipantID Location FirstName LastName
10 1 Room 1 Tom Jones
10 2 Room 1 Dick Smith
11 2 Room 2 Dick Smith
11 3 Room 2 Harry Windsor
If you want to find out all the meetings that "Dick Smith" is in you would write something like this
select m.MeetingID, m.Location
from Meeting m join MeetingParticipant mp on m.MeetingID=mp.ParticipantID
where
mp.ParticipantID=2
and get
MeetingID Location
10 Room 1
11 Room 2
I have omitted important things like indexes, primary keys and missing attributes such as meeting dates, but it is clearer without all the goo.
Your table is not normalized. If you want to query for individual participants, they should be split into their own table, along the lines of:
Meeting
MeetingId primary key
Other stuff
Persons
PersonId primary key
Other stuff
Participants
MeetingId foreign key Meeting(MeetingId)
PersonId foreign key Persons(PersonId)
primary key MeetingId,PersonId
Otherwise, you have to resort to all sorts of trickery (what I call SQL gymnastics) to find out what you want. That trickery never scales well - your queries become slow very quickly as the table grows.
With a properly normalized database, the queries can remain fast well into the multi-millions of records (I work with DB2/z where we are used to truly huge tables).
There are valid reasons for sometimes reverting to second normal form (or even first) for performance but that should be a very hard thought out decision (and based on actual performance data). All databases should initially start of in 3NF.
SELECT * FROM Meeting WHERE Participants LIKE '%,12,%' OR Participants LIKE '12,%' OR Participants LIKE '%,12'
where 12 is the ID you are looking for....
Ugly, what a nasty model.
If I understand your question correctly, you are trying to pass in a comma separated list of participant ids and see if it is in your list. This link lists several ways to do such a thing"
[http://vyaskn.tripod.com/passing_arrays_to_stored_procedures.htm][1]
codezy.blogspot.com
If you store the participant ids in a comma-separated list (as text) in the database, you cannot easily query it (as a list) using SQL. You would have to resort to string-operations.
You should consider changing your schema to use another table to map meetings to participants:
create table meeting_participants (
meeting_id integer not null , -- foreign key
participant_id integer not null
);
That table would have multiple rows per meeting (one for each participant).
You can then query that table for individual participants, or number of participants, and such.
If participants is a separate data type you should be storing it as a child table of your meeting table. e.g.
MEETING
PARTICIPANT 1
PARTICIPANT 2
PARTICIPANT 3
Each participant would hold the meeting ID so you can do a query
SELECT * FROM participants WHERE meeting_id = 1
However, if you must store a comma separated list (for some external reason) then you can do a string search to find the appropriate record. This would be a very inefficient way to do a query though.
That is not the best way to store the information you have.
If it is all you have got then you need to be doing a contains (not an IN). The best answer is to have another table that links Participants to Meetings.
Try SELECT Meeting, Participants FROM Meeting CONTAINS(Participants, #ParticipantId)