SQL query to find list of primary keys not used - sql

I am trying to make a drop down picker in an Access database to display all the primary keys not used, in this case a date that is limited to the first of the month.
I have 2 tables that are for this use
tblReport
pk date | Data for this record |
05/01/13 | stuff
06/01/13 | stuff
07/01/13 | stuff
08/01/13 | stuff
and
tblFutureDates
pk date | an index
05/01/13 | 1
06/01/13 | 2
07/01/13 | 3
08/01/13 | 4
09/01/13 | 5
10/01/13 | 6
11/01/13 | 7
12/01/13 | 8
I want a query that looks at these two tables and returns the dates that are in the second table that aren't in the first one. I have tried some joins but cannot figure it out. This is what I have thus far:
SELECT tblFutureDates.FutureDate
FROM tblFutureDates RIGHT JOIN tblReport
ON tblFutureDates.FutureDate = tblReport.ReportMonth;
and that returns:
05/01/13
06/01/13
07/01/13
08/01/13
Thanks

This selects dates from tblFutureDates that are NOT IN tblReport
SELECT tblFutureDates.FutureDate
FROM tblFutureDates
WHERE tblFutureDates.FutureDate
NOT IN (SELECT tblReport.ReportMonth FROM tblReport)
You can also use LEFT JOIN WHERE IS NULL and NOT EXISTS for more information about all 3 see this post.

Related

Find records which have multiple occurrences in another table array (postgres)

I have a table which has records in array. Also there is another table which have single string records. I want to get records which have multiple occurrences in another table. Following are tables;
Vehicle
veh_id | vehicle_types
-------+---------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8","viper"}
7 | {"ferrariff","viper","viper","volt"}
vehicle_names
id | vehicle_name
-----+-----------------------
1 | byd_tang
2 | volt
3 | viper
4 | laferrari
5 | sonata
6 | jaguarxf
7 | swift
8 | teslax
9 | mirai
10 | ferrariff
11 | bmwi8
I have a query which can give output what I expect but its not optimal and may be its expensive query.
This is the query:
select veh_name
from vehicle_names dsb
where (select count(*) from vehicle dsd
where dsb.veh_name = ANY (dsd.veh_types)) > 1
The output should be:
byd_tang
volt
viper
One option would be an aggregation query:
SELECT
vn.id,
vn.veh_name
FROM vehicle_names vn
INNER JOIN vehicle v
ON vn. veh_name = ANY (v.veh_types)
GROUP BY
vn.id,
vn.veh_name
HAVING
COUNT(*) > 1;
This only counts a vehicle name which appears in two or more records in the other table. It would not pick up, for example, a single vehicle record with the same name appearing two or more times.

Recursive self join over file data

I know there are many questions about recursive self joins, but they're mostly in a hierarchical data structure as follows:
ID | Value | Parent id
-----------------------------
But I was wondering if there was a way to do this in a specific case that I have where I don't necessarily have a parent id. My data will look like this when I initially load the file.
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
Essentially, its a CSV file where each row in the table is a line in the file. Lines 1 and 5 identify an object header and lines 3, 4, 7, and 8 identify the rows belonging to the object. The object header lines can have only 40 attributes which is why the object is broken up across multiple sections in the CSV file.
What I'd like to do is take the table, separate out the record # column, and join it with itself multiple times so it achieves something like this:
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,5,6,7,8,...
2 | *,record,abc,efg,hij,lmn,opq,rst
3 | ,,1,x,y,z,t,u,v,...
4 | ,,2,q,r,s,l,m,n,...
I know its probably possible, I'm just not sure where to start. My initial idea was to create a view that separates out the first and second columns in a view, and use the view as a way of joining in a repeated fashion on those two columns. However, I have some problems:
I don't know how many sections will occur in the file for the same
object
The file can contain other objects as well so joining on the first two columns would be problematic if you have something like
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
9 | ,4,Data,1,2,3,4,...
10 | *,record,lmn,opq,rst,...
11 | ,,1,t,u,v,...
In the above case, my plan could join rows from the Data object in row 9 with the first rows of the Formula object by matching the record value of 1.
UPDATE
I know this is somewhat confusing. I tried doing this with C# a while back, but I had to basically write a recursive decent parser to parse the specific file format and it simply took to long because I had to get it in the database afterwards and it was too much for entity framework. It was taking hours just to convert one file since these files are excessively large.
Either way, #Nolan Shang has the closest result to what I want. The only difference is this (sorry for the bad formatting):
+----+------------+------------------------------------------+-----------------------+
| ID | header | x | value
|
+----+------------+------------------------------------------+-----------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 |3,Formula,1,2,3,4,5,6,7,8 |
| 2 | ,, | ,1,x,y,z,t,u,v | ,1,x,y,z,t,u,v |
| 3 | ,, | ,2,q,r,s,l,m,n | ,2,q,r,s,l,m,n |
| 4 | *,record, | ,abc,efg,hij,lmn,opq,rst |*,record,abc,efg,hij,lmn,opq,rst |
| 5 | ,4, | ,Data,1,2,3,4 |,4,Data,1,2,3,4 |
| 6 | *,record, | ,lmn,opq,rst | ,lmn,opq,rst |
| 7 | ,, | ,1,t,u,v | ,1,t,u,v |
+----+------------+------------------------------------------+-----------------------------------------------+
I agree that it would be better to export this to a scripting language and do it there. This will be a lot of work in TSQL.
You've intimated that there are other possible scenarios you haven't shown, so I obviously can't give a comprehensive solution. I'm guessing this isn't something you need to do quickly on a repeated basis. More of a one-time transformation, so performance isn't an issue.
One approach would be to do a LEFT JOIN to a hard-coded table of the possible identifying sub-strings like:
3,Formula,
*,record,
,,1,
,,2,
,4,Data,
Looks like it pretty much has to be human-selected and hard-coded because I can't find a reliable pattern that can be used to SELECT only these sub-strings.
Then you SELECT from this artificially-created table (or derived table, or CTE) and LEFT JOIN to your actual table with a LIKE to get all the rows that use each of these values as their starting substring, strip out the starting characters to get the rest of the string, and use the STUFF..FOR XML trick to build the desired Line.
How you get the ID column depends on what you want, for instance in your second example, I don't know what ID you want for the ,4,Data,... line. Do you want 5 because that's the next number in the results, or do you want 9 because that's the ID of the first occurrance of that sub-string? Code accordingly. If you want 5 it's a ROW_NUMBER(). If you want 9, you can add an ID column to the artificial table you created at the start of this approach.
BTW, there's really nothing recursive about what you need done, so if you're still thinking in those terms, now would be a good time to stop. This is more of a "Group Concatenation" problem.
Here is a sample, but has some different with you need.
It is because I use the value the second comma as group header, so the ,,1 and ,,2 will be treated as same group, if you can use a parent id to indicated a group will be better
DECLARE #testdata TABLE(ID int,Line varchar(8000))
INSERT INTO #testdata
SELECT 1,'3,Formula,1,2,3,4,...' UNION ALL
SELECT 2,'*,record,abc,efg,hij,...' UNION ALL
SELECT 3,',,1,x,y,z,...' UNION ALL
SELECT 4,',,2,q,r,s,...' UNION ALL
SELECT 5,'3,Formula,5,6,7,8,...' UNION ALL
SELECT 6,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 7,',,1,t,u,v,...' UNION ALL
SELECT 8,',,2,l,m,n,...' UNION ALL
SELECT 9,',4,Data,1,2,3,4,...' UNION ALL
SELECT 10,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 11,',,1,t,u,v,...'
;WITH t AS(
SELECT *,REPLACE(SUBSTRING(t.Line,LEN(c.header)+1,LEN(t.Line)),',...','') AS data
FROM #testdata AS t
CROSS APPLY(VALUES(LEFT(t.Line,CHARINDEX(',',t.Line, CHARINDEX(',',t.Line)+1 )))) c(header)
)
SELECT MIN(ID) AS ID,t.header,c.x,t.header+STUFF(c.x,1,1,'') AS value
FROM t
OUTER APPLY(SELECT ','+tb.data FROM t AS tb WHERE tb.header=t.header FOR XML PATH('') ) c(x)
GROUP BY t.header,c.x
+----+------------+------------------------------------------+-----------------------------------------------+
| ID | header | x | value |
+----+------------+------------------------------------------+-----------------------------------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 | 3,Formula,1,2,3,4,5,6,7,8 |
| 3 | ,, | ,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v | ,,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v |
| 2 | *,record, | ,abc,efg,hij,lmn,opq,rst,lmn,opq,rst | *,record,abc,efg,hij,lmn,opq,rst,lmn,opq,rst |
| 9 | ,4, | ,Data,1,2,3,4 | ,4,Data,1,2,3,4 |
+----+------------+------------------------------------------+-----------------------------------------------+

Join two tables - One common column with different values

I have been searching around for how to do this for days - unfortunately I don't have much experience with SQL Queries, so it's been a bit of trial and error.
Basically, I have created two tables - both with one DateTime column and a different column with values in.
The DateTime column has different values in each table.
So...
ACOQ1 (Table 1)
===============
| DateTime | ACOQ1_Pump_Running |
|----------+--------------------|
| 7:14:12 | 1 |
| 8:09:03 | 1 |
ACOQ2 (Table 2)
===============
| DateTime | ACOQ2_Pump_Running |
|----------+--------------------|
| 3:54:20 | 1 |
| 7:32:57 | 1 |
I want to combine these two tables to look like this:
| DateTime | ACOQ1_Pump_Running | ACOQ2_Pump_Running |
|----------+--------------------+--------------------|
| 3:54:20 | 0 OR NULL | 1 |
| 7:14:12 | 1 | 0 OR NULL |
| 7:32:57 | 0 OR NULL | 1 |
| 8:09:03 | 1 | 0 OR NULL |
I have achieved this by creating a third table that 'UNION's the DateTime column from both tables and then uses that third table's DateTime column for the new table but was wondering if there was a way to skip this step out.
(Eventually I will be adding more and more columns on from different tables and don't really want to be adding yet more processing time by creating a joint DateTime table that may not be necessary).
My working code at the moment:
CREATE TABLE JointDateTime
(
DateTime CHAR(50)
CONSTRAINT [pk_Key3] PRIMARY KEY (DateTime)
);
INSERT INTO JointDateTime (DateTime)
SELECT ACOQ1.DateTime FROM ACOQ1
UNION
SELECT ACOQ2.DateTime FROM ACOQ2
SELECT JointDateTime.DateTime, ACOQ1.ACOQ1_NO_1_PUMP_RUNNING, ACOQ2.ACOQ2_NO_1_PUMP_RUNNING
FROM (SELECT ACOQ1.DateTime FROM ACOQ1
UNION
SELECT ACOQ2.DateTime FROM ACOQ2) JointDateTime
LEFT OUTER JOIN ACOQ1
ON JointDateTime.DateTime = ACOQ1.DateTime
LEFT OUTER JOIN ACOQ2
ON JointDateTime.DateTime = ACOQ2.DateTime
You need a plain old FULL OUTER JOIN like this.
SELECT COALESCE(A1.DateTime,A2.DateTime) DateTime,ACOQ1_Pump_Running, ACOQ2_Pump_Running
FROM ACOQ1 A1
FULL OUTER JOIN ACOQ2 A2
ON A1.DateTime = A2.DateTime
This will give you NULL for ACOQ1_Pump_Running, ACOQ2_Pump_Running for rows which do not match the date in the corresponding table. If you need 0 just use COALESCE or ISNULL.
Side Note: : In your script, I can see your are using DateTime CHAR(50). Please use appropriate types

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

SQL payments matrix

I want to combine two tables into one:
The first table: Payments
id | 2010_01 | 2010_02 | 2010_03
1 | 3.000 | 500 | 0
2 | 1.000 | 800 | 0
3 | 200 | 2.000 | 300
4 | 700 | 1.000 | 100
The second table is ID and some date (different for every ID)
id | date |
1 | 2010-02-28 |
2 | 2010-03-01 |
3 | 2010-01-31 |
4 | 2011-02-11 |
What I'm trying to achieve is to create table which contains all payments before the date in ID table to create something like this:
id | date | T_00 | T_01 | T_02
1 | 2010-02-28 | 500 | 3.000 |
2 | 2010-03-01 | 0 | 800 | 1.000
3 | 2010-01-31 | 200 | |
4 | 2010-02-11 | 1.000 | 700 |
Where T_00 means payment in the same month as 'date' value, T_01 payment in previous month and so on.
Is there a way to do this?
EDIT:
I'm trying to achieve this in MS Access.
The problem is that I cannot connect name of the first table's column with the date in the second (the easiest way would be to treat it as variable)
I added T_00 to T_24 columns in the second (ID) table and was trying to UPDATE those fields
set T_00 =
iif(year(date)&"_"&month(date)=2010_10,
but I realized that that would be to much code for access to handle if I wanted to do this for every payment period and every T_xx column.
Even if I would write the code for T_00 I would have to repeat it for next 23 periods.
Your Payments table is de-normalized. Those date columns are repeating groups, meaning you've violated First Normal Form (1NF). It's especially difficult because your field names are actually data. As you've found, repeating groups are a complete pain in the ass when you want to relate the table to something else. This is why 1NF is so important, but knowing that doesn't solve your problem.
You can normalize your data by creating a view that UNIONs your Payments table.
Like so:
CREATE VIEW NormalizedPayments (id, Year, Month, Amount) AS
SELECT id,
2010 AS Year,
1 AS Month,
2010_01 AS Amount
FROM Payments
UNION ALL
SELECT id,
2010 AS Year,
2 AS Month,
2010_02 AS Amount
FROM Payments
UNION ALL
SELECT id,
2010 AS Year,
3 AS Month,
2010_03 AS Amount
FROM Payments
And so on if you have more. This is how the Payments table should have been designed in the first place.
It may be easier to use a date field with the value '2010-01-01' instead of a Year and Month field. It depends on your data. You may also want to add WHERE Amount IS NOT NULL to each query in the UNION, or you might want to use Nz(2010_01,0.000) AS Amount. Again, it depends on your data and other queries.
It's hard for me to understand how you're joining from here, particularly how the id fields relate because I don't see how they do with the small amount of data provided, so I'll provide some general ideas for what to do next.
Next you can join your second table with this normalized Payments table using a method similar to this or a method similar to this. To actually produce the result you want, include a calculated field in this view with the difference in months. Then, create an actual Pivot Table to format your results (like this or like this) which is the proper way to display data like your tables do.