Randomly Populating Foreign Key In Sample Data Set - sql

I'm generating test data for a new database, and I'm having trouble populating one of the foreign key fields. I need to create a relatively large number (1000) of entries in a table (SurveyResponses) that has a foreign key to a table with only 6 entries (Surveys)
The database already has a Schools table that has a few thousand records. For arguments sake lets say it looks like this
Schools
+----+-------------+
| Id | School Name |
+----+-------------+
| 1 | PS 1 |
| 2 | PS 2 |
| 3 | PS 3 |
| 4 | PS 4 |
| 5 | PS 5 |
+----+-------------+
I'm creating a new Survey table. It will only have about 3 rows.
Survey
+----+-------------+
| Id | Col2 |
+----+-------------+
| 1 | 2014 Survey |
| 2 | 2015 Survey |
| 3 | 2016 Survey |
+----+-------------+
SurveyResponses simply ties a school to a survey.
Survey Responses
+----+----------+----------+
| Id | SchoolId | SurveyId |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 1 |
| 4 | 4 | 3 |
| 5 | 5 | 2 |
+----+----------+----------+
Populating the SurveyId field is what's giving me the most trouble. I can randomly select 1000 Schools, but I haven't figured out a way to generate 1000 random SurveyIds. I've been trying to avoid a while loop, but maybe that's the only option?
I've been using Red Gate SQL Data Generator to generate some of my test data, but in this case I'd really like to understand how this can be done with raw SQL.

Here is one way, using a correlated subquery to get a random survey associated with each school:
select s.schoolid,
(select top 1 surveyid
from surveys
order by newid()
) as surveyid
from schools s;
Note: This doesn't seem to work. Here is a SQL Fiddle showing the non-workingness. I am quite surprised it doesn't work, because newid() should be a
EDIT:
If you know the survey ids have no gaps and start with 1, you can do:
select 1 + abs(checksum(newid()) % 3) as surveyid
I did check that this does work.
EDIT II:
This appears to be overly aggressive optimization (in my opinion). Correlating the query appears to fix the problem. So, something like this should work:
select s.schoolid,
(select top 1 surveyid
from surveys s2
where s2.surveyid = s.schoolid or s2.surveyid <> s.schoolid -- nonsensical condition to prevent over optimization
order by newid()
) as surveyid
from schools s;
Here is a SQL Fiddle demonstrating this.

Related

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Why is this Query not Updateable?

I was looking to provide an answer to this question in which the OP has two tables:
Table1
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | |
| 2 | |
| 3 | |
+--------+--------+
Table2
+----+--------+--------+--------+
| ID | testID | stepID | status |
+----+--------+--------+--------+
| 1 | 1 | 1 | pass |
| 2 | 1 | 2 | fail |
| 3 | 1 | 3 | pass |
| 4 | 2 | 1 | pass |
| 5 | 2 | 2 | pass |
| 6 | 3 | 1 | fail |
+----+--------+--------+--------+
Here, the OP is looking to update the status field for each testID in Table1 with pass if the status of all stepID records associated with the testID in Table2 have a status of pass, else Table1 should be updated with fail for that testID.
In this example, the result should be:
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | fail |
| 2 | pass |
| 3 | fail |
+--------+--------+
I wrote the following SQL code in an effort to accomplish this:
update Table1 a inner join
(
select
b.testID,
iif(min(b.status)=max(b.status) and min(b.status)='pass','pass','fail') as v
from Table2 b
group by b.testID
) c on a.testID = c.testID
set a.testStatus = c.v
However, MS Access reports the all-too-familiar, 'operation must use an updateable query' response.
I know that a query is not updateable if there is a one-to-many relationship between the record being updated and the set of values, but in this case, the aggregated subquery would yield a one-to-one relationship between the two testID fields.
Which left me asking, why is this query not updateable?
You're joining in a query with an aggregate (Max).
Aggregates are not updateable. In Access, in an update query, every part of the query has to be updateable (with the exception of simple expressions, and subqueries in WHERE part of your query), which means your query is not updateable.
You can work around this by using domain aggregates (DMin and DMax) instead of real ones, but this query will take a large performance hit if you do.
You can also work around it by rewriting your aggregates to take place in an EXISTS or NOT EXISTS clause, since that's part of the WHERE clause thus doesn't need to be updateable. That would likely minimally affect performance, but means you have to split this query in two: 1 query to set all the fields to "pass" that meet your condition, another to set them to "fail" if they don't.

SQL / Oracle to Tableau - How to combine to sort based on two fields?

I have tables below as follows:
tbl_tasks
+---------+-------------+
| Task_ID | Assigned_ID |
+---------+-------------+
| 1 | 8 |
| 2 | 12 |
| 3 | 31 |
+---------+-------------+
tbl_resources
+---------+-----------+
| Task_ID | Source_ID |
+---------+-----------+
| 1 | 4 |
| 1 | 10 |
| 2 | 42 |
| 4 | 8 |
+---------+-----------+
A task is assigned to at least one person (denoted by the "assigned_ID") and then any number of people can be assigned as a source (denoted by "source_ID"). The ID numbers are all linked to names in another table. Though the ID numbers are named differently, they all return to the same table.
Would there be any way for me to combine the two tables based on ID such that I could search based on someone's ID number? For example- if I decide to search on or do a WHERE User_ID = 8, in order to see what Tasks that 8 is involved in, I would get back Task 1 and Task 4.
Right now, by joining all the tables together, I can easily filter on "Assigned" but not "Source" due to all the multiple entries in the table.
Use union all:
select distinct task_id
from ((select task_id, assigned_id as id
from tbl_tasks
) union all
(select task_id, source_id
from tbl_resources
)
) ti
where id = ?;
Note that this uses select distinct in case someone is assigned to the same task in both tables. If not, remove the distinct.

SQL - Selecting all latest unique records

I'm struggling a bit at creating an SQL query to select some records from an Access Database (using Excel VBA).
A cut of one of the tables (let's call it 'table1') has the following columns:
| my_id | your_id | phase |
| 1 | 1 | Open |
| 2 | 1 | Close |
| 3 | 2 | Open |
| 4 | 3 | Close |
| 5 | 2 | Close |
| 6 | 3 | Open |
The field 'my_id' will always be a unique value whereas the 'your_id' field may contain duplicates.
What I would like to do is select everything from the table for the most recent record of the 'your_id' where the phase is 'Close'. So that means in the above example table it would select 5, 4 & 2.
Hope this makes sense, sorry if not - I'm struggling to articulate what I mean!
Thanks
Although from ur example if u just add where conditin as phase='Close' u will get the records of 5,4 and 2. But I am assuming that there might be scenarios (not in ur example) where more than 1 record can come with status as Close for any given your_id so query should look like this
Select * from table1 where my_id in (
Select Max(My_Id) from table1 where phase='Close' group by your_id)

MySQL Advanced SELECT help

Alright well I recently got into normalizing my database for this little side project that I have been creating for a while now, but I've just hit a brick wall. I'll try to give an understandable example of what I have and what I need to accomplish ― and hopefully it won't be too painful. OK.
I have 3 tables the first one we will call Shows, structured something like this:
+----+--------------------------+
| id | title |
+----+--------------------------+
| 1 | Example #1 |
| 2 | Example #2 |
| 3 | Example #3 |
+----+--------------------------+
Plain and simple.
My next table is called Categories, and lookes like this:
+----+--------------------------+
| id | category |
+----+--------------------------+
| 1 | Comedy |
| 2 | Drama |
| 3 | Action |
+----+--------------------------+
And a final table called Show_categories:
+---------+---------+
| show_id | cat_id |
+---------+---------+
| 1 | 1 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
+---------+---------+
As you may have noticed the problem is the in my database a single show can have multiple categories. Everything is structured fine, except for the fact that I can't find a why to search for show with multiple categories.
If I were to search for action and comedy type shows I would be given Example #1, but it is not possible (at least with my queries), because the cat_id's inside the Show_categories are in different rows.
Example of a working single category search (Selecting all comedy shows):
SELECT s.id,s.title
FROM Shows s JOIN Show_categories sc ON sc.anid=s.id
WHERE sc.cat_id=1 GROUP BY s.id
And a query that is impossible (because cat_id can't equal 2 different things):
SELECT s.id,s.title
FROM Shows s JOIN Show_categories sc ON sc.anid=s.id
WHERE sc.cat_id=1 AND sc.cat_id=2 GROUP BY s.id
So to sum things up what I am asking is how do I handle a query where I am looking for a show based on multiple matching categories.
Use:
SELECT s.id,
s.title
FROM SHOWS s
JOIN SHOW_CATEGORIES sc ON sc.anid = s.id
WHERE sc.cat_id IN (1, 2)
GROUP BY s.id, s.title
HAVING COUNT(DISTINCT sc.cat_id) = 2
The COUNT(DISTINCT sc.cat_id) comparison needs to equal the number of cat_id values listed in the IN clause. But if both the SHOW_CATEGORIES show_id and cat_id columns are either the primary key, or there's a unique constraint on both columns -- then you can use COUNT(sc.cat_id).
You need an OR statement.
SELECT s.id,s.title
FROM Shows s JOIN Show_categories sc ON sc.anid=s.id
WHERE sc.cat_id=1 OR sc.cat_id=2 GROUP BY s.id
That is, you want all shows with either catid 1 OR catid 2. So this query will return 1, 2 and 3.