N to N relationship using three source tables - sql

I'm a student and I'm making a project, but I have a problem, that I cannot resolve. I've got one table with rows imported from csv file, one table with facts (GunViolences) and one with dimension (Categories - dead and injured). In the original table (imported from csv file) I've got table description, and there are long strings, which have substrings dead or injured. And now I have to connect in some way table of facts with a table of dimensions, depending on category of violence. I've created another table called ViolenceCategories to connect these tables by IDs, but I don't know how can I fill this table.
Structure:
Table FromCSV
id, date, description, address
1,12-01-2002, Shot|shotgun, address1
2,19-04-2003, injured, address2
3, 21-10-2004, shot|injured, address3
Table GunViolence
id, date, address
1, 12-01-2002, address1
2,19-04-2003, address2
3, 21-10-2004, address3
Table DimCategories
id, category
1, shot
2, injured
Table ViolenceCategories
idFact, idDim
1,1
2,2
3,2
3,1
How can I fill table VIolenceCategories?
EDIT
I've created another table to separate values of column with description
Table DimDescription
id, desc1, desc2
1, Shot, shotgun
2, injured, null
3, shot, injured

In SQL Server 2016 and newer you can use table operator STRING_SPLIT() to achieve this. That function will split a delimited string in a record into new rows for each substring. Consider:
SELECT *
FROM FromCSV
CROSS APPLY STRING_SPLIT(description, '|') AS descriptions
+----+------------+---------------------+----------+-------------+
| id | date | description | address | value |
+----+------------+---------------------+----------+-------------+
| 1 | 12-01-2002 | Shot|shotgun | address1 | Shot |
| 1 | 12-01-2002 | Shot|shotgun | address1 | shotgun |
| 2 | 19-04-2003 | three shots|injured | address2 | three shots |
| 2 | 19-04-2003 | three shots|injured | address2 | injured |
| 3 | 21-10-2004 | shot|injured | address3 | shot |
| 3 | 21-10-2004 | shot|injured | address3 | injured |
+----+------------+---------------------+----------+-------------+
SQLFiddle here
Turning this into an Insert statement to fill your ViolenceCategories table would look like:
INSERT INTO ViolenceCategories
SELECT t1.id, t2.id
FROM
(
SELECT id, value
FROM FromCSV
CROSS APPLY STRING_SPLIT(description, '|') AS descriptions
) t1
INNER JOIN DimCategories t2 ON t1.value = t2.category
SQLFiddle here

Related

How to generate a sequence of ID's based on mapping tables and values from the forms in MS-Access (Sql)?

I want to generate ID's based on the form values in MS-Access. And then for each ID generated, create a group of ID's by adding another 4 digits in the end based on a Mapping Table, representing different octets for different time points (12 ID's based on the Initial ID and the mapping Table).
For example, if the ID generated based on form values is 123456, I want to add another four digits and create a group of ID's, say from a mapping table. Like,
123456**1111**
123456**1112**
123456**1113**
and so on.
So far each primary ID, I am slapping on four digits at the end and generating a group of 12 ID's.
I am a beginner in Access and I have tried some code:
UPDATE Table1 SET GenID = UPDATE Table1 SET Table1.GenID = t1 (SELECT Map.V FROM MAP as t1)
However, I get a error that Access does not recognize Map as a valid Field or expression. I am able to break down the problem into this. But could not find a way further and design a query.
Sample Data: (The short_ID and Long_ID tables, uses the mapping tables below each of them as shown.)
Short ID Table:
----------------------------------------------------
ID | Subject_ID | Organ_Type | Category | Short_ID
-----------------------------------------------------
1 | 100 | Kidney | A | 100200300
-----------------------------------------------------
2 | 400 | Heart | B | 400500600
Mapping Tables for Short ID:
Map1 for Table1:
---------------------
Map_from | Map_to |
---------------------
Kidney | 200 |
Heart | 500 |
---------------------
Map2 for Table1:
-----------------------------
Map_cat_from | Map_cat_to |
-----------------------------
A | 300 |
B | 600 |
-----------------------------
Long ID Table:(shown here are just examples for 2 time points rather than 12)
---------------------------------------------------
Subject_ID | Short_ID | Long_ID Timepoint |
---------------------------------------------------
100 | 100200300 | 1002003000001 |
---------------------------------------------------
100 | 100200300 | 1002003000002 |
---------------------------------------------------
400 | 400500600 | 4005006000001 |
---------------------------------------------------
400 | 400500600 | 4005006000002
Timepoint Map for Long ID Table:
------------------------------
Timepoint | Value_to_append |
------------------------------
1 | 0001 |
------------------------------
2 | 0002 |
I need to generate these short and long ID's from the mapping tables directly when input is given in the form. (Category, Organ_Type, Subject_ID)
tldr
generate id from mapping table and form values (id creation)
add four digits at the end and create a group of 12 id's (long id creation) based on a mapping table (which has the 12 four digits that is to be appended in the end)
First, create a query, QShortID:
SELECT
Table1.ID,
Table1.Subject_ID,
Table1.Organ_Type,
Table1.Category,
[Subject_ID] & [Map_to] & [Map_cat_to] AS Short_ID
FROM
(Table1
INNER JOIN
Map1
ON Table1.Organ_Type = Map1.Map_from)
INNER JOIN
Map2
ON Table1.Category = Map2.Map_cat_from;
Output:
Next, create a query, Dozen, that will return 12 rows:
SELECT DISTINCT
Abs([id] Mod 12) AS N
FROM
MSysObjects;
Finally, create a Cartesian (multiplying) query, QLongID:
SELECT Table1.ID, Table1.Subject_ID, Table1.Organ_Type, Table1.Category, [Subject_ID] & [Map_to] & [Map_cat_to] AS Short_ID
FROM (Table1 INNER JOIN Map1 ON Table1.Organ_Type = Map1.Map_from) INNER JOIN Map2 ON Table1.Category = Map2.Map_cat_from;
SELECT
QShortID.Subject_ID,
QShortID.Short_ID,
[Short_ID] & Format([N] + 1, "0000") AS Long_ID
FROM
QShortID,
Dozen
ORDER BY
[Short_ID] & Format([N] + 1, "0000");
Output:
Edit:
To use the timepoint mapping, use:
SELECT
QShortID.Subject_ID,
QShortID.Short_ID,
[Short_ID] & [Value_to_append] AS Long_ID
FROM
QShortID,
TimepointMap
ORDER BY
[Short_ID] & [Value_to_append];
Output:

Get column value that has combination of other column values across many rows

I have to write a query that retrieves the list of "brand" that have a a combination of "name" and "id" equal to a specific value for both "name" and "id".
+--------+-------+-----+
| brand | name | id |
+--------+-------+-----+
| brand1 | david | 123 |
| brand1 | john | 456 |
| brand1 | adam | 789 |
| brand2 | david | 123 |
| brand2 | stacy | 999 |
+--------+-------+-----+
For example, I want all brands that have rows that match
name = "david", id = 123 and
name = "john", id = 456
In the table above, "brand1" has rows that have those two combinations of "name" and "id".
-- desired result
+--------+
| brand |
+--------+
| brand1 |
+--------+
"Brand2" has "name = david, id = 123" but not "name = john, id = 456" so I dont want it.
You can use aggregation and having:
select brand
from t
where (name = 'david' and id = 123) or
(name = 'john' and id = 456)
group by brand
having count(distinct name) = 2;
A possible solution. I created a pairs table with data like below:
david, 123
john, 456
Then the query becomes:
select t1.brand
from brand_table t1
join pairs t2 on t1.name = t2.name
and t1.id = t2.id
group by t1.brand
having count(distinct t1.name) = (select count(*) from pairs)
I was able to use Gordons answer and modified it for my use.
select brand
from t
where concat(name,id) in (:nameIdConcatStringsList)
group by brand
having count(brand) = :sizeOf_nameIdConcatStringsList;
Basically, I created concatenated a list of strings of the name and id combinations I wanted. Example: (david123, john456)
I then find rows that the two columns concatenated equal the input list. Then I group and make sure that brand has a count equal to the size of the input list.
Edit. I did not see the tag that this was supposed to be a sql question and assumed it to be related to pandas. My bad!! But instead of deleting it, I will let it be here, in case someone needs it to be implemented in pandas.
To select those with a name and id belonging to a list
df[(df['brand'].isin(name_list)) & (df['id'].isin(id_list))]['brand']
For a particular name and id:
df[(df['brand']==name)& (df['id']==id)]['brand']

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data
Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().
In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).
So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

Best Way to Join One Column on Columns From Two Other Tables

I have a schema like the following in Oracle
Section:
+--------+----------+
| sec_ID | group_ID |
+--------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+--------+----------+
Section_to_Item:
+--------+---------+
| sec_ID | item_ID |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
+--------+---------+
Item:
+---------+------+
| item_ID | data |
+---------+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+---------+------+
Item_Version:
+---------+----------+--------+
| item_ID | start_ID | end_ID |
+---------+----------+--------+
| 1 | 1 | |
| 2 | 1 | 3 |
| 3 | 2 | |
| 4 | 1 | 2 |
+---------+----------+--------+
Section_to_Item has FK into Section and Item on the *_ID columns.
Item_version is indexed on item_ID but has no FK to Item.item_ID (ran out of space in the snapshot group).
I have code that receives a list of version IDs and I want to get all items in sections in a given group that are valid for at least one of the versions passed in. If an item has no end_ID, it's valid for anything starting with start_ID. If it has an end_id, it's valid for anything up until (not including) end_ID.
What I currently have is:
SELECT Items.data
FROM Section, Section_to_Items, Item, Item_Version
WHERE Section.group_ID = 1
AND Section_to_Item.sec_ID = Section.sec_ID
AND Item.item_ID = Section_to_Item.item_ID
AND Item.item_ID = Item_Version.item_ID
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that the UNION ALL statement is dynamically generated from the list of passed in versions.
This query currently does a cartesian join and is very slow.
For some reason, if I change the query to join
AND Item_Version.item_ID = Section_to_Item.item_ID
which is not a FK, the query does not do the cartesian join and is much faster.
A) Can anyone explain why this is?
B) Is this the right way to be joining this sequence of tables (I feel weird about joining Item.item_ID to two different tables)
C) Is this the right way to get versions between start_ID and end_ID?
Edit
Same query with inner join syntax:
SELECT Items.data
FROM Item
INNER JOIN Section_to_Items ON Section_to_Items.item_ID = Item.item_ID
INNER JOIN Section ON Section.sec_ID = Section_to_Items.sec_ID
INNER JOIN Item_Version ON Item_Version.item_ID = Item_.item_ID
WHERE Section.group_ID = 1
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that in this case the performance difference comes from joining on Item_Version first and then joining Section_to_Item on Item_Version.item_ID.
In terms of table size, Section_to_Item, Item, and Item_Version should be similar (1000s) while Section should be small.
Edit
I just found out that apparently, the schema has no FKs. The FKs specified in the schema configuration files are ignored. They're just there for documentation. So there's no difference between joining on a FK column or not. That being said, by changing the joins into a cascade of SELECT INs, I'm able to avoid joining the entire Item table twice. I don't love the resulting query, and I don't really understand the difference, but the stats indicate it's much less work (changes the A-Rows returned from the inner most scan on Section from 656,000 to 488 (it used to be 656k starts returning 1 row, now it's 488 starts returning 1 row)).
Edit
It turned out to be stale statistics - the two queries were equivalent the whole time but with the incomplete statistics, the DB happened to notice the correct plan only in the second instance. After updating statistics, both queries generated the same plan.
I'm not sure if this is the best idea but this seems to avoid the cartesian join:
select data
from Item
where item_ID in (
select item_ID
from Item_Version
where item_ID in (
select item_ID
from Section_to_Item
where sec_ID in (
select sec_ID
from Section
where group_ID = 1
)
)
and exists (
select 1
from (
select 2 as version
from dual
union all
select 3 as version
from dual
) versions
where versions.version >= start_ID
and (end_ID is null or versions.version <)
)
)

Splitting a string column in BigQuery

Let's say I have a table in BigQuery containing 2 columns. The first column represents a name, and the second is a delimited list of values, of arbitrary length. Example:
Name | Scores
-----+-------
Bob |10;20;20
Sue |14;12;19;90
Joe |30;15
I want to transform into columns where the first is the name, and the second is a single score value, like so:
Name,Score
Bob,10
Bob,20
Bob,20
Sue,14
Sue,12
Sue,19
Sue,90
Joe,30
Joe,15
Can this be done in BigQuery alone?
Good news everyone! BigQuery can now SPLIT()!
Look at "find all two word phrases that appear in more than one row in a dataset".
There is no current way to split() a value in BigQuery to generate multiple rows from a string, but you could use a regular expression to look for the commas and find the first value. Then run a similar query to find the 2nd value, and so on. They can all be merged into only one query, using the pattern presented in the above example (UNION through commas).
Trying to rewrite Elad Ben Akoune's answer in Standart SQL, the query becomes like this;
WITH name_score AS (
SELECT Name, split(Scores,';') AS Score
FROM (
(SELECT * FROM (SELECT 'Bob' AS Name ,'10;20;20' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Sue' AS Name ,'14;12;19;90' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Joe' AS Name ,'30;15' AS Scores))
))
SELECT name, score
FROM name_score
CROSS JOIN UNNEST(name_score.score) AS score;
And this outputs;
+------+-------+
| name | score |
+------+-------+
| Bob | 10 |
| Bob | 20 |
| Bob | 20 |
| Sue | 14 |
| Sue | 12 |
| Sue | 19 |
| Sue | 90 |
| Joe | 30 |
| Joe | 15 |
+------+-------+
If someone is still looking for an answer
select Name,split(Scores,';') as Score
from (
# replace the inner custome select with your source table
select *
from
(select 'Bob' as Name ,'10;20;20' as Scores),
(select 'Sue' as Name ,'14;12;19;90' as Scores),
(select 'Joe' as Name ,'30;15' as Scores)
);