How to join these tables with conditional joins - sql

I saw some useful tips in the web, however I still have some questions.
This is the "main" part of the new site we are creating, it is based on SQL SERVER 2012, the "TAREAS" table is the main key table, which has a self join. I found a way to search for the "tree" of the table, TAREA=TASK, Spanish to English, so basically it is a task manager, on which one task could be part of a primary task, or be a secondary task which can have more "child" tasks. I did it using Common table expressions.
the thing here is on the ID_TipoTarea (TaskType) on TAREAS table, can be on one specific type of task, for example on the diagram there are 2 types availables (but there are and will be more), TipoTareaDesarrollo or TipoTareaEventoSalon, the ID_TipoTarea cant be on both tables, so if ID_TipoTarea=1 then I join on TIpoTareaDesarrollo, if ID_TipoTarea=2 then I join on TipoTareaEventoSalon and so on ID_TipoTarea=3 to another table, and there will be more types, can you help me out?.
how can it be achieved using this query (this is the query to get all the levels on the main table, but I need the conditional joins).
with tareasCTE (id_tarea,id_tareaorigen,id_tipoTarea,nivel)
as(
select *,0 as nivel from tareas t
where id_tarea=#ID_Tarea
union all
select t2.*,nivel+1 from tareasCTE t
inner join tareas t2
on t.id_tarea=t2.id_tareaOrigen
)
I get this output
ID_Tarea, ID_TareaORigen, Nivel, ID_TipoTarea
3 NULL 0 null (no join)
4 3 1 1 (join this one with TipoTareaDesarrollo)
5 3 1 1 (join this one with TipoTareaDesarrollo)
6 3 1 3 (join this one with AnotherTable)
7 4 2 2 (join this one with TipoTareaEventoSalon)
8 4 2 2 (join this one with TipoTareaEventoSalon)
9 4 2 4 (join this one with AnotherTable2)
10 9 3 1 (join this one with TipoTareaDesarrollo)
11 9 3 1 (join this one with TipoTareaDesarrollo)
12 9 3 null (no Join)
13 12 4 1 (join this one with TipoTareaDesarrollo)
14 12 4 2 (join this one with TipoTareaEventoSalon)
15 12 4 2 (join this one with TipoTareaEventoSalon)

You can combine tables TipoTareaDesarrollo, TipoTareaEventoSalon, AnotherTable, AnotherTable2 into a single table using the UNION clause and package this in a second CTE as such:
WITH TipoAreasCTE as
(
SELECT * FROM TipoTareaDesarrollo
UNION
SELECT * FROM TipoTareaEventoSalon
UNION
SELECT * FROM AnotherTable
UNION
SELECT * FROM AnotherTable2
)
You can then join tareasCTE to TipoAreasCTE.
Note that the different tables in the UNION must have the same number of columns with the same datatypes; if not you must use a SELECT list and perhaps CAST the datatypes to make them similar.

Related

Find 'Most Similar' Items in Table by Foreign Key

I have a child table with a number of charact/value pairs for a given 'material' (MaterialID). Any material can have a number of charact values and may have several of the same name (see id's 2,3).
The table has a large number of records (8+ million). What I'm trying to do is find the materials that are the most similar to a supplied material. That is, when I supply a MaterialID, I would like an ordered list of the most similar other materials (those with the most matching charact/value pairs).
I've done some research but, I may be missing some key terms or just not conceptualizing the problem correctly.
Any hints as to how to go about this would be very much appreciated.
ID MaterialID Charact Value
1 1 ROT_DIR CCW
2 1 SPECIAL_FEATURE CATALOG_CP
3 1 SPECIAL_FEATURE CHROME
4 1 SCHEDULE 80
5 2 BEARING_TYPE SB
6 2 SCHEDULE 80
7 3 ROT_DIR CCW
8 3 SPECIAL_FEATURE CATALOG_HSB
9 3 BEARING_TYPE SP
10 4 NDE_STYLE W_FAN
11 4 BEARING_TYPE SB
12 4 ROT_DIR CW*
You can do this with a self join:
select t.materialid, count(*) as nummatches
from t join
t tmat
on t.Charact = tmat.Charact and t.value = tmat.value
where tmat.materialid = #MaterialId
group by t.materialid
order by nummatches desc;
Notes:
You might want to remove the specified material, by adding where t.MaterialId <> tmat.MaterialId to the where clause.
If you want all materials, then make the join a left join and move the where condition to the on clause.
If you want only one material with the most matches, use select top 1.
If you want all materials with the most matches when there are ties, use `select top (1) with ties.

Delete duplicates when the duplicates are not in the same column

Here is a sample of my data (n>3000) that ties two numbers together:
id a b
1 7028344 7181310
2 7030342 7030344
3 7030354 7030353
4 7030343 7030345
5 7030344 7030342
6 7030364 7008059
7 7030659 7066051
8 7030345 7030343
9 7031815 7045692
10 7032644 7102337
Now, the problem is that id=2 is a duplicate of id=5 and id=4 is a duplicate of id=8. So, when I tried to write if-then statements to map column a to column b, basically the numbers just get swapped. There are many cases like this in my full data.
So, my question is to identify the duplicate(s) and somehow delete one of the duplicates (either id=2 or id=5). And I preferably want to do this in Excel but I could work with SQL Server or SAS, too.
Thank you in advance. Please comment if my question is not clear.
What I want:
id a b
1 7028344 7181310
2 7030342 7030344
3 7030354 7030353
4 7030343 7030345
6 7030364 7008059
7 7030659 7066051
9 7031815 7045692
10 7032644 7102337
All sorts of ways to do this.
In SAS or SQL, this is simple (for SQL Server, the SQL portion should be identical or nearly so):
data have;
input id a b;
datalines;
1 7028344 7181310
2 7030342 7030344
3 7030354 7030353
4 7030343 7030345
5 7030344 7030342
6 7030364 7008059
7 7030659 7066051
8 7030345 7030343
9 7031815 7045692
10 7032644 7102337
;;;;
run;
proc sql undopolicy=none;
delete from have H where exists (
select 1 from have V where V.id < H.id
and (V.a=H.a and V.b=H.b) or (V.a=H.b and V.b=H.a)
);
quit;
The excel solution would require creating an additional column I believe with the concatenation of the two strings, in order (any order will do) and then a lookup to see if that is the first row with that value or not. I don't think you can do it without creating an additional column (or using VBA, which if you can use that will have a fairly simple solution as well).
Edit:
Actually, the excel solution IS possible without creating a new column (well, you need to put this formula somewhere, but without ANOTHER additional column).
=IF(OR(AND(COUNTIF(B$1:B1,B2),COUNTIF(C$1:C1,C2)),AND(COUNTIF(B$1:B1,C2),COUNTIF(C$1:C1,B2))),"DUPLICATE","")
Assuming ID is in A, B and C contain the values (and there is no header row). That formula goes in the second row (ie, B2/C2 values) and then is extended to further rows (so row 36 will have the arrays be B1:B35 and C1:C35 etc.). That puts DUPLICATE in the rows which are duplicates of something above and blank in rows that are unique.
I haven't tested this out but here is some food for thought, you could join the table against itself and get the ID's that have duplicates
SELECT
id, a, b
FROM
[myTable]
INNER JOIN ( SELECT id, a, b FROM [myTable] ) tbl2
ON [myTable].a = [tbl2].b
OR [myTable].b = tbl2.a

Finding contiguous regions in a sorted MS Access query

I am a long time fan of Stack Overflow but I've come across a problem that I haven't found addressed yet and need some expert help.
I have a query that is sorted chronologically with a date-time compound key (unique, never deleted) and several pieces of data. What I want to know is if there is a way to find the start (or end) of a region where a value changes? I.E.
DateTime someVal1 someVal2 someVal3 target
1 3 4 A
1 2 4 A
1 3 4 A
1 2 4 B
1 2 5 B
1 2 5 A
and my query returns rows 1, 4 and 6. It finds the change in col 5 from A to B and then from B back to A? I have tried the find duplicates method and using min and max in the totals property however it gives me the first and last overall instead of the local max and min? Any similar problems?
I didn't see any purpose for the someVal1, someVal2, and someVal3 fields, so I left them out. I used an autonumber as the primary key instead of your date/time field; but this approach should also work with your date/time primary key. This is the data in my version of your table.
pkey_field target
1 A
2 A
3 A
4 B
5 B
6 A
I used a correlated subquery to find the previous pkey_field value for each row.
SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m;
Then put that in a subquery which I joined to another copy of the base table.
SELECT
sub.pkey_field,
sub.target,
sub.prev_pkey_field,
prev.target AS prev_target
FROM
(SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m) AS sub
LEFT JOIN YourTable AS prev
ON sub.prev_pkey_field = prev.pkey_field
WHERE
sub.prev_pkey_field Is Null
OR prev.target <> sub.target;
This is the output from that final query.
pkey_field target prev_pkey_field prev_target
1 A
4 B 3 A
6 A 5 B
Here is a first attempt,
SELECT t1.Row, t1.target
FROM t1 WHERE (((t1.target)<>NZ((SELECT TOP 1 t2.target FROM t1 AS t2 WHERE t2.DateTimeId<t1.DateTimeId ORDER BY t2.DateTimeId DESC),"X")));

Use multiple counts in SQL Server 2005

select p.intprojectid, p.vcprojectname, md.intmoduleid,
md.vcmodulename, md.intscreensfunc, md.vcname
from projects as p
left join (select m.intprojectid, m.intmoduleid, m.vcmodulename,
s.intscreensfunc, s.vcname
from modules as m
left join screens_func as s on m.intmoduleid = s.intmoduleid) md
on p.intprojectid = md.intprojectid
This query will return:
no |project-name|mod-id|mod-name | screen-id | screen-name
----------------------------------------------------------------
2 Project-1 4 mod-1 11 scr1
2 Project-1 4 mod-1 12 scr2
2 Project-1 4 mod-1 13 scr3
2 Project-1 4 mod-1 14 scr4
2 Project-1 8 Module-2 NULL NULL
Now I want to count no.of mod-name and no.of.screen-name in project-1. i.e. I want the query to return
project-name no.of.mod no.of.screen
------------------------------------------------
Project-1 2 4
It's definitely possible to return multiple counts.
In other words, your query could be modified as follows:
select p.vcprojectname, COUNT(DISTINCT md.intmoduleid) as no.of.mod, COUNT(md.intscreensfunc) as no.of.screen
from projects as p
left join (select m.intprojectid, m.intmoduleid, m.vcmodulename, s.intscreensfunc, s.vcname
from modules as m
left join screens_func as s
on m.intmoduleid=s.intmoduleid)md
on p.intprojectid=md.intprojectid
GROUP BY p.vcprojectname
Based on your example data, I inferred that there would be a one-many relationship between modules and screens and thus you would want a distinct count for modules but that the same requirement would not be needed for screens (since it appears that one screen would not appear multiple times in a given module) If that is not the case, you can also add distinct to the count of screens.

Why does this query return "incorrect" results?

I have 3 tables:
'CouponType' table:
AutoID Code Name
1 CouT001 SunCoupon
2 CouT002 GdFriCoupon
3 CouT003 1for1Coupon
'CouponIssued' table:
AutoID CouponNo CouponType_AutoID
1 Co001 1
2 Co002 1
3 Co003 1
4 Co004 2
5 Co005 2
6 Co006 2
'CouponUsed' table:
AutoID Coupon_AutoID
1 2
2 3
3 5
I am trying to join 3 tables together using this query below but apparently I am not getting right values for CouponIssued column:
select CouponType.AutoID, Code, Name, Count(CouponIssued.CouponType_AutoID), count(CouponUsed.Coupon_AutoID)
from (CouponType left join CouponIssued
on (CouponType.AutoID = CouponIssued.CouponType_AutoID))
left join CouponUsed
on (couponUsed.Coupon_AutoID = CouponIssued.AutoID)
group by CouponType.AutoID, code, name
order by code
The expected result should be like:
**Auto ID Code Name Issued used**
1 CouT001 SunCoupon 3 2
2 CouT002 GdFriCoupon 3 1
3 CouT003 1for1Coupon 0 0
Thanks!
SELECT t.AutoID
,t.Code
,t.Name
,count(i.CouponType_AutoID) AS issued
,count(u.Coupon_AutoID) AS used
FROM CouponType t
LEFT JOIN CouponIssued i ON i.CouponType_AutoID = t.AutoID
LEFT JOIN CouponUsed u ON u.Coupon_AutoID = i.AutoID
GROUP BY 1,2,3;
You might consider using less confusing names for your table columns. I have made very good experiences with using the same name for the same data across tables (as far as sensible).
In your example, AutoID is used for three different columns, two of which appear a second time in another table under a different name. This would still make sense if Coupon_AutoID was named CouponIssued_AutoID instead.
change count(Coupon.CouponType_AutoID) to count(CouponIssued.CouponType_AutoID) and count(Coupon.Coupon_AutoID) to count(CouponUsed.Coupon_AutoID)