Combining 1:M tables with removing duplicate fields

Combining 1:M tables with removing duplicate fields - sql

Is it possible to display 2 column with no direct relationship side by side without a 'product' (column1xcolumn2)?
check this. http://sqlfiddle.com/#!3/212b6/1
I am trying create a query that displays all group_id,website and History with minimum duplicate fields. I have 3 tables- Group, Website and History but it displays cartesian join. If I have 4 websites and 3 History for 1 group. It displays 12 records. I want something lik this :
group_id | website | History
1 website1 hist1
1 website2 hist2
1 website3 hist3

You can do this in most databases by using row_number() to assign a row number. This is ANSI standard functionality. To come close to what you want:
select g.group_id, w.website, h.ma_history
from (select g.*, row_number() over (order by group_id) as seqnum
from tbl_group g
) g full outer join
(select w.*, row_number() over (order by website) as seqnum
from table_website w
) w
on g.seqnum = w.seqnum full outer join
(select h.*, row_number() over (order by ma_history) as seqnum
from table_ma_history h
) h
on h.seqnum = coalesce(g.seqnum, w.seqnum)
The difference from your desired representation is that the "missing" values are not replicated from one row to the next. Instead, they are represented as NULL. For your example:
group_id | website | History
1 website1 hist1
NULL website2 hist2
NULL website3 hist3
Since you say "something like", is this sufficiently close? Replicating the values is easier or harder depending on the database.

Related

Get specific row from each group

My question is very similar to this, except I want to be able to filter by some criteria.
I have a table "DOCUMENT" which looks something like this:
|ID|CONFIG_ID|STATE |MAJOR_REV|MODIFIED_ON|ELEMENT_ID|
+--+---------+----------+---------+-----------+----------+
| 1|1234 |Published | 2 |2019-04-03 | 98762 |
| 2|1234 |Draft | 1 |2019-01-02 | 98762 |
| 3|5678 |Draft | 3 |2019-01-02 | 24244 |
| 4|5678 |Published | 2 |2017-10-04 | 24244 |
| 5|5678 |Draft | 1 |2015-05-04 | 24244 |
It's actually a few more columns, but I'm trying to keep this simple.
For each CONFIG_ID, I would like to select the latest (MAX(MAJOR_REV) or MAX(MODIFIED_ON)) - but I might want to filter by additional criteria, such as state (e.g., the latest published revision of a document) and/or date (the latest revision, published or not, as of a specific date; or: all documents where a revision was published/modified within a specific date interval).
To make things more interesting, there are some other tables I want to join in.
Here's what I have so far:
SELECT
allDocs.ID,
d.CONFIG_ID,
d.[STATE],
d.MAJOR_REV,
d.MODIFIED_ON,
d.ELEMENT_ID,
f.ID FILE_ID,
f.[FILENAME],
et.COLUMN1,
e.COLUMN2
FROM DOCUMENT -- Get all document revisions
CROSS APPLY ( -- Then for each config ID, only look at the latest revision
SELECT TOP 1
ID,
MODIFIED_ON,
CONFIG_ID,
MAJOR_REV,
ELEMENT_ID,
[STATE]
FROM DOCUMENT
WHERE CONFIG_ID=allDocs.CONFIG_ID
ORDER BY MAJOR_REV desc
) as d
LEFT OUTER JOIN ELEMENT e ON e.ID = d.ELEMENT_ID
LEFT OUTER JOIN ELEMENT_TYPE et ON e.ELEMENT_TYPE_ID=et.ID
LEFT OUTER JOIN TREE t ON t.NODE_ID = d.ELEMENT_ID
OUTER APPLY ( -- This is another optional 1:1 relation, but it's wrongfully implemented as m:n
SELECT TOP 1
FILE_ID
FROM DOCUMENT_FILE_RELATION
WHERE DOCUMENT_ID=d.ID
ORDER BY MODIFIED_ON DESC
) as df -- There should never be more than 1, but we're using TOP 1 just in case, to avoid duplicates
LEFT OUTER JOIN [FILE] f on f.ID=df.FILE_ID
WHERE
allDocs.CONFIG_ID = '5678' -- Just for testing purposes
and d.state ='Released' -- One possible filter criterion, there may be others
It looks like the results are correct, but multiple identical rows are returned.
My guess is that for documents with 4 revisions, the same values are found 4 times and returned.
A simple SELECT DISTINCT would solve this, but I'd prefer to fix my query.

This would be a classic row_number & partition by question I think.
;with rows as
(
select <your-columns>,
row_number() over (partion by config_id order by <whatever you want>) as rn
from document
join <anything else>
where <whatever>
)
select * from rows where rn=1

How to query partitions that you get from using window functions?

I have a table that has the following structure
------------------------------------
|Company_ID| Company_Name| Join_Key|
------------------------------------
| 1 | ACompany | AC |
| 2 | BCompany | BC |
While this table doesn't have many column, there are somewhere around 4 million rows.
I want to calculate some string distance calculations on these company names. I have the following query
select a.Company_Name as Name1,
b.Company_Name as Name2,
Fuzzy_Match(a.Company_Name, b.Company_Name, 'JaccardDistance') as Jaccard --this is a custom function
from [Companies] a, [Companies] b
While something like this would work on a smaller database, since my database is so large, there is no way for me to be able to get through all of the combinations in a reasonable amount of time. So I thought about partitioning the database with a window function.
select Company_Name,
ROW_NUMBER() over(partition by Join_Key order by Join_Key asc) as row_num
Join_Key
from [Companies]
This gives me a list of the companies numbered and partitioned by their join_key, but the thing that I'm not sure of is how to do both things.
How can I perform a cross join and calculate the string similarity measures for each partition so that I'm only comparing companies that both have 'AC' as their join key?

Query to select distinct values from different tables and not have them repeat (show them as a flat file)

I'm trying to get all phones, emails, and organizations for a person and show it in a flat file format. There should be n number of rows, where n is the max count of organizations, emails, or phones. NULL values will be shown once all values have been shown in the rows, with NULL being the last values. The emails and phones can only have 1 PreferredInd per person. I want these to be on the same row (1 of them can be NULL). I've tried to do this on a more complex query, but couldn't get it to work, so I've started over using this simpler example.
Example tables and values:
#ContactPerson
Id Name
1 John Doe
#ContactEmail
Id PersonId Email PreferredInd
1 1 johndoe#us.gov 0
2 1 jdoe#us.gov 1
3 1 johndoe#gmail.com 0
#ContactPhone
Id PersonId Phone PreferredInd
1 1 888-867-5309 0
2 1 305-476-5234 1
#ContactOrganization
Id PersonId Organization
1 1 US Government
2 1 US Army
I want a resulting set to look like:
Name Organization PreferredInd Email Phone
John Doe US Government 1 jdoe#us.gov 888-867-5309
John Doe US Army 0 johndoe#us.gov 305-467-5234
John Doe NULL 0 johndoe#gmail.com NULL
The complete sql code that I have for this example is here on pastebin. It also includes code to create the sample tables. It works when the count of emails exceeds the count of organizations or phones, but that won't always be true. I can't seem to figure out how to get the result that I'm looking for. The actual tables I'm working with can have 0 or infinity emails, phones, or organizations per person. There will also be many more values, but I can fix that myself.
Can you help me fix my query or show me a simpler way to do it? If you have any questions, just let me know and I can try to answer them.

something like this?
with cte_e as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactEmail
), cte_p as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactPhone
), cte_o as (
select
*,
row_number() over(order by Organization) as rn
from ContactOrganization
), cte_d as (
select distinct rn, PersonId from cte_e union
select distinct rn, PersonId from cte_p union
select distinct rn, PersonId from cte_o
)
select
pr.Name, o.Organization, e.Email, p.Phone
from cte_d as d
left outer join ContactPerson as pr on pr.Id = d.PersonId
left outer join cte_e as e on e.PersonId = d.PersonId and e.rn = d.rn
left outer join cte_p as p on p.PersonId = d.PersonId and p.rn = d.rn
left outer join cte_o as o on o.PersonId = d.PersonId and o.rn = d.rn
sql fiddle demo
it's a bit clumsy, I can think of couple of other possible ways to do this, but I think this one is most readable one

Step 1
Write a query that does the full join of all the tables, which will end up with lots of duplicate rows for each person (for each email or phone number)
Step 2
Write a second query that uses GroupBy to group the rows, and that uses the Case or Decode keywords (like a c# switch statement) to find the preferred row value and select it as the value to display

sql merge tables side-by-side with nothing in common

I'm looking for an sql answer on how to merge two tables without anything in common.
So let's say you have these two tables without anything in common:
Guys Girls
id name id name
--- ------ ---- ------
1 abraham 5 sarah
2 isaak 6 rachel
3 jacob 7 rebeka
8 leah
and you want to merge them side-by-side like this:
Couples
id name id name
--- ------ --- ------
1 abraham 5 sarah
2 isaak 6 rachel
3 jacob 7 rebeka
8 leah
How can this be done?
I'm looking for an sql answer on how to merge two tables without anything in common.

You can do this by creating a key, which is the row number, and joining on it.
Most dialects of SQL support the row_number() function. Here is an approach using it:
select gu.id, gu.name, gi.id, gi.name
from (select g.*, row_number() over (order by id) as seqnum
from guys g
) gu full outer join
(select g.*, row_number() over (order by id) as seqnum
from girls g
) gi
on gu.seqnum = gi.seqnum;

Just because I wrote it up anyway, an alternative using CTEs;
WITH guys2 AS ( SELECT id,name,ROW_NUMBER() OVER (ORDER BY id) rn FROM guys),
girls2 AS ( SELECT id,name,ROW_NUMBER() OVER (ORDER BY id) rn FROM girls)
SELECT guys2.id guyid, guys2.name guyname,
girls2.id girlid, girls2.name girlname
FROM guys2 FULL OUTER JOIN girls2 ON guys2.rn = girls2.rn
ORDER BY COALESCE(guys2.rn, girls2.rn);
An SQLfiddle to test with.

Assuming, you want to match guys up with girls in your example, and have some sort of meaningful relationship between the records (no pun intended)...
Typically you'd do this with a separate table to represent the association (relationship) between the two.
This wouldn't give you a physical table, but it would enable you to write an SQL query representing the final results:
SELECT Girls.ID AS GirlId, Girls.Name AS GirlName, Guys.ID AS GuyId, Guys.Name AS GuyName
FROM Couples INNER JOIN
Girls ON Couples.GirlId = Girls.ID INNER JOIN
Guys ON Couples.GuyId = Guys.ID
which you could then use to create a table on the fly using the Select Into syntax
SELECT Girls.ID AS GirlId, Girls.Name AS GirlName, Guys.ID AS GuyId, Guys.Name AS GuyName
INTO MyNewTable
FROM Couples INNER JOIN
Girls ON Couples.GirlId = Girls.ID INNER JOIN
Guys ON Couples.GuyId = Guys.ID
(But standard Normalization rules would say it's best to keep them in distinct tables rather than creating a temp table, unless there's a performance reason not to do so.)

I need this all the time, -- creating templates in Excel using input from my tables. This pulls from one table that has my regions, the other with the quarters in a year. the result gives me one region name for each quarter/period.
SELECT b.quarter_qty, a.mkt_name FROM TBL_MKTS a, TBL_PERIODS b

Select a row used for GROUP BY

I have this table:
id | owner | asset | rate
-------------------------
1 | 1 | 3 | 1
2 | 1 | 4 | 2
3 | 2 | 3 | 3
4 | 2 | 5 | 4
And i'm using
SELECT asset, max(rate)
FROM test
WHERE owner IN (1, 2)
GROUP BY asset
HAVING count(asset) > 1
ORDER BY max(rate) DESC
to get intersection of assets for specified owners with best rate.
I also need id of row used for max(rate), but i can't find a way to include it to SELECT. Any ideas?
Edit:
I need
Find all assets that belongs to both owners (1 and 2)
From the same asset i need only one with the best rate (3)
I also need other columns (owner) that belongs to the specific asset with best rate
I expect the following output:
id | asset | rate
-------------------------
3 | 3 | 3
Oops, all 3s, but basically i need id of 3rd row to query the same table again, so resulting output (after second query) will be:
id | owner | asset | rate
-------------------------
3 | 2 | 3 | 3
Let's say it's Postgres, but i'd prefer reasonably cross-DBMS solution.
Edit 2:
Guys, i know how to do this with JOINs. Sorry for misleading question, but i need to know how to get extra from existing query. I already have needed assets and rates selected, i just need one extra field among with max(rate) and given conditions if it's possible.

Another solution that might or might not be faster than a self join (depending on the DBMS' optimizer)
SELECT id,
asset,
rate,
asset_count
FROM (
SELECT id,
asset,
rate,
rank() over (partition by asset order by rate desc) as rank_rate,
count(asset) over (partition by null) as asset_count
FROM test
WHERE owner IN (1, 2)
) t
WHERE rank_rate = 1
ORDER BY rate DESC

You are dealing with two questions and trying to solve them as if they are one. With a subquery, you can better refine by filtering the list in the proper order first (max(rate)), but as soon as you group, you lose this. As such, i would set up two queries (same procedure, if you are using procedures, but two queries) and ask the questions separately. Unless ... you need some of the information in a single grid when output.
I guess the better direction to head is to have you show how you want the output to look. Once you bake the input and the output, the middle of the oreo is easier to fill.

SELECT b.id, b.asset, b.rate
from
(
SELECT asset, max(rate) maxrate
FROM test
WHERE owner IN (1, 2)
GROUP BY asset
HAVING count(asset) > 1
) a, test b
WHERE a.asset = b.asset
AND a.maxrate = b.rate
ORDER BY b.rate DESC

You don't specify what type of database you're running on, but if you have analytical functions available you can do this:
select id, asset, max_rate
from (
select ID, asset, max(rate) over (partition by asset) max_rate,
row_number() over (partition by asset order by rate desc) row_num
from test
where owner in (1,2)
) q
where row_num = 1
I'm not sure how to add in the "having count(asset) > 1" in this way though.

This first searches for rows with the maximum rate per asset. Then it takes the highest id per asset, and selects that:
select *
from test
inner join
(
select max(id) as MaxIdWithMaxRate
from test
inner join
(
select asset
, max(rate) as MaxRate
from test
group by
asset
) filter
on filter.asset = test.asset
and filter.MaxRate = test.rate
group by
asset
) filter2
on filter.MaxIdWithMaxRate = test.id
If multiple assets share the maximum rate, this will display the one with the highest id.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas