New column referencing second table - do I need a join? - sql

I have two tables (first two shown) and need to make a third from the first two - do I need to do a join or can you reference a table without joining?
The third table shown is the desired output. Thanks for any help!
| ACC | CALL DATE | | |
+-----+-----------+--+--+
| 1 1 | 2/1/18 | | |
+-----+-----------+--
+-----+---------------+--+--+
| ACC | PURCHASE DATE | | |
+-----+---------------+--+--+
| 1 1 | 1/1/18 | | |
+-----+---------------+--+--+
+-----+-----------+----------------------+--+
| ACC | CALL DATE | PRIOR MONTH PURCHASE | |
+-----+-----------+----------------------+--+
| 1 1 | 2/1/18 | YES | |
+-----+-----------+----------------------+--+

Of course you can have a query that references multiple tables without joining. union all is an example of an operator that does that.
There is also the question of what you mean by "joining" in the question. If you mean explicit joins, there are ways around that -- such as correlated subqueries. However, these are implementing some form of "join" in the database engine.
As for your query, you would want to use exists with a correlated subquery:
select t1.*,
(case when exists (select 1
from table2 t2
where t2.acc = t1.acc and
datediff(month, t2.purchase_date, t1.call_date) = 1
)
then 'Yes' else 'No'
end) as prior_month_purchase
from table1 t1;
This is "better" than a join because it does not multiply or remove rows. The result set has exactly the rows in the first table, with the additional column.
The syntax assumes SQL Server (which was an original tag). Similar logic can be expressed in other databases, although date functions are notoriously database-dependent.

Lets check the options,
Say if you were to create a new third table on the basis of the data in first two, then every update/inserts/deletes to either of the tables should also propagate into the third table as well.
Say you instead have a view which does what you need, there isnt a need to maintain that third table and also gets you the data needed from the first two each time you query it.
create view third_table as
select a.acc,a.call_date,case when dateadd(mm,-1,a.call_date)=b.purchase_date then 'Yes' else 'No end as prior_month_purchase
from first_table a
left join second_table b
on a.acc=b.acc

Related

SQL Help - Join small lookup table where not all columns are required (and an other option)

I have one large table with transactions and a smaller lookup table with values I want to add based on 4 common columns. The trick here is not every combination of these 4 columns will exist in the lookup table and there are scenarios where I want it to stop checking and accept the match instead of going to the next column. I also have an "Other" option to default to if it doesn't match any of the options.
Table structures are something like this:
transaction_table
country, trans_id, store_type, store_name, channel, browser, purchase_amount, currency
lookup_table
country, store_name, channel, browser, trans_fee
The data could be something like this:
transaction_table:
country| trans_id| store_type |store_name |channel |browser |amt |currency
US | 001 | Big Box | Target | B&M |N/A |1.45 |USD
US | 002 | Big Box | Target | Online |Chrome |1.79 |USD
US | 003 | Small | Bob's Store| B&M |N/A |2.50 |USD
US | 004 | Big Box | Walmart | B&M |N/A |1.12 |USD
US | 005 | Big Box | Walmart | Online |Firefox |3.79 |USD
US | 006 | Big Box | Amazon | Online |IE |4.54 |USD
US | 007 | Small | Jim's Plc | B&M |IE |2.49 |USD
lookup_table:
country|store_name |channel |browser |trans_fee
US |Target |B&M |N/A |0.25
US |Target |Online | |0.15
US |Walmart | | |0.30
US |Other | | |0.45
So looking at the lookup_table data:
Row 1 is very specific and would be a match on all 4 of the join
columns.
Row 2 would not care what browser was used to shop at Target so
regardless of the "browser" value, the trans_fee should come back
the same (other stores may care though).
Row 3 is saying any transaction with a country='US' and the
store_name='Walmart', regardless of the rest of the join columns
would have the same trans_fee
Row 4 is the "other" scenario where it should look first at the
store_name column and if it doesn't find a match, go to Other.
The lookup_table data can change and may end up being time dependent (start_date and end_date columns added) so it really wouldn't be a good candidate for a long, complex CASE statement.
I was thinking of a combination of checking each column with an IF IN statement but I'm hoping there's a more straightforward conditional join type statement I can use to go column by column and have an other option.
Thanks!
edit: I didn't specify this but I want to basically return all of the data from transaction_table and add the corresponding trans_fee to each line.
You will need to use a conditional JOIN.
Something like this
SELECT *
FROM lookup_table
LEFT OUTER JOIN transaction_table
ON CASE WHEN lookup_table.store_name IS NOT NULL
THEN transacton_table.store_name = lookup_table.store_name END
Such partial matching is tricky. And your problem is not really that well set up. You seem to have NULLs in some columns and general values in others.
In any case, you can solve this by matching what you can and then using order by to get the best match. In your case, I think this looks like this:
select tt.*,
(select trans_fee
from lookup l
where l.country = tt.country and
l.store_name in ('other', tt.store_name) and
(l.channel = tt.channel or l.channel is null) and
(l.browser = tt.browser or l. browser is null)
order by (case when l.store_name = tt.store_name then 1 else 2 end),
(case when l.channel = tt.channel then 1 else 2 end),
(case when l.browser = tt.browser then 1 else 2 end)
fetch first 1 row only
) as trans_fee
from transaction_table tt;
This is generic SQL. But the same idea should work in any database.

Using self join on a table to compare two columns based on a linked column in the same table

I have the following:
TableA
ID | DocumentType | DocumentCode | DocumentDate | Warehouse | RefecenceCode
---+--------------+--------------+--------------+-----------+--------------
1 | DeliveryNote | DOC-001 | 2017-04-21 | 1 | NULL
2 | Invoice | DOC-002 | 2017-04-21 | 2 | DOC-001
As you can see, the warehouse is different on each document and DOC-002 is related to DOC-001 through the information in ReferenceCode column (which means that was created starting from DOC-001 as a source document).
It is supposed for the DOC-002 to have the same information but sometimes might be different and in this case, I was tried to create a query (I think self join applies here) in order to check what information is different in the DOC-002 in this case compared to DOC-001, based on the reference code, but I couldn't managed to do it.
If someone could give me a hand, I'll be very grateful.
This is the SQL query:
select *
from TableA tbl
inner join TableA tbla on tbl.id = tbla.id
where tbla.ReferenceCode = tbl.DocumentCode
You indeed want to join the table to itself. But joining on the ID column won't work, because that column doesn't relate records to each other. Instead, you need to join on the DocumentCode and ReferenceCode fields. Then only include the records that have some difference (in this case, I'm only comparing the DocumentDate and Warehouse fields).
select tbla.*
from TableA tbl
join TableA tbla on tbl.DocumentCode = tbla.ReferenceCode
where tbla.DocumentDate != tbl.DocumentDate
or tbla.Warehouse != tbl.Warehouse

Transforming a 2 column SQL table into 3 columns, column 3 lagged on 2

Here's my problem: I want to write a query (that goes into a larger query) that takes a table like this;
ID | DATE
A | 1
A | 2
A | 3
B | 1
B | 2
and so on, and transforms it into;
ID | DATE1 | DATE2
A | 1 | 2
A | 2 | 3
A | 3 | NOW
B | 1 | 2
B | 2 | NOW
Where the numbers are dates, and NOW() is always appended to the most recent date. Given free rein I would do this in Python, but unfortunately this goes into a larger query. We're using SyBase's SQL Anywhere 12, I think? I interact with the database using SQuirreL SQL.
I'm very stumped. I thought (SQL query to transform a list of numbers into 2 columns) would help, but I'm afraid I don't know enough to make it work. I was thinking of JOINing the table to itself, but I don't know how to SELECT for only the A-1-2 rows instead of the A-1-3 rows as well, for instance, or how to insert the NOW() value into it. Does anyone have any ideas?
I made a an sqlfiddle.com to outline a solution for your example. You were mentioning dates, but using integers so I chose to do an integer example, but it can be modified. I wrote it in postgresql so the coalesce() function can be substituted with nvl() or similar. Also, the parameter '0' can be substituted with any value, including now(), but you must change the data type of the "i" column in the table to be a date as well. Please let me know if you need further help on this.
select a.id, a.i, coalesce(min(b.i),'0') from
test a
left join test b on b.id=a.id and a.i<b.i
group by a.id,a.i
order by a.id, a.i
http://sqlfiddle.com/#!15/f1fba/6

sql insert value from another table with original nulls but not unmatched entries

OK. So this is a hard one to explain, but I am replacing the type of a foreign key in a database. To do this I need to update the values in a table that references it. That is all fine and good, and nice and easy to do.
I'm inserting this stuff into a temporary table which will replace the original table, but the insert query isn't at all difficult, it's the select that I get the values from.
However, I also want to keep any entries where the original reference was NULL. Also not hard, I could use a Left Inner Join for that.
But we're not done yet: I don't want the entries for which there is no match in the second table. I've been dinking around with this for 2 hours now, and am no closer to figuring this out than I am to the moon.
Let me give you an example data set:
____________________________
| Inventory || Customer |
|============||============|
| ID Cust || ID Name |
|------------||------------|
| 1 A || 1 A |
| 2 B || 2 B |
| 3 E || 3 C |
| 4 NULL || 4 D |
|____________||____________|
Let's say the database used to use the Customer.Name field as its Primary Key, and I need to change it to a standard int identity(1,1) not null ID. I've added the field with no issues in the Customer table, and kept the Name because I need it for other stuff. I have had no trouble with this in all the tables that do not allow NULLs, but since the "Inventory" table allows something to be associated with No customer, I'm running into troubles.
If I did a left inner join, my results would be:
______________
| Results |
|============|
| ID Cust |
|------------|
| 1 1 |
| 2 2 |
| 3 NULL |
| 4 NULL |
|____________|
However, Inventory #3 was referencing a customer which does not exist. I want that to be filtered out.
This database is my development database, where I hack, slash, and destroy things with wanton disregard for validity. So a lot of links in these tables are no longer valid.
The next step is replicating this process in the beta-testing environment, where bad records shouldn't exist, but I can't guarantee that. So I'd like to keep the filter, if possible.
The query I have right now is using a sub-query to find all rows in Inventory whose CustID either exists in Customers, or is null. It then tries to only grab the value from those rows which the subquery found. Here's the translated query:
insert into results
(
ID,
Cust
)
select
inv.ID, cust.ID
from Inventory inv, Customer cust
where inv.ID in
(
select inv.ID from Inventory inv, Customer cust
where inv.Cust is null
or cust.Name = inv.Cust
)
and cust.Name = inv.Cust
But, as I'm sure you can see, this query isn't right. I've tried using 2, 3 subqueries, inner joins, left joins, bleh. The results of this query, and many others I've tried (that weren't horribly, horribly wrong) are:
______________
| Results |
|============|
| ID Cust |
|------------|
| 1 1 |
| 2 2 |
|____________|
Which is essentially an inner-join. Considering my actual data has around 1100 records which have NULL values in that field, I don't think truncating them is the answer.
The answer I'm looking for is:
______________
| Results |
|============|
| ID Cust |
|------------|
| 1 1 |
| 2 2 |
| 4 NULL |
|____________|
The trickiest part of this insert into select is the fact that I'm looking to insert either a value from another table, or essentially a value from this table or the literal NULL. That just isn't something I know how to do; I'm still getting the hang of SQL.
Since I'm inserting the results of this query into a table, I've considered doing the insert using a select which leaves out the NULL values and un-matched records, then going back through and adding in all the NULL records, but I really want to learn how to do the more advanced queries like this.
So do any of yous folks have any ideas? 'Cause I'm lost.
How about a union?
Select all records where ID and Cust match and union that with all records where ID matches and inventory.cust is null.

SELECT TOP 1 ...Some stuff... ORDER BY DES gives different result

SELECT TOP 1 Col1,col2
FROM table ... JOIN table2
...Some stuff...
ORDER BY DESC
gives different result. compared to
SELECT Col1,col2
FROM table ... JOIN table2
...Some stuff...
ORDER BY DESC
2nd query gives me some rows , When I want the Top 1 of this result I write the 1st query with TOP 1 clause. These both give different results.
why is this behavior different
This isn't very clear, but I guess you mean the row returned by the first query isn't the same as the first row returned by the second query. This could be because your order by has duplicate values in it.
Say, for example, you had a table called Test
+-----+------+
| Seq | Name |
+-----+------+
| 1 | A |
| 1 | B |
| 2 | C |
+-----+------+
If you did Select * From Test Order By Seq, either of these is valid
+-----+------+
| Seq | Name |
+-----+------+
| 1 | A |
| 1 | B |
| 2 | C |
+-----+------+
+-----+------+
| Seq | Name |
+-----+------+
| 1 | B |
| 1 | A |
| 2 | C |
+-----+------+
With the top, you could get either row.
Having the top 1 clause could mean the query optimizer uses a completely different approach to generate the results.
I'm going to assume that you're working in SQL Server, so Laurence's answer is probably accurate. But for completeness, this also depends on what database technology you are using.
Typically, index-based databases, like SQL Server, will return results that are sorted by the index, depending on how the execution plan is created. But not all databases utilize indices.
Netezza, for example, keeps track of where data lives in the system without the concept of an index (Netezza's system architecture is quite a bit different). As a result, selecting the 1st record of a query will result in a random record from the result set floating to the top. Executing the same query multiple times will likely result in a different order each time.
If you have a requirement to order data, then it is in your best interest to enforce the ordering yourself instead of relying on the arbitrary ordering that the database will use when creating its execution plan. This will make your results more predictable.
Your 1st query will get one table's top row and compare with another table with condition. So it will return different values compare to normal join.