Need help going from Excel Array formulas to SQL Join statements - sql

Here is the table (new user, can't post images yet):
I've always used Excel, but I'm switching over to SQL due to larger data sets. To populate CELL G2 in Excel, I would have used something like this:
=SUM((A2:A8=E2)*(B2:B8))
That gives a value of 141 in Excel for Cell G2.
I figured out how to get this to work in SQL, but I don't exactly understand why this works. Here's what I used in SQL:
SET Table2.total_units_purchased = Table1.some_number
FROM Table2 INNER JOIN
(
SELECT Table1.item
, SUM(Table1.units_purchased) AS some_number
FROM Table1
GROUP BY Table1.item
) Table1
ON Table2.item = Table1.item
Is the "AS" required before some_number? This still works for some reason even if I omit "AS".
Am I missing anything here? Does the order of the tables in the last line of code matter?
Thanks for any help.

Using 'AS' keyword in as an alias is optional, that's why your query still works when you omit the 'AS'.
Secondly, the order of the tables in the last line of code does not matter in this case. The query would work in the same way if your code was ON Table1.item = Table2.item. Note that table order matters if you are using left or right outer joins.
In your SQL code you are selecting the 'item' and the 'sum of the units purchased' for that item and you put these data into another table which you rename to Table1 again.
Then, you join two tables, Table2 and new Table1 where their item fields have the same value.
Lastly, you are selecting the 'sum of the units purchased' field and assign it to Table2's total_units_purchased field.
You could have written your SQL query as the following, which does the same thing with your Excel function.
select sum(table1.units_purchased)
into table2.total_units_purchased
from table1, table2
where Table2.item = Table1.item

Related

Improve SQL script

I would like to know if there is a better way to write the example script stated below.
Table 1 has 1 line for every item.
Table 2 has 1 line for every physique available of an item.
I would write the SQL below. But when I have about 18 physique values, this will increase the join section. I can join the table without specifying the Physique, but this leaves me with a dataset where rows are exploded and I need to run a Distinct or Group By.
select
t2.ItemID, t2.Name, t1_width.Target as 'Width', t1_length.Target as 'Length'
from
t2
left join t1 as t1_width on t1_width.ItemID = t2.ItemID and t1_Physique = 'Width'
left join t1 as t1_length on t1_length.ItemID = t2.ItemID and t1_Physique = 'Length'
Maybe there is a better way to call the right values from the Select of make one join?
You must use Pivot Table in this case. PIVOT basically changes the rows to columns. In this way there'd be only one INNER JOIN ON t1.ItemID = t2.ItemID.
To start with,
SELECT DISTINCT Physique FROM table2
to get the pivot (column values). There is even a set query to generate this in the example below.
[Use this example to build the query]
Quick Tip: You could use either MAX(target) or COUNT(target) in the agg func call in PIVOT table, depends on dataset you trying to generate.

How to join and sum tables without losing data?

Here's an example of what I'm trying to do.
Select (t1.count+t2.count) as countTotal from t1 LEFT JOIN t2 ON (t1.ID = t2.ID);
I'm doing this on a much larger scale with many variables added together. The problem I'm getting is that if one of the IDs is not in one of the tables I'm combining, the whole row for that ID comes back blank. My goal is to sum the two tables together for the most part but if one of the rows is only in one table, how can I keep that data in the resulting query?
Use NZ():
Select nz(t1.count) + nz(t2.count) as countTotal
from . . .
This replaces the NULL values with 0 so the + works. Otherwise it returns NULL if any value is NULL.
Assuming the ID can be missing from either table, you'll need to use a FULL OUTER JOIN.
If the row is missing from one table, t1.count + t2.count will return Null.
According to the documentation, the default return value from Nz for when the value is Null is 0 or a zero-length string. I prefer to write clearer code, so I specify the value.
SELECT Nz(t1.count, 0) + Nz(t2.count, 0) as countTotal
FROM t1
FULL OUTER JOIN t2 on t1.ID = t2.ID
I had to do something similar and ended up exporting it all to excel and carefully combining the data manually, it took me like 4 hours. That was only for 1200 records though, and I was working with 5 tables that all had different names for the matching columns, huge mess.
Maybe you could try using a make table query, but if you have a large amount of fields then this isn’t feasible, and I’m not 100% it would even work.
The next best option would be to export the tables and import them into an SQL IDE that supports full outer join. Then make a table with that full join and export it back to Access. This would definitely work. But it can be tricky to import tables to an SQL IDE, I’ve had trouble with it in the past. But once I got them imported, it was smooth sailing.

T-SQL - Include table name (or alias) concatenated with column name in result set?

Using SSMS when joining 3+ tables and using SELECT *, I'm wondering if there is an easy way (dynamic) to include the table name & column name in the result set without having to type out all the desired columns.
For example:
Table1
Table2
Table3
SELECT *
FROM Table1 t1
LEFT JOIN Table2 t2 ON t2.keyA = t1.keyA
LEFT JOIN Table3 t3 ON t2.keyB = t3.keyB
Trying to produce output like
Table1-Column1, Table2-Column1, Table3-Column1
OR
t1.Column1, t2.Column1, t3.Column1
If you have a lot of column, there's is an easy way to do it, but not dynamically. If you have only a few columns, doesn't worth a try and do it manually.
Take a look at theses two methods : https://blog.sqlauthority.com/2012/06/06/sql-server-tricks-to-replace-select-with-column-names-sql-in-sixty-seconds-017-video/
First method (drag'n'drop columns) isn't useful in your case cause we need to have all column aligned on different lines. The second (generating the create table script) is the one we need.
Generate the create table for table1 and copy-paste the columns needed from table1 into your query, and repeat for table2 and table3. Make sure they are all aligned in the same editor column.
Then, "the magic trick", simply press and hold the Alt key, click between the coma and the first letter of the first column name of t1 and drag to select the lines between the first column of t1 to the last column of t1. Type "t1.". That's it. repeat for t2 and t3.
This is not sql-server related, nor SSMS, but works with any decent editor supporting multiline editing.

Why does Oracle SQL update query return "invalid identifier" on existing column?

I have an update query for an Oracle SQL db. Upon execution the query returns ORA-00904: "t1"."sv_id": invalid identifier
So, why do I get an "invalid identifier" error message although the column exists?
Here is the complete query (replaced actual table and column names by dummies in np++)
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
ORDER BY d.creationTimestamp ASC)
WHERE ROWNUM = 1)
END
Now I don't understand why that error occurs.
Here is what I already know:
The Queries in the CASE statement work when executed separately, provided they are wrapped into a query that provides table_1 t1 for sure.
t1.s_id seems to work since oracle doesn't complain about that. When i change it to a column that really doesn't exist, oracle starts complaining about that non existent column before returning something about t1.sv_id. So somehow the alias might work, although I'm not sure about it.
I'm 100% sure that the column t1.sv_id exists and no typo was made. Executed a query on t1 directly and doublechecked everything in notepad by marking all occurences.
An (completely unrelated) update query like the following works as well (note the alias t1 is used in the select query). Don't assume table_1/2 to be the same as in the update query above, just reused the names. This should just illustrate that I successfully used an alias in an update query before.
update table_1 t1 set (t2_id) = (select id from table_2 t2 where t1.id = t2.t1_id)
UPDATE
Thx a lot for pointing me to the "you don't have access to alises in deeper suquery layers" issue. That got me on track again pretty fast.
So here is the query I ended up with. This seems to work fine. Eliminates the acces to t1 in the deeper layers and selects the oldest row, so that the same result should be returned from the query I expected from the original query in the ELSE part.
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id
AND d.sv_id = t1.sv_id
AND d.creation = (SELECT MIN(id.creation) FROM table_2 id
WHERE d.s_id = id.s_id AND d.sv_id = id.sv_id))
END
You can't reference a table alias in a subquery of a subquery; the alias doesn't apply (or doesn't exist, or isn't in scope, depending on how you prefer to look at it). With the code you posted the error is reported against line 11 character 24, which is:
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
^^^^^^^^
If you change the t1.s_id reference on the same line to something invalid then the error doesn't change and is still reported as ORA-00904: "T1"."SV_ID": invalid identifier. But if you change the same reference on line 5 instead to something like
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_idXXX AND dateCheck.sv_id = t1.sv_id) = 0)
... then the error changes to ORA-00904: "T1"."S_IDXXX": invalid identifier. This is down to how the statement is being parsed. In your original version the subquery in the WHEN clause is value, and you only break it by changing that identifier. The subquery in the ELSE is also OK. But the nested subquery in the ELSE has the problem, and changing the t1.s_id in that doesn't make any difference because the parser reads that part of the statement backwards (I don't know, or can't remember, why!).
So you have to eliminate the nested subquery. A general approach would be to make the whole CASE an inline view which you can then join using s_id and sv_id, but that's complicated as there may be no matching table_2 record (based on your count); and there may be no s_id value to match against as that isn't being checked in table_3.
It isn't clear if there will always be a table_3 record even then there is a table_2 record, or if they're mutually exclusive. If I've understood what the CASE is doing then I think you can use an outer join between those two tables and compare the combined data with the row you're updating, but because of that ambiguity it needs to be a full outer join. I think.
Here's a stab at using that construct with a MERGE instead of an update.
MERGE INTO table_1 t1
USING (
SELECT t2.s_id,
coalesce(t2.sv_id, t3.id) as sv_id,
coalesce(t2.type, t3.type) as type,
row_number() over (partition by t2.s_id, t2.sv_id
order by t2.creationtimestamp) as rn
FROM table_2 t2
FULL OUTER JOIN table_3 t3
ON t3.id = t2.sv_id
) tmp
ON ((tmp.s_id is null OR tmp.s_id = t1.s_id) AND tmp.sv_id = t1.sv_id AND tmp.rn = 1)
WHEN MATCHED THEN UPDATE SET t1.type = tmp.type;
If there will always be a table_3 record then you could use that as the driver and have a left outer join to table_2 instead, but hard to tell which might be appropriate. So this is really just a starting point.
SQL Fiddle with some made-up data that I believe would have hit both branches of your case. More realistic data would expose the flaws and misunderstandings, and suggest a more robust (or just more correct) approach...
Your query and your analysis seems sound to me. I have no solution but a few things you can try to maybe trigger something that explains this odd behavior:
Quote the column (just in case it happens to be a SQL keyword).
Use table_1.sv_id - this works as long as the whole query contains this table only once.
Make sure that the alias t1 exists only once
Run the query with a query tool like SQuirrel SQL - the tool can examine the exact position where Oracle reports the problem. Maybe it's in a different place of the query than you think
Check () and make sure they are around the parts where they should be.
Swap the order of expressions around =

BigQuery - joining on a repeated field

I'm trying to run a join on a repeated field.
Originally I get an error:
Cannot join on repeated field payload.pages.action
I fix this by running flatten on the relevant table (this is only an example query - it will give empty result if it would successfully run):
SELECT
t1.repository.forks
FROM publicdata:samples.github_nested t1
left join each flatten(publicdata:samples.github_nested,payload.pages) t2
on t2.payload.pages.action=t1.repository.url
I get a different error:
Table wildcard function 'FLATTEN' can only appear in FROM clauses
This used to work in the past. Is there some syntax change?
I don't think there has been a syntax change, but you should be able to wrap the flatten statement in a subselect. That is,
SELECT
t1.repository.forks
FROM publicdata:samples.github_nested t1
left join each (SELECT * FROM flatten(publicdata:samples.github_nested,payload.pages)) t2
on t2.payload.pages.action=t1.repository.url