Finding the difference between two nearly identical SQL rows - sql

I am developing an application and my boss wants to track all changes that have been made to a record throughout its life.
For instance, if I have the following table:
ID Name City Item Version
1 Mike Miami Test box 1
1 Mike Fort Lauderdale Test box 2
1 Mike Sarasota Testing box 3
And I want to see that from version 1 to version 2 the city was changed to Fort Lauderdale, is there a query that will help me do that? I would really like to be able to do this without specifying all the column names individually, because the actual table has 25+ columns and they may change at any time, plus it would be nice if the query could be easily portable to different tables. Ideally my result would look like the following, but I'm willing to accept anything that would help. Thanks in advance!
ColumnName Previous Value New Value
City Miami Fort Lauderdale

Assuming that the columns are all strings (which is rather necessary for your output format), you can do this by unpivoting the data and using lag():
select c.*
from (select t.id,
lag(v.col) over (partition by t.id order by t.version) as prev_val,
v.val
from t cross apply
(values ('Name', name), ('City', city), ('Item', item)
) v(col, val)
) c
where prev_val <> val;
If you have columns that are not strings, then you'll need to convert them to strings in the values clause.
This also assumes that the values are not NULL. That can be handled, but does not seem necessary.

Related

Merge SQL Rows in Subquery

I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.

How to get a single value without using group by Oracle

I have this data and i need to combine all lines in a row in field fullname and get a single value from 3 equals from order field. How can i do that without using a group by?
Existing data
id order fullname
1 32 Jack Stinky Potato
2 32 Kevin Enormous Cucumber
3 32 Jerald Sad Onion
Expecting result
32 Jack Stinky Potato, Kevin Enormous Cucumber, Jerald Sad Onion
using group by would write
select order, wm_concat(fullname) from EmployeeCards
group by order
or this, but it doesn't rational.
select wm_concat(unique order), wm_concat(fullname) from EmployeeCards
or just select (unique order), wm_concat(fullname) from EmployeeCards
don't working. Which aggregate function shoul i use to get a single value? Thanks
Use LISTAGG:
SELECT
"order",
LISTAGG(fullname, ',') WITHIN GROUP (ORDER BY id) AS fullnames
FROM EmployeeCards
GROUP BY
"order";
Demo
Also, please avoid naming your database objects (e.g. tables, columns, etc.) using reserved SQL keywords, such as ORDER.

BigQuery: grouping by similar strings for a large dataset

I have a table of invoice data with over 100k unique invoices and several thousand unique company names associated with them.
I'm trying to group these company names into more general groups to understand how many invoices they're responsible for, how often they receive them, etc.
Currently, I'm using the following code to identify unique company names:
SELECT DISTINCT(company_name)
FROM invoice_data
ORDER BY company_name
The problem is that this only gives me exact matches, when its obvious that there are many string values in company_name that are similar. For example: McDonalds Paddington, McDonlads Oxford Square, McDonalds Peckham, etc.
How can I make by GROUP BY statement more general?
Sometimes the issue isn't as simple as the example listed above, occasionally there is simply an extra space or PTY/LTD which throws off a GROUP BY match.
EDIT
To give an example of what I'm looking for, I'd be looking to turn the following:
company_name
----------------------
Jim's Pizza Paddington|
Jim's Pizza Oxford |
McDonald's Peckham |
McDonald's Victoria |
-----------------------
And be able to group by their company name rather than exclusively with an exact string match.
Have you tried using the Soundex function?
SELECT
SOUNDEX(name) AS code,
MAX( name) AS sample_name,
count(name) as records
FROM ((
SELECT
"Jim's Pizza Paddington" AS name)
UNION ALL (
SELECT
"Jim's Pizza Oxford" AS name)
UNION ALL (
SELECT
"McDonald's Peckham" AS name)
UNION ALL (
SELECT
"McDonald's Victoria" AS name))
GROUP BY
1
ORDER BY
You can then use the soundex to create groupings, with a split or other type of function to pull the part of the string which matches the name group or use a windows function to pull back one occurrence to get the name string. Not perfect but means you do not need to pull into other tools with advanced language recognition.

SSRS query and WHERE with multiple

Being new with SQL and SSRS and can do many things already, but I think I must be missing some basics and therefore bang my head on the wall all the time.
A report that is almost working, needs to have more results in it, based on conditions.
My working query so far is like this:
SELECT projects.project_number, project_phases.project_phase_id, project_phases.project_phase_number, project_phases.project_phase_header, project_phase_expensegroups.projectphase_expense_total, invoicerows.invoicerow_total
FROM projects INNER JOIN
project_phases ON projects.project_id = project_phases.project_id
LEFT OUTER JOIN
project_phase_expensegroups ON project_phases.project_phase_id = project_phase_expensegroups.project_phase_id
LEFT OUTER JOIN
invoicerows ON project_phases.project_phase_id = invoicerows.project_phase_id
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total >0 )
The parameter is for selectionlist that is used to choose a project to the report.
How to have also records that have
( project_phase_expensegroups.projectphase_expense_total ) with value 0 but there might be invoices for that project phase?
Tried already to add another condition like this:
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total > 0 )
OR
( invoicerows.invoicerow_total > 0 )
but while it gives some results - also the one with projectphase_expense_total with value 0, but the report is total mess.
So my question is: what am I doing wrong here?
There is a core problem with your query in that you are left joining to two tables, implying that rows may not exist, but then putting conditions on those tables, which will eliminate NULLs. That means your query is internally inconsistent as is.
The next problem is that you're joining two tables to project_phases that both may have multiple rows. Since these data are not related to each other (as proven by the fact that you have no join condition between project_phase_expensegroups and invoicerows, your query is not going to work correctly. For example, given a list of people, a list of those people's favorite foods, and a list of their favorite colors like so:
People
Person
------
Joe
Mary
FavoriteFoods
Person Food
------ ---------
Joe Broccoli
Joe Bananas
Mary Chocolate
Mary Cake
FavoriteColors
Person Color
------ ----------
Joe Red
Joe Blue
Mary Periwinkle
Mary Fuchsia
When you join these with links between Person <-> Food and Person <-> Color, you'll get a result like this:
Person Food Color
------ --------- ----------
Joe Broccoli Red
Joe Bananas Red
Joe Broccoli Blue
Joe Bananas Blue
Mary Chocolate Periwinkle
Mary Chocolate Fuchsia
Mary Cake Periwinkle
Mary Cake Fuchsia
This is essentially a cross-join, also known as a Cartesian product, between the Foods and the Colors, because they have a many-to-one relationship with each person, but no relationship with each other.
There are a few ways to deal with this in the report.
Create ExpenseGroup and InvoiceRow subreports, that are called from the main report by a combination of project_id and project_phase_id parameters.
Summarize one or the other set of data into a single value. For example, you could sum the invoice rows. Or, you could concatenate the expense groups into a single string separated by commas.
Some notes:
Please, please format your query before posting it in a question. It is almost impossible to read when not formatted. It seems pretty clear that you're using a GUI to create the query, but do us the favor of not having to format it ourselves just to help you
While formatting, please use aliases, Don't use full table names. It just makes the query that much harder to understand.
You need an extra parentheses in your where clause in order to get the logic right.
WHERE ( projects.project_number = #iProjectNumber )
AND (
(project_phase_expensegroups.projectphase_expense_total > 0)
OR
(invoicerows.invoicerow_total > 0)
)
Also, you're using a column in your WHERE clause from a table that is left joined without checking for NULLs. That basically makes it a (slow) inner join. If you want to include rows that don't match from that table you also need to check for NULL. Any other comparison besides IS NULL will always be false for NULL values. See this page for more information about SQL's three value predicate logic: http://www.firstsql.com/idefend3.htm
To keep your LEFT JOINs working as you intended you would need to do this:
WHERE ( projects.project_number = #iProjectNumber )
AND (
project_phase_expensegroups.projectphase_expense_total > 0
OR project_phase_expensegroups.project_phase_id IS NULL
OR invoicerows.invoicerow_total > 0
OR invoicerows.project_phase_id IS NULL
)
I found the solution and it was kind easy after all. I changed the only the second LEFT OUTER JOIN to INNER JOIN and left away condition where the query got only results over zero. Also I used SELECT DISTINCT
Now my report is working perfectly.

SQL Server - copy data across tables , but copy the data only when it match with a specific column name

For example I got this 2 table
dbo.fc_states
StateId Name
6316 Alberta
6317 British Columbia
and dbo.fc_Query
Name StatesName StateId
Abbotsford Quebec NULL
Abee Alberta NULL
100 Mile House British Columbia NULL
Ok pretty straightforward , how do I copy the stateId over from fc_states to fc_Query, but match it with the StatesName, let say the result would be
Name StatesName StateId
Abee Alberta 6316
100 Mile House British Columbia 6317
Thanks, and both stateName column type is text
How about:
update fc_Query set StateId =
(select StateId from fc_states where fc_states.Name = fc_Query.StatesName)
That should give you the result you're looking for.
This is a different way than what Eddie did, I like MERGE for updates if they're not dead simple (like I wouldn't consider yours dead simple). So if you're bored/curious also try
WITH stateIds as
(SELECT name, MAX(stateID) as stID
FROM fc_states
GROUP BY name)
MERGE fc_Query
on stateids.name = fc_query.statesname
WHEN MATCHED THEN UPDATE
SET fc_query.stateid = convert(int, stid)
;
The first part, from "WITH" to the GROUP BY NAME), is a CTE, that creates a table-like thing - a name 'stateIds' that is good as a table for the immediately following part of the query - where there's guaranteed to be only one row per state name. Then the MERGE looks for anything in the fc_query with a matching name. And if there's a match, it sets it as you want. YOu can make a small edit if you don't want to overwrite existing stateids in fc_query:
WITH stateIds as
(SELECT name, MAX(stateID) as stID
FROM fc_states
GROUP BY name)
MERGE fc_Query
ON stateids.name = fc_query.statesname
AND fc_query.statid IS NOT NULL
WHEN MATCHED THEN UPDATE
SET fc_query.stateid = convert(int, stid)
;
And you can have it do something different to rows that don't match. So I think MERGE is good for a lot of applications. You need a semicolon at the end of MERGE statements, and you have to guarantee that there will only be one match or zero matches in the source (that is "stateids", my CTE) for each row in the target; if there's more than one match some horrible thing happens, Satan wins or the US economy falters, I'm not sure what, just never let it happen.