Selecting a single row from a column that has multiple rows - sql

I'm a SQL newbie so bear with me.
I am writing a select statement to select data from multiple tables which I have done however when I try to select a specific column I get duplicates as that column can rightly have multiple rows. What I want to do is select the most appropriate row and select that.
My code so far:
Select
a.[StudentId], a.[Name], a.[StartDT], a.[EndDT],
b.[ClassID], b.[Module], b.[ModStart], b.[ModEnd]
from
[Data].[StudentTbl] a
left join
[Data].[ClassTbl] b on a.[StudentId] = b.[Student_ID]
When I select the b.[Module] I'm getting multiple rows as there can be a number of modules per class however I am wanting to select the b.[Module] the student has completed before leaving.
Essentially if the a.[EndDT] is equal to b.[ModEnd], I need that specific row. Max function doesn't always work as there are DQ issues within the ClassTbl that when a student has left a row is inserted after the last module saying N/A
What I'm currently getting is this:
What I want to get eventually:

Related

Data reconciliation between 2 datasets on SQL

image_table
I currently need to find all the differences between a new_master dataset and a previous one using SQL Oracle. The datasets have the same structure and consist of both integers and strings and do not have a unique key id unless I select several columns together. You can see an image at the beginning as image_table. I found online this code and I wanted to ask you if you have any advices.
SELECT n.*
FROM new_master as n
LEFT JOIN old_master as o
ON (n.postcode = o.postcode)
WHERE o.postcode IS NULL
SORT BY postcode
In doing so I should get back all the entries from the new_master that are not in the old one.
Thanks
If you are in an Oracle databse, there are a couple queries that can help you find any differences.
Find any records in OLD that are not in NEW.
SELECT * FROM old_master
MINUS
SELECT * FROM new_master;
Find any records in NEW that are not in OLD.
SELECT * FROM new_master
MINUS
SELECT * FROM old_master;
Count number of items in OLD
SELECT COUNT (*) FROM old_master;
Count number of items in NEW
SELECT COUNT (*) FROM new_master;
The COUNT queries are needed in addition to the MINUS queries in case there are duplicate rows with the same column data.

Oracle SQL Developer(4.0.0.12)

First time posting here, hopes it goes well.
I try to make a query with Oracle SQL Developer, where it returns a customer_ID from a table and the time of the payment from another. I'm pretty sure that the problems lies within my logicflow (It was a long time I used SQL, and it was back in school so I'm a bit rusty in it). I wanted to list the IDs as DISTINCT and ORDER BY the dates ASCENDING, so only the first date would show up.
However the returned table contains the same ID's twice or even more in some cases. I even found the same ID and same DATE a few times while I was scrolling through it.
If you would like to know more please ask!
SELECT DISTINCT
FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
FROM
FIRM.customer
INNER JOIN FIRM.account
ON FIRM.customer.CUSTOMER_ID = FIRM.account.CUSTOMER
INNER JOIN FIRM.account_recharge
ON FIRM.account.ACCOUNT_ID = FIRM.account_recharge.ACCOUNT
WHERE
FIRM.account_recharge.X__INSDATE BETWEEN TO_DATE('14-01-01', 'YY-MM-DD') AND TO_DATE('14-12-31', 'YY-MM-DD')
ORDER
BY FELTOLTES
Your select works like this because a CUSTOMER_ID indeed has more than one X__INSDATE, therefore the records in the result will be distinct. If you need only the first date then don't use DISTINCT and ORDER BY but try to select for MIN(X__INSDATE) and use GROUP BY CUSTOMER_ID.
SELECT DISTINCT FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
Distinct is applied to both the columns together, which means you will get a distinct ROW for the set of values from the two columns. So, basically the distinct refers to all the columns in the select list.
It is equivalent to a select without distinct but a group by clause.
It means,
select distinct a, b....
is equivalent to,
select a, b...group by a, b
If you want the desired output, then CONCATENATE the columns. The distict will then work on the single concatenated resultset.

Postgres Error: More than one row returned by a subquery used as an expression

I have two separate databases. I am trying to update a column in one database to the values of a column from the other database:
UPDATE customer
SET customer_id=
(SELECT t1 FROM dblink('port=5432, dbname=SERVER1 user=postgres password=309245',
'SELECT store_key FROM store') AS (t1 integer));
This is the error I am receiving:
ERROR: more than one row returned by a subquery used as an expression
Any ideas?
Technically, to remove the error, add LIMIT 1 to the subquery to return at most 1 row. The statement would still be nonsense.
... 'SELECT store_key FROM store LIMIT 1' ...
Practically, you want to match rows somehow instead of picking an arbitrary row from the remote table store to update every row of your local table customer.
I assume a text column match_name in both tables (UNIQUE in store) for the sake of this example:
... 'SELECT store_key FROM store
WHERE match_name = ' || quote_literal(customer.match_name) ...
But that's an extremely expensive way of doing things.
Ideally, you completely rewrite the statement.
UPDATE customer c
SET customer_id = s.store_key
FROM dblink('port=5432, dbname=SERVER1 user=postgres password=309245'
, 'SELECT match_name, store_key FROM store')
AS s(match_name text, store_key integer)
WHERE c.match_name = s.match_name
AND c.customer_id IS DISTINCT FROM s.store_key;
This remedies a number of problems in your original statement.
Obviously, the basic error is fixed.
It's typically better to join in additional relations in the FROM clause of an UPDATE statement than to run correlated subqueries for every individual row.
When using dblink, the above becomes a thousand times more important. You do not want to call dblink() for every single row, that's extremely expensive. Call it once to retrieve all rows you need.
With correlated subqueries, if no row is found in the subquery, the column gets updated to NULL, which is almost always not what you want. In my updated query, the row only gets updated if a matching row is found. Else, the row is not touched.
Normally, you wouldn't want to update rows, when nothing actually changes. That's expensively doing nothing (but still produces dead rows). The last expression in the WHERE clause prevents such empty updates:
AND c.customer_id IS DISTINCT FROM sub.store_key
Related:
How do I (or can I) SELECT DISTINCT on multiple columns?
The fundamental problem can often be simply solved by changing an = to IN, in cases where you've got a one-to-many relationship. For example, if you wanted to update or delete a bunch of accounts for a given customer:
WITH accounts_to_delete AS
(
SELECT account_id
FROM accounts a
INNER JOIN customers c
ON a.customer_id = c.id
WHERE c.customer_name='Some Customer'
)
-- this fails if "Some Customer" has multiple accounts, but works if there's 1:
DELETE FROM accounts
WHERE accounts.guid =
(
SELECT account_id
FROM accounts_to_delete
);
-- this succeeds with any number of accounts:
DELETE FROM accounts
WHERE accounts.guid IN
(
SELECT account_id
FROM accounts_to_delete
);
This means your nested SELECT returns more than one rows.
You need to add a proper WHERE clause to it.
This error means that the SELECT store_key FROM store query has returned two or more rows in the SERVER1 database. If you would like to update all customers, use a join instead of a scalar = operator. You need a condition to "connect" customers to store items in order to do that.
If you wish to update all customer_ids to the same store_key, you need to supply a WHERE clause to the remotely executed SELECT so that the query returns a single row.
USE LIMIT 1 - so It will return only 1 row.
Example
customerId- (select id from enumeration where enumerations.name = 'Ready To Invoice' limit 1)
The result produced by the Query is having no of rows that need proper handling this issue can be resolved if you provide the valid handler in the query like
1. limiting the query to return one single row
2. this can also be done by providing "select max(column)" that will return the single row

Using multiple nested fields in BigQuery

I have some records that have information about stores. These records have several different nested fields. One of the nested fields is tags and one is employees. I am trying to get a count of the number of stores that have a tag and an employee with a certain name. So I did this:
SELECT count(*)
FROM [stores.stores_844_1]
where tags.tag_name='foo'
and employees.first_name='bar'
Then I get the error:
Error: Cannot query the cross product of repeated fields tags.tag_name and employees.first_name.
I can make it work by changing the query to:
SELECT count(*)
FROM ((flatten([stores.stores_844_1],tags))
where tags.tag_name='foo'
and employees.first_name='bar'
The problem with this is that I am dynamically creating the where clause and so my from clause will have to change depending on what I have in the where. While I could generate some logic in code to figure out what the from clause should be, I was wondering if there is a way to do something like:
SELECT count(*)
FROM [stores.stores_844_1]
where tags.tag_name='foo' WITHIN RECORD
and employees.first_name='bar' WITHIN RECORD
That would not have to flatten the main table?
I have tried using an ugly work around like this:
SELECT count(*)
FROM
(SELECT GROUP_CONCAT(CONCAT('>', tags.tag_name,'<')) WITHIN RECORD as f1, GROUP_CONCAT(CONCAT('>',employees.first_name,'<')) WITHIN RECORD as f2
FROM [stores.stores_844_1]
)
where f1 CONTAINS '>foo<'
and f2 CONTAINS '>bar<'
This ugly workaround works how I want it to, but it just seems really hacky and ugly and there must be a better way, right?
You can use WITHIN RECORD to come up with another field that indicates whether the values are present. I'm not sure if this meets your requirements, since you still have to change the FROM clause, but it seems cleaner than what you are currently doing. In other words, try this:
SELECT count(*) FROM (
SELECT SUM(IF(tags.tag_name='foo', 1, 0)) WITHIN RECORD as has_foo,
SUM(IF(employees.first_name='bar', 1, 0)) WITHIN RECORD as has_bar,
FROM [stores.stores_844_1])
WHERE has_foo > 0 AND has_bar > 0

SQL query giving wrong sum

I'm using the rather old Microsoft Query that comes with Excel to query an ODBC database. However it's giving me the wrong sum when I join two tables.
This works fine:
SELECT accountcode, SUM(tr_amount)
FROM deb_trans deb_trans
WHERE (today() > dr_tr_due_date + 14)
GROUP BY accountcode
However, this does not:
SELECT deb_trans.accountcode, Sum(deb_trans.tr_amount)
FROM deb_trans deb_trans, mailer_master mailer_master
WHERE (today()>dr_tr_due_date+14) AND (mailer_master.accountcode=deb_trans.accountcode)
GROUP BY deb_trans.accountcode
The joined field being accountcode.
The field tr_amount orginates from the deb_trans table. It is not present in mailer_master.
Any ideas? Thanks guys!
If you join the tables, you get a row for each combination which corresponds to the filter criteria before it is grouped. In this case: a row for each deb_trans and mailer_master combination filtered by date. If you want a valid sum, you should not join another table the way that the number of rows (before grouping) is changed.