What is the best way to do this SQL task? - sql

What would be the best approach with this SQL-based logic:
I need to get some groups from a table. However there are thousands of items, which can belong to only one of those groups (say one of the five/ten/fifteen groups returned). I can get the groups and then loop all of the item objects and insert them into the group.
Or would it be better to get all the objects which belong to a group, loop them, and insert them into the belonging group? What would be the difference in performance?

If you're just looking for the groups, then a simple SELECT DISTINCT group FROM Table will return those. If you want each of the rows and their associated groups, well a SELECT * (not for production use...) would get that as well. If you want them in order, then a SELECT with an ORDER BY group would do.
What are you going to do with this information?

Related

When should I use distinct in a query?

This is a general question:
Does anyone have a tip as to how i can know when i should use distinct in my queries ? I am struggling at understanding when to use it exactly. I tend to use it when I don't need it and not when I do.
thank you all very much.
Basically, there is little reason to use select distinct -- although it is sometime convenient short-hand.
If it can be avoided, avoid it! SQL incurs overhead for removing duplicates, even if there are no duplicates. So, select distinct is slower than select.
Often select distinct is more appropriately written using group by -- because often you want some column to be aggregated (such as the maximum date/time).
That said, it can be convenient shorthand, so it should not be avoided altogether, just used rarely.
There is no general rule as to when to use DISTINCT, it is based on your requirement i.e. when you have two same values in one column but you only require one value so you will use distinct.
Suppose you have a list of banks and branches in a city. But you need to know how many unique banks are operating in the city then you will write
select distinct bank_name from city;
I use distinct when I want to ensure rows are not duplicated in a query that could have duplicate records for the field combination I am selecting. Generally, this would be when selecting a set of columns that do not include a primary/unique key and are not guaranteed to be unique when the selected fields are taken together.
For example, if I was selecting customers that had purchased this year to send a letter to, and customers can have more than one order in a single year and I want to ensure that I send only one letter per person and address, I would use Distinct to ensure that I get one occurrence of each unique customer name / address combination.
--could return multiple records for repeat customers if Distinct was not present
Select Distinct BillingName, BillingAddress
from Orders
where OrderDate > '2019-08-01'

Selecting a large number of rows by index using SQL

I am trying to select a number of rows by the value of a column called ID. I know you can do this pretty easily by:
SELECT col1, col2, col3 FROM mytable WHERE id IN (1,2,3,4,5...)
However, what if there are a few million IDs I want to select and the IDs don't always have pattern (which means I can't use something like BETWEEN x AND y)? Does this select statement still work or is there better ways of doing so?
The actual application is this. Filters are specified by users, which is compared to some attributes of the records. From those filters, we create a subset of the data which is of interest to a particular user. There are about 30 million records each with roughly ~3000 attributes (which is stored in roughly 30 tables, but every table has ID as a primary key), so every time someone makes a query about their desired subset of records, we'd have to join many tables, apply those filters, and figure out what his subset looks like. In order to avoid joining many tables all the time, I thought maybe it's a better idea to join the tables once, figure out the id of the selected subset, and this way each time a new query is made, all we have to do is select the relevant columns of the rows that match the filtered ids.
This depends on the database and the interface you are using. For a few hundred or thousand values, no problem. But your question specifies millions. And that could start to get into limits on the length of the query -- either specified by the database, the tool you are using, or intermediate libraries.
If you have so many ids, I would strongly recommend that you load them into a table in the database with the id as the primary key. Then use join or exists to identify the rows in your table that match.
Often, such a list would be generated in the database anyway. In that case, you can use a subquery or CTE and just include that code in your final query.

Joining multiple Tables in Oracle gives out duplicated records

I am a newbie to sql. I have three tables mr1,mr2,mr3. Caseid is the primary keys in all these tables. I need to join all these table columns and display result.
Problem is that i dont know which join to use.
when i joined all these just like below query:
select mr1.col1,mr1.col2,mr2.col1,mr2.col2,mr3.col1,mr3.col2
from mr1,mr2,mr3
where mr1.caseid = mr2.caseid
and mr2.caseid = mr3.caseid;
it displays 4 records, eventhough the maximum number of records is two, which is in table mr2.
records are duplicated, can anyone help me in this regard?
Distinct will do it but it's not the correct approch.
You need to add another join (mr1.caseid = mr3.caseid) because mr2 and mr3 rows must be related to the same row in mr1, otherwise you end up with 2 pairs, onde for each tabled joined to your primary table (mr2).
First answer in SO, so forgive me if it wasn't that clear.
Your problem is that your tables are in a one-to many relationship. When you join them, it is expected that the number of rows will go up unless you take steps to limit the records returned. How to fix depends on the meaning of the data.
If all the fields are exactly the same, then adding DISTINCT will fix the problem. However, it may be faster, depending on the size of the tables and the number of records you are returning, to use a derived table to limit the records in the join to only one from the table with multiple records.
If at least one of the fields is different however, then you need to know the business rule that will allow you to pick the correct record. It might be accomplished by adding a where clause or by using an aggregate function and group by or even both. This really depends on the meaning of the result set which is why you need to ask further question in your own organization as they are the only ones who will know which of the multiple records is the correct one to pick from the perspectives of the people who will be using the results of the query. Further, the business might actually want to see all of the records and you have no problem at all.

Retrieve data from two different table in a single report

I have two table Employee and Salary table, salary consists Salary of employee in a field named Salary_employee.
Second one is Extra Expense, Extra expense consists records related to extra expenses of a company like electricity bills,office maintenance in a field named extra_expense.
(Their is no relationship between these two table).
Finally, I just wanted to show all the expenses of company in a report, for this i need to group both the table. what to use here join or union ??.
If there is no relationship between the two tables, then this really cannot work since you dont know where the expense is supposed to tie into. You should redesign the database if possible as this sounds impossible based on your description.
UPDATE
OK, by the look of your screenshots, I am guessing that this database only stores one companies info? And not multiple?
IF that is correct, AND if all you want to do is squish the data together into one flowing report of expenses, then I would indeed suggest a UNION. A JOIN would not give you the flow you are looking for. A UNION will just smash the two outputs together into one...which I think is what you are asking for?
SELECT ext_amount AS amount, ext_date AS date_of_trans
FROM extra_expenses
UNION
SELECT sal_cash AS amount, sal_dateof_payment AS date_of_trans
FROM employee_salary
It sounds like you don't need to use group or join. Simply query both tables separately within a script and handle them both accordingly to their structure to produce a report.
Join and union are functions which you can use to extract different information on a common thing from separate tables. E.g. if you have a user whose private details are stored in one table, but their profile information is in another table. If you want to display both private details as well as profile info, you can join the two tables by the common user name in order to combine and gather all info on the user in one query.

Is it possible to add a "check if previous" column to a view?

I have a view in SQL Server 2008 that I want to use for a report in SSRS 2008.
The main problem is that I have to use two different datasets in one table and cannot do grouping as I want it. Both datasets come from this view. Let's say in one column of my report table I want to sum all computers of all school buildings of my country. In the other column, the ratio students of schools per computer.
Now, in DB there are two different tables one for Buildings and one for Schools (because sometimes there are different buildings for one school or other similar scenarios). Joining them results in more couples building-school than needed for the computer-sum column, it will result summing different times the same building (if more than one schools operate in that building).
The table is this:
To avoid this I have done those two datasets, one from the building point of view, and one from the school point of view. But these are two datasets in one table! To solve my problem, I have thought to add a special column to my view : it checks automatically if a BUILDING_ID is shown twice or more in the result table, f.e. like this:
The problem is that I don't know if this is possible and if it is, I don't know how to do it.
Maybe this can give you a hint:
select building_id,
row_number() over (partition by building_id order by newid()) - 1 check_if_previous
from yourtable
If you just want 1's or 0's
select building_id,
cast(row_number() over (partition by building_id order by newid()) - 1 as BIT) check_if_previous
from yourtable