What is the difference between Select and Project Operations - sql

I'm referring to the basic relational algebra operators here.
As I see it, everything that can be done with project can be done with select.
I don't know if there is a difference or a certain nuance that I've missed.

PROJECT eliminates columns while SELECT eliminates rows.

Select Operation : This operation is used to select rows from a table (relation) that specifies a given logic, which is called as a predicate. The predicate is a user defined condition to select rows of user's choice.
Project Operation : If the user is interested in selecting the values of a few attributes, rather than selection all attributes of the Table (Relation), then one should go for PROJECT Operation.
See more : Relational Algebra and its operations

In Relational algebra 'Selection' and 'Projection' are different operations, but the SQL SELECT combines these operations in a single statement.
Select retrieves the tuples (rows) in a relation (table) for which the condition in 'predicate' section (WHERE clause) stands true.
Project retrieves the attributes (columns) specified.
The following SQL SELECT query:
select field1,field2 from table1 where field1 = 'Value';
is a combination of both Projection and Selection operations of relational algebra.

Project is not a statement. It is the capability of the select statement.
Select statement has three capabilities. They are selection,projection,join. Selection-it retrieves the rows that are satisfied by the given query.
Projection-it chooses the columns that are satisfied by the given query.
Join-it joins the two or more tables

selection opertion is used to select a subset of tuple from the relation that satisfied selection condition It filter out those tuple that satisfied the condition .Selection opertion can be visualized as horizontal partition into two set of tuple - those tuple satisfied the condition are selected and those tuple do not select the condition are discarded
sigma (R)
projection opertion is used to select a attribute from the relation that satisfied selection condition . It filter out only those tuple that satisfied the condition . The projection opertion can be visualized as a vertically partition into two part -are those satisfied the condition are selected other discarded
Π(R)
attribute list is a num of attribute

Project will effects Columns in the table while Select effects the Rows. on other hand Project is use to select the columns with specefic properties rather than Select the all of columns data

Select extract rows from the relation with some condition and Project extract particular number of attribute/column from the relation with or without some condition.

The difference between the project operator (π) in relational algebra and the SELECT keyword in SQL is that if the resulting table/set has more than one occurrences of the same tuple, then π will return only one of them, while SQL SELECT will return all.

select just changes cardinality of the result table but project does change both degree of relation and cardinality.

The difference come in relational algebra where project affects columns and select affect rows. However in query syntax, select is the word. There is no such query as project.
Assuming there is a table named users with hundreds of thousands of records (rows) and the table has 6 fields (userID, Fname,Lname,age,pword,salary). Lets say we want to restrict access to sensitive data (userID,pword and salary) and also restrict amount of data to be accessed. In mysql maria DB we create a view as follows ( Create view user1 as select Fname,Lname, age from users limit 100;) from our view we issue (select Fname from users1;) . This query is both a select and a project

Related

Get count of distinct key field values from CDS

I would like to ask if it is possible to get dynamically Count of distinct fields using ABAP.
Key in our CDS has 9 fields which is quite a lot but it is not possible to split because of historical decisions. What I need is code like below:
select count(distinct (lv_requested_elements)) from CDS_VIEW;
or
select count(*) from (select distinct lv_requested_elements from CDS_VIEW);
I know that it is possible to read the select into memory and get sy-dbcnt but I want to be sure that there is no other option.
I assume that most simple and straightforward way is to read the smallest field into memory and then count by grouped (distinctified) rows:
DATA(fields) = ` BLART, BLDAT, BUDAT`.
DATA: lt_count TYPE TABLE OF string.
SELECT (fields(6))
INTO TABLE #lt_count
FROM ('BKPF')
GROUP BY (fields).
DATA(count) = sy-dbcnt.
CTE, that was mentioned, uses the same memory read, so you'll receive no performance gain:
A common table expression creates a temporary tabular results set, which can be accessed during execution of the WITH statement
If you going to count this key combination frequently, I propose to create consumption or nested CDS view which will do this on-the-fly.

Using Distinct in Aggregate Select query

I am using oracle DB. I have a Aggregated script. We found that some of the rows in the table are repeated, unwanted and hence, is not supposed to be added in the sum.
now suppose i use Distinct command just after the select statement, will distinct command applied before aggregation or after it.
If you use SELECT DISTINCT, then the result set will have no duplicate rows.
If you use SELECT COUNT(DISTINCT), then the count will only count distinct values.
If you are thinking of using SUM(DISTINCT) (or DISTINCT with any other aggregation function) be warned. I have never used it (except perhaps as a demonstration), and I have written a fair number of queries.
You really need to solve the problem at the source. For instance, if accounts are being repeated, then SUM(DISTINCT) does not distinguish between accounts, only by the values assigned to the account. You need to get the logic right.
when you say that you have repeated rows - you must have a clear idea of uniqueness for the combination of some specific columns.
If you expect that certain column combinations are unique within specified groups yo can detect the groups deviating from that using queries following the pattern below.
select <your group by columns>
from <your table name>
group by <your group by predicate>
having (max(A)!=min(A) or max(B)!=min(B) or max(C)!=min(C))
Then you have to decide what to do with the problem. I would suggest cleaning up and adding unique constraints to the table.
The aggregate query you mention would run successfully for the rows in your table not having duplicate values for the combination of columns that needs to be unique. Using my example you could get the aggregates for that part of your data using the inverted having predicate.
It would be something like this
select <your aggregate functions, counts, sums, averages and so on>
from <your table name>
group by <your group by predicate>
having (max(A)=min(A) and max(B)=min(B) and max(C)=min(C))
If you must include the groups breaking uniqueness expectations you must somehow do a qualified selection of which of the variants in the group to use - you could for example go for the last one or the first one if one of your columns should happen to express something about when the row was created.

Clarification about Select from (select...) statement

I came across a SQL practice question. The revealed answer is
SELECT ROUND(ABS(a - c) + ABS(b - d), 4) FROM (
SELECT MIN(lat_n) AS a, MIN(long_w) AS b, MAX(lat_n) AS c, MAX(long_w) AS d
FROM station);
Normally, I would enocunter
select[] from[] where [] (select...)
which to imply that the selected variable from the inner loop at the where clause will determine what is to be queried in the outer loop. As mentioned at the beginning, this time the select is after
FROM to me I'm curious the functionality of this. Is it creating an imaginary table?
The piece in parentheses:
(SELECT MIN(lat_n) AS a, MIN(long_w) AS b, MAX(lat_n) AS c, MAX(long_w) AS d FROM station)
is a subquery.
What's important here is that the result of a subquery looks like a regular table to the outer query. In some SQL flavors, an alias is necessary immediately following the closing parenthesis (i.e. a name by which to refer to the table-like result).
Whether this is technically a "temporary table" is a bit of a detail as its result isn't stored outside the scope of the query; and there is an also a thing called a temporary table which is stored.
Additionally (and this might be the source of confusion), subqueries can also be used in the WHERE clause with an operator (e.g. IN) like this:
SELECT student_name
FROM students
WHERE student_school IN (SEELCT school_name FROM schools WHERE location='Springfield')
This is, as discussed in the comments and the other answer a subquery.
Logically, such a subquery (when it appears in the FROM clause) is executed "first", and then the results treated as a table1. Importantly though, that is not required by the SQL language2. The entire query (including any subqueries) is optimized as a whole.
This can include the optimizer doing things like pushing a predicate from the outer WHERE clause (which, admittedly, your query doesn't have one) down into the subquery, if it's better to evaluate that predicate earlier rather than later.
Similarly, if you had two subqueries in your query that both access the same base table, that does not necessarily mean that the database system will actually query that base table exactly twice.
In any case, whether the database system chooses to materialize the results (store them somewhere) is also decided during the optimization phase. So without knowing your exact RDBMS and the decisions that the optimizer takes to optimize this particular query, it's impossible to say whether it will result in something actually being stored.
1Note that there is no standard terminology for this "result set as a table" produced by a subquery. Some people have mentioned "temporary tables" but since that is a term with a specific meaning in SQL, I shall not be using it here. I generally use the term "result set" to describe any set of data consisting of both columns and rows. This can be used both as a description of the result of the overall query and to describe smaller sections within a query.
2Provided that the final results are the same "as if" the query had been executed in its logical processing order, implementations are free to perform processing in any ordering they choose to.
As there are so many terms involved, I just thought I'll throw in another answer ...
In a relational database we deal with tables. A query reads from tables and its result again is a table (albeit not a stored one).
So in the FROM clause we can access query results just like any stored table:
select * from (select * from t) x;
This makes the inner query a subquery to our main query. We could also call this an ad-hoc view, because view is the word we use for queries we access data from. We can move it to the begin of our main query in order to enhance readability and possibly use it multiple times in it:
with x as (select * from t) select * from x;
We can even store such queries for later access:
create view v as select * from t;
select * from v;
In the SQL standard these terms are used:
BASE TABLE is a stored table we create with CREATE TABLE .... t in above examples is supposed to be a base table.
VIEWED TABLE is a view we create with CREATE VIEW .... v above examples is a viewed table.
DERIVED TABLE is an ad-hoc view, such as x in the examples above.
When using subqueries in other clauses than FROM (e.g. in the SELECT clause or the WHERE clause), we don't use the term "derived table". This is because in these clauses we don't access tables (i.e. something like WHERE mytable = ... does not exist), but columns and expression results. So the term "subquery" is more general than the term "derived table". In those clauses we still use various terms for subqueries, though. There are correlated and non-correlated subqueries and scalar and non-scalar ones.
And to make things even more complicated we can use correlated subqueries in the FROM clause in modern DBMS that feature lateral joins (sometimes implemented as CROSS APPLY and OUTER APPLY). The standard calls these LATERAL DERIVED TABLES.

What does * mean in sql?

For example, I know what SELECT * FROM example_table; means. However, I feel uncomfortable not knowing what each part of the code means.
The second part of a SQL query is the name of the column you want to retrieve for each record you are getting.
You can obviously retrieve multiple columns for each record, and (only if you want to retrieve all the columns) you can replace the list of them with *, which means "all columns".
So, in a SELECT statement, writing * is the same of listing all the columns the entity has.
Here you can find probably the best tutorial for SQL learning.
I am providing you answer by seperating each part of code.
SELECT == It orders the computer to include or select each content from the database name(table ) .
(*) == means all {till here code means include all from the database.}
FROM == It refers from where we have to select the data.
example_table == This is the name of the database from where we have to select data.
the overall meaning is :
include all data from the databse whose name is example_table.
thanks.
For a beginner knowing the follower concepts can be really useful,
SELECT refers to attributes that you want to have displayed in your final query result. There are different 'SELECT' statements such as 'SELECT DISTINCT' which returns only unique values (if there were duplicate values in the original query result)
FROM basically means from which table you want the data. There can be one or many tables listed under the 'FROM' statement.
WHERE means the condition you want to satisfy. You can also do things like ordering the list by using 'order by DESC' (no point using order by ASC as SQL orders values in ascending order after you use the order by clause).
Refer to W3schools for a better understanding.

How to get other columns in this query

I am using a group by clause in my query. I want to get other columns not specified in the group by parameters
SELECT un.user, un.role
FROM [Unique] un
group by user, role
In the query about [Unique] has 7 columns altogether. How do I get the other columns?
In most databases (MySQL and SQLite are the exceptions I know of), you cannot include a column in a GROUP BY SELECT unless:
The column is included in the GROUP BY clause.
The column is aggregated in one of the supported aggregate functions.
In MySQL and SQLite, the rows inside the aggregate groups from which the extra values get taken are undefined.
If you want extra columns in any other engine, you can wrap the column names in MAX():
SELECT un.user, un.role, MAX(un.city), MAX(un.bday)
FROM [Unique] un
GROUP BY user, role
In this case, the values for the extra columns are likely to come from different rows in the input record set. If this is important (sometimes it isn't since the extra columns come from the one side of a one-to-many JOIN), you can't use this technique.
Just to be clear: If you use GROUP BY in a SELECT, then each row you get back is constructed out of groups of multiple rows in the table you're SELECTing against. If you include columns that are not part of the GROUP BY clause, you're not giving the engine any instructions on which row from the table you want that value read from. Most engines, therefore, do not allow you to run this kind of SQL. MySQL does, with undefined results but I personally consider it bad practice to do this.
You have to choose on what basis you want the other columns. If multiple entries exist for the same user / role, do you want the first / last / random? You have to make choices on the other columns, by aggregating them or choosing to include them in the group by statement.
Some RDBMS do provide a default behaviour for performing this, but since the question is just marked SQL, we do not know if it applies.
Have you tried just specifying them?
SELECT un.user, un.role, un.col3, un.col4
FROM [Unique] un
group by user, role
You need to use a Order By to get extra column. or you end up specifying every column in your group by.
Use LEFT JOIN to self-join the Unique or use the SELECT with GROUP BY as sub-query.