SQL or statement vs multiple select queries - sql

I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?

SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.

A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)

Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)

Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)

Related

Selecting a large number of rows by index using SQL

I am trying to select a number of rows by the value of a column called ID. I know you can do this pretty easily by:
SELECT col1, col2, col3 FROM mytable WHERE id IN (1,2,3,4,5...)
However, what if there are a few million IDs I want to select and the IDs don't always have pattern (which means I can't use something like BETWEEN x AND y)? Does this select statement still work or is there better ways of doing so?
The actual application is this. Filters are specified by users, which is compared to some attributes of the records. From those filters, we create a subset of the data which is of interest to a particular user. There are about 30 million records each with roughly ~3000 attributes (which is stored in roughly 30 tables, but every table has ID as a primary key), so every time someone makes a query about their desired subset of records, we'd have to join many tables, apply those filters, and figure out what his subset looks like. In order to avoid joining many tables all the time, I thought maybe it's a better idea to join the tables once, figure out the id of the selected subset, and this way each time a new query is made, all we have to do is select the relevant columns of the rows that match the filtered ids.
This depends on the database and the interface you are using. For a few hundred or thousand values, no problem. But your question specifies millions. And that could start to get into limits on the length of the query -- either specified by the database, the tool you are using, or intermediate libraries.
If you have so many ids, I would strongly recommend that you load them into a table in the database with the id as the primary key. Then use join or exists to identify the rows in your table that match.
Often, such a list would be generated in the database anyway. In that case, you can use a subquery or CTE and just include that code in your final query.

Row Stores vs Column Stores

Assuming that the database is already populated with data, and that each of the following SQL statements is the one and only query that an application will perform, why is it better to use row-wise or column-wise record storage for the following queries?...
1) SELECT * FROM Person
2) SELECT * FROM Person WHERE id=5
3) SELECT AVG(YEAR(DateOfBirth)) FROM Person
4) INSERT INTO Person (ID,DateOfBirth,Name,Surname) VALUES(2e25,’1990-05-01’,’Ute’,’Muller’)
In those examples Person.id is the primary key.
The article Row Store and Column Store Databases gives a general discussion on this, but I am specifically concerned about the four queries above.
SELECT * FROM ... queries are better for row stores since it has to access numerous files.
Column store is good for aggregation over large volume of date or when you have quesries that only need a few fields from a wide table.
Therefore:
1st querie: row-wise
2nd query: row-wise
3rd query: column-wise
4th query: row-wise
I have no idea what you are asking. You have this statement:
INSERT INTO Person (ID, DateOfBirth, Name, Surname)
VALUES('2e25', '1990-05-01', 'Ute', 'Muller');
This suggests that you have a table with four columns, one of which is an id. Each person is stored in their own column.
You then have three queries. The first cannot be optimized. The second is optimized, assuming that id is a primary key (a reasonable assumption). The third requires a full table scan -- although that could be ameliorated with an index only on DateOfBirth.
If the data is already in this format, why would you want to change it?
This is a very simple data structure. Three of your four query examples access all columns. I see no reason why you would not use a regular row-store table structure.

How to select all fields in SQL joins without getting duplicate columns names?

Suppose I have one table A, with 10 fields. And Table B, with 5 fields.
B links to A via a column named "key", that exists both in A, and in B, with the same name ("key").
I am generating a generic piece of SQL, that queries from a main table A, and receives a table name parameter to join to, and select all A fields + B.
In this case, I will get all the 15 fields I want, or more precisely - 16, because I get "key" twice, once from A and once from B.
What I want is to get only 15 fields (all fields from the main table + the ones existing in the generic table), without getting "key" twice.
Of course I can explicit the fields I want in the SELECT itself, but that thwarts my very objective of building a generic SQL.
It really depends on which RDBMS you're using it against, and how you're assembling your dynamic SQL. For instance, if you're using Oracle and it's a PL/SQL procedure putting together your SQL, you're presumably querying USER_TAB_COLS or something like that. In that case, you could get your final list of columns names like
SELECT DISTINCT(column_name)
FROM user_tab_cols
WHERE table_name IN ('tableA', 'tableB');
but basically, we're going to need to know a lot more about how you're building your dynamic SQL.
Re-thinking about what I asked makes me conclude that this is not plausible. Selecting columns in a SELECT statement picks the columns we are interested in from the list of tables provided. In cases where the same column name exists in more than one of the tables involved, which are the cases my question is addressing, it would, ideally, be nice if the DB engine could return a unique list of fields - BUT - for that it would have to decide itself which column (and from which table) to choose, from all the matches - which is something that the DB cannot do, because it is solely dependent in the user's choice.

How to check efficiently, if a substring exists : SQL Query

I have to do certain actions based on the decision if a sub string exists in a column.
For example my column 'LangCodes' have # separated values like en-us#ar-ae#in-id.
I can use the SQL in operator if I can convert the value in form like : 'en-us','ar-ae','in-id'.
For example select Col1 from Table1 where 'en-us' in (LangCodes)
Do I need to use replace function of SQL to accomplish this or any better way exists?
You cannot do this efficiently in SQL Server, because you are storing your data in a fashion not consistent with the use of relational databases. You need a separate correlation table that has columns id and LangCode, with one row per language code.
You can do what you want with string operations. Here is a typical way:
where '#'+LangCodes+'#' like '%#en-us#%'
This, however, cannot take advantage of an index on LangCodes.
The most efficient and best way to check your languages codes is to seperate them in your table.
Never, never, never store multiple values in one column!
This is how your tables could look like (just examples)
product table
-------------
id
name
language_code table
-------------------
id
name
product_language_code table
---------------------------
product_id
language_code_id

How do you complex join a number table with an actual table with many clauses dependent on the data from the number table?

I have a table of numbers (PLSQL collection containing some_table_line_ids passed in from a website).
Then I have some_table also has columns -> config_data, config_state
I want to pull in all lines that have the same table_id from the all the table_ids in the number table.
I also want to pull in all lines that have the same config_data as each record pulled in from the first part.
So its a parent/child relationship. This can be done in two for loops by selecting a line by an id in a cursor then another for loop selecting each line equaling the parents config data. Each loop I am performing data manipulation on each line.
I would like to combine both these into a single cursor having all table ids that I need.
What would that look like?
You just want to do a complicated join on different factors. Something like:
select st2.*
from numbers n join
some_table st
on st.table_id = n.table_id join
some_table st2
on st2.config_data = st.config_data
Quite possibly, you actually want:
select distinct st.*
since you might otherwise have duplicates. Or, you might want:
select n.table_id, st.config_data, st2.*
So you know which of the original values was responsible for bringing in the row.
You describe the array as a PL/SQL collection. If you employ a SQL type instead you could include it in the FROM clause by using the TABLE function.
create type some_table_line_id_nt as table of number;
Something like:
select s.*
from some_table s
join table(some_table_line_ids) t
on s.id = t.column_value
(I haven't offered a complete solution as you haven't given enough details of table structure and data.)
I solved the issue using start with and connect by prior.