Query select a bulk of IDs from a table - SQL - sql

I have a table which holds ~1M rows. My application has a list of ~100K IDs which belong to that table (the list being generated by the application layer).
Is there a common-method of how to query all of these IDs? ~100K Select queries? A temporary table which I insert the ~100K IDs to, and Select query via join the required table?
Thanks,
Doori Bar

You could do it in one query, something like
SELECT * FROM large_table WHERE id IN (...)
Insert a comma-separated list of IDs where I put the ...
Unfortunately, there is no easy way that I know of to parametrize this, so you need to be extra-super careful to avoid SQL injection vulnerabilities.

A temporary table which holds the 100k IDs seems like a good solution. Don't insert them one by one though ; INSERT ... VALUES syntax in MySQL accepts the insertion of multiple rows.
By the way, where do you get your 100k IDs, if it's not from the database ? If they come from a preceding request, I'd suggest to have it fill the temporary table.
Edit : For a more portable way of multiple insert :
INSERT INTO mytable (col1, col2) SELECT 'foo', 0 UNION SELECT 'bar', 1

Do those id's actually reference the table with 1M rows?
If so, you could use SELECT * ids FROM <1M table>
where ids is the ID column and where "1M table" is the name of the table which holds the 1M rows.
but I don't think I really understand your question...

Related

SQL: Insert certain records from one table into another and also add few other fields using query

I have two tables say TABLE1 and TABLE2. And say the field id is common in both. Rest of field are different.
I now select all distinct id from TABLE1 and want to insert them into TABLE2 while also writing its other attributes. Like the pseudocode below.
for each distinct id (i) in TABLE1:
INSERT in TABLE2 (i, false, unix_timestamp())
end
Now I for some reason cannot use a programming language to do this. Is it possible to do this in SQL using Apache Drill?
What you could do is write a query that produces the output you're looking for and then save that as a table. Drill is really a query engine and doesn't support INSERT operations the way a database does.
So a pseudo query migth look like this:
CREATE TABLE <your file> AS
SELECT ...
Then you could query that file. I don't know if that helps or not. You can also create views and temporary tables, but Drill itself doesn't really implement INSERT commands.

BigQuery loop to select values from dynamic table_names registered in another table

I'm looking for a solution to extract data from multiple tables and insert it into another table automatically running a single script. I need to query many tables, so I want to make a loop to select from those table's names dynamically.
I wonder if I could have a table with table names, and execute a loop like:
foreach(i in table_names)
insert into aggregated_table select * from table_names[i]
end
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset1.*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
This approach will work if below conditions are met
all to be processed table from list have exact same schema
one of those tables is the most recent table - this table will define schema that will be used for all the rest tables in the list
to meet above bullet - ideally list should be hosted in another dataset
Obviously, you can add INSERT INTO ... to insert result into whatever destination is to be
Please note: Filters on _TABLE_SUFFIX that include subqueries cannot be used to limit the number of tables scanned for a wildcard table, so make sure your are using longest possible prefix - for example
#standardSQL
SELECT * FROM `project.dataset1.source_table_*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
So, again - even though you will select data from specific tables (set in project.dataset2.list) the cost will be for scanning all tables that match project.dataset1.source_table_* woldcard
While above is purely in BigQuery SQL - you can use any client of your choice to script exacly the logic you need - read table names from list table and then select and insert in loop - this option is simplest and most optimal I think

Advice on removing records completely from a database

I am looking for some advice on the best way to remove multiple records (approximatley over 3000) completly from a database. I have been assigned a job of removing old records from our database for GDPR reasons.
However this is a database i do not have much knowledge on and there is no documentation, ERD's etc on how the tables are joined together.
I managed to work out the tables which will need to have records removed to completely remove details from the database, there are about 24 tables which need to have records removed from.
I have a list of ID numbers which need to be removed so i was thinking of creating a temporary table with the list of IDs and then creating a stored procedure to loop through the temproary tables. Then for each of the 24 tables check to see if it contains records connected to the ID number and then if they do delete them.
Does anyone know if there is any better way of removing these records??
I would use a table variable and union all:
declare #ids table (id int primary key)
insert into #ids (id)
select 1 union all
select 2 union all
...
select 3000
delete from table_name where id in
(select id from #ids)
Obviously just change the numbers to the actual ids

What SQL query do I need in order to add lots of empty rows to a table at once?

I understand that what I am asking for may not make a lot of sense, but I none the less have a particular need for it. I have a table that has 500 rows in it. I have another table that has 500 more rows, that I need to merge into the first table. The easiest way I know how to do that is to add 500 rows to the first table, and then use an update statement because then I have a primary key to use to pair the first and second tables.
So how can I add 500 blank rows to my first table? I've been trying to think of a query that would do that, but haven't been able to come up with anything...
You can insert to one table from another table:
INSERT INTO suppliers (supplier_id, supplier_name)
SELECT account_no, name
FROM customers
WHERE city = 'Newark';
You can use insert into statement:
SQlite: select into?
As long as the tables contain the same data structure, you can use a simple query to insert them into your table:
INSERT INTO tableOne SELECT * FROM tableTwo
If you have to manually map the fields, you'll have to change it to the field level insert, such as:
INSERT INTO tableOne(columnOne,columnTwo) SELECT column3, column4 FROM tableTwo
You can add the standard WHERE statements to these as well.
Hope that helps.

How do I query a SQL database for a lot of results that don't have any common criteria?

I have a MS SQL DB with about 2,600 records (each one information on a computer.) I need to write a SELECT statement that selects about 400 of those records.
What's the best way to do that when they don't have any common criteria? They're all just different random numbers so I can't use wildcards or anything like that. Will I just have to manually include all 400 numbers in the query?
If you need 400 specific rows where their column match a certain number:
Yes include all 400 numbers using an IN clause. It's been my experience (via code profiling) that using an IN clause is faster than using where column = A or column = B or ...
400 is really not a lot.
SELECT * FROM table WHERE column in (12, 13, 93, 4, ... )
If you need 400 random rows:
SELECT TOP 400 * FROM table
ORDER BY NEWID()
Rather than executing multiple queries or selecting the entire rowset and filtering it yourself, create either a temporary table or or a permanent table where you an insert temporary rows for each ID. In your main query just join on your temporary table.
For example, if your source table is...
person:
person_id
name
And you have 400 different person_id's you want, let's say we have a permanent table for our temporary rows, like this...
person_query:
query_id
person_id
You'd insert your rows into person_query, then execute your query like this..
select
*
from person p
join person_query pq on pq.person_id = p.person_id
where pq.query_id = #query_id
Maybe you have found a deficiency in the database design. That is, there is something common amongst the 400 records you want and what you need is another column in the database to indicate this commonality. You could then select against this new column.
As Brian Bondy said above, using the IN statement is probably the best way
SELECT * FROM table WHERE column in (12, 13, 93, 4, ... )
One good trick is to paste the IDs in from a spreadsheet, if you have one ...
If the IDs of the rows you want are in a spreadsheet, then you can add an extra column to the spreadsheet that CONCATENATES() a comma on to the end of the ID, so that the column in your spreadsheet looks like this:
12,
13,
93,
4,
then copy and paste this column of data into your query, so it looks like this:
SELECT * FROM table WHERE column in (
12,
13,
93,
4,
...
)
It doesn't look pretty but its a quick way of getting all the numbers in.
You could create an XML list or something of the sort which would keep track of what you need to query, and then you could write a query that would iterate through that list bringing all of them back.
Here is a website that has numerous examples of performing what you are looking for in a number of different methods (#4 is the XML method).
You can create a table with those 400+ random tokens, and select on those. e.g.,
SELECT * FROM inventory WHERE inventory_id IN (SELECT id FROM inventory_ids WHERE tag = 'foo')
You still have to maintain the other table, but at least you're not having one ginormous query.
I would built a separate table with your selection criteria and then join the tables together or something like that, assuming your criteria is static of course.
Just select the TOP n rows, and order by something random.
Below is a hypothetical example to return 10 random employee names:
SELECT TOP 10
EMP.FIRST_NAME
,EMP.LAST_NAME
FROM
Schema.dbo.Employees EMP
ORDER BY
NEWID()
For this specific situation (not necessarily a general solution) the fastest and simplest thing is probably to read the entire SQL table into memory and find your matches in your program's code rather than have the database parse a gigantic where clause.