How to query select statement with millions of input in text file

How to query select statement with millions of input in text file - sql

I have millions of Id's in text file (temp.txt). I have to write a select statement which recursively execute by picking the ids from text file and return the output.
select * from table where id=123;
temp.txt
1234
1224
1232
.
.

Some options:
Created a "linked server" using a text DB driver and join to your real data
Load the text data to a temp table somehow and join that to your "real" data
Use a script to generate a query with a massive "IN" clause (or multiple queries that are UNIONed together if the IN clause is too big)
Loading to a temp file is probably the most efficient overall, but may or may not be possible depending on your DB permissions.

Let's say you have 5 IDs in your text file.. the IDs could be something like 1984, 2346, 2345, 6534, 1234.
To write a query to select all of these, try this:
SELECT * FROM your_table WHERE column IN (1984, 2346, 2345, 6534, 1234);
For every ID in your text file, you need to put the ID with a comma after it inside the parenthesis.
This query selects all columns for each record in a table where any of the IDs you search for in the parenthesis match a column value based on the column you suggested in the query.

Related

SQLite WHERE-Clause for every column?

Does SQLite offer a way to search every column of a table for a searchkey?
SELECT * FROM table WHERE id LIKE ...
Selects all rows where ... was found in the column id. But instead to only search in the column id, I want to search in every column if the searchstring was found. I believe this does not work:
SELECT * FROM table WHERE * LIKE ...
Is that possible? Or what would be the next easy way?
I use Python 3 to query the SQLite database. Should I go the route to search through the dictionary after the query was executed and data returned?

A simple trick you can do is:
SELECT *
FROM table
WHERE ((col1+col2+col3+col4) LIKE '%something%')
This will select the record if any of these 4 columns contain the word "something".

No; you would have to list or concatenate every column in the query, or reorganize your database so that you have fewer columns.
SQLite has full-text search tables where you can search all columns at once, but such tables do not work efficiently with any other queries.

I could not comment on #raging-bull answer. So I had to write a new one. My problem was, that I have columns with null values and got no results because the "search string" was null.
Using coalesce I could solve that problem. Here sqlite chooses the column content, or if it is null an empty string (""). So there is an actual search string available.
SELECT *
FROM table
WHERE (coalesce(col1,"") || coalesce(col2,"") || coalesce(col3,"") || coalesce(col4,"")) LIKE '%something%')

I'm not quite sure, if I understood your question.
If you want the whole row returned, when id=searchkey, then:
select * from table where id=searchkey;
If you want to have specific columns from the row with the correct searchkey:
select col1, col2, col3 from table where id=searchkey;
If you want to search multiple columns for the "id": First narrow down which columns this could be found in - you don't want to search the whole table! Then:
select * from table where col1=searchkey or col2=searchkey or col3=searchkey;

Find out if a value exists in a column with a large input values set

What is the most effective (and simple) way to find out if a specific column cells of a table contain one of a given values?
To give you some background, I have a list of 1000 ID numbers. They might or might not exist in a "FileName" column of a table "ProcessedFiles" as a part of the filename.
Basically, I need to check which of these 1000 tasks have been processed (i.e. they exist in the table).
The thing that I came with seems very uneffective:
SELECT * FROM ProcessedFiles
WHERE FileName LIKE '%54332423%'
OR FileName LIKE '%234432%'
OR FileName LIKE '%342342%'
...
etc
Thanks for help!

You could create a temporary table and insert all the Ids in a column. Then you could cross join with the ProcessedFiles table and check for the id in the name with a like:
SELECT pf.*
FROM ProcessedFiles pf,table t
WHERE pf.FileName like '%'+t.Id+'%'
I tested the above and it worked on SQL Server.

SQL Query: Modify records based on a secondary table

I have two tables in a PostgreSQL database.
The first table contains an ID and a text field with up to 200 characters and the second table contains a data definition table which has a column that contains smileys or acronyms and a second column which converts them to plain readable English.
The number of records in table 1 is about 1200 and the number in table two is about 300.
I wish to write a SQL statement which will convert any text speak in column 1 in table one into normal readable language based on the definitions in Table 2.
So for example if the value in table 1 reads as: Finally Finished :)
The transformed SQL would be something like: Finally Finished Smiles or smiling,
where the definition is pulled from the second table.
Note the smiley could be anywhere in the text in column one and could one of three hundred characters.
Does anyone know if this is possible?

Yes. Do you want to do it entirely in SQL, or are you writing a brief bit of code to do this? I'm not entirely sure of how to do it all in SQL but I would consider something like what is below:
SELECT row.textToTranslate FROM Table_1
oldText = row.textToTranslate
Split row.textToTranslate by some delimeter
For each word in row.textToTranslate:
queryResult = SELECT FROM Table_2 WHERE pretranslate=word
if(queryResult!=Null)
modifiedText = textToTranslate.replace(word, queryResult)
UPDATE Table_1 SET translatedText=modifiedText WHERE textToTranslate=oldText

Pentaho Kettle Spoon Date manipulation

I am using Pentaho Spoon to do some transformation. I am using 'Table Input' and joining multiple tables to get final output table.
I need to achieve:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date = '2013-1-9'
AND TBLB.date BETWEEN '2012-11-15' AND '2013-1-9';
I am manually inserting '2012-11-15' but I am using Get System Data to insert '2012-1-9'. I am using 1 Get System Data.
My query is:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date='?'
AND TBLB.date BETWEEN '2012-11-15' AND '?';
I get error message in Table Input saying No value specified for parameter 2
Any suggestion will be appreciated.
Thank you.

Simple one this; You need to "duplicate" the system date. So add another line in "get system data" called "date2" or something, make it the same as the first line, and then it will fill in the 2nd parameter or ?
OR simply change the query to say between '2012-11-15' and TBLA.date
then you dont need the 2nd parameter

Personally I prefer the pattern of a Get System Info/Add Constants step to create one row with multiple columns that feeds into a Database Join step. Then you replace parameters in your query with columns instead of rows, and you can specify a column more than once.

Save All Results to Excel

I have run a query using Eclipse from a Sybase db. I need to eliminate duplicate entries but the results have mixed types - INT and TEXT. Sybase will not do distinct on TEXT fields. When I Save All results and paste that into Excel some of the TEXT field bleeds into the INT field columns - which makes Excel -Remove Duplicates tough to do.
I am thinking I might create an alias for my query, add a temp table, select the distinct INT column values from the alias and then query the alias again, this time including the TEXT values. Then when I export the data I save it into Word instead. It would look like this:
SELECT id, text
FROM tableA, TableB
WHERE (various joins here...)
AS stuff
CREATE TABLE #id_values
(alt_id CHAR(8) null)
INSERT INTO #id_values
(SELECT DISTINCT id
FROM stuff)
SELECT id, text
FROM stuff a
WHERE EXISTS (SELECT 1 FROM #id_values WHERE b.alt_id = a.id )
If there was a way to format the data better in Excel I would not have to do all this manipulation on the db side.I have tried different formats in the Excel import dialog..import as tab-delimited, space-delimited with the same end result.
Additional information: I converted the TEXT to VARCHAR but I now need a new column which has up to 5 entries per id sometimes. ID -> TYPE is 1-many? The distinct worked on the original list but now I need to figure out how to show all the new column values in one row with each id. The new column is CHAR(4).
Now my original select looks like this:
SELECT DISTINCT id, CONVERT(VARCHAR(8192), text), type_cd
FROM TableA, TableB
...etc
And I get multiple rows again for each type_cd attached to an id. I also realized I don't think I need the 'b.' alias in front of *alt_id*.
Also, regardless of how I format the query (TEXT or VARCHAR), Excel continues to bleed the text into the id rows. Maybe this is not a sql problem but rather with Excel, or maybe Eclipse.

You are limited in how much data you can past into an Excel cell anyway, so convert your text to a varchar:
SELECT distinct id, cast(text as varchar(255)) as text
FROM tableA, TableB
WHERE (various joins here...)
I'm using 255, because that is the default on what Excel shows. You can have longer values in Excel cells, but this may be sufficient for your purposes. If not, just make the value bigger.
Also, as a comment, you should be using the proper syntax for joins, which uses the "on" clause (or "cross join" in place of a comma).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to query select statement with millions of input in text file - sql

I have millions of Id's in text file (temp.txt). I have to write a select statement which recursively execute by picking the ids from text file and return the output. select * from table where id=123; temp.txt 1234 1224 1232 . .

Related

SQLite WHERE-Clause for every column?

Find out if a value exists in a column with a large input values set

SQL Query: Modify records based on a secondary table

Pentaho Kettle Spoon Date manipulation

Save All Results to Excel

Categories

Resources