I have a dataset with around 500 million records and I have requirement to derive two columns based on sequential processing of case statemens something like
Select Field1,
Field2,
Case when (expression1a and expression2c and expression 3d)
Then ‘abc’
Case when (expression1b and (expression 2f or expression 3))
Then ‘def’
Case when (expression1x and expression 2f and expression 3)
Then ‘ghi’
Case when (expression1 and expression 2n and expression 3)
Then ‘nop’
....
.....
......
.....
Else ‘unp’ end as field3
From table
With such a large query length I am facing issue of 250k character limit as well. Is there any better way to handle this scenario on google cloud?
The only way I know how to solve your problem would be to create a table and populate a column where you could list all these variables. Something like:
SELECT field1 as tmp
FROM humongoustable
WHERE tmp IN (SELECT words from smaller_table)
You would do this for every variable you needed and hopefully would be able to complete the query under the limit.
Also something else you may want to look into, is creating a new column in the table based on values that you are looking for and populate them as True/False and perform filters and joins based on these new columns. These columns could be in other tables or in the same table.
Related
I would like to implement for 1 column multiple specific parameters like:
select * from table1
where column1 = a or column1 = b or column1 = c ...
Can it be done in a better way (the SQL Statement in Use is over 10 lines long with the OR statements it'll grow another 10 lines O.o and it'll make the code much slower!)
You can use in:
select t.*
from t
where column in ( . . . );
The in list is pretty equivalent to a bunch of or conditions. There are some nuances. For instance, all the values in the in list will be converted to the same type. If one is a number and the rest are strings, then all will be converted to strings -- perhaps generating an error.
For performance, you want an index on t(column).
Our application is a Mainframe which is a IBM iSeries – DB2 database set up. Some of our table values have a range.
Ex: 100;105;108;110:160;180
-- UPDATE --
The above data is from a single row (Single column to be precise). In the same format there would be multiple values (on various rows)
It this case, individual values are delimited by a “;” but 110:160 is a range. It includes all the values from 110 to 160. Now, for the individual values we were using like statements obviously. Ex; if I have to query for 105.
The challenge here is, if I had to query 125 which is technically not present in the database. However, logically I need to retrieve that record.
The system (application) somehow was able to accomplish this, I am not sure how. I am not a mainframe developer, I just had to query the database to retrieve a specific record for some of the automation that we work on.
As a workaround, I could think of two things:
Expand the ranges and store it in a temp database programmatically.
Ex: 110:160 would be expanded to 110;111;112..160 (Yes, it’s tedious)
Reduce the number of records, by filtering through certain unique colums (the one’w which are without ranges) then programmatically apply a logic to identify the right record
As both are workarounds, I was so curious to how the system does it. (I reached out to dev’s of the app. So far, no luck). So is there a direct approach to achieve this ? Could it be a stored procedure ?
If i got your question right your example values are not in a single row but in multiple - otherwise some preprocessing has to be done.
I would destruct the combined value into its components with SQL - like:
with temp(id, text, value1, value2) as (
select id, text
,case when posstr(id,':') > 0
then substr(id, 1, posstr(id,':') - 1)
else id
end as value1
,case when posstr(id,':') > 0
then substr(id, posstr(id,':')+1 , length(id))
else id
end as value2
from testrange
)
select * from temp
where 125 between value1 and value2
I am trying to write code that allows me to check if there are any cases of a particular pattern inside a table.
The way I am currently doing is with something like
select count(*)
from database.table
where column like (some pattern)
and seeing if the count is greater than 0.
I am curious to see if there is any way I can speed up this process as this type of pattern finding happens in a loop in my query and all I need to know is if there is even one such case rather than the total number of cases.
Any suggestions will be appreciated.
EDIT: I am running this inside a Teradata stored procedure for the purpose of data quality validation.
Using EXISTS will be faster if you don't actually need to know how many matches there are. Something like this would work:
IF EXISTS (
SELECT *
FROM bigTbl
WHERE label LIKE '%test%'
)
SELECT 'match'
ELSE
SELECT 'no match'
This is faster because once it finds a single match it can return a result.
If you don't need the actual count, the most efficient way in Teradata will use EXISTS:
select 1
where exists
( select *
from database.table
where column like (some pattern)
)
This will return an empty result set if the pattern doesn't exist.
In terms of performance, a better approach is to:
select the result set based on your pattern;
limit the result set's size to 1.
Check whether a result was returned.
Doing this prevents the database engine from having to do a full table scan, and the query will return as soon as the first matching record is encountered.
The actual query depends on the database you're using. In MySQL, it would look something like:
SELECT id FROM database.table WHERE column LIKE '%some pattern%' LIMIT 1;
In Oracle it would look like this:
SELECT id FROM database.table WHERE column LIKE '%some pattern%' AND ROWNUM = 1;
Does SQLite offer a way to search every column of a table for a searchkey?
SELECT * FROM table WHERE id LIKE ...
Selects all rows where ... was found in the column id. But instead to only search in the column id, I want to search in every column if the searchstring was found. I believe this does not work:
SELECT * FROM table WHERE * LIKE ...
Is that possible? Or what would be the next easy way?
I use Python 3 to query the SQLite database. Should I go the route to search through the dictionary after the query was executed and data returned?
A simple trick you can do is:
SELECT *
FROM table
WHERE ((col1+col2+col3+col4) LIKE '%something%')
This will select the record if any of these 4 columns contain the word "something".
No; you would have to list or concatenate every column in the query, or reorganize your database so that you have fewer columns.
SQLite has full-text search tables where you can search all columns at once, but such tables do not work efficiently with any other queries.
I could not comment on #raging-bull answer. So I had to write a new one. My problem was, that I have columns with null values and got no results because the "search string" was null.
Using coalesce I could solve that problem. Here sqlite chooses the column content, or if it is null an empty string (""). So there is an actual search string available.
SELECT *
FROM table
WHERE (coalesce(col1,"") || coalesce(col2,"") || coalesce(col3,"") || coalesce(col4,"")) LIKE '%something%')
I'm not quite sure, if I understood your question.
If you want the whole row returned, when id=searchkey, then:
select * from table where id=searchkey;
If you want to have specific columns from the row with the correct searchkey:
select col1, col2, col3 from table where id=searchkey;
If you want to search multiple columns for the "id": First narrow down which columns this could be found in - you don't want to search the whole table! Then:
select * from table where col1=searchkey or col2=searchkey or col3=searchkey;
The title is a bit confusing, so I'll explain with an example what I'm trying to do.
I have a field called "modifier". This is a field with concatenated values for each individual. For example, the value in one row could be:
*26,50,4 *
and the value in the next row
*4 *
And the table (Table A) would look something like this:
Key Modifier
1 *26,50,4 *
2 *4 *
3 *1,2,3,4 *
The asterisks are always going to be in the same position (here, 1 and 26) with an uncertain number of numbers in between, separated by commas.
What I'd like to do is "join" this "modifier" field to another table (Table B) with a list of possible values for that modifier. e.g., that table could look like this:
ID MOD
1 26
2 3
3 50
4 78
If a value in A.modifier appears in B.mod, I want to keep that row in Table A. Otherwise, leave it out. (I use the term "join" loosely because I'm not sure that's what I need here.)
Is this possible? How would I do it?
Thanks in advance!
edit 1: I realize I can use regular expressions and do a bunch of or statements that search for the comma-separated values in the MOD list, but is there a better way?
One way to do it is using TRIM, string concatenations and LIKE.
SELECT *
FROM tableA a
WHERE EXISTS(
SELECT 1 FROM tableB b
WHERE
','|| trim( trim( BOTH '*' FROM a.Modifier )) ||','
LIKE '%,'|| b.mod || ',%'
);
Demo --> http://www.sqlfiddle.com/#!4/1caa8/10
This query migh be still slow for huge tables (it always performs full scans of tables or indexes), however it should be faster than using regular expressions or parsing comma separated lists into individual values.