I'm new to SQL (using postgreSQL) and I've written a java program that selects from a large table and performs a few functions. The problem is that when I run the program I get a java OutOfMemoryError because the table is simply too big. I know that I can select from the beginning of the table using the LIMIT operator, but is there a way I can start the selection from a certain index where I left off with the LIMIT command? Thanks!
There is offset option in Postgres as in:
select from table
offset 50
limit 50
For mysql you can use the follwoing approaches:
SELECT * FROM table LIMIT {offset}, row_count
SELECT * FROM table WHERE id > {max_id_from_the previous_selection} LIMIT row_count. First max_id_from_the previous_selection = 0.
This is actually something that the jdbc driver will handle for you transparently. You can actually stream the result set instead of loading it all into memory at once. To do this in MySQL, you need to follow the instructions here: http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
Basically when you create you call connection.prepareStatement you need to pass ResultSet.TYPE_FORWARD_ONLY and ResultSet.CONCUR_READ_ONLY as the second and third parameters, then call setFetchSize(Integer.MIN_VALUE) on your PreparedStatement object.
There are similar instructions for doing this with other databases which I could iterate if needed.
EDIT: now we know you need instructions for PostgreSQL. Follow the instructions here: How to read all rows from huge table?
Related
I'm trying to do pagination in DB2. I wouldn't like to do it with subquery, but OFFSET is not working with TIMESTAMP_FORMAT.
Use of function TIMESTAMP_FORMAT in QSYS2 not valid. Data mapping error on member
I've found this question, but seems here the problem is with content of column and it's not my case, as values are alright and TIMESTAMP_FORMAT works without OFFSET.
I didn't look for some other way to not use TIMESTAMP_FORMAT, as I need to create pagination on queries written not by me, but by client.
The query looks like this.
SELECT DATE(TIMESTAMP_FORMAT(CHAR("tablename"."date"),'YYMMDD'))
FROM tableName
OFFSET 10 ROWS
I get
"[SQL0583] Use of function TIMESTAMP_FORMAT in QSYS2 not valid."
I'm not sure how OFFSET can relate to TIMESTAMP_FORMAT, but when I replace the select with select * it works fine.
I wonder why there is a conflict between OFFSET and TIMESTAMP_FORMAT and is there a way to bypass this without subquery.
From Listing of SQL Messages:
SQL0583
Function &1 in &2 cannot be invoked where specified because it is
defined to be not deterministic or contains an external action.
Functions that are not deterministic cannot be specified in a GROUP BY
clause or in a JOIN clause, or in the default clause for a global
variable.
Functions that are not deterministic or contain an external
action cannot be specified in a PARTITION BY clause or an ORDER BY
clause for an OLAP function and cannot be specified in the select list
of a query that contains an OFFSET clause.
The RAISE_ERROR function
cannot be specified in a GROUP BY or HAVING clause.
I don't know how to check these properties for the QSYS2.TIMESTAMP_FORMAT function (there is no its definition in the QSYS2.SYSROUTINES table), but it looks like improper definition of this function - there is no reason to create it as not deterministic or external action.
You can "deceive" DB2 like this:
CREATE FUNCTION MYSCHEMA.TIMESTAMP_FORMAT(str VARCHAR(4000), fmt VARCHAR(128))
RETURNS TIMESTAMP
DETERMINISTIC
CONTAINS SQL
NO EXTERNAL ACTION
RETURN QSYS2.TIMESTAMP_FORMAT(str, fmt);
SELECT
DATE(MYSCHEMA.TIMESTAMP_FORMAT(CHAR(tablename.date), 'YYMMDD'))
--DATE(QSYS2.TIMESTAMP_FORMAT(CHAR(tablename.date), 'YYMMDD'))
FROM table(values '190412') tableName(date)
OFFSET 10 ROWS;
And use this function instead. It works on my 7.3 at least.
It's a harmless deception, and you may ask IBM support to clarify such a "feature" of QSYS2.TIMESTAMP_FORMAT...
I suspect your problem is bad data...
The default for the IBM interactive tools, STRSQL and ACS Run SQL Scripts, is OPTIMIZE(*FIRSTIO) meaning get the first few rows back as quickly as possible...
With the OFFSET 10 clause you're probably accessing rows initially that you didn't before.
Try the following
create table mytest as (
SELECT DATE(TIMESTAMP_FORMAT(CHAR("tablename"."date"),'YYMMDD')) as mydate
FROM tableName
) with data
If that doesn't error, then yes you've found a bug, open a PMR.
Otherwise, you can see how far along the DB got by looking at the rows in the new table and track down the record with bad data.
I'm using SQL Link server for fetching data from MariaDB.
But I fetching issue with slowness when i used MariaDB from link server.
I used below scenarios to fetch result (also describe time taken by query)
Please suggest if you have any solutions.
Total number of row in patient table : 62520
SELECT count(1) FROM [MariaDB]...[webimslt.Patient] -- 2.6 second
SELECT * FROM OPENQUERY([MariaDB], 'select count(1) from webimslt.patient') -- 47ms
SELECT * FROM OPENQUERY([MariaDB], 'select * from webimslt.patient') -- 20 second
This isn’t really a fair comparison...
SELECT COUNT(1) is only returning a single number and will probably be using an index to count rows.
SELECT * is returning ALL data from the table.
Returning data is an expensive (slow) process, so it will obviously take time to return your data. Then there is the question of data transfer, are the servers connected using a high speed connection? That is also a factor in this. It will never be as fast to query over a linked server as it is to query your database directly.
How can you improve the speed? I would start by only returning the data you need by specifying the columns and adding a where clause. After that, you can probably use indexes in Maria to try to speed things up.
I want to execute select * from table1, select * from table2, select * from table3,...select * from table80....(Basically extract data from 80 different tables and send the data to 80 different indexes in Elasticsearch(Kibana).
Is it possible for me to give multiple select * statement in one Query Database Table and then route it to different indexes? If yes, what will be the flow like?
There are a couple approaches you can take to solve this issue.
If your tables are literally table1, table2, etc., you can simply generate 80 flowfiles, each with a unique integer value in an attribute (i.e. table_count) and use GenerateTableFetch and ExecuteSQL to create the queries using this attribute via Expression Language
If the table names are non-sequential (i.e. users, addresses, etc.), you can read from a file listing each on a line or use ListDatabaseTables to query the database for the names. You can then perform simple text processing to split the created flowfile(s) to one per table and continue as above
QueryDatabaseTable doesn't allow incoming connections so it is not possible.
But you can achieve same use case with the following flow
Flow:
1. ListDatabaseTables
2. RouteOnAttribute //*optional* filter only required tables
3. GenerateTableFetch //to generate pages of sql queries and store state
4. RemoteProcessGroup (or) Load balance connection
5. ExecuteSql //run more than one concurrent task if needed
6. further processing
7. PutElasticSearch.
In addition if you don't want to run the flow incrementally then remove GenerateTableFetch processor
Configure ExecuteSql processor select query as
select * from ${db.table.schema}.${db.table.name}
Some useful references:
GenerateTableFetch link1 link2
Incrementally run ExecuteSQL processor without using GenerateTableFetch link
I am trying to fasten up a SQL Server report regarding the IBM OS/400 operating system for my sales department.
A colleague of mine (which left the company) did this report and used a ton of sub selects.
The report usually takes about 30 min to process and often just fails to be displayed. I already tried to cut out some tables/rows in hopes of fastening up the process without success (all is needed by the sales department).
It works over all relevant data (orders, customers, articles, our order at the manufacturer, the manufacturer and so on). Any ideas?
I can't index it, due to the OS/400 system; guess it would be a new programming task for our contractor which leads to costs.
Can I use some clever joins? or somehow reduce the amount of subselects?
Are you using 4 part names in your query? That's probably your problem...
From SQL server...
-- Pull all rows from the table(s) back to MS SQL server and do the where locally on the MS SQL server
select * from LINKEDSVR.MYIBMI.MYLIB.MYTBL where locnbr = '00335';
-- Sends the statement to IBM i server for processing, only results are returned..
select * from openquery(LINKEDSVR, 'select * from MYTBL where locnbr = ''00335''');
Try running the subselects first, sending the output of each to its own table.
Update statistics on the tables. Then run the rest of the SQL, replacing what were originally subselects with the tables created in the first step.
Treat multiple layers of nesting the same way: each layer is its own insert into another table.
I've found that query optimizers have a hard time with complex SQL. Breaking-out the subqueries into separate steps often resolves this.
Between runs my preference is to leave the data intact as a reference in case debugging is needed, then truncate the tables as the first step of a run.
Responding to eraser's comments
Assuming your original query takes this general form:
select [columns] from
(-- subquery
select [columns] from TableA
) as Subquery
from TableB
where mainquery_where_clause
Re-write:
-- Create a table to handle results for your subquery:
Create Table A ;
-- Update the data distribution statistics:
update stats (TableA) ;
-- Now run the subquery:
insert into SubQTable select [columns] from TableA
-- Now run the re-written main query:
Select [columns]
from TableA, TableB
where TableA.joincol = TableB.joincol
and mainquery_where_clause ;
I noticed some syntax issues with the SQL you posted. Looks like something got left out. But the principle of my answer remains the same. Please note that applying my suggestion may not help, as there are potentially many variables to your scenario; you mentioned subqueries, so I chose to address that.
Halfer's suggestion is a great one: edit your original question, adding the SQL code, and putting it in the "{}" supplied by the text editing tool.
I strongly suggest that you obtain the SQL execution plan and post the results.
is there a way to actually query the database in a such a way to search for a particular value in every table across the whole database ?
Something like a file search in Eclipse, it searches accross the whole worspace and project ?
Sorry about that .. its MS SQL 2005
SQL Workbench/J has a built in tool and command to do that.
It's JDBC based and should also work with SQL Server.
You will need to use the LIKE operator, and search through each field separately. i.e.
SELECT * FROM <table name>
WHERE (<field name1> LIKE '%<search value>%') OR
(<field name2> LIKE '%<search value>%') OR
... etc.
This isn't a quick way though.
I think the best way would be to
1) programatically generate the query and run it
2) use a GUI tool for the SQL server you are using which provides this functionality.
In mysql you can use union operator like
(SELECT * from table A where name = 'abc') UNION (SELECT * from
table B where middlename = 'pqr')
and so on
use full text search for efficency
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Well, your best bet is to write a procedure to do this. But to give you some pointers you can use the INFORMATION_SCHEMA.Tables to get a list of all the tables in a given database and INFORMATION_SCHEMA.Columns to get a list of all columns. These tables also give you the datatype of columns. So you will need a few loops on these tables to do the magic.
It should be mentioned most RDBMSs nowadays support these schemas.
In phpmyadmin, go to your database, reach the search tab.
Here you will be able to select all of your tables and search through your entire db in one time.