Handle >10 million rows for one table (postgresql) - sql

I imported 11 Million location names from geonames.org into my postgresql. However when I try to just view the data for instance in TablePlus it is extremely slow. Executing a simple select for one row, takes like 2 minutes. What can I do with large data, so that it won't be too slow and I can select it very fast?
I think I don't have any indexes, would that make a difference?
This is my table:
create table geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames text,
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(120),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);

You need to specify what the query looks like.
Indexes would definitely make a difference. But the type of index depends on the query you are using and the columns used for selecting one or more rows.
The place to start is by defining a primary key on the table. Presumably, geonameid is the primary key. You can do this:
alter table geonames add constraint pk_geonames_geonameid primary key (geonameid);
You should really do this when you create the table, but better late than never.
If you are searching by geonameid, then you will notice a significant speed-up.
If you want to search by other columns, such as name or asciiname, then add indexes for those:
create index idx_geonames_name on geonames(name);
create index idx_geonames_asciiname on geonames(aciiname);
This doesn't work for all searches. If your criteria is like with wildcards, you may need a different indexing strategy. Similarly, if it is by latitude and longitude, you'll want a GIS index.

Related

Creating an index in my table does not lower my cost

I have this table:
accident_info
(
accident_index varchar(20),
first_road_class varchar(20),
accident_severity varchar(20),
date date,
urban_or_rural_area varchar(20),
weather_conditions varchar(40),
year int,
inscotland varchar(20)
);
And against this table, I execute the following query :
select count(accident_index)as hits, first_road_class
from accident_info
group by first_road_class;
without index.
I would like to create an index to lower my Aggregate Cost but the one I've made so far doesn't seem to work. This is:
create index on accident_info(accident_index, first_road_class);
First ten Rows of my table
For this query:
select count(accident_index) as hits, first_road_class
from accident_info
group by first_road_class;
You can try an index on accident_info(first_road_class, accident_index). The order of the columns is important.

Creating table when each object may have a list of values

First I've created a table with information on stores and transactions with the following query:
CREATE TABLE main.store_transactions
(
store_id varchar(100) NOT NULL,
store_name varchar(100),
store_transaction_id varchar(100),
transaction_name varchar(100),
transaction_date timestamp,
transaction_info varchar(200),
primary_key(store_id)
)
But then I realized that the same store may have various transactions related to it, not just one. How should I implement table creation in this case?
One thing that comes to mind is to create a separate table with transactions, each transaction having store_id as a foreign key. And then just join when needed.
How is it possible to implement it in a single table?
Well, the most elegant way would be indeed to create a satelite table for your stores and reference it to the store_transactions table, e.g:
CREATE TABLE stores
(
store_id varchar(100) NOT NULL PRIMARY KEY,
store_name varchar(100)
);
CREATE TABLE store_transactions
(
store_id varchar(100) NOT NULL REFERENCES stores(store_id),
store_transaction_id varchar(100),
transaction_name varchar(100),
transaction_date timestamp,
transaction_info varchar(200)
);
With this structure you will have many transactions to a single store.
There are other less appealing options, such as customizing a data type for stores and creating an array of it in the table store_transactions. But regarding the costly maintainability of such approach, I would definitely discourage it.

How to model 'toggle' table in sql

I have a table with some columns
CREATE TABLE test (
testid INT,
field1 CHAR(10),
field2 VARCHAR(50),
field3 DATETIME,
field4 MEDIUMINT
[...]
);
Now I want to be able to have a setting in my app that will allow me to to either enable or disable some of those for particular users.
CREATE TABLE user (
userid INT
);
I was thinking about:
CREATE TABLE user_test_visible (
userid INT,
field1 BOOL,
field2 BOOL,
field3 BOOL,
field4 BOOL
[...]
);
Also I was thinking about something like this :
CREATE TABLE user_test_visible (
userid INT,
field_name VARCHAR(30),
visible BOOL);
Are any of those approaches sensible?
I would suggest do something like this maybe.
CREATE TABLE test
(
fieldId INT,
field CHAR(10)
)
To have one table that contains the fields. Then if you need to add one more (change of requirements) you do not have to add a new column.
The I would skip the boolean and go with one table that has a shared primary key. Like this:
CREATE TABLE user_test_visible (
userid INT,
fieldId INT
);
The reason why I would suggest skipping the boolean is that if there is no row do show the field. That depends on what your start value is. If you want the users to see all field from the begining then you might consider having the table like this:
CREATE TABLE user_test_not_visible (
userid INT,
fieldId INT
);
Then where there is a row in this table then do not show the filed.
Edit
When use insert the field you must have some pre deployment script right? There you can also specify which columns that are visible and which is not. If you have different data types then ether have the layout like you have or you can just a sql_variant. But beaver that this type of column is not supported in for example linq-to-sql as a primary key.
That is just my idés. Hope it helps
Perhaps a more flexible approach would be to define "roles" within your application. A user would be associated with one or more roles, and each role would be associated with a set of columns. The union of those column sets would be what a user can see. This approach will require more effort to work out what columns a user can see, but it would make user management easier in the long term. It also separates user privileges from what that means in terms of database access.

tuning search on combination of float and string

I have this schema
RESTAURANT
(id int not null,
name varchar(50),
place varchar(100),
distance float,
a varchar(50),
b varchar(50),
c varchar(50),
d varchar(50),
PRIMARY KEY (id))
and I'm tuning a search function for this table.
a,b,c,d are different field used in the research, but what I need to focus on is place and the distance because most of the query are actually performed on the combination of this two field
I'm using db2, and I'm not really skilled, suggestion where to start from?
What you need is to use indexes. You can execute the the Desing Advisor to see what DB2 proposes about indexes:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0005144.html
For more information about indexes, you can take a look at:
http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2010_Issue4/DataArchitect/index.html

SQL - Create table in SQL

Please guide me if I'm on right track.
I'm trying to create database schema for Mobile Bill for a person X and how to define PK, FK for the table Bill_Detail_Lines.
Here are the assumptions:
Every customer will have a unique relationship number.
Bill_no will be unique as it is generated every month.
X can call to the same mobile no every month.
Account_no is associated with every mobile no and it doesn't change.
Schema:
table: Bill_Headers
Relationship_no - int, NOT NULL , PK
Bill_no - int, NOT NULL , PK
Bill_date - varchar(255), NOT NULL
Bill_charges - int, NOT NULL
table: Bill_Detail_Lines
Account_no - int, NOT NULL
Bill_no - int, NOT NULL , FK
Relationship_no - int, NOT NULL, FK
Phone_no - int, NOT NULL
Total_charges - int
table: Customers
Relationship_no - int, NOT NULL, PK
Customer_name - varchar(255)
Address_line_1 - varchar(255)
Address_line_2 - varchar(255)
Address_line_3 - varchar(255)
City - varchar(255)
State - varchar(255)
Country - varchar(255)
I would recommend having a primary key for Bill_Detail_Lines. If each line represents a total of all calls made to a given number, then the natural PK seems to be (Relationship_no, Bill_no, Phone_no), or maybe (Relationship_no, Bill_no, Account_no).
If each line instead represents a single call, then I would probably add a Line_no column and make the PK (Relationship_no, Bill_no, Line_no).
Yes, as for me, everything looks good.
I have to disagree, there's a couple of 'standards' which aren't being followed. Yes the design looks ok, but the naming convention isn't appropriate.
Firstly, table names should be singular (many people will disagree with this).
If you have a single int, PK on a table, the standard is to call it 'ID', thus you have "SELECT Customer.ID FROM Customer" - for instance. You also then fully qualify the FK columns, for instance: CustomerID on Bill_Headers instead of Relationship_no which you then have to check in the table definition to remember what it's related to.
Something I also always keep in mind, is to make the column header as clear and short as possible without obfuscating the name. For instance, "Bill_charges" on Bill_Headers could just be "Charges", as you're already on the Bill_Header(s) (<- damn that 's'), same goes for Date, but date could be a bit more descriptive, CreatedDate, LastUpdatedDate, etc...
Lastly, beware of hard-coding multiple columns where one would suffice, same other way around. Specifically I'm talking about:
Address_line_1 - varchar(255)
Address_line_2 - varchar(255)
Address_line_3 - varchar(255)
This will lead to headaches later. SQL does have the capability to store new line characters in a string, thus combining them to one "Address - varchar(8000)" would be easiest. Ideally this would be in a separate table, call it Customer_Address with int "CustomerID - int PK FK" column where you can enter specific information.
Remember, these are just suggestions as there's no single way of database design that everyone SHOULD follow. These are best practices, at the end of the day it's your decision to make.
There are a few mistakes:
Realtionship_no and Bill_no are int. Make sure that the entries are within the range of integer. It is better to take them as varchar() or char()
Bill_date should be in data type Date
In table Bill_Detail_Lines also, it is better to have Account_no as varchar() or char() because of the long account no. And the same goes with Phone_no.
Your Customers table is all fine except that you have taken varchar() size as 255 for City State and Country which is too large. You can work with smaller size also.