Creating an index in my table does not lower my cost - sql

I have this table:
accident_info
(
accident_index varchar(20),
first_road_class varchar(20),
accident_severity varchar(20),
date date,
urban_or_rural_area varchar(20),
weather_conditions varchar(40),
year int,
inscotland varchar(20)
);
And against this table, I execute the following query :
select count(accident_index)as hits, first_road_class
from accident_info
group by first_road_class;
without index.
I would like to create an index to lower my Aggregate Cost but the one I've made so far doesn't seem to work. This is:
create index on accident_info(accident_index, first_road_class);
First ten Rows of my table

For this query:
select count(accident_index) as hits, first_road_class
from accident_info
group by first_road_class;
You can try an index on accident_info(first_road_class, accident_index). The order of the columns is important.

Related

Invalid table name error when creating a table

Very new to SQL, but I thought that I had at least mastered how to make tables. I am trying to create the following table and get the error 'ORA-00903: invalid table name'. I'm not sure what is wrong.
Create table order (
order_id int,
item_type varchar(50),
item_name varchar(50),
item_price decimal(10,2),
primary key(order_id)
);
I am testing this on Oralce Live SQL and it is ok as well as on my Oracle 12c Database EE, all you need to add are "". But even so, I would not recommend it to use reserved words for naming tables.
Create table "order" (
order_id int,
item_type varchar(50),
item_name varchar(50),
item_price decimal(10,2),
primary key(order_id)
);
insert into "order" values (1, 'Item', 'Name', '20.2');
select * from "order";

Handle >10 million rows for one table (postgresql)

I imported 11 Million location names from geonames.org into my postgresql. However when I try to just view the data for instance in TablePlus it is extremely slow. Executing a simple select for one row, takes like 2 minutes. What can I do with large data, so that it won't be too slow and I can select it very fast?
I think I don't have any indexes, would that make a difference?
This is my table:
create table geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames text,
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(120),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);
You need to specify what the query looks like.
Indexes would definitely make a difference. But the type of index depends on the query you are using and the columns used for selecting one or more rows.
The place to start is by defining a primary key on the table. Presumably, geonameid is the primary key. You can do this:
alter table geonames add constraint pk_geonames_geonameid primary key (geonameid);
You should really do this when you create the table, but better late than never.
If you are searching by geonameid, then you will notice a significant speed-up.
If you want to search by other columns, such as name or asciiname, then add indexes for those:
create index idx_geonames_name on geonames(name);
create index idx_geonames_asciiname on geonames(aciiname);
This doesn't work for all searches. If your criteria is like with wildcards, you may need a different indexing strategy. Similarly, if it is by latitude and longitude, you'll want a GIS index.

How can I Insert several columns from one table to another having only 1 column unique/distinct?

I am trying to create a star schema and am currently working on the dimension tables. I want to copy several columns from one table to another but at the same time I want to make the result values unique by 1 of the columns.
These are the tables I am using:
DWH_PRICE_PAID_RECORDS
CREATE TABLE "DWH_PRICE_PAID_RECORDS" ("TRANSACTION_ID" VARCHAR(50) NOT NULL, "PRICE" INTEGER, "DATE_OF_TRANSFER" DATE NOT NULL, "PROPERTY_TYPE" CHAR(1), "OLD_NEW" CHAR(1), "DURATION" CHAR(1), "TOWN_CITY" VARCHAR(50), "DISTRICT" VARCHAR(50), "COUNTY" VARCHAR(50), "PPDCATEGORY_TYPE" CHAR(1), "RECORD_TYPE" CHAR(1));
ALTER TABLE "DWH_PRICE_PAID_RECORDS" ADD CONSTRAINT "PK3" PRIMARY KEY ("TRANSACTION_ID");
and DIM_REGION
CREATE TABLE "DIM_REGION" ("REGION_ID" INTEGER generated always as identity (start with 1 increment by 1), "TRANSACTION_ID" VARCHAR(50), "TOWN" VARCHAR(50), "COUNTY" VARCHAR(50), "DISTRICT" VARCHAR(50), "LATITUDE" VARCHAR(50), "LONGITUDE" VARCHAR(50), "COUNTRY_STRING" VARCHAR(50));
ALTER TABLE "DIM_REGION" ADD CONSTRAINT "PK8" PRIMARY KEY ("REGION_ID");
My first attempt was to use "select distinct" but that only removes all duplicates of ALL columns combined. I want to have a region dimension and the "town" should be the identifier to match DIM_REGION with the fact table on the data mart that I will create later (called DM_PRICE_PAID_RECORDS).
The DWH_PRICE_PAID_RECORDS table has around 10k records but only 938 unique towns. I want to have those 938 towns in the dim_region as ID along with other columns such as county, district etc.
This is what works but then of course everything else is NULL but town:
INSERT INTO DIM_REGION (TOWN) SELECT (town_city) from DWH_PRICE_PAID_RECORDS GROUP BY town_city;
So I thought I only have to add additional columns
INSERT INTO DIM_REGION (TOWN, County, District) SELECT town_city, county, district from DWH_PRICE_PAID_RECORDS GROUP BY town_city;
but when I do that I get this error message (the error message is german and I had to translate, sorry):
ERROR 42Y36 Column reference: "DWH_PRICE_PAID_RECORDS.COUNTY" is invalid or part of a invalid statement. When using SELECT and GROUP BY the selected columns and statements must be valid group- or aggregation expressions.
Can you help me or do you have another idea how else I could get the result I seek?
Thank you very much!
If the other 2 columns don't matter, you can do this:
INSERT INTO DIM_REGION (TOWN, County, District)
SELECT town_city, MAX(county), MAX(district)
FROM DWH_PRICE_PAID_RECORDS
GROUP BY town_city
This will get you only 1 row for each town.
You are so close!
INSERT INTO DIM_REGION (TOWN, County, District) SELECT town_city, county, district from DWH_PRICE_PAID_RECORDS GROUP BY town_city, county, district;
That should do the job. When using a group by, everything in the SELECT list that isn't an aggregate has to appear in the GROUP BY clause.
As an aside, does TRANSACTION_ID really belong in the dimension table?

Order of rows changes when creating table

I've a weird issue.
DROP table IF EXISTS ipl;
CREATE TABLE ipl(
match_id VARCHAR(50),
batting VARCHAR(50),
bowling VARCHAR(50),
overn VARCHAR(50),
batsman VARCHAR(50),
bowler VARCHAR(50),
super_over VARCHAR(50),
bat_runs VARCHAR(50),
extra_runs VARCHAR(50),
total_runs VARCHAR(50),
player_out VARCHAR(50),
how VARCHAR(50),
fielder VARCHAR(50));
BULK INSERT ipl
FROM 'F:\Study\Semesters\4th
Sem\COL362\HomeWork\1\Dataset\deliveries.csv'
WITH(FIELDTERMINATOR= ',');
SELECT * FROM ipl;
This is the code I'm using to make the table in SSMS. match_id goes from 1 to about 290, in increasing order in the csv file. When I executed this query once, everything was ok. But, when I did that again, some rows from the middle were moved to the last.
You can see that below:
(Note that jump from 4 to 49)
I don't know what's wrong. Please help me resolve this issue. Thanks!
SQL tables represent unordered sets. If you want rows in a particular order, you need an order by. How can you do this with a bulk insert? Well, you need an identity column. The idea is to create the table with an identity and use a view for the bulk insert:
create table ipl (
ipl_id int identity(1, 1) primary key,
. . .
);
create view vw_ipl as
select match_id, batting, bowling, . . .
from ipl
bulk insert vw_ipl
from 'F:\Study\Semesters\4th Sem\COL362\HomeWork\1\Dataset\deliveries.csv'
with (fieldterminator= ',' );
select *
from ipl
order by ipl_id;
As a relational database SQL server do not guarantee a particular order of returned data. If you need ordered data, specify order by clause.

SQL database query display extra dates

I am making a database with postgresql 9.1
Given tables:
CREATE TABLE rooms(
room_number int,
property_id int,
type character varying,
PRIMARY KEY (room_number, property_id)
);
Insert into rooms values (1,1,double),(2,1,double),(3,1,triple)
CREATE TABLE reservations(
reservation_ID int,
property_id int,
arrival date,
departure date,
room_num int,
PRIMARY KEY(reservation_ID,property_id)
FOREIGN KEY (room_number, property_id)
);
INSERT INTO orders VALUES (1,1,2013-9-27,2013-9-30,1),
(2,1,2013-9-27,2013-9-28,2),
(3,1,2013-9-29,2013-9-30,3);
I want to give 2 dates and check availability in between. So at the 1st column should apear:
all the dates between the given and
additional one column for every type of the room displaying the availability.
So my result, given 2013-9-27 & 2013-9-30 as input, must be sth like this:
I think the best solution would be use both generate_series() and crosstab() to create a dynamic table. Moreover you can use a left join from a CTE to your data tables so you get better information. Something like:
WITH daterange as (
SELECT s::date as day FROM generate_series(?, ?, '1 day')
)
SELECT dr.day, sum(case when r.type = 'double' then r.qty else 0) as room_double,
sum(case when r.type = 'triple' then r.qty else 0) as room_triple....
);
But note that crosstab would make the second query a little easier.