I'm working with a database that has 2 tables for each company with 4 companies (8 total DBs for thisquery) and for reasons outside of my control that can't be changed. This is also an sqlite DB.
My app currently has to do 8 round trips to get all the data. I want to consolidate that down to one table view query but I can't figure out how to combined the data in a way that would make it work. Here is an example of the tables.
Table 1 (Type A)
name zone
ABCD ABC1
DBAA CBA1
Table 2 (Type A)
name zone
ABCD 1234
DBAA 4321
Table 1 (Type B)
zone weight rate
ABC1 1 0.50
CBA1 2 0.88
Table 2 (Type B)
zone weight rate
1234 1 0.52
4321 2 0.80
Finally I want the view to look like this:
name weight Table 1 rate Table 2 rate
CABA 1 0.52 0.50
AEAS 2 0.80 0.88
I tried this for my SQL statement:
SELECT 1A.name, 1B.weight, 1B.rate as A from 1A, 1B WHERE 1A.zone = 1B.zone
UNION ALL
SELECT 2A.name, 2B.weight, 2B.rate as B from 2A, 2B WHERE 2A.zone = 2B.zone
I have also tried a couple joins statements after reading unions must have matching column counts but I can't seem to hit the right query. Any ideas what I'm doing wrong or how I can achieve this with a query?
Any help is greatly appreciated!
Updated with Fiddle example here: http://sqlfiddle.com/#!5/37c19/3/0
Here is a query that will produce something similar to your example:
SELECT
ZonesOne.name
, RatesOne.weight
, RatesOne.rate as Table1Rate
, RatesTwo.Rate AS Table2Rate
FROM ZonesOne, RatesOne, RatesTwo
WHERE
RatesOne.zone = ZonesOne.zone
AND RatesOne.Weight = RatesTwo.weight
UNION ALL
SELECT
ZonesTwo.name
, RatesOne.weight
, RatesOne.rate as Table1Rate
, RatesTwo.Rate AS Table2Rate
FROM ZonesTwo, RatesOne, RatesTwo
WHERE
RatesOne.zone = ZonesTwo.zone
AND RatesOne.Weight = RatesTwo.weight
However, your Table 1 Rate and Table 2 Rate seem to be switched around. Also, your data from ZonesTwo has two entries for "DBAA".
Related
I have a table company_totals, that has the following schema -
column_name
column_data_type
company
STRING
link
STRING
full_count
FLOAT
starts_with_count
FLOAT
Number of rows = 12,000,000. Table size = 1.6 GB. CLUSTERED BY = company link. SEARCH INDEX created on column = link.
I have the following select statement which is taking beyond 6 hours and the execution results in timeout - Operation timed out after 6.0 hours. Consider reducing the amount of work performed by your operation so that it can complete within this limit.)
SELECT first_table.company, first_table.link, null as full_count, SUM(second_table.full_count) AS starts_with_count
FROM company_totals first_table, company_totals second_table
WHERE STARTS_WITH(second_table.link, first_table.link)
group by first_table.company, first_table.link
The above query calculates values of the column starts_with_count which is the sum of values of another column full_count, based on a starts_with() condition. In the company_totals table, the column starts_with_count is what I want to fill. I have added the expected values for this column manually to show my expectation. Other column values are already present in the table. The starts_with_count value is sum (full_count) where its link appears in other rows.
company
link
full_count
starts_with_count (expected)
abc
http://www.abc.net1
1
15 (= sum (full_count) where link like 'http://www.abc.net1%')
abc
http://www.abc.net1/page1
2
9 (= sum (full_count) where link like 'http://www.abc.net1/page1%')
abc
http://www.abc.net1/page1/folder1
3
3 (= sum (full_count) where link like 'http://www.abc.net1/page1/folder1%')
abc
http://www.abc.net1/page1/folder2
4
4
abc
http://www.abc.net1/page2
5
5
xyz
http://www.xyz.net1/
6
21
xyz
http://www.xyz.net1/page1/
7
15
xyz
http://www.xyz.net1/page1/file1
8
8
Highly appreciate any help in this issue.
So i have a table that has a set of information like this
name Type PRICE
11111 XX 0.001
22222 YY 0.002
33333 ZZ 0.0001
11111 YY 0.021
11111 ZZ 0.0111
77777 YY 0.1
77777 ZZ 1.2
Now these numbers go on for about a million rows and there could be upwards of 20 of the same 'name' mapping to 20 different TYPE. But there will only be 1 unique type per name. What I mean by this is that 11111 could have XX,YY,ZZ on it but it cannot have YY,ZZ,YY on it.
What I need is to get the lowest 3 prices and what TYPE they are per name.
Right now I can get the lowest price per name by doing:
select name, type, min(price) from table group by name;
However that is just for the lowest price but I need the lowest 3 prices. I've been trying for a couple days and I cant seem to get it. All help is appreciated.
Also, please let me know if I forgot any information, i'm still trying to figure out stack overflow :P
Oh and the database is a noSQL that uses SQL syntax.
edit: I can't seem to get the format down for my example data from my table to show correctly
If your database supports window functions, and allowing for the possibility that there may be more than three rows in your data with any of the three lowest prices, this should do it:
select the_table.*
from
the_table
inner join (
select name, price
from (
select name, price, row_number() over(partition by name order by price) as rn
from the_table) as x
where rn < 4
) as y on y.name=the_table.name and y.price=the_table.price;
I am currently working with a Hive Table which contains transactions data and I need to do some basic statistics on these data, and put the results in a new table.
EDIT: I'm using Hive 0.13 on Hadoop 2.4.1.
CONTEXT
First, let me try to present the input table: here's a table with 3 columns, an ID, a date (month/year), and an amount:
<ID> <Date> <Amount>
1 11.2014 5.00
2 11.2014 10.00
3 12.2014 15.00
1 12.2014 7.00
1 12.2014 15.00
2 01.2015 20.00
3 01.2015 30.00
3 01.2015 45.00
... ... ...
And the desired output consist of a table grouped by IDs, where in each line I sum the the amounts, for each corresponding months:
<ID> <11.2014> <12.2014> <01.2015> <...>
1 5.00 22.00 0.00 ...
2 10.00 0.00 20.00 ...
3 15.00 0.00 75.00 ...
... ... ... ... ...
Considering that the original table has >4 million IDs and > 500 million lines, on more then 2 years. It seems pretty hard to hardcode the table by hand since I don't know how many columns I should create.
(I know how many different dates I have, but if the original table grows over 5, 10, 15 years, there is going to be a lot to do by hand and that's risky.)
THE CHALLENGE
I know how to do some basic manipulations and GROUP BYs, I can even do some CASE WHEN, but the tricky part in my problem is that I can not create columns like this (as mentionned above)...
SUM (CASE WHEN Date = 11.2014 THEN Amount ELSE 0 END) AS 11.2014
SUM (CASE WHEN Date = 12.2014 THEN Amount ELSE 0 END) AS 12.2014
SUM (CASE WHEN Date = 01.2015 THEN Amount ELSE 0 END) AS 01.2015
SUM (CASE WHEN Date = ??? THEN Amount ELSE 0 END) AS ???
... because I don't know how many different dates I'll eventually have, so I would need something like this:
SUM (CASE WHEN Date = [loop over each dates] THEN Amount ELSE 0 END)
AS [the date selected in the loop]
THE QUESTION
Do you have something to propose in order to :
How can I loop over all the dates ?
And be able to create a colum for every dates I have without specifying myself the name of the soon to be created column ?
Is it doable in a single HiveQL script ? (not obligated but could be really nice)
I would like to avoid UDF but at this point I'm not sure it's preventable since I haven't find any case that ressemble mine.
Thanks in advance and don't hesitate to ask for more info.
This is too long for a comment.
You cannot do exactly what you want in Hive, because a SQL query has to have a fixed number of columns when it is defined.
What can you do?
The easiest thing is simply to change what you want. Product multiple rows instead of multiple columns:
select id, date, sum(amount)
from table t
group by id, date;
You can then load the data into your favorite spreadsheet and pivot it there.
Other alternatives. You can write a query that will write the appropriate query. This would go through the table, identify the possible dates, and construct a SQL statement. You can then run the SQL statement.
Or, you could use some other data types, such as a list or JSON to store the aggregated values in one row.
I need to get data from different columns depending on a set of rules and I don't see how to do it. Let me illustrate this with an example. I have a table:
ID ELEM_01 ELEM_02 ELEM_03
---------------------------------
1 0.12 0 100
2 0.14 5 200
3 0.16 10 300
4 0.18 15 400
5 0.20 20 500
And I have a set of rules which look something like this:
P1Z: ID=2 and ELEM_01
P2Z: ID=4 and ELEM_03
P3Z: ID=4 and ELEM_02
P4Z: ID=3 and ELEM_03
I'm trying to output the following:
P1Z P2Z P3Z P4Z
------------------------
0.14 400 15 300
I'm used to much simpler queries and this is a bit above my level. I'm getting mixed up by this problem and I don't see a straightforward solution. Any pointers would be appreciated.
EDIT Logic behind the rules: the table contains data about different aspects of a piece of equipment. Each combination of ID/ELEM_** represents the value of one aspect of the piece of equipment. The table contains all values of all aspects, but we want a row containing data on only a specific subset of aspects, so that we can output in a single table the values of a specific subset of aspects for all pieces of equipment.
Assuming that each column is numeric and ID is unique you could do:
SELECT
SUM(CASE WHEN ID = 2 THEN ELEM_01 END) AS P1Z,
SUM(CASE WHEN ID = 4 THEN ELEM_03 END) AS P2Z,
SUM(CASE WHEN ID = 4 THEN ELEM_02 END) AS P3Z,
SUM(CASE WHEN ID = 3 THEN ELEM_03 END) AS P4Z
...
Let us have simple table:
CREATE TABLE dbo.test
(
c1 INT
)
INSERT INTO test (c1) VALUES (1)
INSERT INTO test (c1) VALUES (2)
INSERT INTO test (c1) VALUES (3)
Next calculate some SUM:
SELECT SUM(t1.c1) FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
Output is: 6 . Simple and easy.
But if I run:
SELECT SUM(t1.c1), * FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
The output is:
6 2 2
6 2 3
6 2 1
6 3 2
6 3 3
6 3 1
6 1 2
6 1 3
6 1 1
My question is: Why the second output is not matching the condition in WHERE clause?
Looks like Sybase implements it's own extensions to GROUP BY:
Through the following extensions, Sybase lifts restrictions on what
you can include or omit in the select list of a query that includes
group by.
The columns in the select list are not limited to the grouping columns
and columns used with the vector aggregates.
The columns specified by group by are not limited to those
non-aggregate columns in the select list.
However, the results of the extension are not always intuitive:
When you use the Transact-SQL extensions in complex queries that
include the where clause or joins, the results may become even more
difficult to understand.
How does this relate to your problem?
However, the way that Adaptive Server handles extra columns in the
select list and the where clause may seem contradictory. For example:
select type, advance, avg(price)
from titles
where advance > 5000
group by type
type advance
------------- --------- --------
business 5,000.00 2.99
business 5,000.00 2.99
business 10,125.00 2.99
business 5,000.00 2.99
mod_cook 0.00 2.99
mod_cook 15,000.00 2.99
popular_comp 7,000.00 21.48
popular_comp 8,000.00 21.48
popular_comp NULL 21.48
psychology 7,000.00 14.30
psychology 2,275.00 14.30
psychology 6,000.00 14.30
psychology 2,000.00 14.30
psychology 4,000.00 14.30
trad_cook 7,000.00 17.97
trad_cook 4,000.00 17.97
trad_cook 8,000.00 17.97
(17 rows affected)
It only seems as if the query is ignoring the where clause when you
look at the results for the advance (extended) column. Adaptive Server
still computes the vector aggregate using only those rows that satisfy
the where clause, but it also displays all rows for any extended
columns that you include in the select list. To further restrict these
rows from the results, you must use a having clause.
So, to give you the results you would expect, Sybase should allow you to do:
SELECT SUM(t1.c1), * FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
HAVING t2.c1 = 1
The WHERE will exclude the results from the total SUM; the HAVING will hide records that don't match the condition.
Confusing, isn't it?
Instead, you'd probably be better off writing the query so that it doesn't require Sybase's GROUP BY extensions.