I have table 'weatherdata' with 3 fields.
CREATE TABLE weatherdata( value` string, snapshort_time timestamp)
PARTITIONED BY ( country string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/expl.db/weatherdata'
When I write command given below on this table it successfully shows the 'Snapshort_time' field in the end in Output :
hive (expl)> select * from weatherdata;
OK
{"location: {"name":"Beijing","region":"Beijing","country":"China","lat":39.93,"lon":116.39,"tz_id":"Asia/Shanghai","localtime_epoch":1486857803,"localtime":"2017-02-12 0:03"},"current":{"last_updated_epoch":1486857803,"last_updated":"2017-02-12 00:03","temp_c":-3.0,"temp_f":26.6,"is_day":0,"condition":{"text":"Clear","icon":"//cdn.apixu.com/weather/64x64/night/113.png","code":1000},"wind_mph":0.0,"wind_kph":0.0,"wind_degree":0,"wind_dir":"N","pressure_mb":1028.0,"pressure_in":30.8,"precip_mm":0.0,"precip_in":0.0,"humidity":39,"cloud":0,"feelslike_c":-3.0,"feelslike_f":26.6}} NULL 2017-02-11 08:22:36
The 'null' shown in Output value is for 'country' field.
But when I given the Select below the 'Snapshort_time' shows 'null'.
select get_json_object(value, '$.location.name') AS name,
get_json_object(value,'$.location.region') AS region,
get_json_object(value, '$.location.country') AS country,
get_json_object(value, '$.current.condition.text') AS text,
get_json_object(value, '$.current.feelslike_c') AS feelslike_c,
Snapshort_time
from weatherdata;
Here is the Output:
OK
Beijing Beijing China Clear -3.0 NULL
Dubai Dubai United Arab Emirates Clear 24.6 NULL
London City of London, Greater London United Kingdom Mist -1.3 NULL
Moscow Moscow City Russia Clear -5.7 NULL
Paris Ile-de-France France Patchy light snow -1.2 NULL
Sydney New South Wales Australia Partly cloudy 26.3 NULL
Tokyo Tōkyō Japan Partly cloudy -1.3 NULL
Toronto Ontario Canada Overcast -4.0 NULL
Washington District of Columbia United States of America Partly cloudy 5.9 NULL
Time taken: 0.339 seconds, Fetched: 9 row(s)
What must be the reason?
Related
I have only one City Table :
ID
Name
Country Code
District
Population
6
Rotterdam
NLD
Zuid-Holland
593321
3878
Scottsdale
USA
Arizona
202705
3965
Corona
USA
California
124966
3973
Concord
USA
California
121780
3977
Cedar Rapids
USA
Iowa
120758
3982
Coral Springs
USA
Florida
117549
1613
Neyagawa
JPN
Osaka
257315
1630
Ageo
JPN
Saitama
209442
The Expected Result is :
countrycode
avg(population)
JPN
xxxxxx
NLD
xxxxxxx
USA
xxxxxxx
I have used the shared code but was not getting the expected answer:
select avg(population)
from city
where countrycode='JPN' and 'USA' and 'NLD'
group by district;
The above code gives me a blank result " avg(population)" - blank.
I am using SQL workbench
Try:
select countrycode, avg(population) as avg_population
from city
where countrycode in('JPN', 'USA', 'NLD')
group by countrycode;
This should do the job :
select countrycode, avg(population)
from city
where countrycode in('JPN','USA','NLD')
group by countrycode;
Right now I have:
Scorecard
team1
team2
Winner
Margin
Ground
Match Date
Year
ODI # 1
Australia
England
Australia
5 wickets
Melbourne
5-Jan-71
1971
ODI # 2
England
Australia
England
6 wickets
Manchester
24-Aug-72
1972
ODI # 3
England
Australia
Australia
5 wickets
Lord's
26-Aug-72
1972
ODI # 4
England
Australia
England
2 wickets
Birmingham
28-Aug-72
1972
ODI # 5
New Zealand
Pakistan
New Zealand
22 runs
Christchurch
11-Feb-73
1973
And what I want to is combine team1 and team2 and then get distant list
Example based on what I have above:
teams
Australia
England
New Zealand
Pakistan
I am using Cloudera Hive- I was trying to get a union to work.
I also tried:
SELECT concat_ws('^',(SPLIT('${team1,team2}',',')));
However, the output is just giving me:
${team1^team2}
easiet way would be to use union:
select team1 as teams from tablename
union distinct
select team2 from tablename
Here is another ways using sub query :
Select distinct teams from (
select team1 as teams from tablename
union
select team2 from tablename
) t
I am inserting the result of a SELECT statement from a relational table into another table using Pentaho, is it possible to add a UUID4 identifier to each row and then insert.
Data before insertion :
ip country city start_time
1.7411624393E10 Canada London 2017-06-01 15:27:23
1.7411221531E10 Canada Ottawa 2017-06-02 23:57:56
1.846525287E9 Canada Langley 2017-06-02 22:27:29
2.0647254234E10 Canada Toronto 2017-06-02 22:22:49
2.0647254234E10 Canada Toronto 2017-06-02 22:22:12
2.0647254234E10 Canada Toronto 2017-06-02 22:21:20
Needed as :
UUID ip country city start_time
ID1 1.7411624393E10 Canada London 2017-06-01 15:27:23
ID2 1.7411221531E10 Canada Ottawa 2017-06-02 23:57:56
ID3 1.846525287E9 Canada Langley 2017-06-02 22:27:29
ID4 2.0647254234E10 Canada Toronto 2017-06-02 22:22:49
ID5 2.0647254234E10 Canada Toronto 2017-06-02 22:22:12
ID6 2.0647254234E10 Canada Toronto 2017-06-02 22:21:20
I am able to generate one UUID4 ID using random generator for all the records, but I need to generate ofcourse separate UUIDs for all the rows.
You can use "Generate random value" step to create a column with a type "Universally Unique Identifier type 4(UUID4)".
Long time reader, first time poster.
I'm trying to consolidate a table I have to the rate of sold goods getting lost in transit. In this table, we have four kinds of products, three countries of origin, three transit countries (where the goods are first shipped to before being passed to customers) and three destination countries. The table is as follows.
Status Product Count Origin Transit Destination
--------------------------------------------------------------------
Delivered Shoes 100 Germany France USA
Delivered Books 50 Germany France USA
Delivered Jackets 75 Germany France USA
Delivered DVDS 30 Germany France USA
Not Delivered Shoes 7 Germany France USA
Not Delivered Books 3 Germany France USA
Not Delivered Jackets 5 Germany France USA
Not Delivered DVDS 1 Germany France USA
Delivered Shoes 300 Poland Netherlands Canada
Delivered Books 80 Poland Netherlands Canada
Delivered Jackets 25 Poland Netherlands Canada
Delivered DVDS 90 Poland Netherlands Canada
Not Delivered Shoes 17 Poland Netherlands Canada
Not Delivered Books 13 Poland Netherlands Canada
Not Delivered Jackets 1 Poland Netherlands Canada
Delivered Shoes 250 Spain Ireland UK
Delivered Books 20 Spain Ireland UK
Delivered Jackets 150 Spain Ireland UK
Delivered DVDS 60 Spain Ireland UK
Not Delivered Shoes 19 Spain Ireland UK
Not Delivered Books 8 Spain Ireland UK
Not Delivered Jackets 8 Spain Ireland UK
Not Delivered DVDS 10 Spain Ireland UK
I would like to create a new table that shows the count of goods delivered and not delivered in one row, like this.
Product Delivered Not_Delivered Origin Transit Destination
Shoes 100 7 Germany France USA
Books 50 3 Germany France USA
Jackets 75 5 Germany France USA
DVDS 30 1 Germany France USA
Shoes 300 17 Poland Netherlands Canada
Books 80 13 Poland Netherlands Canada
Jackets 25 1 Poland Netherlands Canada
DVDS 90 0 Poland Netherlands Canada
Shoes 250 19 Spain Ireland UK
Books 20 8 Spain Ireland UK
Jackets 150 8 Spain Ireland UK
DVDS 60 10 Spain Ireland UK
I've had a look at some other posts and so far I haven't found exactly what I'm looking for. Perhaps the issue here is that there will be multiple WHERE statements in the code to ensure that I don't group all shoes together, ore all country groups.
Is this possible with SQL?
Something like this?
select
product
,sum(case when status = 'Delivered' then count else 0 end) as delivered
,sum(case when status = 'Not Delivered' then count else 0 end) as not_delivered
,origin
,transit
,destination
from table
group by
product
,origin
,transit
,destination
This is rather easy; instead of one line per Product, Origin, Transit, Destination and Status, you want one result line per Product, Origin, Transit and Destination only. So group by these four columns and aggregate conditionally:
select
product, origin, transit, destination,
sum(case when status = 'Delivered' then "count" else 0 end) as delivered,
sum(case when status = 'Not Delivered' then "count" else 0 end) as not_delivered
from mytable
group by product, origin, transit, destination;
BTW: It is not a good idea to use a keyword for a column name. I used double quotes to use your column count, which is standard SQL, but I don't know if it works in Google BigQuery. Maybe it must be "Count" rather than "count" or something entirely else.)
SELECT
product, origin, transit, destination,
SUM([count] * (status = 'Delivered')) AS delivered,
SUM([count] * (status = 'Not Delivered')) AS not_delivered
FROM mytable
GROUP BY 1, 2, 3, 4
I've searched but found nothing that could help.
I have the following table in a SQL Server 2005 database:
Parent Child Value
---- -------- ---------
America Mexico 8
America Canada 1
Asia Japan 5
Asia Korea 7
Europe Spain 0
Europe Italy 2
Africa Zimbabwe 1
Mexico Baja California 0
America USA 3
USA California 1
USA Texas 2
Parent and Child are Primary Key, value is not important (IMO). I would like to create a view that results in something like this:
Parent Child Value
---- -------- ---------
America USA 3
USA California 1
USA Texas 2
I would search for America, and the result will give back every nested child there is, recursively, no matter how many it has, since I could include cities, localities, etc.
What I need is similar to what some call a BOM explosion.
Here is how you can do it:
with cte as (
select parent, child
from t
union all
select cte.parent, t.child
from cte join
t
on cte.child = t.parent
)
select cte.*
from cte
where parent = 'America';
Here is a small SQL Fiddle example.