hive: to_map function not working - hive

I have below data into the hive table;
select pid, year, catches from fielding_s where pid = 'zobribe01' group by id;
zobribe01 2006 [{"p1":52,"p2":50,"p3":1322,"p4":86}]
zobribe01 2007 [{"p1":30,"p2":26,"p3":674,"p4":37}]
zobribe01 2008 [{"p1":8,"p2":4,"p3":123,"p4":11}]
zobribe01 2008 [{"p1":1,"p2":0,"p3":14,"p4":0}]
zobribe01 2008 [{"p1":5,"p2":3,"p3":81,"p4":8}]
zobribe01 2008 [{"p1":14,"p2":8,"p3":238,"p4":21}]
zobribe01 2008 [{"p1":19,"p2":12,"p3":340,"p4":29}]
zobribe01 2008 [{"p1":2,"p2":1,"p3":21,"p4":0}]
zobribe01 2008 [{"p1":35,"p2":33,"p3":880,"p4":51}]
zobribe01 2009 [{"p1":3,"p2":2,"p3":39,"p4":6}]
zobribe01 2009 [{"p1":91,"p2":81,"p3":2144,"p4":143}]
zobribe01 2009 [{"p1":1,"p2":1,"p3":17,"p4":0}]
zobribe01 2009 [{"p1":7,"p2":5,"p3":140,"p4":15}]
zobribe01 2009 [{"p1":1,"p2":null,"p3":null,"p4":null}]
zobribe01 2009 [{"p1":9,"p2":2,"p3":114,"p4":8}]
zobribe01 2009 [{"p1":70,"p2":44,"p3":1242,"p4":112}]
zobribe01 2009 [{"p1":59,"p2":37,"p3":988,"p4":89}]
zobribe01 2009 [{"p1":13,"p2":6,"p3":186,"p4":9}]
zobribe01 2010 [{"p1":14,"p2":9,"p3":237,"p4":77}]
zobribe01 2010 [{"p1":55,"p2":45,"p3":1113,"p4":74}]
zobribe01 2010 [{"p1":2,"p2":1,"p3":30,"p4":1}]
zobribe01 2010 [{"p1":14,"p2":9,"p3":250,"p4":22}]
zobribe01 2010 [{"p1":1,"p2":0,"p3":6,"p4":0}]
zobribe01 2010 [{"p1":110,"p2":89,"p3":2504,"p4":204}]
zobribe01 2010 [{"p1":103,"p2":80,"p3":2248,"p4":182}]
zobribe01 2011 [{"p1":131,"p2":118,"p3":3175,"p4":213}]
zobribe01 2011 [{"p1":38,"p2":33,"p3":869,"p4":65}]
I want to merge these rows into a table using to_map function. I am running below query but it gives me below error.
select pid, to_map(year, catches) from fielding_s where pid = 'zobribe01' group by pid;
FAILED: SemanticException [Error 10025]: Line 1:12 Expression not in GROUP BY key 'catches'

Remove the group by clause from you query and then try
select id, map(name, designation) b from snow

Related

How to get selective count in sql

i have a intermediate result as follows
year ID edition
1996 WOS:000074643400033 WOS.SCI
1996 WOS:000074643400033 WOS.ISSHP
1996 WOS:000074643400033 WOS.ISTP
2004 WOS:000222568300039 WOS.ISTP
2004 WOS:000222568300039 WOS.SCI
2008 WOS:000265048200175 WOS.ISTP
2009 WOS:000275179901182 WOS.ISTP
2009 WOS:000275179901182 WOS.ISSHP
now i must run a count on top of this result with the following conditions,
if a ID contain both "WOS.ISTP" and "WOS.ISSHP" edition in a same year, it must be counted just once.
my final table should look like the following.
Note: i have added the "intermediate_count" column just for understanding purpose, it need not appear in the final table.
year ID edition intermediate_count Final_count
1996 WOS:000074643400033 WOS.SCI 1 2
1996 WOS:000074643400033 WOS.ISSHP 1
1996 WOS:000074643400033 WOS.ISTP
2004 WOS:000222568300039 WOS.ISTP 1 2
2004 WOS:000222568300039 WOS.SCI 1
2008 WOS:000265048200175 WOS.ISTP 1 1
2009 WOS:000275179901182 WOS.ISTP 1 1
2009 WOS:000275179901182 WOS.ISSHP
i tried use CASE in the following way but didn't work out.
select id, year,
case
when edition_code = 'WOS.ISTP' and edition_code = 'WOS.ISSHP' then 'both'
else edition_code
end as edition_code_new
from table
group by id, year, edition_code_new
order by id;
any help would be appreciated, thanks in advance.
Assuming you want one row per id and year, you can use count(distinct):
select id, year,
count(distinct case when edition_code in ('WOS.ISTP', 'WOS.ISSHP')
then 'WOS.ISTP'
else edition_code
end) as count
from table
group by id, year
order by id;
You can also phrase this as a window function if you want the original rows.

Msg 102, Level 15, State 1, Line 28 Incorrect syntax near 'order'

I'm trying the following query and I get an error. I am trying to Calculate YTD and Previous YTD in the same query.
Msg 102, Level 15, State 1, Line 28 Incorrect syntax near 'order'.
WITH
grouped_by_date AS
(
SELECT
[Sales_Organization],
[Market_Grp],
[Delivery_Year],
[Delivery_Month],
[Invoicing_Day],
SUM(QTY_UoM) AS Weight
FROM
tmp.factsales s
GROUP BY
[Sales_Organization],
[Market_Grp],
[Delivery_Year],
[Delivery_Month],
[Invoicing_Day]
),
cumulative_sum_for_ytd AS
(
SELECT
*,
SUM([Weight]) OVER (PARTITION BY [Delivery_Year] ORDER BY [Delivery_Month], [Invoicing_Day]
)
AS Weight_YTD
FROM
grouped_by_date
),
hack_to_do_lag AS
(
SELECT
*,
CASE
WHEN [Delivery_Year]%2=1
THEN MAX(CASE WHEN [Delivery_Year]%2=0 THEN [Weight_YTD] END) OVER (PARTITION BY ([Delivery_Year]+0)/2)
ELSE MAX(CASE WHEN [Delivery_Year]%2=1 THEN [Weight_YTD] END) OVER (PARTITION BY ([Delivery_Year]+1)/2)
END
AS Weight_PreviousYTD
FROM
cumulative_sum_for_ytd
)
SELECT
*
FROM
hack_to_do_lag
I searched on google it seems the problem link the version that I used, in fact:
SELECT ##VERSION
Microsoft SQL Server 2008 R2 (SP3) - 10.50.6220.0 (X64) Mar 19 2015
12:32:14 Copyright (c) Microsoft Corporation Enterprise Edition
(64-bit) on Windows NT 6.3 (Build 9600: ) (Hypervisor)
How could I resolve my problem? I can't change the version.
SQL Server 2008 does not support cumulative window functions, so you need to do the calculation differently. A subquery or apply is a typical method:
WITH grouped_by_date AS (
SELECT Sales_Organization, Market_Grp,
Delivery_Year, Delivery_Month, Invoicing_Day,
SUM(QTY_UoM) as Weight
FROM tmp.factsales s
GROUP BY Sales_Organization, Market_Grp,
Delivery_Year, Delivery_Month, Invoicing_Day
)
SELECT gbd.*,
(SELECT SUM(gbd2.Weight)
FROM grouped_by_date gbd2
WHERE gbd2.Delivery_Year = gbd.Delivery_Year AND
(gbd2.Delivery_Month < gbd.Delivery_Month OR
gbd2.Delivery_Month = gbd.Delivery_Month AND
gbd2.Invoicing_Day <- gbd.Invoicing_Day
)
) as weight_ytd
FROM grouped_by_date gbd;

I want to calculate a percentage change within a column using SQL Server

Forgive me if this has already been addressed elsewhere, I checked but couldn't work for me. I am not very conversant with SQL Server.
I want to calculate the percentage of indices between months Jan - Dec denoted by 01, 02, 03... in my table using SQL Server. I have added what is expected in the column Pct_Change using Excel. See sample below:
M Year Indices Pct_Change
01 2017 190.51
02 2017 188.99 -0.8
03 2017 190.06 0.6
04 2017 194.24 2.2
05 2017 196.83 1.3
06 2017 196.30 -0.3
07 2017 191.09 -2.7
08 2017 190.42 -0.3
09 2017 194.02 1.9
10 2017 201.20 3.7
11 2017 200.98 -0.1
12 2017 194.43 -3.3
01 2018 197.23 1.4
02 2018 198.20 0.5
03 2018 194.60 -1.8
Any help will be greatly appreciated.
This is a guess, due to the lack of expected results, but perhaps...
WITH VTE AS(
SELECT *
FROM (VALUES(01,2017,190.51),
(02,2017,188.99),
(03,2017,190.06),
(04,2017,194.24),
(05,2017,196.83),
(06,2017,196.30),
(07,2017,191.09),
(08,2017,190.42),
(09,2017,194.02),
(10,2017,201.20),
(11,2017,200.98),
(12,2017,194.43),
(01,2018,197.23),
(02,2018,198.20),
(03,2018,194.60)) V(M, [Year], Indices))
SELECT *,
(Indices / LAG(Indices) OVER (ORDER BY [Year] ASC, M ASC)) -1 AS Pct_change
FROM VTE;
Edit: OP is, unfortunately, using 2008R2. Personally, I strongly suggest looking at upgrading your version. SQL Server 2008(R2) has less than a year of extended support left now. After that, you will receive no updates for it, including security updates. If you need to be GDPR compliant, then it's a must that you change.
Anyway, you can do this in SQL Server 2008R2 by using a LEFT JOIN to the same table:
WITH VTE AS(
SELECT *
FROM (VALUES(01,2017,190.51),
(02,2017,188.99),
(03,2017,190.06),
(04,2017,194.24),
(05,2017,196.83),
(06,2017,196.30),
(07,2017,191.09),
(08,2017,190.42),
(09,2017,194.02),
(10,2017,201.20),
(11,2017,200.98),
(12,2017,194.43),
(01,2018,197.23),
(02,2018,198.20),
(03,2018,194.60)) V(M, [Year], Indices)),
--The solution. note that you would need your WITH on the next line
--Mine isn't, as I used a CTE to create the sample data.
RNs AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY [Year] ASC, M ASC) AS RN
FROM VTE)
SELECT R1.M,
R1.[Year],
R1.Indices,,
(R1.Indices / R2.Indices) -1 AS Pct_change
FROM RNs R1
LEFT JOIN RNs R2 ON R1.RN = R2.RN + 1;

SQL not grouping properly

I am trying to find the number of records for certain service codes, by year - in my database.
The code:
SELECT datepart( year,dbo.PUBACC_HD.grant_date) as'Year',
dbo.PUBACC_HD.radio_service_code as 'Service Code',
count(dbo.PUBACC_FR.transmitter_make) as 'Number of Records'
FROM dbo.PUBACC_FR
INNER JOIN dbo.PUBACC_HD
ON dbo.PUBACC_FR.unique_system_identifier = dbo.PUBACC_HD.unique_system_identifier
GROUP BY dbo.PUBACC_HD.grant_date, dbo.PUBACC_HD.radio_service_code
ORDER BY [Number of Records] desc
Current Result:
Year Service Code Number of Records
----------- ------------ -----------------
2011 CF 11195 <----
2013 CF 2042
2011 CF 1893 <----
2013 CF 1879
2013 CF 1841
2013 CF 1741
2013 CF 1644
2010 CF 1595
2013 MG 1563
2011 CF 1512 <----
2013 CF 1510
2011 CF 1454
2011 CF 1428
2016 CF 1385
2011 CF 1378
2015 MG 1349
I want all of the fields to be aggregated. Example of none aggregations denoted by arrows. (2011, CF) is just one example in the large table of things not aggregating correctly.
Anyone know why this is happening?
You should use:
GROUP BY datepart( year,dbo.PUBACC_HD.grant_date)
instead of:
GROUP BY dbo.PUBACC_HD.grant_date
As it is right now, you are grouping by a date value, that may differ among records sharing the same radio_service_code value.
Change Group By To:
Group By datepart( year,dbo.PUBACC_HD.grant_date),dbo.PUBACC_HD.radio_service_code
You have only select year from grant_date, so you have to also write group by accordingly
Because you are grouping by grant_date and not by year
Try with statement and also change your group by condition.
;With CTE AS
(
SELECT
datepart( year,dbo.PUBACC_HD.grant_date) as Year,
dbo.PUBACC_HD.radio_service_code as ServiceCode,
count(dbo.PUBACC_FR.transmitter_make) as NumberofRecords
FROM dbo.PUBACC_FR
INNER JOIN dbo.PUBACC_HD
ON dbo.PUBACC_FR.unique_system_identifier = dbo.PUBACC_HD.unique_system_identifier
)
Select * from cte
GROUP BY Year, ServiceCode
ORDER BY NumberofRecords desc
As #Lucas Kot-Zaniewski state you are using year in select and group it by date, thats is problem.

Linq query on the same table with difference between rows

I've been trying to develop a linq query that returns results from the same table.
ORDERS
YEAR NumberOfOrders
--------------------------
2009 150
2010 195
2011 180
2012 110
The query must returns the diffrence between the current and the last year (2012 and 2011) like follows :
Result:
YEAR NumberofOrders DIFFERENCE
---------------------------------------
2012 110 -70
Thanks for your help,
found it by myself
var query = (from o1 in context.orders
where o1.year == lastyear
from o2 in context.orders
where o2.year == currentyear
select new
{
difference = o2.numberOfOrders - o1.numberOfOrders,
numberOfOrders = o2.numberOfOrders,
year = o2.year
});
Thanks,