How to remove values after special character in hive - sql

I am having a hive table with column state as
**state**
taxes, TX
Washington, WA
New York, NY
New Jersey, NJ
Now I want to separate the state column and I want to write it in new columns as
**state** **code**
taxes TX
Washington WA
New York NY
New Jersey NJ

select split(state,',')[0] as state
,ltrim(split(state,',')[1]) as code
from mytable
+------------+------+
| state | code |
+------------+------+
| taxes | TX |
| Washington | WA |
| New York | NY |
| New Jersey | NJ |
+------------+------+

select substr (name,0,instr(name,',')-1), substr (name ,instr(name,',')+1,10) from aa

Related

t-SQL cartesian production of several tables

I would like to get a cartesian product of several tables in SQL (which are actually only one column, so no common key). For example:
TABLE A
Robert
Pierre
Samuel
TABLE B
Montreal
Chicago
TABLE C
KLM
AIR FRANCE
FINAL TABLE (CROSS PRODUCT)
Robert | Montreal | KLM
Pierre | Montreal | KLM
Samuel | Montreal | KLM
Robert | Chicago | KLM
Pierre | Chicago | KLM
Samuel | Chicago | KLM
Robert | Montreal | AIR FRANCE
Pierre | Montreal | AIR FRANCE
Samuel | Montreal | AIR FRANCE
Robert | Chicago | AIR FRANCE
Pierre | Chicago | AIR FRANCE
Samuel | Chicago | AIR FRANCE
I tried CROSS JOIN, but I couldn't find an example with multiple tables. Is the only way to do it is nesting? What if we have 15 tables to join that way... it creates a very long code.
Thank you!
You would simply use:
select *
from a cross join b cross join c;
Do note that if any of the tables are empty (i.e. no rows), you will get no results.

.agg on a group inside a groupby object?

Sorry if this has been asked before, I couldn't find it.
I have census population dataframe that contains the population of each county in the US.
The relevant part of df looks like:
+----+--------+---------+----------------------------+---------------+
| | REGION | STNAME | CTYNAME | CENSUS2010POP |
+----+--------+---------+----------------------------+---------------+
| 1 | 3 | Alabama | Autauga County | 54571 |
+----+--------+---------+----------------------------+---------------+
| 2 | 3 | Alabama | Baldwin County | 182265 |
+----+--------+---------+----------------------------+---------------+
| 69 | 4 | Alaska | Aleutians East Borough | 3141 |
+----+--------+---------+----------------------------+---------------+
| 70 | 4 | Alaska | Aleutians West Census Area | 5561 |
+----+--------+---------+----------------------------+---------------+
How I can get the np.std of the states population (sum of counties' population) for each of the four regions in the US without modifying the df?
You can use transform:
df['std_col'] = df.groupby('STNAME')['CENSUS2010POP'].transform("std")
IIUC, if you want sum of counties, you do:
state_pop = df.groupby('STNAME')['CTYNAME'].nunique().apply(np.std)
You can also directly use the standard deviation method std()
new_df=df.groupby(['REGION'])[['CENSUS2010POP']].std()

How to Group By 2 fields in SQL Query?

I have Two tables in Postgresql and I'm trying to get the number of times a hashtag is repeated by place.
I've made this query:
SELECT tweets_with_location.user_location,
tweets_with_location.my_new_id,
all_hashtags_with_location.regexp_split_to_table
FROM tweets_with_location, all_hashtags_with_location
WHERE tweets_with_location.my_new_id = all_hashtags_with_location.my_new_id;
Which returns the Location, the tweet id and the hashtag:
USER_LOCATION | MY_NEW_ID | HASHTAG
New York, NY | 33 | Happy
New York, NY | 40 | BigApple
Bronx, NY | 12 | Happy
Bronx, NY | 45 | Happy
Queens, NY | 23 | Trump
Queens, NY | 20 | Trump
Then, I've made another SQL Query but it seems it doesn't sums up the number of times a hashtag was displayed by place, the Count value is always 1:
SELECT tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table,
COUNT(DISTINCT all_hashtags_with_location.regexp_split_to_table) AS CountOf
FROM tweets_with_location, all_hashtags_with_location
WHERE tweets_with_location.my_new_id = all_hashtags_with_location.my_new_id
GROUP BY tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table
ORDER BY CountOf DESC;
I need is this result:
USER_LOCATION - HASHTAG - COUNT
New York, NY | Happy | 1
Bronx, NY | Happy | 2
Queens, NY | Trump | 2
New York, NY | Happy | 1
How do I do this? What is wrong with my SQL Query?
Or just remove the DISTINCT qualifier in the COUNT() function.
You were really close, you are counting the wrong field:
SELECT tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table,
COUNT(DISTINCT tweets_with_location.my_new_id) AS CountOf
FROM tweets_with_location, all_hashtags_with_location
WHERE tweets_with_location.my_new_id = all_hashtags_with_location.my_new_id
GROUP BY tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table
ORDER BY CountOf DESC;

Updating a column in PL/SQL

(Using PL/SQL anonymous program block)
I have a table tblROUTE2 of Mexican state highways:
+-----------------+------------+---------+----------+----------+----------------------------+-----------+--------+
| TYPE | ADMN_CLASS | TOLL_RD | RTE_NUM1 | RTE_NUM2 | STATEROUTE | LENGTH_KM | STATE |
+-----------------+------------+---------+----------+----------+----------------------------+-----------+--------+
| Paved Undivided | Federal | N | 81 | | Tamaulipas Federal Hwy 81 | 124.551 | NULL |
| Paved Undivided | Federal | N | 130 | | Hidalgo Federal Hwy 130 | 76.347 | NULL |
| Paved Undivided | Federal | N | 130 | | Mexico Federal Hwy 130 | 68.028 | NULL |
+-----------------+------------+---------+----------+----------+----------------------------+-----------+--------+
and tblSTATE2 of Mexican states:
+------+-----------------------+---------+-----------+
| CODE | NAME | POP1990 | AREA_SQMI |
+------+-----------------------+---------+-----------+
| MX02 | Baja California Norte | 1660855 | 28002.325 |
| MX03 | Baja California Sur | 317764 | 27898.191 |
| MX18 | Nayarit | 824643 | 10547.762 |
+------+-----------------------+---------+-----------+
I need to update the STATE field in tblROUTE2 with the CODE field found in tblSTATE2, based on the route name in tblROUTE2. Basically, I need to somehow take the first string or two (some routes have two names)-- before the string 'Federal'-- of the STATEROUTE field in tblROUTE2 and make sure it matches with the string in the NAME field in tblSTATE2. Then since the states are matched with a CODE, update those codes in the STATE field of tblROUTE2.
I have started a code:
DECLARE
state_code
tblROUTE2.STATE%TYPE;
state_name
tblSTATE2.NAME%TYPE;
BEGIN
SELECT STATE, NAME
INTO state_code
FROM tblROUTE2 r, tblSTATE2 s
WHERE STATEROUTE LIKE '%Federal';
END;
As well, I will need to remove the state name from the route name. For example, the string in STATEROUTE 'Tamaulipas Federal Hwy' becomes 'Federal Hwy'. I have started a code, not sure if it's right:
UPDATE tblROUTE2
SET STATEROUTE = TRIM(LEADING FROM 'Federal');
Using MERGE update :
MERGE INTO tblROUTE2 A
USING
(
SELECT CODE, NAME FROM tblSTATE2
) B
ON
(
upper(SUBSTR(A.STATEROUTE, 0, INSTR(UPPER(A.STATEROUTE), UPPER('FEDERAL'))-2)) = upper(B.NAME)
)
WHEN MATCHED THEN UPDATE
SET A.STATE = B.CODE;
Here in FIDDLE I've replicated your tables and added additional record where STATEROUTE matches one of the records in NAME. Although Fiddle return an error, I ran it in my Oracle DB, and one record was updated correctly as the following screenshot:

Select one of each

How can I get all the countries from the DB, from this table:
city | country | info
Jerusalem | Israel | Capital
Tel Aviv | Israel |
New York | USA | Biggest
Washington DC | USA | Capital
Berlin | Germany | Capital
How can I get, using SQL, the countries only: Israel, USA, Germany?
Which database server are you using?
Assuming that the top row is the column name and you are using MySQL then you should be able to just do
"SELECT distinct(country) FROM <table-name>;"
This is probably in the documentation for the database software that you are using.