There is table w/ colum called Cars in this colum I have array [Audi, BMW, Toyota, ..., VW]
And I want update this table and set Cars without few elements from this array (Toyota,..., BMW)
How can I get it, I want put another array and delete elements that matched
You can unnest the array, filter, and reaggregate:
select t.*,
(select array_agg(car)
from unnest(t.cars) car
where car not in ( . . . )
) new_cars
from t;
If you want to keep the original ordering:
select t.*,
(select array_agg(u.car order by n)
from unnest(t.cars) with ordinality u(car, n)
where u.car not in ( . . . )
) new_cars
from t
You could call array_remove several times:
SELECT array_remove(
array_remove(
ARRAY['Audi', 'BMW', 'Toyota', 'Opel', 'VW'],
'Audi'
),
'BMW'
);
array_remove
------------------
{Toyota,Opel,VW}
(1 row)
Maybe I Can help using pandas in python. Assuming, you'd want to delete all the rows having the elements you'd like to delete. Lets say df is your dataframe, then,
import pandas as pd
vals_to_delete = df.loc[(df['cars']== 'Audi') | (df['cars']== 'VW')]
df = df.drop(vals_to_delete)
or you could also do
df1 = df.loc'[(df['cars']!= 'Audi') | (df['cars']!= 'VW')]
In sql, you could use
DELETE FROM table WHERE Cars in ('Audi','VW);
Related
I have the follow query in BigQuery:
SELECT *
FROM `data`, UNNEST(deliveries.modalities.campaigns) as dmc
where
dmc.id = 4469
The struct of the field deliveries is:
deliveries RECORD REPEATED
-----items RECORD REPEATED
-----modalities RECORD REPEATED
----------campaigns RECORD REPEATED
---------------coparticipations RECORD REPEATED
---------------id
I wnat to filter deliveries.modalities.campaigns.id, but my query don't worked. Can anyone help me?
Another approach that you might try and consider is to use CTE as shown below:
with test_1 as (
select *
from `your-project.your-dataset.test_deliveries`, unnest (deliveries) d
JOIN unnest (d.modalities) m
)
select *
from test_1, unnest (campaigns) c
where c.id = 4469
Output:
My Test Schema:
My loaded .jsonl file to create my sample data:
{"deliveries": [{"items": [1,2,3],"modalities": [{"campaigns": [{"coparticipations": [1,2,3],"id": 1234}]}]}]}
{"deliveries": [{"items": [2,3,4],"modalities": [{"campaigns": [{"coparticipations": [4,5,6],"id": 2345}]}]}]}
{"deliveries": [{"items": [3,4,5],"modalities": [{"campaigns": [{"coparticipations": [7,8,9],"id": 4469}]}]}]}
{"deliveries": [{"items": [4,5,6],"modalities": [{"campaigns": [{"coparticipations": [10,11,12],"id": 3456}]}]}]}
I have a column of type jSONB that have data like this:
column name: used_filters
row number 1 example:
{ "categories" : ["economic", "Social"], "tags": ["world" ,"eco-friendly"] }
row number 2 example:
{ "categories" : ["economic"], "tags": ["eco-friendly"] , "keywords" : ["2050"] }
I want to group the result to get the most frequent value for each one of the keys
something like this:
key
most_freq
category
economic
tags
eco-friendly
keyword
2050
the keys are not constant and could be something other than the example I said but I know that they will be frequent.
You can extract keys and values as arrays first by using jsonb_each, and then unnest the generated arrays by jsonb_array_elements_text. The rest is classical aggregation along with sorting through the count values by window function such as
SELECT key, value
FROM ( SELECT j.key, jj.value,
RANK() OVER (PARTITION BY j.key ORDER BY COUNT(*) DESC)
FROM t,
LATERAL jsonb_each(js) AS j,
LATERAL jsonb_array_elements_text(j.value) AS jj
GROUP BY j.key, jj.value ) AS q
WHERE rank = 1
Demo
say, I have a result of rows of a member from a simple query.
select distinct mbr_id from mbr_base where location = '17957' ;
result would look like this.
mbr_id
location
000000011441894
17957
000000011437056
17957
000000011437981
17957
000000011441312
17957
000000011440730
17957
000000011482555
17957
000000011498476
17957
this is one of the result where location condition is filtered.
Yet, I have another 49 locations to iterate as these are my distinct locations to be examined.
Finally, I would combine all this as a one table as a result table to be ready for analytics.
For example, my Psuedo-code for Python would like
df = pd.DataFrame()
for i in unique(mbr_base['location']):
rst = '''select * from where location = 'i'; '''
rst_df = pd.to_dataframe(rst)
pd.concat([df,rst_df],axis=0)
display(df)
Can you help me to write a procedure for doing this in sql(pgsql preferrably)
Many thanks;
If you want the individual locations, then use in:
select distinct mbr_id, location
from mbr_base
where location in ( '17957', . . . )
Your sample results have the location. If you just want the mbr_id, then use:
select distinct mbr_id
from mbr_base
where location in ( '17957', . . . )
Now, presumably mbr_id is unique in mbr_base. If so, remove the distinct. In addition, location looks like a number. If it really is a number, then drop the single quotes. So, what you might want is:
select mbr_id
from mbr_base
where location in ( 17957, . . . )
I have jsonb in one of my table
the jsonb looks like this
my_data : [
{pid: 1, stock: 500},
{pid: 2, stock: 1000},
...
]
pid refers to products' table id ( which is pid )
EDIT: The table products has following properties: pid (PK), name
I want to loop over my_data[] in my JSONB and fetch pid's name from product table.
I need the result to look something like this (including the product names from the second table) ->
my_data : [
{
product_name : "abc",
pid: 1,
stock : 500
},
...
]
How should I go about performing such jsonb inner join?
Edit :- tried S-Man's solutions and i'm getting this error
"invalid reference to FROM-clause entry for table \"jc\""
here is the
SQL QUERY
step-by-step demo:db<>fiddle
SELECT
jsonb_build_object( -- 5
'my_data',
jsonb_agg( -- 4
elems || jsonb_build_object('product_name', mot.product_name) -- 3
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems -- 1
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid -- 2
Expand JSON array into one row per array element
Join the other table against the current one using the pid values (notice the ::int cast, because otherwise it would be text value)
The new columns from the second table now can be converted into a JSON object. This one can be concatenate onto the original one using the || operator
After that recreating the array from the array elements again
Putting this in array into a my_data element
Another way is using jsonb_set() instead of step 5 do reset the array into the original array directly:
step-by-step demo:db<>fiddle
SELECT
jsonb_set(
mydata,
'{my_data}',
jsonb_agg(
elems || jsonb_build_object('product_name', mot.product_name)
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid
GROUP BY mydata
I have a hive table with the following schema:
COOKIE | PRODUCT_ID | CAT_ID | QTY
1234123 [1,2,3] [r,t,null] [2,1,null]
How can I normalize the arrays so I get the following result
COOKIE | PRODUCT_ID | CAT_ID | QTY
1234123 [1] [r] [2]
1234123 [2] [t] [1]
1234123 [3] null null
I have tried the following:
select concat_ws('|',visid_high,visid_low) as cookie
,pid
,catid
,qty
from table
lateral view explode(productid) ptable as pid
lateral view explode(catalogId) ptable2 as catid
lateral view explode(qty) ptable3 as qty
however the result comes out as a Cartesian product.
I found a very good solution to this problem without using any UDF,
posexplode is a very good solution :
SELECT COOKIE ,
ePRODUCT_ID,
eCAT_ID,
eQTY
FROM TABLE
LATERAL VIEW posexplode(PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID
LATERAL VIEW posexplode(CAT_ID) eCAT_ID AS seqc, eCAT_ID
LATERAL VIEW posexplode(QTY) eQTY AS seqq, eDateReported
WHERE seqp = seqc AND seqc = seqq;
You can use the numeric_range and array_index UDFs from Brickhouse ( http://github.com/klout/brickhouse ) to solve this problem. There is an informative blog posting describing in detail over at http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/
Using those UDFs, the query would be something like
select cookie,
array_index( product_id_arr, n ) as product_id,
array_index( catalog_id_arr, n ) as catalog_id,
array_index( qty_id_arr, n ) as qty
from table
lateral view numeric_range( size( product_id_arr )) n1 as n;
You can do this by using posexplode, which will provide an integer between 0 and n to indicate the position in the array for each element in the array. Then use this integer - call it pos (for position) to get the matching values in other arrays, using block notation, like this:
select
cookie,
n.pos as position,
n.prd_id as product_id,
cat_id[pos] as catalog_id,
qty[pos] as qty
from table
lateral view posexplode(product_id_arr) n as pos, prd_id;
This avoids the using imported UDF's as well as joining various arrays together (this has much better performance).
If you are using Spark 2.4 in pyspark, use arrays_zip with posexplode:
df = (df
.withColumn('zipped', arrays_zip('col1', 'col2'))
.select('id', posexplode('zipped')))
I tried to work out on your scenario... please try this code -
create table info(cookie string,productid int,catid string,qty string);
insert into table info
select cookie,productid[myprod],categoryid[mycat],qty[myqty] from table
lateral view posexplode(productid) pro as myprod,pro
lateral view posexplode(categoryid) cate as mycat,cate
lateral view posexplode(qty) q as myqty,q
where myprod=mycat and mycat=myqty;
Note - In the above statements, if you place -
select cookie,myprod,mycat,myqty from table in place of select cookie,productid[myprod],categoryid[mycat],qty[myqty] from table
in the output you will get the index of the element in the array of productid, categoryid and qty. Hope this will be helpful.