please help me with the following conversion please. So I have a pandas dataframe in the following format:
id
location
{ "0": "5",
"0": "Charlotte, North Carolina",
"1": "5",
"1": "N/A",
"2": "5",
"2": "Portland, Oregon",
"3": "5",
"3": "Jonesborough, Tennessee",
"4": "5",
"4": "Rockville, Indiana",
"5": "5",}
"5": "Dallas, Texas",
and would like to convert this into the following format:
A header
Another header
"5"
"Charlotte, North Carolina"
"5"
"N/A"
"5"
"Portland, Oregon"
"5"
"Jonesborough, Tennessee"
"5"
"Rockville, Indiana"
"5"
"Dallas, Texas"
Please help
You can try this.
import pandas as pd
import re
df = pd.DataFrame([['{ "0": "5",', '"0": "Charlotte, North Carolina",'], ['"1": "5",','"1": "N/A",']], columns=['id', 'location'])
#using regex to extract int values and selecting second int
df['id'] = df['id'].apply(lambda x: re.findall(r'\d+', x)[1])
#Split the string with : and select second value. And remove comma
df['location'] = df['location'].apply(lambda x: x.split(':')[1][:-1])
print(df)
Output:
id location
0 5 "Charlotte, North Carolina"
1 5 "N/A"
Related
Is the following a full list of all value types as they're passed to json in BigQuery? I've gotten this by trial and error but haven't been able to find this in the documentation:
select
NULL as NullValue,
FALSE as BoolValue,
DATE '2014-01-01' as DateValue,
INTERVAL 1 year as IntervalValue,
DATETIME '2014-01-01 01:02:03' as DatetimeValue,
TIMESTAMP '2014-01-01 01:02:03' as TimestampValue,
"Hello" as StringValue,
B"abc" as BytesValue,
123 as IntegerValue,
NUMERIC '3.14' as NumericValue,
3.14 as FloatValue,
TIME '12:30:00.45' as TimeValue,
[1,2,3] as ArrayValue,
STRUCT('Mark' as first, 'Thomas' as last) as StructValue,
[STRUCT(1 as x, 2 as y), STRUCT(5 as x, 6 as y)] as ArrayStructValue,
STRUCT(1 as x, [1,2,3] as y, ('a','b','c') as z) as StructNestedValue
{
"NullValue": null,
"BoolValue": "false", // why not just false without quotes?
"DateValue": "2014-01-01",
"IntervalValue": "1-0 0 0:0:0",
"DatetimeValue": "2014-01-01T01:02:03",
"TimestampValue": "2014-01-01T01:02:03Z",
"StringValue": "Hello",
"BytesValue": "YWJj",
"IntegerValue": "123",
"NumericValue": "3.14",
"FloatValue": "3.14",
"TimeValue": "12:30:00.450000",
"ArrayValue": ["1", "2", "3"],
"StructValue": {
"first": "Mark",
"last": "Thomas"
},
"ArrayStructValue": [
{"x": "1", "y": "2"},
{"x": "5", "y": "6"}
],
"StructNestedValue": {
"x": "1",
"y": ["1", 2", "3"],
"z": {"a": "a", b": "b", "c": "c"}
}
}
Honestly, it seems to me that other than the null value and the array [] or struct {} container, everything is string-enclosed, which seems a bit odd.
According to this document, json is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most
languages, this is realized as an array, vector, list, or sequence.
The result of the SELECT query is in json format, wherein [] depicts an array datatype, {} depicts an object datatype and double quotes(" ") depicts a string value as in the query itself.
I have a list containing the keys and another list containing values (obtained from splitting a log line). How can I combine the two to make a proeprty-bag in Kusto?
let headers = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| expand fields = split(RawData, ",")
| expand dict = ???
Expected:
dict
-----
{"A": 1, "B": 2, "C": 3}
{"A": 4, "B": 5, "C": 6}
Here's one option, that uses the combination of:
mv-apply: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/mv-applyoperator
pack(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packfunction
make_bag(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/make-bag-aggfunction
let keys = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| project values = split(RawData, ",")
| mv-apply with_itemindex = i key = keys to typeof(string) on (
summarize dict = make_bag(pack(key, values[i]))
)
values
dict
[ "1", "2", "3"]
{ "A": "1", "B": "2", "C": "3"}
[ "4", "5", "6"]
{ "A": "4", "B": "5", "C": "6"}
kindly need to extract name value, even it's Action or adventure from this column in new column
in pandas
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
You want from_records:
import pandas as pd
data = [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
df = pd.DataFrame.from_records(data)
df
you get
id name
0 28 Action
1 12 Adventure
2 14 Fantasy
3 878 Science Fiction
I am learning the implementation of the TensorFlow model at android.
In this tutorial, it put the labels.txt and model.tflite files into assets folder .
https://blog.notyouraveragedev.in/android/image-classification-in-android-using-tensor-flow/
What is that labels.txt should be?
I have a file that has the following format :
"1": "1 Cent,Australian dollar,australia",
"2": "2 Cents,Australian dollar,australia",
"3": "5 Cents,Australian dollar,australia",
"4": "10 Cents,Australian dollar,australia",
"5": "20 Cents,Australian dollar,australia",
"6": "50 Cents,Australian dollar,australia",
"7": "1 Dollar,Australian dollar,australia",
"8": "2 Dollars,Australian dollar,Australia",
Is it that labels.txt file or is it something else ?
Never mind I found the answer now.
It should be a pure class name.
eg. 1 Cent,Australian dollar,Australia
I have a gigantic script that I would like to create in an iterative way (while or for loop), so it becomes overviewable and much shorter. It makes so much sense that it should be doable in SQL but so far I have not succeeded. What I did now in order to make it work is a lot of selections that I UNION together to make one table.
I want to iterate through the years, so while year is lower then 2017 execute function with the year in it as variable, starting from 1995.
So actually, an iterative function that fills in all years in the following lines of code and combines all results within one table: I will keep trying myself and update the code if I make progress.
SELECT
regio, 1995 as year, sum("0") as "0", sum("1") as "1", sum("2") as "2", sum("3") as "3", sum("4") as "4", sum("5") as "5", sum("6") as "6", sum("7") as "7", sum("8") as "8", sum("9") as "9", sum("10") as "10"
FROM
source
where
year = 1995 OR "year-1" = 1995 OR "year-2" = 1995 OR "year-3" = 1995 OR "year-4" = 1995
group by
regio
UNION
SELECT
regio, 1996 as year, sum("0") as "0", sum("1") as "1", sum("2") as "2", sum("3") as "3", sum("4") as "4", sum("5") as "5", sum("6") as "6", sum("7") as "7", sum("8") as "8", sum("9") as "9", sum("10") as "10"
FROM
source
where
year = 1996 OR "year-1" = 1996 OR "year-2" = 1996 OR "year-3" = 1996 OR "year-4" = 1996
group by
regio
You would seem to want:
SELECT regio, g.yyyy as year, sum("0") as "0", sum("1") as "1",
sum("2") as "2", sum("3") as "3", sum("4") as "4",
sum("5") as "5", sum("6") as "6", sum("7") as "7",
sum("8") as "8", sum("9") as "9", sum("10") as "10"
FROM source CROSS JOIN
generate_series(1995, 2017) g(yyyy)
WHERE g.yyyy IN (year, "year-1", "year-2", "year-3", "year-4")
GROUP BY regio, g.yyyy;