I am trying to merge 2 data-frames ('credit' and 'info') on the column 'id'.
My code for this is:
c.execute('SELECT * FROM "credit"')
credit=c.fetchall()
credit=pd.DataFrame(credit)
c.execute('SELECT * FROM "info"')
info=c.fetchall()
movies_df=pd.DataFrame(info)
movies_df_merge=pd.merge(credit, movies_df, on='id')
Both of the id column types from the tables ('credit' and 'info') integers, but I am unsure of why I keep getting a key error on 'id'.
I have also tried:
movies_df_merge=movies_df.merge(credit, on='id')
The way how you read both DataFrames is not relevant here.
Just print both DataFrames (if the number of records is big, it will
be enough to print(head(df))).
Then look at them. Especially check whether both DataFrames contains
id column. Maybe one of them is ID, whereas another is id?
The upper / lower case of names does matter here.
Check also that id column in both DataFrames is a "normal" column
(not a part of the index).
Related
I am super new to SQL and have two queries I think should produce the same output but they don't. Can someone figure out the difference between them?
The input table for this simple example has two columns, letter and extra. The data in the first column is a random letter from the list ['a', 'b', 'c', 'd', 'e'] and extra should not matter (I think?). These are the queries:
update
tbl
set
extra = letter;
and:
update
tbl
set
extra = (select
letter
from tbl);
The resulting tables these produce are:
e|e
e|e
c|c
e|e
b|b
...
and:
e|e
e|e
c|e
e|e
b|e
...
respectively.
I expect the first output for both queries, how come the second one turns out as it does?
EDIT:
The reason I ask this question is because what I want to do is a bit more involved than this simple example and I believe I need the subquery. I am trying to add a kind of normalisation column, like this:
update
tbl
set
extra = 1 / (select
norm
from
tbl
INNER JOIN
(SELECT
letter, count(*) as norm
FROM
tbl
GROUP BY letter) as tmp
ON
tbl.letter = tmp.letter);
Alas, this obviously doesn't work because of the above.
What your first query is saying:
Set the value of extra to the value of letter in the same row.
What the second query is saying:
Pick a value from the column "letter" in the table, and update every row in the table to have the column 'extra' contain that value.
They are different instructions, so you get different results.
I am joining different tables which have columns with same names. When I first tried to select them, I came into the ambiguous column name error, i.e. there were columns with same names. Therefore, I made an explicit selection of columns, but now I get less columns than I requested.
response = DB[:courses].select(Sequel[:courses][:id], Sequel[:courses][:title], Sequel[:courses][:headline], Sequel[:courses][:description], Sequel[:courses][:slug], Sequel[:courses][:avg_duration], Sequel[:courses][:points], Sequel[:courses][:intro_video_url], Sequel[:courses][:background_color], Sequel[:courses][:views], Sequel[:courses][:certificate_option], Sequel[:courses][:url], Sequel[:courses][:is_active], Sequel[:courses][:num_subscribers], Sequel[:courses][:num_reviews], Sequel[:courses][:num_finished], Sequel[:courses][:avg_rating], Sequel[:courses][:avg_rating_recent], Sequel[:locales][:title], Sequel[:locales][:english_title], Sequel[:courses][:has_caption], Sequel[:courses][:is_paid], Sequel[:courses][:price], Sequel[:courses][:price_discount], Sequel[:courses][:currency], Sequel[:instructors][:headline], Sequel[:instructors][:name], Sequel[:instructors][:slug], Sequel[:instructors][:image], Sequel[:instructors][:initials], Sequel[:instructors][:url], Sequel[:instructors][:origin_id], Sequel[:courses][:image_preview], Sequel[:courses][:image_view], Sequel[:difficulties][:name], Sequel[:course_types][:name], Sequel[:origins][:image_url], Sequel[:origins][:name], Sequel[:origins][:url_about])
.join(:locales, id: Sequel[:courses][:locale_id])
.join(:instructors, id: Sequel[:courses][:instructor_id])
.join(:origins, id: Sequel[:courses][:origin_id])
.join(:difficulties, id: Sequel[:courses][:difficulty_id])
.join(:course_types, id: Sequel[:courses][:course_type_id])
.where(Sequel.ilike(Sequel[:courses][:title], "%#{title}%")).where( is_paid: is_paid).limit(count).offset(count * (page - 1))
I expected to get 38 columns, but I get 32. I tried to explicitly get only columns (select.columns) or get via map (select.map), however the result is same. When I do this request natively in SQLite prompt, it returns exact 38 columns. I also tried to do queries with gem sqlite3, however the same prompt results in only 32 columns.
How can I get all columns without making any sacrifices? Can I rename names while making a selection or is there any other solution?
First you need to give each table an alias. Then when you write your select statement prepend each column you are selecting with the table alias like alias.column_name.
Something like this:
Note that the first and last field are single columns.
No, once you've got multiple levels in your columns, each column needs a level value for each.
If you're just looking for presentable output, those values can be an empty string. They can also be the same value at each level.
But if you select on the first two columns of the first level, you're going to have four columns at the second level, and pandas will need to know what to call all four.
I have two tables of stock tickers.
I create SQL joined query to combine the two tables.
query_combined = session\
.query(Table1, Table2)\
.join(Table2, Table1.ticker==Table2.ticker)
I then feed the SQL to Pandas to load in a frame:
df_combined = pandas\
.read_sql(query_combined.statement,
query_combined.session.bind,
index_col='ticker')
However, since there are two "tickers" columns from the joined tables, setting the index_col='ticker' results in a tuple for the index column of '(ticker, ticker)'. I just want to specify one of the "ticker" columns as the dataframe index but am unsure how.
I am new to pandas and am sure this is very simple, but in my hour of Googling, I haven't found the answer. Many thanks in advance for pointing me in the right direction.
Consider with_labels to qualify ambiguous columns with underscores <table>_<column>:
df_combined = (pandas
.read_sql(query_combined.with_labels().statement,
query_combined.session.bind,
index_col='Table1_ticker')
)
To shorten table name, alias the tables before the join:
t1 = aliased(t1, Table1)
t2 = aliased(t2, Table2)
query_combined = (session
.query(t1, t2)
.join(t2, t1.ticker==t2.ticker)
)
I have one table with rows and each row has a column that contains a field name (say raw1 - 'Number001', raw2-'ShortChar003', etc). In order for me to get that value of these fields I have to use a second table; this table has 1 raw with many columns (number001, Number002, ShortChar003, etc).
How can I extract the value?
Good Question..You can use lookup function
=Lookup(Fields!CityColumn.Value, Fields!CityColumn.Value, Fields!CountColumn.Value, "Dataset1")
Or you might have to use string functions..ex LEFT, Substring, Right same like SQL.If possible pls post some data of both tables, I will explain in detail