2

bst365体育投注网址大全Have two dataframes, one of them is very large, as follows:

import pandas as pd
import numpy as np
import string, random

siz = int(1e10)
random.seed(1234)
a1 = pd.Series((random.choice(string.ascii_uppercase) for _ in range(siz)), name='CatA')
a2 = pd.Series((random.choice(string.ascii_lowercase) for _ in range(siz)), name='CatB')
val1 = pd.Series(pd.Series(np.random.randint(2, high=10, size=siz), name='Value'))

df_a = pd.DataFrame([a1, a2, val1]).T.set_index(['CatA', 'CatB'])

siz = 1000
random.seed(4321)
b1 = pd.Series((random.choice(string.ascii_uppercase) for _ in range(siz)), name='CatA')
b2 = pd.Series((random.choice(string.ascii_lowercase) for _ in range(siz)), name='CatB')
val2 = pd.Series(pd.Series(np.random.randint(2, high=10, size=siz), name='Value'))

df_b = pd.DataFrame([b1, b2, val2]).T.set_index(['CatA', 'CatB'])
  • Want to quickly get the difference between the two dataframes based on their index, while keeping Value of df_a intact.
    • df_b should be eliminated from df_a.
    • Both dfs have the same structure. The Value of df_a should be preserved.
    • The Value of df_b is dropped.

Tried df_a.sub(df_b.drop('Value', 1)) ... which doesn't work.

Is there a vectoriz-ed way to do this?

1
0

I believe you need with inverted mask by ~:

df = df_a[~df_a.index.isin(df_b.index)]
| improve this answer | |

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.