Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBNInference - Key Error #1744

Open
Rajaram1604 opened this issue Mar 15, 2024 · 4 comments
Open

DBNInference - Key Error #1744

Rajaram1604 opened this issue Mar 15, 2024 · 4 comments

Comments

@Rajaram1604
Copy link

Subject of the issue

Getting Key Error while making inference by using DBNInference

Your environment

  • pgmpy version - 0.1.24(installed from pgmpy dev branch)
  • Python version - 3.11
  • Operating System - Windows

Steps to reproduce

Fitting the below DataFrame data with DynamicBayesian Network model

data_t0 = pd.DataFrame({
    'CreditScore': np.random.randint(500, 800, size=(100, 1)).flatten(),
    'Income': np.random.randint(20000, 20100, size=(100,)),
    'LoanAmount': np.random.randint(15000, 15100, size=(100,)),
})

data_t0['LoanApproval'] = np.where((data_t0['CreditScore'] > 650) & (data_t0['Income'] > 20080) & (data_t0['LoanAmount'] < data_t0['Income']), 'Approved', 'Denied')

df_t0 = pd.DataFrame(data_t0, columns=data_t0.keys())

# Slice 1 ,  data
data_t1 = pd.DataFrame({
    'CreditScore': np.random.randint(500, 900, size=(100, 1)).flatten(),
    'Income': np.random.randint(20000, 20200, size=(100,)),
    'LoanAmount': np.random.randint(15000, 15200, size=(100,)),
})

data_t1['LoanApproval'] = np.where((data_t1['CreditScore'] > 700) & (data_t1['Income'] > 20100) & (data_t1['LoanAmount'] < data_t1['Income']), 'Approved', 'Denied')

df_t1 = pd.DataFrame(data_t1, columns=data_t1.keys())

# Concatenating the two slice DataFrames
concat_df = pd.concat([df_t0, df_t1], axis=1, join='inner')

print(concat_df)

#convert dataframe to list
list = concat_df.values.tolist()
print(list)

# define the column names
col_names = [('CreditScore', 0), ('Income', 0), ('LoanAmount', 0),('LoanApproval',0),('CreditScore', 1), ('Income', 1), ('LoanAmount', 1),('LoanApproval',1)]

final_df = pd.DataFrame(list, columns=col_names)

print(final_df)

# fit the data into model, Currently only Maximum Likelihood Estimator is supported.
loan_dbn_model.fit(data=final_df, estimator="MLE")
loan_dbn_model.initialize_initial_state()

Expected behaviour

Making the inference by using DBNInference which I have fitted the model with the above mentioned data , example evidence {("CreditScore", 0): 672}. Expected some results but getting the error. below the code snippet
`
dbn_inference = DBNInference(loan_dbn_model)
results = dbn_inference.query(variables=[("LoanApproval", 0)], evidence={("CreditScore", 0): 672})

`

Actual behaviour

Getting the error below error while making the inference. But while am making the inference with evidence={("CreditScore", 0): 0}) its working, We know that internally scaling down the data and fitting the model. But while make the inference need to use a real data which we have fitted the model.
results = dbn_inference.query(variables=[("LoanApproval", 0)], evidence={("CreditScore", 0): 672})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\pgmpy\inference\dbn_inference.py", line 475, in query
return self.backward_inference(variables, evidence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\pgmpy\inference\dbn_inference.py", line 385, in backward_inference
potential_dict = self.forward_inference(variables, evidence, "potential")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\pgmpy\inference\dbn_inference.py", line 281, in forward_inference
initial_factor = self._get_factor(start_bp, evidence_0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\pgmpy\inference\dbn_inference.py", line 204, in get_factor
final_factor.reduce([(var, evidence[var])])
File "C:\Python311\Lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py", line 570, in reduce
phi.values = phi.values[tuple(slice
)]
~~~~~~~~~~^^^^^^^^^^^^^^^
IndexError: index 672 is out of bounds for axis 0 with size 81

@ankurankan
Copy link
Member

@Rajaram1604 Could you please also add how the load_dbn_model model is defined in the code so that I can reproduce the issue?

@Rajaram1604
Copy link
Author

Rajaram1604 commented Mar 18, 2024

Defining the DBN Model.

loan_dbn_model = DBN();

Add Edges

loan_dbn_model.add_edges_from([(('Income', 0), ('LoanAmount', 0)),
                               (('CreditScore', 0), ('LoanAmount', 0)),
                               (('LoanAmount', 0), ('LoanApproval', 0)),

                               (('Income', 1), ('LoanAmount', 1)),
                               (('CreditScore', 1), ('LoanAmount', 1)),
                               (('LoanAmount', 1), ('LoanApproval', 1)),

                               (('Income', 0), ('Income', 1)),
                               (('CreditScore', 0), ('CreditScore', 1)),
                               (('LoanAmount', 0), ('LoanAmount', 1)),
                               (('LoanApproval', 0), ('LoanApproval', 1))
                               ])

making inference

dbn_inference = DBNInference(loan_dbn_model)

evidence = {("CreditScore", 0): 672}
results = dbn_inference.query(variables=[("LoanApproval", 0)], evidence=evidence)

@ankurankan
Copy link
Member

ankurankan commented Mar 19, 2024

@Rajaram1604 In the model, the variable CreditScore has only 78 (named 0-77) states.

In [22]: print(loan_dbn_model.get_cpds(('CreditScore', 0)).state_names)
{<DynamicNode(CreditScore, 0) at 0x7a572e539450>: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77]}

And because the state that is specified in the evidence ({("CreditScore," 0): 672}) does not exist, inference is throwing an error.

@Rajaram1604
Copy link
Author

Rajaram1604 commented Mar 19, 2024

@ankurankan, Yes got it. Actually we are generating random numbers between 500 to 800 for the credit score variable by using numpy and training the model by scale down the values(Algorithms might be scale down the state values). I mean state was internally converted into small values between 1 and 77 by algorithm to reduce the consumption memory and etc..

But while make the inference for the particular evidence, we can not scale down the evidence right. The DynamicInference should scale down the particular evidence which should used for make the inference but we end up with the key error.

In this case, Could you please suggest me how we can make the inference for the particular evidence.

Note: In Bayesian Network model also, during training phase states are scale down but while make the inference with real evidence like {'creditScore':672} which is working fine, I think may be evidence also scale down the value and making inference perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants