[1]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Synthetic Log Generation from DECLARE Models

DECLARE4Py implements the generation of synthetic logs from DECLARE models with a solution based on Answer Set Programming that uses a Clingo solver. More details can be found in the paper of Chiariello, F., Maggi, F. M., & Patrizi, F. (2022, June). ASP-Based Declarative Process Mining. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 5, pp. 5539-5547).

As first step, it is necessary to import a .decl file containing the DECLARE constraints.

[2]:
import os
from Declare4Py.ProcessModels.DeclareModel import DeclareModel
from Declare4Py.ProcessMiningTasks.LogGenerator.ASP.ASPLogGenerator import AspGenerator

model_name = 'data-model1'
model: DeclareModel = DeclareModel().parse_from_file(os.path.join("../../../", "tests", "test_models", f"{model_name}.decl"))

Then, some general settings are needed to set the number of cases to generate and the minimum and maximum number of events for each case

[3]:
%%time
# Number of cases that have be generated
num_of_cases = 10

# Minimum and maximum number of events a case can contain
(num_min_events, num_max_events) = (8,15)

# Shows some feedback from the Generator (Set it too false to ignore all debug messages)
verbose = False

asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events, verbose=verbose)
asp_gen.run()
CPU times: total: 21.3 s
Wall time: 7.27 s

The class AspGenerator has to be instantiated with the DECLARE model and the settings of above. Then, the run method will generate the cases and the to_xes method will save them in a .xes event log or the to_csv method will save them in a .csv file.

[4]:
asp_gen.to_xes(f"{model_name}.xes")
asp_gen.to_csv(f"{model_name}.csv")

Logs can be generated with some purposes according to the needs of Process Mining algorithms. DECLARE4Py implements four useful purposes that can be set with the following methods of the AspGenerator class.

1. Setting up the Length Distribution of the Cases

Users can specify a probability distribution over the lengths of the generated traces. The method set_distribution_type takes as parameter the distribution_type. By setting this parameter with the uniform value, a uniform distribution in [num_min_events, num_max_events] is chosen.

Also, the length of the positive traces can be changed with the method set_positive_traces

[5]:
%%time
# Default is uniform
asp_gen.set_distribution_type("uniform")

# Before was 10, lets double that
asp_gen.set_positive_traces(num_of_cases * 2)

asp_gen.run()
asp_gen.to_csv(f'{model_name}_Distribution_Test_1.csv')
CPU times: total: 29.9 s
Wall time: 9.85 s

A gaussian distribution requires a location (the mean) and a scale (the variance)

[6]:
%%time
asp_gen.change_distribution_settings(min_num_events_or_mu=15.5, max_num_events_or_sigma=3.2, dist_type="gaussian")
asp_gen.run()
asp_gen.to_csv(f'{model_name}_Distribution_Test_2.csv')
CPU times: total: 55.3 s
Wall time: 17.2 s

A custom distribution requires the user to set the probability for each length in [num_min_events, num_max_events]

[7]:
%%time
asp_gen.set_distribution_type("custom")
asp_gen.set_custom_probabilities([0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.3])

asp_gen.run()
asp_gen.to_csv(f'{model_name}_Distribution_Test_3.csv')
CPU times: total: 19.5 s
Wall time: 7.05 s

2. Setting up the Personalized Clingo configuration

More information

For more information on clingo and its functionalities consult: https://potassco.org/

For more information on the option commands consult the documentation of Clingo (Potassco) at: https://github.com/potassco/guide/releases/ or https://github.com/potassco/asprin/blob/master/asprin/src/main/clingo_help.py

Or download directly the documentation from here: https://github.com/potassco/guide/releases/download/v2.2.0/guide.pdf

Setting up the configuration

Clingo offers various option to personalize the solver range of action, probabilistic reasoning and decision-making

At the moment the solver can be personalized using the following method use_custom_clingo_configuration with the following options:

  • The Configuration of clingo can be: “frumpy”, “tweety”, “crafty”, “jumpy”, “trendy” or “handy”. (Default is trendy)

  • The amount of Threads used by clingo to speed up the process. (Default uses al possible cores)

  • The Random Frequency used by clingo in the decision-making is a float number between 0 and 1 included. Where 0 means: No random decisions and 1 means: Every decision is random. (Default is 0.3)

  • The Mode configures the optimization of the algorithm and can be either “optN” or “ignore”. (Default is optN)

  • The Sign of the operation which can be “asp”, “pos” “neg”, “rnd”. (Default is asp)

  • The Strategy configures the optimization of the strategy and can be “bb” or “usc”. (This functionality is not used in the default configuration)

  • The Heuristic used by clingo configures the decision heuristic and can be “Berkmin”, “Vmtf”, “Vsids”, “Domain”, “Unit” or “None”. (This functionality is not used in the default configuration)

[8]:
%%time

asp_gen.use_default_clingo_configuration()
# The default configuration can be obtained using the following command
print(asp_gen.get_current_clingo_configuration())

# To enable the custom configuration:
asp_gen.use_custom_clingo_configuration(config="jumpy", threads=None, frequency=1, sign_def="rnd", strategy="bb", heuristic="Vsids")

# The current configuration then becomes the custom one
print(asp_gen.get_current_clingo_configuration())

# this command tells the generator to use the default configuration again
# asp_gen.use_default_clingo_configuration()
# It does not delete the old custom configuration, in fact the custom configuration can be re-enabled by calling
# asp_gen.use_custom_clingo_configuration()

asp_gen.run()
asp_gen.to_csv(f'{model_name}_Custom_Configuration_Test_1.csv')
{'CONFIG': 'trendy', 'THREADS': '16', 'FREQUENCY': '0.3', 'SIGN-DEF': 'asp', 'MODE': 'optN', 'STRATEGY': None, 'HEURISTIC': None}
{'CONFIG': 'jumpy', 'THREADS': '16', 'FREQUENCY': '1', 'SIGN-DEF': 'rnd', 'MODE': 'optN', 'STRATEGY': 'bb', 'HEURISTIC': 'Vsids'}
CPU times: total: 23.6 s
Wall time: 7.55 s

3. Setting up the Variants

Users can generate variants by setting the number of repetitions of the workflow of each case. This is done with the set_number_of_repetition_per_trace method

[9]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events, verbose=verbose)

asp_gen.set_number_of_repetition_per_trace(2)

asp_gen.run()
asp_gen.to_csv(f'{model_name}_Variants_Test_1.csv')

Setting up Positive and Negative Traces

Users can specify some constraints to be violated in the synthetic cases to obtain labelled logs for binary classification, e.g., for deviance mining algorithms. The method set_constraints_to_violate takes as input:

  1. tot_negative_trace: the number of negative cases to be violated;

  2. violate_all: whether to violate all the specified constraints or let Clingo decide which constraints to be violated;

  3. constraints_list: the list containing the subset of DECLARE constraints (specified as strings of text) to be violated.

[10]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

asp_gen.set_constraints_to_violate(tot_negative_trace=10, violate_all=True, constraints_list=[
    "Init[ER Registration] | |",
    "Chain Response[ER Registration, ER Triage] |A.org:group is J |T.org:group is A |"])
asp_gen.run()
asp_gen.to_csv(f'{model_name}.csv')

In addition, instead of giving the explicit text of the DECLARE constraint, an index can be used in the set_constraints_to_violate_by_template_index method

[11]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

for id, constr_text in enumerate(model.serialized_constraints):
    print(f"{id} - {constr_text}")

asp_gen.set_constraints_to_violate_by_template_index(tot_negative_trace=10, violate_all=True,
                                                 constraints_idx_list=[0, 3])
asp_gen.run()
asp_gen.to_csv(f'{model_name}.csv')
0 - Existence2[Admission NC] | |
1 - Chain Response[Admission NC, Release B] |A.org:group is K |T.org:group is E |
2 - Chain Response[Admission NC, Release A] |A.org:group is I |T.org:group is E |133020,957701,s
3 - Chain Precedence[IV Liquid, Admission NC] |A.org:group is I |T.org:group is A |92,14473,s
4 - Chain Response[ER Registration, ER Triage] |(A.DiagnosticArtAstrup is false) AND (A.SIRSCritHeartRate is true) AND (A.org:group is A) AND (A.DiagnosticBlood is true) AND (A.DisfuncOrg is false) AND (A.DiagnosticECG is true) AND (A.Age >= 45) AND (A.InfectionSuspected is true) AND (A.DiagnosticLacticAcid is true) AND (A.DiagnosticSputum is true) AND (A.Hypoxie is false) AND (A.DiagnosticUrinaryCulture is true) AND (A.DiagnosticLiquor is false) AND (A.SIRSCritTemperature is true) AND (A.Infusion is true) AND (A.Hypotensie is false) AND (A.DiagnosticUrinarySediment is true) AND (A.Oligurie is false) AND (A.Age <= 80) AND (A.SIRSCritTachypnea is true) AND (A.DiagnosticOther is false) AND (A.SIRSCritLeucos is false) AND (A.DiagnosticIC is true) AND (A.SIRSCriteria2OrMore is true) AND (A.DiagnosticXthorax is true) |T.org:group is C |52,2154,s
5 - Chain Precedence[Release A, Return ER] |A.org:group is ? |T.org:group is E |1121801,1121801,s
6 - Chain Precedence[ER Sepsis Triage, IV Antibiotics] |A.org:group is L |T.org:group is L |15,11000,s
7 - Chain Response[ER Sepsis Triage, IV Antibiotics] |A.org:group is L |T.org:group is L |15,11000,s
8 - Chain Precedence[Admission IC, Admission NC] |A.org:group is J |T.org:group is J |
9 - Chain Precedence[IV Antibiotics, Admission NC] |A.org:group is F |T.org:group is A |92,14459,s
10 - Chain Precedence[Admission NC, Release B] |A.org:group is E |T.org:group is K |48225,48225,s
11 - Chain Response[Admission IC, Admission NC] |A.org:group is J |T.org:group is J |61534,61534,s
12 - Chain Response[LacticAcid, Leucocytes] |A.LacticAcid <= 0.8 |T.Leucocytes >= 13.8 |0,2778,m
13 - Chain Precedence[ER Registration, ER Triage] |A.org:group is C |(T.InfectionSuspected is true) AND (T.SIRSCritTemperature is true) AND (T.DiagnosticLacticAcid is true) AND (T.DiagnosticBlood is true) AND (T.DiagnosticIC is true) AND (T.SIRSCriteria2OrMore is true) AND (T.DiagnosticECG is true) |52,2154,s

Setting up Rules for the Activation Conditions

Users can specify the number of activations of a DECLARE constraint in the synthetic cases. This can be done with the set_activation_conditions method by specifying an interval of activations for specific DECLARE constraints in the loaded model

[12]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

activation_constraint = model.serialized_constraints[3]
asp_gen.set_activation_conditions({
    activation_constraint: [2, 3]
}) # activation should occur between 2 to 3 times

asp_gen.run()
asp_gen.to_csv(f'{model_name}_Activation_Conditions_Test_1.csv')

In addition, instead of giving the explicit text of the DECLARE constraints, an index can be used in the set_activation_conditions_by_template_index method

[13]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

for id, constr_text in enumerate(model.serialized_constraints):
    print(f"{id} - {constr_text}")

asp_gen.set_activation_conditions_by_template_index({3: [2, 3]})
asp_gen.run()
asp_gen.to_csv(f'{model_name}.csv')
0 - Existence2[Admission NC] | |
1 - Chain Response[Admission NC, Release B] |A.org:group is K |T.org:group is E |
2 - Chain Response[Admission NC, Release A] |A.org:group is I |T.org:group is E |133020,957701,s
3 - Chain Precedence[IV Liquid, Admission NC] |A.org:group is I |T.org:group is A |92,14473,s
4 - Chain Response[ER Registration, ER Triage] |(A.DiagnosticArtAstrup is false) AND (A.SIRSCritHeartRate is true) AND (A.org:group is A) AND (A.DiagnosticBlood is true) AND (A.DisfuncOrg is false) AND (A.DiagnosticECG is true) AND (A.Age >= 45) AND (A.InfectionSuspected is true) AND (A.DiagnosticLacticAcid is true) AND (A.DiagnosticSputum is true) AND (A.Hypoxie is false) AND (A.DiagnosticUrinaryCulture is true) AND (A.DiagnosticLiquor is false) AND (A.SIRSCritTemperature is true) AND (A.Infusion is true) AND (A.Hypotensie is false) AND (A.DiagnosticUrinarySediment is true) AND (A.Oligurie is false) AND (A.Age <= 80) AND (A.SIRSCritTachypnea is true) AND (A.DiagnosticOther is false) AND (A.SIRSCritLeucos is false) AND (A.DiagnosticIC is true) AND (A.SIRSCriteria2OrMore is true) AND (A.DiagnosticXthorax is true) |T.org:group is C |52,2154,s
5 - Chain Precedence[Release A, Return ER] |A.org:group is ? |T.org:group is E |1121801,1121801,s
6 - Chain Precedence[ER Sepsis Triage, IV Antibiotics] |A.org:group is L |T.org:group is L |15,11000,s
7 - Chain Response[ER Sepsis Triage, IV Antibiotics] |A.org:group is L |T.org:group is L |15,11000,s
8 - Chain Precedence[Admission IC, Admission NC] |A.org:group is J |T.org:group is J |
9 - Chain Precedence[IV Antibiotics, Admission NC] |A.org:group is F |T.org:group is A |92,14459,s
10 - Chain Precedence[Admission NC, Release B] |A.org:group is E |T.org:group is K |48225,48225,s
11 - Chain Response[Admission IC, Admission NC] |A.org:group is J |T.org:group is J |61534,61534,s
12 - Chain Response[LacticAcid, Leucocytes] |A.LacticAcid <= 0.8 |T.Leucocytes >= 13.8 |0,2778,m
13 - Chain Precedence[ER Registration, ER Triage] |A.org:group is C |(T.InfectionSuspected is true) AND (T.SIRSCritTemperature is true) AND (T.DiagnosticLacticAcid is true) AND (T.DiagnosticBlood is true) AND (T.DiagnosticIC is true) AND (T.SIRSCriteria2OrMore is true) AND (T.DiagnosticECG is true) |52,2154,s