A Cyber-physical System (CPS) is a system which has components deployed both in the physical world (e.g., industrial machines, smart buildings), and in computing environments (e.g., data centers, cloud infrastructures). For example, a smart factory could be considered as a CPS having components: (i) inside assembly robots, (ii) inside sensor gateways deployed in the factory to collect environmental conditions, and (iii) deployed in a private data-center to analyze data collected from robots and sensor gateways. An Elastic Cyber-physical Systems (eCPS) can further add/remove components at run-time, from computing resources to physical devices. Elasticity enables eCPSs to align their costs, quality, and resource usage to load and owner requirements.
The owner of a smart factory builds an elastic cyber-physical system (eCPS) for analysis of streaming data coming from the factory's industrial robots and environmental sensors. The system can scale to adapt to changes in load or factory requirements by adding and removing both physical and cyber components. Factory sensors robots send data to physical devices called Sensor Gateways. The gateways perform local data processing and sends the data through a HAProxy HTTP Load Balancer to Streaming Analytics services hosted in virtual machines in a Private Cloud. The Streaming Analytics service is deployed as a software artifact in a Tomcat web server. Selected analytics results are published to interested parties through a third-party Messaging Service offered as it is by a Public Cloud provider.
The smart factory owner wants to ensure that the system is healthy and operates within specified parameters, especially after scaling actions which add/remove components. I.e., the system is correctly configured, its components are deployed and running, and provides expected performance.
We introduce a platform for run-time health verification of elastic cyber-physical systems (eCPSs), providing functionality for:
We need a model for capturing the deployment stack and dependencies of system components. As our goal is run-time verification of real eCPSs, the model must capture the state of the run-time infrastructure. The model must also be applicable to heterogeneous eCPSs, and easy to extend with additional types of components depending on particular systems. To this end we introduce an abstract model for representing eCPS components and their run-time instances. Our model targets only the infrastructure of eCPSs and is designed with simplicity and generality in mind. These properties allow the model to be applied to a wide range of systems without requiring a large amount of domain-specific knowledge.
We first capture Physical Machine, Physical Device, and Virtual Machine (VM) components, crucial in describing systems which run both in the cloud and in the physical world. We capture Virtual Container components to describe and verify virtualization containers such as Docker. Increasing the verification detail, we capture OS Process, and Service components. Capturing components from different stack levels enables hierarchical testing, in which we can verify the lower level (e.g., VM), and if that succeeds, verify the higher levels (e.g., OS Process running inside a VM). Additional component types can be defined by extending the Type enumeration.
system Component can have one or more Component Instances according to the system's run-time structure. E.g., multiple instances of the Streaming Analytics component . A component instance can be hostedOn another component, e.g., an OS Process running inside a Virtual Machine. The reverse relationship of hostedOn is hosts, enabling model navigation in the opposite direction. Instances can also communicate with other instances, captured with a connectsTo relationships. Further, components can be combined to achieve functionality. We use the term Composite Component to describe combinations of multiple system components working towards the same functionality goal. For example, the Streaming Analytics component using a VM hosting a Web Server hosting in turn a software Service.
To verify a system, its static structure description is submitted to our platform as JSON. The system is described as a recursive composition of components according to the model introduced. Each component has a name, type, and potential containedComponents. A component can also be hosted on another component, indicated by hostedOn property.
{ "type": "Composite", "name": "SportsAnalytics", "containedUnits": [ { "type": "Composite", "name": "DataCapture", "containedUnits": [ { "type": "Gateway", "name": "Gateway.DataCapture" }, { "hostedOn": "Gateway.DataCapture", "type": "Process", "name": "Process.DataCapture" } ] }, { "type": "Composite", "name": "LoadBalancer", "containedUnits": [ { "type": "VirtualMachine", "name": "VM.LoadBalancer" }, { "hostedOn": "VM.LoadBalancer", "type": "Process", "name": "Process.HAProxy" } ] }, { "type": "Composite", "name": "StreamingAnalytics", "containedUnits": [ { "type": "VirtualMachine", "name": "VM.StreamingAnalytics" }, { "hostedOn": "VM.StreamingAnalytics", "type": "Process", "name": "Process.Tomcat" }, { "hostedOn": "Process.Tomcat", "type": "Service", "name": "Service.StreamingAnalytics" } ] }, { "type": "Composite", "name": "MessagingService", "containedUnits": [ { "type": "Service", "name": "Service.MessagingService" } ] } ] }
#Description #name: "TestName" #description: "human readable description" #timeout: 10 #Triggers #every: 30 s #event: "E1" , "E2" on UnitType.VirtualMachine #event: "E1FFF" , "E2" on UnitType.Process #Execution #executor: UnitType.VirtualMachine for UnitType.VirtualMachine, UnitType.VirtualContainer, UnitType.Process #executor: UnitType.VirtualContainer for UnitType.Process #executor: UnitType.SoftwareContainer for UnitType.SoftwareContainer #executor: UnitType.SoftwareContainer for UnitID."A-Za-z0-9_", UnitID."Process.ProcessNAME", UnitUUID."A-Za-z0-9_." #executor: UnitID."A-Za-z0-9_" for UnitID."Process.ProcessNAME", UnitUUID."A-Za-z0-9_." #supported types are Service | Process | SoftwarePlatform | PhysicalDevice | SoftwareContainer | VirtualContainer | Gateway | VirtualMachine | PhysicalMachine
After the user determines When to verify each health indicator, and defines one or more verification descriptions for each indicator using our domain specific language. The strategy for verifying if the VM component is healthy is depicted in left. As the Streaming Analytics is elastic, network accessibility should be verified when a new VM is created. A test Trigger entry is added (Line 5) for the event: "Added" for ID."VM.StreamingAnalytics" representing the Streaming Analytics VMs, detected by our verification platform. VMs can also fail at run-time due to various factors. Network accessibility should be also verified periodically during the system's run-time. To this end a every: 30 s periodic test trigger is defined in the strategy (Line 6). The executor of the test must also be specified. VM network accessibility should be verified from outside the VM. Thus, a distinct executor is requested (Line 9), having the type VirtualMachine. Finally, a timeout specifies how long to wait for the test result before considering that it has failed (Line 2). This is useful if something happened to the test executor component, e.g., it has also failed.
Description timeout: 30 Triggers every: 30 s event: "Added" on ID."VM.StreamingAnalytics" Execution executor: distinct Type.VirtualMachine for Type.VirtualMachine
We write one verification strategy for each verification test, structured in three parts: (i) test properties Description, (ii) specification of test execution Triggers , and (iii) test Execution information. The test properties can be defined specifying for each test a name, a human-readable description, and optional timeout. The name is used to identify the test. The timeout is used to mark as failed tests which do not return results in the specified interval of time.
We use triggers to specify when a particular test should be executed. A trigger can be an event, or a periodic timer.
e support both direct and indirect tests, as detailed in the next section. Thus, in the last strategy section we specify which component will execute the test. One or more executor specifications can be defined, describing which specific executor to execute the test for which specific component identifier. A distinct keyword states that the test executor must be other than the test target, useful in executing indirect tests from components with the same identifier (e.g., pinging a VM from another VM).
#test implemented as standalone python code # all imports must be local os = __import__('os') #contextualized "targetID" variable #executing custom OS command response = os.system("ping -c 1" + targetID) #construct result if response == 0: #if ping fails response is 256 success = 100 else: success = 0 #TestResult type provided by our verification platform return TestResult(success, response)
The user must further decide How each health indicator can be verified depending system capabilities. The VM network accessibility indicator can be verified using the ping command available in each VM operating system. Using our platform, the test is defined as a standalone Python script depicted left. The script can use contextualized variables injected at test execution by our platform, such as targetID, which for VMs is their IP (Line 6). It is the responsibility of the test designer to use domain-specific knowledge in implementing the test logic and deciding when a test is successful or not (Lines 8-11). Each test result is be returned using the type defined by our platform (Line 13).
Description name: "CloudAMQPAlive" description: "Check if CloudAMQP is accessible" timeout: 30 Triggers every: 30 s Execution executor: UnitType.VirtualMachine for UnitID."Service.MessagingService"
os = __import__('os') base64 = __import__('base64') httplib = __import__('httplib') url = "/api/overview" instanceIP = targetID auth = base64.encodestring('%s:%s' % (username, password)).replace('\n', '') webservice = httplib.HTTPS(instanceIP) webservice.putrequest("GET", url) webservice.putheader("User-Agent", "Python http auth") webservice.putheader("Content-type", "text/html; charset=\"UTF-8\"") webservice.putheader("Authorization", "Basic %s" % auth) webservice.endheaders() res = webservice.getfile().read() successful = "OK" in statusmessage details = "/api/overview returned " + str(statusmessage) meta = {} meta["type"]="Checks if RabitMQ API responds to get" return TestResult(successful=successful, details=details, meta=meta)
In the following we discuss how our approach can be used to verify black-box components which do not allow installation of test executors. We focus on the Messaging Service component using CloudAMQP, which provides a standalone RabbitMQ accessible through an API over Internet.
The system owner answers the What? and When? to verify that the component is alive, i.e., its provider has not encountered failures. How? to verify is answered by checking if the RabbitMQ API "/api/overview" is online and accessible. Then, a developer implements the verification test as a Python sequence of code issuing a HTTP GET with his CloudAMQP credentials to the service's API. The system owner or developer further defines a verification strategy to execute the test every 30 seconds from any running VM, describing the test executor as executor: Type.VirtualMachine for ID."MessagingService". Finally, the developer can send an alive message to our platform notifying that the component is running and should be tested.
More results and instructions on how to download, install, and use the platform to follow soon.
This is part of our work in the U-Test EU project.Pls. contact Hong-Linh Truong truong@dsg.tuwien.ac.at for further information about our work