功能需求 和非功能需求 有没有很明显的界限?

发布时间:
2024-12-22 19:45
阅读量:
5

Architecture 101: Top 10 Non-Functional Requirements (NFRs) you Should be Aware of
架构 101:您应该了解的十大非功能性需求 (NFR)

Non-functional requirements play an integral role in designing distributed, cloud-native architectures. Teams should know how the new features or modifications to an existing feature would impact the NFRs. This enables teams to deliver more robust products. Let’s delve into the topic by understanding what a Non-functional requirement (NFR) is.
非功能性需求在设计分布式云原生架构中发挥着不可或缺的作用。团队应该知道新功能或现有功能的修改将如何影响 NFR。这使团队能够交付更强大的产品。让我们通过了解什么是非功能性需求 (NFR) 来深入研究该主题。

What is a Non-functional Requirement?
什么是非功能性需求?

A non-functional requirement (NFR) is a specification that describes the system’s operation capabilities, constraints, and how it should operate, rather than what the system should do. These requirements focus on the quality attributes of the system, such as performance, security, usability, reliability, and maintainability, rather than specific behaviors or functions.
非功能性需求(NFR)是描述系统的运行能力、约束以及应该如何运行的规范,而不是系统应该做什么。这些需求侧重于系统的质量属性,例如性能、安全性、可用性、可靠性和可维护性,而不是特定的行为或功能。

To be precise, Nonfunctional Requirements (NFRs) are intended to specify ‘system qualities,’ whereas functional requirements are intended to specify system capabilities, features, and how it responds to specified inputs.
准确地说,非功能性需求(NFR)旨在指定“系统质量”,而功能性需求旨在指定系统功能、特性以及它如何响应指定的输入

Examples of NFRs NFR 示例

  • The customer search should respond within 500 milliseconds.
    客户搜索应在 500 毫秒内响应。
  • The system must be able to handle 10,000 transactions per minute.
    系统必须能够每分钟处理 10,000 笔交易。
  • The system must achieve 99.95% uptime.
    系统必须实现 99.95% 的正常运行时间。
  • The system must be able to continue operation without data loss in the event of a single server failure.
    当单个服务器发生故障时,系统必须能够继续运行而不丢失数据。

Top 10 Non-Functional Requirements (NFRs)
十大非功能性需求 (NFR)

  • Security 安全
  • Regulatory and Compliance
    监管与合规
  • Scalability 可扩展性
  • High Availability 高可用性
  • Reliability 可靠性
  • Performance 表现
  • Observability 可观测性
  • User Activity Tracking 用户活动跟踪
  • Auditing 审计
  • Usability 可用性

Security 安全

Security is critical in defining the qualities and constraints that ensure a system’s robustness against threats and vulnerabilities. Security is a broader aspect and covers various parts of the entire ecosystem.
安全性对于定义确保系统针对威胁和漏洞的稳健性的质量和约束至关重要。安全是一个更广泛的方面,涵盖整个生态系统的各个部分。

Key Focused Areas 重点关注领域

  • Identify and Access Management — Implementing strong Authentication and Authorization.
    识别和访问管理- 实施强大的身份验证和授权。
  • Network Security — Network Segmentation, IP Allow/deny lists, and Secure communication across the ecosystem.
    网络安全 —网络分段、IP 允许/拒绝列表以及整个生态系统的安全通信。
  • Application Security — Secure Coding Practices, Vulnerability Management, and Secure SDLC practices including code reviews, static and dynamic analysis, and security testing.
    应用程序安全 -安全编码实践、漏洞管理和安全 SDLC 实践,包括代码审查、静态和动态分析以及安全测试。
  • Data Security — Data Classification and Handling, encryption, masking, hashing, and Data Loss Prevention methodologies.
    数据安全——数据分类和处理、加密、屏蔽、散列和数据丢失防护方法。

Key Metrics 关键指标

  • Security Incident Metrics
    安全事件指标
    - Number of security incidents along with response time.
    - 安全事件的数量以及响应时间。
    - Incident detection times along with resolution rate.
    - 事件检测时间以及解决率。
  • Access and Authentication Metrics
    访问和身份验证指标
    - Failed Login Attempts - 登录尝试失败
    - Unauthorized Access Attempts
    - 未经授权的访问尝试
    - Account Lockouts - 帐户锁定
    - MFA (Multi-factor-authentication) adoption rate
    - MFA(多重身份验证)采用率
  • Data Security Metrics 数据安全指标
    - Data Breaches - 数据泄露
    - Data Loss Incidents - 数据丢失事件
    - Data Access Violation Incidents
    - 数据访问违规事件
    - Data at rest and transit coverage
    - 静态数据和传输覆盖范围
    - Data encryption coverage
    - 数据加密覆盖范围
  • Network Security Metrics 网络安全指标
    - Firewall rule violations
    - 违反防火墙规则
    - Network Traffic Anomalies
    - 网络流量异常
    - Intrusion Detection/Prevention System (IDS/IPS) Alerts
    - 入侵检测/预防系统 (IDS/IPS) 警报
  • Vulnerability Metrics 漏洞指标
    - Number of vulnerabilities along with severity levels and resolution rate.
    - 漏洞数量以及严重级别和解决率。
    - Time to patch and patch completion rate.
    - 修补时间和修补完成率。
  • Security Operations Metrics
    安全运营指标
    - Security Monitoring Coverage
    - 安全监控覆盖范围
    - False Positive/Negative Rates
    - 误报/漏报率
    - Security Tool Utilization
    - 安全工具的使用

Regulatory and Compliance
监管与合规

Regulatory and compliance requirements ensure that the system operates within the boundaries set by laws and regulations, thereby avoiding legal penalties and building trust with stakeholders. Regulatory and compliance aspects are critical for ensuring that a system adheres to the necessary legal, regulatory, and industry standards.
监管和合规要求确保系统在法律法规规定的范围内运行,从而避免法律处罚并与利益相关者建立信任。监管和合规方面对于确保系统遵守必要的法律、监管和行业标准至关重要。

Key Focused Areas 重点关注领域

  • Legal and Regulatory Requirements — Identify and integrate specific legal and regulatory requirements applicable to the industry (e.g., GDPR, HIPAA, PCI-DSS).
    法律和监管要求 —确定并整合适用于行业的特定法律和监管要求(例如 GDPR、HIPAA、PCI-DSS)。
  • Compliance Audits: Conduct internal and external audits to ensure adherence to regulatory standards.
    合规审计:进行内部和外部审计,以确保遵守监管标准。
  • Certification and Accreditation: Obtain necessary certifications (e.g., ISO/IEC, etc.) to demonstrate compliance.
    认证和认可:获得必要的认证(例如ISO/IEC等)以证明合规性。
  • Security Policies: Develop and enforce comprehensive security policies covering all aspects of data protection and compliance.
    安全策略:制定并实施涵盖数据保护和合规性各个方面的全面安全策略。
  • Data Retention Policies: Establish clear guidelines for data retention and disposal in line with regulatory requirements.
    数据保留政策:根据监管要求制定明确的数据保留和处置指南。
  • Audit Trails: Maintain comprehensive logs of system activities, including data access and modifications.
    审计跟踪:维护系统活动的全面日志,包括数据访问和修改。
  • Continuous Monitoring: Implement systems for continuous monitoring of compliance, and security status along with reporting capabilities.
    持续监控:实施持续监控合规性、安全状态以及报告功能的系统。
  • Legal Compliance: Stay informed about changes in laws and regulations and adjust compliance strategies accordingly.
    法律合规:及时了解法律法规的变化,并相应调整合规策略。

Key Metrics 关键指标

  • Compliance Metrics 合规指标
    - Compliance Status with relevant regulations (GDPR, HIPAA, PCI-DSS etc).
    - 相关法规(GDPR、HIPAA、PCI-DSS 等)的合规状态。
    - Number and severity of internal/external Audit Findings
    - 内部/外部审计结果的数量和严重性
    - Number of Policy Violations instances
    - 违反政策的次数
    - Regulatory and Compliance Training and Effectiveness Metrics.
    - 监管和合规培训以及有效性指标。
  • Risk Management Metrics 风险管理指标
    - Risk Assessment Frequency.
    - 风险评估频率。
    - Identified Risks during assessments.
    - 评估期间已识别的风险。
    - Risk Mitigation metrics.
    - 风险缓解指标。
  • Data Protection Metrics 数据保护指标
    - Number of Data Subject Access Requests (DSARs) received and processed.
    - 收到和处理的数据主体访问请求 (DSAR) 的数量。
    - Access Control Effectiveness
    - 访问控制有效性
  • Audit and Monitoring Metrics
    审计和监控指标
    - Audit Trail Completeness
    - 审计追踪完整性
    - Percentage of IT assets under continuous security monitoring.
    - 接受持续安全监控的 IT 资产百分比。
    - Number of alerts generated by Security Information and Event Management (SIEM) systems.
    - 安全信息和事件管理 (SIEM) 系统生成的警报数量。
    - Rates of false positive and false negative security alerts.
    - 误报和误报安全警报的比率。
  • Legal and Ethical Compliance Metrics
    法律和道德合规指标
    - Regulatory updates tracking and Implementation
    - 监管更新跟踪和实施
    - Number of legal and non-compliance incidents
    - 法律和违规事件的数量
    - Number of ethical violations incidents
    - 道德违规事件数量

Scalability 可扩展性

Scalability refers to a system’s ability to handle increasing workloads or to expand in capacity without compromising performance or efficiency. It involves the capability to grow and manage more demands, such as higher user loads or larger data volumes, through horizontal or vertical scaling. Scalability is crucial for ensuring that a system remains responsive and effective as it grows, supporting business expansion and changing demands.
可扩展性是指系统在不影响性能或效率的情况下处理不断增加的工作负载或扩展容量的能力。它涉及通过水平或垂直扩展来增长和管理更多需求的能力,例如更高的用户负载或更大的数据量。可扩展性对于确保系统在增长时保持响应能力和有效性、支持业务扩展和不断变化的需求至关重要。

Horizontal Scaling: Adding more machines or resources to distribute the load (e.g., adding more servers).
水平扩展:添加更多机器或资源来分配负载(例如,添加更多服务器)。

Vertical Scaling: increasing the capacity of existing resources (e.g., adding more CPU or memory to a server).
垂直扩展:增加现有资源的容量(例如,向服务器添加更多CPU或内存)。

Key Focused Areas 重点关注领域

  • Performance Optimization: Ensuring the system's efficiency as load increases by optimizing code, queries, and algorithms.
    性能优化:通过优化代码、查询和算法,确保系统在负载增加时的效率。
  • Load Balancing: Distributing traffic across multiple servers to prevent overload and ensure high availability.
    负载均衡:在多个服务器之间分配流量以防止过载并确保高可用性。
  • Caching: Using caching mechanisms to reduce the load on downstream systems (databases, APIs) and improve response times.
    缓存:使用缓存机制来减少下游系统(数据库、API)的负载并提高响应时间。
  • Monitoring and Analytics: Continuously monitor system performance and load to identify bottlenecks and optimize resources.
    监控和分析:持续监控系统性能和负载,以识别瓶颈并优化资源。
  • Network Scalability: Ensuring network capacity and architecture can support increased traffic and data flow.
    网络可扩展性:确保网络容量和架构能够支持增加的流量和数据流。
  • Infrastructure as Code: Automating infrastructure provisioning to support dynamic scaling.
    基础设施即代码:自动化基础设施配置以支持动态扩展。

Key Metrics 关键指标

  • Response Time: Measures how quickly the system responds to requests.
    响应时间:衡量系统响应请求的速度。
  • Throughput: Number of transactions or requests processed per unit of time.
    吞吐量:每单位时间处理的事务或请求的数量。
  • Resource Utilization: CPU, memory, and disk usage across servers.
    资源利用率:跨服务器的 CPU、内存和磁盘使用情况。
  • Latency: Time taken for data to travel across the network.
    延迟:数据通过网络传输所花费的时间。
  • Error Rate: Frequency of errors or failed requests.
    错误率:错误或失败请求的频率。
  • Database Performance: Query execution time and connection pooling statistics.
    数据库性能:查询执行时间和连接池统计信息。
  • Traffic Load: Number of concurrent users or sessions.
    流量负载:并发用户或会话的数量。
  • Scaling Events: Frequency and success rate of scaling operations.
    扩展事件:扩展操作的频率和成功率。
  • Cost Efficiency: Cost per transaction or request as the system scales.
    成本效率:随着系统扩展,每笔交易或请求的成本。
  • System Availability: Uptime and reliability during peak loads.
    系统可用性:峰值负载期间的正常运行时间和可靠性。

Tracking these metrics helps ensure that a system scales effectively while maintaining performance and stability.
跟踪这些指标有助于确保系统有效扩展,同时保持性能和稳定性。

High Availability 高可用性

High Availability (HA) refers to a system’s ability to remain operational and accessible with minimal downtime, even during failures or maintenance. It ensures continuous service by using redundancy, failover mechanisms, and load balancing. HA is critical for mission-critical applications, where interruptions can have significant consequences.
高可用性 (HA) 是指系统即使在故障或维护期间也能以最短的停机时间保持运行和可访问的能力。它通过使用冗余、故障转移机制和负载平衡来确保连续服务。 HA 对于关键任务应用程序至关重要,因为中断可能会产生严重后果。

Key Focused Areas 重点关注领域

  • Redundancy: Implementing backup components and systems to ensure continuity in case of failures.
    冗余:实施备份组件和系统以确保发生故障时的连续性。
  • Failover Mechanisms: Setting up automatic failover processes to switch to standby systems during outages.
    故障转移机制:设置自动故障转移过程以在中断期间切换到备用系统。
  • Load Balancing: Distributing traffic evenly across servers to prevent overload and ensure consistent performance.
    负载平衡:在服务器之间均匀分配流量,以防止过载并确保一致的性能。
  • Monitoring and Alerts: Continuously monitoring system health and setting up alerts for quick response to issues.
    监控和警报:持续监控系统运行状况并设置警报以快速响应问题。
  • Disaster Recovery: Developing and testing disaster recovery plans to restore services after major incidents.
    灾难恢复:制定和测试灾难恢复计划,以在重大事件发生后恢复服务。
  • Data Replication: Keeping data copies synchronized across multiple locations to prevent data loss.
    数据复制:在多个位置保持数据副本同步,以防止数据丢失。
  • Network Reliability: Ensuring network infrastructure is robust and has multiple paths for data transmission.
    网络可靠性:确保网络基础设施稳健并具有多条数据传输路径。

Key Metrics 关键指标

  • Uptime Percentage: Measures the total time the system is operational.
    正常运行时间百分比:衡量系统运行的总时间。
  • Mean Time Between Failures (MTBF): Average time between system failures.
    平均故障间隔时间 (MTBF) :系统故障之间的平均时间。
  • Mean Time to Recovery (MTTR): Average time taken to recover from failures.
    平均恢复时间 (MTTR) :从故障中恢复所需的平均时间。
  • Failover Time: Time taken for systems to switch over during failures.
    故障转移时间:发生故障时系统切换所需的时间。
  • Service Level Agreement (SLA) Compliance: Adherence to uptime and performance commitments.
    服务水平协议 (SLA) 合规性:遵守正常运行时间和性能承诺。
  • Error Rate: Frequency of errors or service disruptions.
    错误率:错误或服务中断的频率。
  • System Load: Monitoring resource usage during peak and off-peak times.
    系统负载:监控高峰和非高峰时段的资源使用情况。
  • Network Latency: Time delays in network communications.
    网络延迟:网络通信的时间延迟。
  • Backup and Recovery Success Rate: Frequency of successful data backups and restorations.
    备份和恢复成功率:成功数据备份和恢复的频率。
  • Incident Frequency: Number of incidents affecting availability over a period.
    事件频率:一段时间内影响可用性的事件数量。

Reliability 可靠性

Reliability refers to the ability of a system or component to consistently perform its intended function without failure over a specified period. It involves ensuring that the system operates correctly and dependably, providing accurate results, and maintaining functionality under expected conditions. High reliability minimizes downtime and errors, contributing to user trust and system integrity.
可靠性是指系统或组件在指定时间内持续执行其预期功能而不发生故障的能力。它涉及确保系统正确可靠地运行、提供准确的结果以及在预期条件下维护功能。高可靠性可最大限度地减少停机时间和错误,有助于增强用户信任和系统完整性。

Key Focused Areas 重点关注领域

  • System Design: Building robust architectures that handle faults and errors gracefully.
    系统设计:构建稳健的架构,优雅地处理故障和错误。
  • Redundancy: Implementing backup systems and components to prevent single points of failure.
    冗余:实施备份系统和组件以防止单点故障。
  • Testing and Validation: Conduct thorough testing, including stress and load tests, to ensure system resilience.
    测试和验证:进行彻底的测试,包括压力和负载测试,以确保系统的弹性。
  • Monitoring and Maintenance: Continuously monitor system performance and conduct regular maintenance to prevent issues.
    监控和维护:持续监控系统性能并进行定期维护以防止出现问题。
  • Error Handling: Designing effective error detection and recovery mechanisms.
    错误处理:设计有效的错误检测和恢复机制。
  • Documentation: Keeping comprehensive documentation for troubleshooting and maintenance.
    文档:保存用于故障排除和维护的全面文档。
  • User Training: ensuring users are trained to handle common issues effectively.
    用户培训:确保用户接受培训以有效处理常见问题。

Key Metrics 关键指标

  • Mean Time Between Failures (MTBF): Average time between system failures.
    平均故障间隔时间 (MTBF) :系统故障之间的平均时间。
  • Mean Time to Repair (MTTR): Average time taken to repair and restore the system after a failure.
    平均修复时间 (MTTR) :发生故障后修复和恢复系统所需的平均时间。
  • Failure Rate: Frequency of system failures over a specific period.
    故障率:特定时期内系统故障的频率。
  • Uptime Percentage: The proportion of time the system is operational and available.
    正常运行时间百分比:系统运行且可用的时间比例。
  • Error Rate: Number of errors encountered during operation.
    错误率:操作期间遇到的错误数。
  • Service Level Agreement (SLA) Compliance: Adherence to agreed-upon reliability metrics.
    服务级别协议 (SLA) 合规性:遵守商定的可靠性指标。
  • System Downtime: Total time the system is unavailable.
    系统停机时间:系统不可用的总时间。
  • Incident Frequency: Number of incidents affecting system reliability.
    事件频率:影响系统可靠性的事件数量。

Performance 表现

Performance refers to how well a system, application, or component accomplishes its intended function within given constraints. It encompasses various aspects such as speed, efficiency, throughput, responsiveness, and resource utilization. In computing and technology contexts, performance is typically measured and optimized to ensure optimal user experience, operational efficiency, and scalability.
性能是指系统、应用程序或组件在给定约束下完成其预期功能的程度。它涵盖速度、效率、吞吐量、响应能力和资源利用率等各个方面。在计算和技术环境中,通常会对性能进行测量和优化,以确保最佳的用户体验、运营效率和可扩展性。

Key Focused Areas 重点关注领域

  • Response Time: optimizing the time taken to respond to user requests or actions.
    响应时间:优化响应用户请求或操作所需的时间。
  • Throughput: maximizing the number of transactions or operations processed per unit of time.
    吞吐量:最大化每单位时间处理的事务或操作的数量。
  • Scalability: ensuring the system can handle increasing loads by scaling resources horizontally or vertically.
    可扩展性:确保系统可以通过水平或垂直扩展资源来处理不断增加的负载。
  • Resource Utilization: Efficiently using CPU, memory, disk, and network resources to avoid bottlenecks.
    资源利用率:有效利用CPU、内存、磁盘和网络资源以避免瓶颈。
  • Caching: Utilizing caching mechanisms to store and retrieve frequently accessed data quickly.
    缓存:利用缓存机制快速存储和检索经常访问的数据。
  • Database Performance: Optimizing database queries, indexing, and schema design for faster data retrieval and updates.
    数据库性能:优化数据库查询、索引和模式设计,以实现更快的数据检索和更新。
  • Code Efficiency: Writing efficient algorithms and code to minimize computational overhead and improve execution speed.
    代码效率:编写高效的算法和代码,以最大限度地减少计算开销并提高执行速度。
  • Network Optimization: Reducing latency and optimizing data transmission across networks.
    网络优化:减少延迟并优化网络上的数据传输。
  • Load Balancing: Distributing incoming traffic evenly across servers to prevent overload and ensure consistent performance.
    负载平衡:在服务器之间均匀分配传入流量,以防止过载并确保一致的性能。
  • Monitoring and Tuning: Continuously monitoring system performance metrics and tuning configurations to optimize performance over time.
    监控和调优:持续监控系统性能指标并调整配置,以随着时间的推移优化性能。

Key Metrics 关键指标

  • Response Time: Measures the time taken to respond to user requests or system events. It includes:
    响应时间:测量响应用户请求或系统事件所需的时间。它包括:
    - Average Response Time: Overall average response time across all requests.
    -平均响应时间:所有请求的总体平均响应时间。
    - Percentile Response Time: 90th or 95th percentile response time to understand performance under peak loads.
    -百分位响应时间:第 90 或 95 个百分位响应时间,用于了解峰值负载下的性能。
  • Throughput: Measures the rate at which the system processes transactions or requests. Key metrics include:
    吞吐量:衡量系统处理事务或请求的速率。关键指标包括:
    - Requests per Second (RPS): Number of requests processed per second.
    -每秒请求数 (RPS) :每秒处理的请求数。
    - Transactions per Second (TPS): Number of transactions completed per second.
    -每秒事务数 (TPS) :每秒完成的事务数。
  • Error Rate: Tracks the frequency of errors or failed transactions. Metrics include:
    错误率:跟踪错误或失败事务的频率。指标包括:
    - Error Rate Percentage: Percentage of requests or transactions that result in errors.
    -错误率百分比:导致错误的请求或事务的百分比。
  • Concurrency: Measures the number of simultaneous users or connections the system can handle without degradation. Metrics include:
    并发性:衡量系统在不降级的情况下可以处理的并发用户或连接的数量。指标包括:
    - Active Users: Number of users actively interacting with the system at a given time.
    -活跃用户:在给定时间与系统积极交互的用户数量。
    - Active Connections: Number of concurrent connections to servers or databases.
    -活动连接:服务器或数据库的并发连接数。
  • Resource Utilization: Monitors the utilization of system resources (CPU, memory, disk, network). Metrics include:
    资源利用率:监控系统资源(CPU、内存、磁盘、网络)的利用率。指标包括:
    - CPU Utilization: Percentage of CPU used by the system or specific processes.
    - CPU 利用率:系统或特定进程使用的 CPU 百分比。
    - Memory Utilization: Amount of memory used by applications or services.
    -内存利用率:应用程序或服务使用的内存量。
    - Disk I/O: Input/output operations per second (IOPS) on disk storage.
    -磁盘 I/O :磁盘存储上的每秒输入/输出操作数 (IOPS)。
    - Network Bandwidth: Amount of data transferred over the network per unit of time.
    -网络带宽:每单位时间通过网络传输的数据量。
  • Latency: measures the delay between a request and its response. Metrics include:
    延迟:测量请求与其响应之间的延迟。指标包括:
    - Round-Trip Latency: Time is taken for a request to travel to the server and back to the client.
    -往返延迟:请求传输到服务器并返回到客户端所花费的时间。
  • Cache Hit Rate: Measures the effectiveness of caching mechanisms in reducing data retrieval times. Metrics include:
    缓存命中率:衡量缓存机制在减少数据检索时间方面的有效性。指标包括:
    - Cache Hit Percentage: Percentage of requests served from cache rather than fetching from the database or storage.
    -缓存命中百分比:从缓存提供服务而不是从数据库或存储获取的请求的百分比。
  • Cache Miss Rate: Measures the cache miss rate while retrieving the data. Metrics include:
    缓存缺失率:测量检索数据时的缓存缺失率。指标包括:
    - Cache Miss Percentage: Percentage of requests directly fetched from database or storage instead of serving from the cache.
    -缓存未命中百分比:直接从数据库或存储获取而不是从缓存提供服务的请求的百分比。
  • Database Performance Metrics: Tracks database-specific metrics such as:
    数据库性能指标:跟踪特定于数据库的指标,例如:
    - Query Execution Time: Time taken for database queries to execute.
    -查询执行时间:执行数据库查询所花费的时间。
    - Transaction Commit Time: Time taken to commit transactions to the database.
    -事务提交时间:将事务提交到数据库所花费的时间。
    - Database Locks: Number of locks held or contention for resources.
    -数据库锁:持有的锁或资源争用的数量。
  • Load Balancer Metrics: Monitors load balancer performance and the distribution of traffic across servers. Metrics include:
    负载均衡器指标:监控负载均衡器性能和服务器之间的流量分布。指标包括:
    - Server Load Distribution: Distribution of requests across backend servers.
    -服务器负载分布:跨后端服务器的请求分布。
    - Health Check Status: Status of servers based on health checks performed by the load balancer.
    -运行状况检查状态:基于负载均衡器执行的运行状况检查的服务器状态。
  • Application Specific Metrics: You might have your application specific metrics that for your application machine critical functionality.
    应用程序特定指标:您可能拥有针对应用程序机器关键功能的应用程序特定指标。

Observability 可观测性

Observability refers to the capability of understanding and monitoring the internal state of a system based on its external outputs or behaviors. It emphasizes the ability to gain insights into how a system operates, detects and troubleshoots problems effectively, especially in complex distributed architectures.
可观察性是指根据系统的外部输出或行为来理解和监控系统内部状态的能力。它强调深入了解系统如何有效运行、检测和解决问题的能力,尤其是在复杂的分布式架构中。

Key Focused Areas 重点关注领域

  • Instrumentation 仪器仪表
    - Logging: Designing logging mechanisms to capture relevant events, errors, and activities within the application.
    -日志记录:设计日志记录机制来捕获应用程序内的相关事件、错误和活动。
    - Metrics: Defining and collecting metrics that measure the performance, health, and usage of different components.
    -指标:定义和收集衡量不同组件的性能、运行状况和使用情况的指标。
    - Tracing: Implementing distributed tracing to monitor and visualize the flow of requests across microservices or components.
    -跟踪:实施分布式跟踪来监控和可视化跨微服务或组件的请求流。
  • Monitoring and Alerting 监控和警报
    - Monitoring Infrastructure: Setting up tools and platforms to collect, store, and analyze logs, metrics, and traces.
    - 监控基础设施:设置工具和平台来收集、存储和分析日志、指标和跟踪。
    - Alerting Rules: Establishing alerting rules based on thresholds or conditions to notify about performance degradation, errors, or anomalies.
    -警报规则:根据阈值或条件建立警报规则,以通知性能下降、错误或异常。
  • Service-Level Objectives (SLOs)
    服务级别目标 (SLO)
    - Defining SLOs: Establishing clear service-level objectives that define acceptable performance and reliability criteria.
    - 定义 SLO:建立明确的服务级别目标,定义可接受的性能和可靠性标准。
    - Monitoring SLOs: Implementing monitoring to track SLO compliance and identify areas needing improvement.
    -监控 SLO :实施监控以跟踪 SLO 合规性并确定需要改进的领域。
  • Error Handling and Recovery
    错误处理和恢复
    - Error Reporting: Ensuring comprehensive error reporting with context to facilitate troubleshooting and debugging.
    - 错误报告:确保提供带有上下文的全面错误报告,以方便故障排除和调试。
    - Fault Tolerance: Implementing mechanisms such as retries, circuit breakers, and fallbacks to handle and recover from errors gracefully.
    -容错:实施重试、断路器和回退等机制,以优雅地处理错误并从错误中恢复。
  • Performance Optimization 性能优化
    - Performance Metrics: Collecting and analyzing metrics related to response times, throughput, and resource utilization.
    - 性能指标:收集和分析与响应时间、吞吐量和资源利用率相关的指标。
    - Performance Testing: Conducting performance testing to identify bottlenecks and optimize system performance.
    -性能测试:进行性能测试以识别瓶颈并优化系统性能。
  • Deployment and Release Management
    部署和发布管理
    - Deployment Visibility: Ensuring visibility into application behavior during deployment and release phases.
    - 部署可见性:确保部署和发布阶段应用程序行为的可见性。
    - Rollback Strategies: Implementing strategies to rollback changes quickly in case of performance or reliability issues.
    -回滚策略:实施策略以在出现性能或可靠性问题时快速回滚更改。
  • Security Monitoring 安全监控
    - Auditing and Compliance: Monitoring application activities to ensure compliance with security policies and regulations.
    - 审计和合规性:监控应用程序活动以确保遵守安全策略和法规。
    - Security Incident Response: Establishing processes and tools for detecting and responding to security incidents.
    -安全事件响应:建立用于检测和响应安全事件的流程和工具。
  • User Experience Monitoring
    用户体验监控
    - Real User Monitoring (RUM): Collecting metrics on user interactions and experiences to understand application usability and performance from the user’s perspective.
    - 真实用户监控(RUM):收集有关用户交互和体验的指标,从用户的角度了解应用程序的可用性和性能。
    - Feedback Loops: Incorporating feedback mechanisms to capture user-reported issues and improve application reliability and usability.
    -反馈循环:结合反馈机制来捕获用户报告的问题并提高应用程序的可靠性和可用性。
  • Continuous Improvement 持续改进
    - Feedback and Iteration: Using observability data to iterate on application design, performance optimizations, and reliability enhancements.
    - 反馈和迭代:使用可观测性数据迭代应用程序设计、性能优化和可靠性增强。
    - Post-Mortem Analysis: Conducting post-incident reviews to learn from failures and improve system resilience.
    -事后分析:进行事件后审查,从故障中吸取教训并提高系统弹性。

Key Metrics 关键指标

  • Logging Metrics 记录指标
    - Log Volume: Amount of log data generated over time.
    - 日志量:随着时间的推移生成的日志数据量。
    - Log Levels: Distribution of logs by severity (e.g., debug, info, warning, error).
    -日志级别:按严重性分布日志(例如,调试、信息、警告、错误)。
    - Log Retention: Duration for which logs are retained and accessible.
    -日志保留:日志保留和可访问的持续时间。
  • Metrics Metrics 指标 指标
    - Metric Types: Different types of metrics captured (e.g., counters, gauges, histograms).
    - 指标类型:捕获的不同类型的指标(例如计数器、仪表、直方图)。
    - Metric Collection Rate: Frequency of metric collection.
    -指标收集率:指标收集的频率。
    - Metric Cardinality: Number of unique metric series being collected.
    -度量基数:正在收集的唯一度量系列的数量。
  • Tracing Metrics 追踪指标
    - Trace Span Duration: Duration of individual traces or spans.
    - 跟踪跨度持续时间:单个跟踪或跨度的持续时间。
    - Trace Error Rate: Percentage of traces containing errors.
    -跟踪错误率:包含错误的跟踪的百分比。
    - Distributed Context Propagation: Metrics related to how distributed context (e.g., trace IDs, span IDs) is propagated across services.
    -分布式上下文传播:与分布式上下文(例如,跟踪ID、跨度ID)如何跨服务传播相关的度量。
  • Alerting Metrics 警报指标
    - Alerting Rules: Number of defined alerting rules.
    - 警报规则:定义的警报规则的数量。
    - Alert Trigger Rate: Frequency of alerts triggered.
    -警报触发率:触发警报的频率。
    - Alert Resolution Time: Time taken to resolve alerts.
    -警报解决时间:解决警报所需的时间。
  • System Performance Metrics
    系统性能指标
    - Response Time: Average and percentile response times for requests.
    - 响应时间:请求的平均响应时间和百分位响应时间。
    - Error Rate: Percentage of requests resulting in errors.
    -错误率:导致错误的请求的百分比。
    - Resource Utilization: Metrics related to CPU, memory, disk, and network usage.
    -资源利用率:与 CPU、内存、磁盘和网络使用相关的指标。
  • Service-Level Objectives (SLOs) Metrics
    服务级别目标 (SLO) 指标
    - SLO Compliance: Percentage of time SLOs are met.
    - SLO 合规性:满足 SLO 的时间百分比。
    - SLO Violation Duration: Duration and frequency of SLO violations.
    - SLO 违规持续时间:SLO 违规的持续时间和频率。
  • User Experience Metrics:
    用户体验指标
    - Page Load Time: Time taken for web pages or applications to load.
    -页面加载时间:网页或应用程序加载所需的时间。
    - Transaction Success Rate: Percentage of successful transactions or operations.
    -交易成功率:成功交易或操作的百分比。
  • Data Pipeline Metrics 数据管道指标
    - Data Flow Rate: Rate of data ingested or processed by pipelines.
    - 数据流量:管道摄取或处理数据的速率。
    - Pipeline Latency: Time taken for data to move through pipelines.
    -管道延迟:数据通过管道移动所花费的时间。
  • Infrastructure Metrics 基础设施指标
    - Server Availability: Percentage of time servers are available.
    - 服务器可用性:服务器可用的时间百分比。
    - Network Latency: Round-trip time for network requests.
    -网络延迟:网络请求的往返时间。
  • Security Metrics 安全指标
    - Security Events: Number of security-related events or incidents.
    - 安全事件:与安全相关的事件或事件的数量。
    - Security Policy Violations: Instances of violations against security policies.
    -违反安全策略:违反安全策略的实例。

User Activity Tracking 用户活动跟踪

User activity tracking in enterprise applications involves monitoring and logging user interactions, actions, and behaviors within the application. This process is essential for various purposes, including security auditing, compliance, user behavior analysis, and system optimization.
企业应用程序中的用户活动跟踪涉及监视和记录应用程序内的用户交互、操作和行为。此过程对于各种目的至关重要,包括安全审核、合规性、用户行为分析和系统优化。

Considerations for Implementing User Activity Tracking
实施用户活动跟踪的注意事项

  • Scalability: Design tracking mechanisms that can handle large volumes of data without impacting application performance.
    可扩展性:设计可以处理大量数据而不影响应用程序性能的跟踪机制。
  • Integration: Integrate tracking functionalities seamlessly into existing application workflows and infrastructure.
    集成:将跟踪功能无缝集成到现有应用程序工作流程和基础设施中。
  • Data Retention and Purging: Define policies for data retention and purging to manage storage and compliance requirements effectively.
    数据保留和清除:定义数据保留和清除策略,以有效管理存储和合规性要求。
  • Authentication and Access Control: Ensure that only authorized personnel have access to view and manage user activity logs.
    身份验证和访问控制:确保只有授权人员有权查看和管理用户活动日志。

Key Foused Areas 重点关注领域

  • Logging Events and Actions
    记录事件和操作
    - Authentication and Authorization: Log user login/logout events, access attempts, and permission changes.
    -身份验证和授权:记录用户登录/注销事件、访问尝试和权限更改。
    - Data Access: Track data read, write, and modification operations performed by users.
    -数据访问:跟踪用户执行的数据读取、写入和修改操作。
    - Application Usage: Monitor feature usage, navigation patterns, and workflow interactions.
    -应用程序使用情况:监控功能使用情况、导航模式和工作流程交互。
  • Data Collection and Storage
    数据收集和存储
    - Data Granularity: Capture detailed information about each user action, including timestamps, IP addresses, and session identifiers.
    - 数据粒度:捕获有关每个用户操作的详细信息,包括时间戳、IP 地址和会话标识符。
    - Sensitive Data Handling: Ensure compliance with data protection regulations by anonymizing or encrypting sensitive user information in logs.
    -敏感数据处理:通过对日志中的敏感用户信息进行匿名化或加密,确保遵守数据保护法规。
  • Monitoring and Analysis 监测与分析
    - Real-time Monitoring: Implement mechanisms to monitor user activity in real-time to detect suspicious behavior or security incidents promptly.
    - 实时监控:实施实时监控用户活动的机制,以及时发现可疑行为或安全事件。
    - Analytics and Reporting: Use collected data for trend analysis, performance optimization, and user behavior insights.
    -分析和报告:使用收集的数据进行趋势分析、性能优化和用户行为洞察。
  • Compliance and Auditing 合规与审计
    - Regulatory Compliance: Ensure tracking mechanisms align with industry standards and legal requirements (e.g., GDPR, HIPAA).
    - 监管合规性:确保跟踪机制符合行业标准和法律要求(例如,GDPR、HIPAA)。
    - Audit Trails: Maintain audit trails of user activities to facilitate compliance audits and investigations.
    -审计跟踪:维护用户活动的审计跟踪,以促进合规性审计和调查。
  • Security and Incident Response
    安全和事件响应
    - Anomaly Detection: Use user activity logs to detect unusual or unauthorized activities that may indicate security breaches.
    - 异常检测:使用用户活动日志来检测可能表明存在安全漏洞的异常或未经授权的活动。
    - Incident Response: Leverage tracked data to investigate incidents, identify root causes, and mitigate risks promptly.
    -事件响应:利用跟踪数据调查事件、确定根本原因并及时降低风险。
  • User Privacy and Transparency
    用户隐私和透明度
    - Privacy Policies: Clearly communicate to users the types of data being tracked, how it is used, and their rights regarding data privacy.
    - 隐私政策:向用户清楚地传达所跟踪的数据类型、数据的使用方式以及他们在数据隐私方面的权利。
    - Opt-in/Opt-out: Provide mechanisms for users to opt-in or opt-out of certain tracking activities where applicable.
    -选择加入/选择退出:为用户提供选择加入或选择退出某些跟踪活动(如果适用)的机制。

Key Metrics 关键指标

  • Login and Authentication Metrics
    登录和身份验证指标
    - Login Success Rate: Percentage of successful user login attempts.
    - 登录成功率:用户登录尝试成功的百分比。
    - Login Failure Rate: Percentage of unsuccessful user login attempts.
    -登录失败率:用户登录尝试失败的百分比。
    - Unique Users: Number of unique users accessing the application over a period.
    -唯一用户数:一段时间内访问应用程序的唯一用户数。
  • Session Management Metrics
    会话管理指标
    - Session Duration: Average duration of user sessions.
    - 会话持续时间:用户会话的平均持续时间。
    - Active Sessions: Number of active user sessions at any given time.
    -活动会话:任何给定时间的活动用户会话数。
    - Session Timeout Rate: Percentage of sessions that expire due to inactivity.
    -会话超时率:由于不活动而过期的会话的百分比。
  • Feature Usage Metrics 功能使用指标
    - Most Used Features: Identification of the most frequently accessed application features.
    - 最常用的功能:识别最常访问的应用程序功能。
    - Feature Adoption Rate: Rate at which new features are adopted by users.
    -功能采用率:用户采用新功能的比率。
    - Feature Abandonment Rate: Percentage of users who start using a feature but do not complete the intended actions.
    -功能放弃率:开始使用某功能但未完成预期操作的用户百分比。
  • Navigation and Interaction Metrics
    导航和交互指标
    - Page Views: Number of times each page or screen within the application is viewed.
    - 页面浏览次数:应用程序中每个页面或屏幕的浏览次数。
    - Click-through Rate (CTR): Percentage of users who click on specific elements (e.g., buttons, links).
    -点击率 (CTR) :点击特定元素(例如按钮、链接)的用户百分比。
    - Path Analysis: Analysis of user navigation paths through the application.
    -路径分析:分析用户通过应用程序的导航路径。
  • Performance Metrics 绩效指标
    - Response Time: Average time taken for the application to respond to user actions.
    - 响应时间:应用程序响应用户操作所需的平均时间。
    - Latency: Time delay between user action and application response.
    -延迟:用户操作和应用程序响应之间的时间延迟。
    - Error Rates: Frequency of errors encountered during user interactions.
    -错误率:用户交互期间遇到错误的频率。
  • Conversion Metrics 转化指标
    - Conversion Rate: Percentage of users who complete desired actions (e.g., sign up, purchase).
    - 转化率:完成所需操作(例如注册、购买)的用户百分比。
    - Abandonment Rate: Percentage of users who start but do not complete conversion actions.
    -放弃率:开始但未完成转化操作的用户百分比。
  • Security and Compliance Metrics
    安全性和合规性指标
    - Access Control Violations: Instances where users attempt unauthorized access.
    - 访问控制违规:用户尝试未经授权的访问的情况。
    - Compliance Audit Logs: Logs tracking activities related to regulatory compliance requirements (e.g., data access audits).
    -合规性审核日志:记录与法规合规性要求相关的跟踪活动(例如,数据访问审核)。
  • User Engagement Metrics 用户参与度指标
    - Active Users: Number of users actively interacting with the application over a period.
    - 活跃用户:一段时间内与应用程序积极交互的用户数量。
    - Retention Rate: Percentage of users who return to the application after their initial visit.
    -保留率:初次访问后返回应用程序的用户百分比。
    - Churn Rate: Percentage of users who stop using the application over a specific timeframe.
    -流失率:在特定时间范围内停止使用应用程序的用户百分比。
  • Feedback and Sentiment Metrics
    反馈和情绪指标
    - User Feedback: Collection and analysis of user feedback and sentiment through surveys, reviews, or feedback forms.
    - 用户反馈:通过调查、评论或反馈表收集和分析用户反馈和情绪。
    - Net Promoter Score (NPS): Metric indicating user satisfaction and likelihood to recommend the application to others.
    -净推荐值 (NPS) :表示用户满意度以及向其他人推荐该应用程序的可能性的指标。

Auditing 审计

Auditing involves ensuring that the application maintains comprehensive audit trails and supports auditing capabilities to meet regulatory compliance, security monitoring, and operational transparency.
审计涉及确保应用程序维护全面的审计跟踪并支持审计功能,以满足法规遵从性、安全监控和运营透明度的要求。

  • Audit Trail Generation: Ensuring the application generates detailed logs of all relevant actions and events. This includes user interactions, system activities, data access, configuration changes, and security-related events.
    审计跟踪生成:确保应用程序生成所有相关操作和事件的详细日志。这包括用户交互、系统活动、数据访问、配置更改和安全相关事件。
  • Data Integrity: Verifying that audit logs are tamper-evident and secure, ensuring the integrity of recorded actions and preventing unauthorized modifications.
    数据完整性:验证审核日志是否防篡改且安全,确保记录操作的完整性并防止未经授权的修改。
  • Compliance Requirements: Addressing specific regulatory and industry compliance standards (e.g., GDPR, HIPAA, PCI-DSS, SOX) that mandate auditing practices and data protection measures.
    合规性要求:满足强制审核实践和数据保护措施的特定监管和行业合规性标准(例如,GDPR、HIPAA、PCI-DSS、SOX)。
  • Access Control Auditing: Monitoring and logging access attempts, authentication events, and authorization changes to detect and respond to unauthorized access or misuse.
    访问控制审核:监视和记录访问尝试、身份验证事件和授权更改,以检测和响应未经授权的访问或误用。
  • Configuration Auditing: Tracking changes to application configurations, system settings, and security policies to maintain consistency, compliance, and security posture.
    配置审核:跟踪应用程序配置、系统设置和安全策略的更改,以保持一致性、合规性和安全态势。
  • Incident Response and Forensics: Supporting incident investigation and forensic analysis by providing audit trails that enable reconstruction of events leading up to security incidents or operational failures.
    事件响应和取证:通过提供审计跟踪来支持事件调查和取证分析,从而能够重建导致安全事件或操作故障的事件。
  • Retention and Storage: Defining policies and procedures for audit log retention, ensuring logs are securely stored, accessible for auditing purposes, and retained for required durations as per regulatory and business requirements.
    保留和存储:定义审核日志保留的策略和程序,确保日志安全存储、可出于审核目的进行访问,并根据法规和业务要求保留所需的持续时间。
  • Monitoring and Alerting: Implementing real-time monitoring of audit logs to detect anomalies, suspicious activities, or deviations from expected patterns. Alerts should notify relevant stakeholders promptly for timely response and mitigation.
    监控和警报:对审核日志实施实时监控,以检测异常、可疑活动或与预期模式的偏差。警报应及时通知相关利益相关者,以便及时响应和缓解。
  • Reporting and Analysis: Enabling auditing teams to analyze audit data, generate reports, and conduct audits to ensure adherence to policies, identify areas of improvement, and demonstrate compliance during audits.
    报告和分析:使审计团队能够分析审计数据、生成报告并进行审计,以确保遵守政策、确定改进领域并在审计期间证明合规性。

Key Metrics 关键指标

  • Audit Log Coverage: Percentage of critical actions and events that are logged within the application or system.
    审核日志覆盖率:应用程序或系统中记录的关键操作和事件的百分比。
  • Audit Log Integrity: Measures ensuring that audit logs are tamper-evident and maintain data integrity.
    审核日志完整性:确保审核日志防篡改并保持数据完整性的措施。
  • Audit Log Retention Period: Duration for which audit logs are retained and accessible for auditing and compliance purposes.
    审核日志保留期:出于审核和合规性目的而保留和访问审核日志的持续时间。
  • Audit Log Access: Number of authorized accesses to audit logs over a specified period.
    审核日志访问:指定时间段内审核日志的授权访问次数。
  • Audit Log Review Frequency: Frequency and regularity of audit log reviews conducted by auditors or security teams.
    审核日志审核频率:审核员或安全团队进行审核日志审核的频率和规律性。
  • Compliance Violations: Number of instances where audit logs indicate non-compliance with regulatory or organizational policies.
    合规性违规:审核日志表明不遵守法规或组织政策的实例数量。
  • Incident Response Time: Average time taken to respond to and investigate incidents based on audit log findings.
    事件响应时间:根据审核日志结果响应和调查事件所需的平均时间。

Usability 可用性

Usability focuses on ensuring that the application is intuitive, easy to use, and meets the needs of its users efficiently.
可用性侧重于确保应用程序直观、易于使用并有效满足用户的需求。

Considerations for Implementing Usability NFR:
实施可用性 NFR 的注意事项:

  • User-Centered Design: Prioritize user needs and preferences throughout the design and development process.
    以用户为中心的设计:在整个设计和开发过程中优先考虑用户的需求和偏好。
  • Feedback Loops: Establish mechanisms for gathering and incorporating user feedback into iterative design improvements.
    反馈循环:建立收集用户反馈并将其纳入迭代设计改进的机制。
  • Cross-functional Collaboration: Involve stakeholders from UX/UI design, development, and business teams to ensure usability goals align with business objectives.
    跨职能协作:让 UX/UI 设计、开发和业务团队的利益相关者参与进来,以确保可用性目标与业务目标保持一致。
  • Continuous Improvement: Adopt a mindset of continuous improvement to evolve the application’s usability based on changing user needs and technological advancements.
    持续改进:采用持续改进的心态,根据不断变化的用户需求和技术进步来改进应用程序的可用性。

Key Focused Areas 重点关注领域

  • User Interface (UI) Design
    用户界面 (UI) 设计
    - Intuitiveness: The application should be easy to navigate and use, with logical and consistent layouts.
    - 直观性:应用程序应该易于导航和使用,具有逻辑且一致的布局。
    - Accessibility: Ensure the application is accessible to users with disabilities, following accessibility standards (e.g., WCAG guidelines).
    -可访问性:确保残疾用户可以访问应用程序,遵循可访问性标准(例如 WCAG 指南)。
  • User Experience (UX) 用户体验(UX)
    - Efficiency: Users should be able to accomplish tasks quickly and with minimal effort.
    - 效率:用户应该能够以最小的努力快速完成任务。
    - Satisfaction: Focus on user satisfaction through pleasant interactions and responsive design.
    -满意度:通过愉快的交互和响应式设计关注用户满意度。
  • Navigation and Information Architecture
    导航和信息架构
    - Clear Navigation: Provide clear menus, breadcrumbs, and navigation paths to help users find information easily.
    - 清晰的导航:提供清晰的菜单、面包屑和导航路径,帮助用户轻松查找信息。
    - Information Hierarchy: Organize content and features logically, prioritizing important information for quick access.
    -信息层次结构:逻辑地组织内容和功能,优先考虑重要信息以便快速访问。
  • Consistency and Standards
    一致性和标准
    - UI/UX Guidelines: Adhere to established design patterns, standards, and style guides to maintain consistency across the application.
    - UI/UX 指南:遵守既定的设计模式、标准和风格指南,以保持整个应用程序的一致性。
    - Platform Consistency: Ensure consistency in design and behavior across different devices and platforms (e.g., desktop, mobile).
    -平台一致性:确保不同设备和平台(例如桌面、移动)的设计和行为的一致性。
  • Feedback and Error Handling
    反馈和错误处理
    - Feedback Mechanisms: Provide immediate feedback for user actions (e.g., success messages, validation errors).
    - 反馈机制:为用户操作提供即时反馈
END