aboutsummaryrefslogtreecommitdiff
path: root/kernel/futex.c
AgeCommit message (Collapse)Author
2014-08-20Merge upstream tag 'v3.10.49' into msm-3.10Ian Maund
* commit 'v3.10.49': (529 commits) Linux 3.10.49 ACPI / battery: Retry to get battery information if failed during probing x86, ioremap: Speed up check for RAM pages Score: Modify the Makefile of Score, remove -mlong-calls for compiling Score: The commit is for compiling successfully. Score: Implement the function csum_ipv6_magic score: normalize global variables exported by vmlinux.lds rtmutex: Plug slow unlock race rtmutex: Handle deadlock detection smarter rtmutex: Detect changes in the pi lock chain rtmutex: Fix deadlock detector for real ring-buffer: Check if buffer exists before polling drm/radeon: stop poisoning the GART TLB drm/radeon: fix typo in golden register setup on evergreen ext4: disable synchronous transaction batching if max_batch_time==0 ext4: clarify error count warning messages ext4: fix unjournalled bg descriptor while initializing inode bitmap dm io: fix a race condition in the wake up code for sync_io Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code clk: spear3xx: Use proper control register offset ... In addition to bringing in upstream commits, this merge also makes minor changes to mainitain compatibility with upstream: The definition of list_next_entry in qcrypto.c and ipa_dp.c has been removed, as upstream has moved the definition to list.h. The implementation of list_next_entry was identical between the two. irq.c, for both arm and arm64 architecture, has had its calls to __irq_set_affinity_locked updated to reflect changes to the API upstream. Finally, as we have removed the sleep_length member variable of the tick_sched struct, all changes made by upstream commit ec804bd do not apply to our tree and have been removed from this merge. Only kernel/time/tick-sched.c is impacted. Change-Id: I63b7e0c1354812921c94804e1f3b33d1ad6ee3f1 Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-23futex: Make lookup_pi_state more robustThomas Gleixner
The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Backport to 3.13 conflicts: kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Johansen <john.johansen@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Git-commit: b50939f89dae691f8da9e0fd22e0eae0db1c0ae4 Git-repo: https://android.googlesource.com/kernel/common.git Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-23futex: Always cleanup owner tid in unlock_piThomas Gleixner
If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Git-commit: a2ec8e3dcdc6c93f574a0e22039b791cc5e14fa6 Git-repo: https://android.googlesource.com/kernel/common.git Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-23futex: Validate atomic acquisition in futex_lock_pi_atomic()Thomas Gleixner
We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Git-commit: 550c7910f0e2fd4f130fec2f17541f3614fdfaf9 Git-repo: https://android.googlesource.com/kernel/common.git Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-23futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 ↵Thomas Gleixner
in futex_requeue(..., requeue_pi=1) If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() into line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5 (futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()) [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Git-commit: 5b3bbe42f5874081123fa3aa4dd85d5e5c806f54 Git-repo: https://android.googlesource.com/kernel/common.git Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-18Merge upstream tag 'v3.10.40' into msm-3.10Ian Maund
* commit 'v3.10.40': (203 commits) Linux 3.10.40 ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe drm: cirrus: add power management support Input: synaptics - add min/max quirk for ThinkPad Edge E431 Input: synaptics - add min/max quirk for ThinkPad T431s, L440, L540, S1 Yoga and X1 lockd: ensure we tear down any live sockets when socket creation fails during lockd_up dm thin: fix dangling bio in process_deferred_bios error path dm transaction manager: fix corruption due to non-atomic transaction commit Skip intel_crt_init for Dell XPS 8700 mtd: sm_ftl: heap corruption in sm_create_sysfs_attributes() mtd: nuc900_nand: NULL dereference in nuc900_nand_enable() mtd: atmel_nand: Disable subpage NAND write when using Atmel PMECC tgafb: fix data copying gpio: mxs: Allow for recursive enable_irq_wake() call rtlwifi: rtl8188ee: initialize packet_beacon rtlwifi: rtl8192se: Fix regression due to commit 1bf4bbb rtlwifi: rtl8192se: Fix too long disable of IRQs rtlwifi: rtl8192cu: Fix too long disable of IRQs rtlwifi: rtl8188ee: Fix too long disable of IRQs rtlwifi: rtl8723ae: Fix too long disable of IRQs ... Change-Id: If5388cf980cb123e35e1b29275ba288c89c5aa18 Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-07futex: Make lookup_pi_state more robustThomas Gleixner
commit 54a217887a7b658e2650c3feff22756ab80c7339 upstream. The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex: Always cleanup owner tid in unlock_piThomas Gleixner
commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e upstream. If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex: Validate atomic acquisition in futex_lock_pi_atomic()Thomas Gleixner
commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270 upstream. We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 ↵Thomas Gleixner
in futex_requeue(..., requeue_pi=1) commit e9c243a5a6de0be8e584c604d353412584b592f8 upstream. If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex: Prevent attaching to kernel threadsThomas Gleixner
commit f0d71b3dcb8332f7971b5f2363632573e6d9486a upstream. We happily allow userspace to declare a random kernel thread to be the owner of a user space PI futex. Found while analysing the fallout of Dave Jones syscall fuzzer. We also should validate the thread group for private futexes and find some fast way to validate whether the "alleged" owner has RW access on the file which backs the SHM, but that's a separate issue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex: Add another early deadlock detection checkThomas Gleixner
commit 866293ee54227584ffcb4a42f69c1f365974ba7f upstream. Dave Jones trinity syscall fuzzer exposed an issue in the deadlock detection code of rtmutex: http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com That underlying issue has been fixed with a patch to the rtmutex code, but the futex code must not call into rtmutex in that case because - it can detect that issue early - it avoids a different and more complex fixup for backing out If the user space variable got manipulated to 0x80000000 which means no lock holder, but the waiters bit set and an active pi_state in the kernel is found we can figure out the recursive locking issue by looking at the pi_state owner. If that is the current task, then we can safely return -EDEADLK. The check should have been added in commit 59fa62451 (futex: Handle futex_pi OWNER_DIED take over correctly) already, but I did not see the above issue caused by user space manipulation back then. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() testHeiko Carstens
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream. If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there is no runtime check necessary, allow to skip the test within futex_init(). This allows to get rid of some code which would always give the same result, and also allows the compiler to optimize a couple of if statements away. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [geert: Backported to v3.10..v3.13] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-24Merge upstream linux-stable v3.10.28 into msm-3.10Ian Maund
The following commits have been reverted from this merge, as they are known to introduce new bugs and are currently incompatible with our audio implementation. Investigation of these commits is ongoing, and they are expected to be brought in at a later time: 86e6de7 ALSA: compress: fix drain calls blocking other compress functions (v6) 16442d4 ALSA: compress: fix drain calls blocking other compress functions This merge commit also includes a change in block, necessary for compilation. Upstream has modified elevator_init_fn to prevent race conditions, requring updates to row_init_queue and test_init_queue. * commit 'v3.10.28': (1964 commits) Linux 3.10.28 ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init() serial: amba-pl011: use port lock to guard control register access mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL md/raid5: Fix possible confusion when multiple write errors occur. md/raid10: fix two bugs in handling of known-bad-blocks. md/raid10: fix bug when raid10 recovery fails to recover a block. md: fix problem when adding device to read-only array with bitmap. drm/i915: fix DDI PLLs HW state readout code nilfs2: fix segctor bug that causes file system corruption thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only ftrace/x86: Load ftrace_ops in parameter not the variable holding it SELinux: Fix possible NULL pointer dereference in selinux_inode_permission() writeback: Fix data corruption on NFS hwmon: (coretemp) Fix truncated name of alarm attributes vfs: In d_path don't call d_dname on a mount point staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq() staging: comedi: addi_apci_1032: fix subdevice type/flags bug mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully GFS2: Increase i_writecount during gfs2_setattr_chown perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h perf scripting perl: Fix build error on Fedora 12 ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic Linux 3.10.27 sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper SCSI: sd: Reduce buffer size for vpd request intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters. mac80211: move "bufferable MMPDU" check to fix AP mode scan ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS ACPI / TPM: fix memory leak when walking ACPI namespace mfd: rtsx_pcr: Disable interrupts before cancelling delayed works clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock clk: samsung: exynos4: Correct SRC_MFC register clk: clk-divider: fix divisor > 255 bug ahci: add PCI ID for Marvell 88SE9170 SATA controller parisc: Ensure full cache coherency for kmap/kunmap drm/nouveau/bios: make jump conditional ARM: shmobile: mackerel: Fix coherent DMA mask ARM: shmobile: armadillo: Fix coherent DMA mask ARM: shmobile: kzm9g: Fix coherent DMA mask ARM: dts: exynos5250: Fix MDMA0 clock number ARM: fix "bad mode in ... handler" message for undefined instructions ARM: fix footbridge clockevent device net: Loosen constraints for recalculating checksum in skb_segment() bridge: use spin_lock_bh() in br_multicast_set_hash_max netpoll: Fix missing TXQ unlock and and OOPS. net: llc: fix use after free in llc_ui_recvmsg virtio-net: fix refill races during restore virtio_net: don't leak memory or block when too many frags virtio-net: make all RX paths handle errors consistently virtio_net: fix error handling for mergeable buffers vlan: Fix header ops passthru when doing TX VLAN offload. net: rose: restore old recvmsg behavior rds: prevent dereference of a NULL device ipv6: always set the new created dst's from in ip6_rt_copy net: fec: fix potential use after free hamradio/yam: fix info leak in ioctl drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() net: inet_diag: zero out uninitialized idiag_{src,dst} fields ip_gre: fix msg_name parsing for recvfrom/recvmsg net: unix: allow bind to fail on mutex lock ipv6: fix illegal mac_header comparison on 32bit netvsc: don't flush peers notifying work during setting mtu tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 net: unix: allow set_peek_off to fail net: drop_monitor: fix the value of maxattr ipv6: don't count addrconf generated routes against gc limit packet: fix send path when running with proto == 0 virtio: delete napi structures from netdev before releasing memory macvtap: signal truncated packets tun: update file current position macvtap: update file current position macvtap: Do not double-count received packets rds: prevent BUG_ON triggered on congestion update to loopback net: do not pretend FRAGLIST support IPv6: Fixed support for blackhole and prohibit routes HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue"" gpio-rcar: R-Car GPIO IRQ share interrupt clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast irqchip: renesas-irqc: Fix irqc_probe error handling Linux 3.10.26 sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c ext4: fix bigalloc regression arm64: Use Normal NonCacheable memory for writecombine arm64: Do not flush the D-cache for anonymous pages arm64: Avoid cache flushing in flush_dcache_page() ARM: KVM: arch_timers: zero CNTVOFF upon return to host ARM: hyp: initialize CNTVOFF to zero clocksource: arch_timer: use virtual counters arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S arm64: dts: Reserve the memory used for secondary CPU release address arm64: check for number of arguments in syscall_get/set_arguments() arm64: fix possible invalid FPSIMD initialization state ... Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106 Signed-off-by: Ian Maund <imaund@codeaurora.org>
2013-12-20futex: fix handling of read-only-mapped hugepagesLinus Torvalds
commit f12d5bfceb7e1f9051563381ec047f7f13956c3c upstream. The hugepage code had the exact same bug that regular pages had in commit 7485d0d3758e ("futexes: Remove rw parameter from get_futex_key()"). The regular page case was fixed by commit 9ea71503a8ed ("futex: Fix regression with read only mappings"), but the transparent hugepage case (added in a5b338f2b0b1: "thp: update futex compound knowledge") case remained broken. Found by Dave Jones and his trinity tool. Reported-and-tested-by: Dave Jones <davej@fedoraproject.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13futex: Take hugepages into account when generating futex_keyZhang Yi
commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream. The futex_keys of process shared futexes are generated from the page offset, the mapping host and the mapping index of the futex user space address. This should result in an unique identifier for each futex. Though this is not true when futexes are located in different subpages of an hugepage. The reason is, that the mapping index for all those futexes evaluates to the index of the base page of the hugetlbfs mapping. So a futex at offset 0 of the hugepage mapping and another one at offset PAGE_SIZE of the same hugepage mapping have identical futex_keys. This happens because the futex code blindly uses page->index. Steps to reproduce the bug: 1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0 and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs mapping. The mutexes must be initialized as PTHREAD_PROCESS_SHARED because PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as their keys solely depend on the user space address. 2. Lock mutex1 and mutex2 3. Create thread1 and in the thread function lock mutex1, which results in thread1 blocking on the locked mutex1. 4. Create thread2 and in the thread function lock mutex2, which results in thread2 blocking on the locked mutex2. 5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2 still blocks on mutex2 because the futex_key points to mutex1. To solve this issue we need to take the normal page index of the page which contains the futex into account, if the futex is in an hugetlbfs mapping. In other words, we calculate the normal page mapping index of the subpage in the hugetlbfs mapping. Mappings which are not based on hugetlbfs are not affected and still use page->index. Thanks to Mel Gorman who provided a patch for adding proper evaluation functions to the hugetlbfs code to avoid exposing hugetlbfs specific details to the futex code. [ tglx: Massaged changelog ] Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn> Reviewed-by: 'Mel Gorman' <mgorman@suse.de> Acked-by: 'Darren Hart' <dvhart@linux.intel.com> Cc: 'Peter Zijlstra' <peterz@infradead.org> Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-01futex: use freezable blocking callColin Cross
Avoid waking up every thread sleeping in a futex_wait call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Change-Id: I9ccab9c2d201adb66c85432801cdcf43fc91e94f Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-03-12futex: fix kernel-doc notation and spelloRandy Dunlap
Fix kernel-doc warning in futex.c and convert 'Returns' to the new Return: kernel-doc notation format. Warning(kernel/futex.c:2286): Excess function parameter 'clockrt' description in 'futex_wait_requeue_pi' Fix one spello. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27more file_inode() open-coded instancesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest change is the rwsem lock-steal improvements, both to the assembly optimized and the spinlock based variants. The other notable change is the clean up of the seqlock implementation to be based on the seqcount infrastructure. The rest is assorted smaller debuggability, cleanup and continued -rt locking changes." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rwsem-spinlock: Implement writer lock-stealing for better scalability futex: Revert "futex: Mark get_robust_list as deprecated" generic: Use raw local irq variant for generic cmpxchg lockdep: Selftest: convert spinlock to raw spinlock seqlock: Use seqcount infrastructure seqlock: Remove unused functions ntp: Make ntp_lock raw intel_idle: Convert i7300_idle_lock to raw_spinlock locking: Various static lock initializer fixes lockdep: Print more info when MAX_LOCK_DEPTH is exceeded rwsem: Implement writer lock-stealing for better scalability lockdep: Silence warning if CONFIG_LOCKDEP isn't set watchdog: Use local_clock for get_timestamp() lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug() locking/stat: Fix a typo
2013-02-19futex: Revert "futex: Mark get_robust_list as deprecated"Thomas Gleixner
This reverts commit ec0c4274e33c0373e476b73e01995c53128f1257. get_robust_list() is in use and a removal would break existing user space. With the permission checks in place it's not longer a security hole. Remove the deprecation warnings. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Richard Weinberger <richard@nod.at> Cc: akpm@linux-foundation.org Cc: paul.gortmaker@windriver.com Cc: davej@redhat.com Cc: keescook@chromium.org Cc: stable@vger.kernel.org Cc: ebiederm@xmission.com
2013-02-07sched/rt: Move rt specific bits into new header fileClark Williams
Move rt scheduler definitions out of include/linux/sched.h into new file include/linux/sched/rt.h Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-11-26futex: avoid wake_futex() for a PI futex_qDarren Hart
Dave Jones reported a bug with futex_lock_pi() that his trinity test exposed. Sometime between queue_me() and taking the q.lock_ptr, the lock_ptr became NULL, resulting in a crash. While futex_wake() is careful to not call wake_futex() on futex_q's with a pi_state or an rt_waiter (which are either waiting for a futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and futex_requeue() do not perform the same test. Update futex_wake_op() and futex_requeue() to test for q.pi_state and q.rt_waiter and abort with -EINVAL if detected. To ensure any future breakage is caught, add a WARN() to wake_futex() if the same condition is true. This fix has seen 3 hours of testing with "trinity -c futex" on an x86_64 VM with 4 CPUS. [akpm@linux-foundation.org: tidy up the WARN()] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Reported-by: Dave Jones <davej@redat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Kacur <jkacur@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-01futex: Handle futex_pi OWNER_DIED take over correctlyThomas Gleixner
Siddhesh analyzed a failure in the take over of pi futexes in case the owner died and provided a workaround. See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076 The detailed problem analysis shows: Futex F is initialized with PTHREAD_PRIO_INHERIT and PTHREAD_MUTEX_ROBUST_NP attributes. T1 lock_futex_pi(F); T2 lock_futex_pi(F); --> T2 blocks on the futex and creates pi_state which is associated to T1. T1 exits --> exit_robust_list() runs --> Futex F userspace value TID field is set to 0 and FUTEX_OWNER_DIED bit is set. T3 lock_futex_pi(F); --> Succeeds due to the check for F's userspace TID field == 0 --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space T1 --> exit_pi_state_list() --> Transfers pi_state to waiter T2 and wakes T2 via rt_mutex_unlock(&pi_state->mutex) T2 --> acquires pi_state->mutex and gains real ownership of the pi_state --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space T3 --> observes inconsistent state This problem is independent of UP/SMP, preemptible/non preemptible kernels, or process shared vs. private. The only difference is that certain configurations are more likely to expose it. So as Siddhesh correctly analyzed the following check in futex_lock_pi_atomic() is the culprit: if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { We check the userspace value for a TID value of 0 and take over the futex unconditionally if that's true. AFAICT this check is there as it is correct for a different corner case of futexes: the WAITERS bit became stale. Now the proposed change - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { + if (unlikely(ownerdied || + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { solves the problem, but it's not obvious why and it wreckages the "stale WAITERS bit" case. What happens is, that due to the WAITERS bit being set (T2 is blocked on that futex) it enforces T3 to go through lookup_pi_state(), which in the above case returns an existing pi_state and therefor forces T3 to legitimately fight with T2 over the ownership of the pi_state (via pi_state->mutex). Probelm solved! Though that does not work for the "WAITERS bit is stale" problem because if lookup_pi_state() does not find existing pi_state it returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to return -ESRCH to user space because the OWNER_DIED bit is not set. Now there is a different solution to that problem. Do not look at the user space value at all and enforce a lookup of possibly available pi_state. If pi_state can be found, then the new incoming locker T3 blocks on that pi_state and legitimately races with T2 to acquire the rt_mutex and the pi_state and therefor the proper ownership of the user space futex. lookup_pi_state() has the correct order of checks. It first tries to find a pi_state associated with the user space futex and only if that fails it checks for futex TID value = 0. If no pi_state is available nothing can create new state at that point because this happens with the hash bucket lock held. So the above scenario changes to: T1 lock_futex_pi(F); T2 lock_futex_pi(F); --> T2 blocks on the futex and creates pi_state which is associated to T1. T1 exits --> exit_robust_list() runs --> Futex F userspace value TID field is set to 0 and FUTEX_OWNER_DIED bit is set. T3 lock_futex_pi(F); --> Finds pi_state and blocks on pi_state->rt_mutex T1 --> exit_pi_state_list() --> Transfers pi_state to waiter T2 and wakes it via rt_mutex_unlock(&pi_state->mutex) T2 --> acquires pi_state->mutex and gains ownership of the pi_state --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space This covers all gazillion points on which T3 might come in between T1's exit_robust_list() clearing the TID field and T2 fixing it up. It also solves the "WAITERS bit stale" problem by forcing the take over. Another benefit of changing the code this way is that it makes it less dependent on untrusted user space values and therefor minimizes the possible wreckage which might be inflicted. As usual after staring for too long at the futex code my brain hurts so much that I really want to ditch that whole optimization of avoiding the syscall for the non contended case for PI futexes and rip out the maze of corner case handling code. Unfortunately we can't as user space relies on that existing behaviour, but at least thinking about it helps me to preserve my mental sanity. Maybe we should nevertheless :) Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-24futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()Darren Hart
If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, as the trinity test suite manages to do, we miss early wakeups as q.key is equal to key2 (because they are the same uaddr). We will then attempt to dereference the pi_mutex (which would exist had the futex_q been properly requeued to a pi futex) and trigger a NULL pointer dereference. Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-24futex: Fix bug in WARN_ON for NULL q.pi_stateDarren Hart
The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing the address (&q.pi_state) of the pointer instead of the value (q.pi_state) of the pointer. Correct it accordingly. Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-24futex: Test for pi_mutex on fault in futex_wait_requeue_pi()Darren Hart
If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test for pi_mutex != NULL before testing the owner against current and possibly unlocking it. Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-29futex: Mark get_robust_list as deprecatedKees Cook
Notify get_robust_list users that the syscall is going away. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: kernel-hardening@lists.openwall.com Cc: spender@grsecurity.net Link: http://lkml.kernel.org/r/20120323190855.GA27213@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-29futex: Do not leak robust list to unprivileged processKees Cook
It was possible to extract the robust list head address from a setuid process if it had used set_robust_list(), allowing an ASLR info leak. This changes the permission checks to be the same as those used for similar info that comes out of /proc. Running a setuid program that uses robust futexes would have had: cred->euid != pcred->euid cred->euid == pcred->uid so the old permissions check would allow it. I'm not aware of any setuid programs that use robust futexes, so this is just a preventative measure. (This patch is based on changes from grsecurity.) Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: kernel-hardening@lists.openwall.com Cc: spender@grsecurity.net Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-19Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core/locking changes for v3.4 from Ingo Molnar * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Simplify return logic futex: Cover all PI opcodes with cmpxchg enabled check
2012-02-15futex: Simplify return logicThomas Gleixner
No need to assign ret in each case and break. Simply return the result of the handler function directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com>
2012-02-15futex: Cover all PI opcodes with cmpxchg enabled checkThomas Gleixner
Some of the newer futex PI opcodes do not check the cmpxchg enabled variable and call unconditionally into the handling functions. Cover all PI opcodes in a separate check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com>
2011-12-31futex: Fix uninterruptible loop due to gate_areaHugh Dickins
It was found (by Sasha) that if you use a futex located in the gate area we get stuck in an uninterruptible infinite loop, much like the ZERO_PAGE issue. While looking at this problem, PeterZ realized you'll get into similar trouble when hitting any install_special_pages() mapping. And are there still drivers setting up their own special mmaps without page->mapping, and without special VM or pte flags to make get_user_pages fail? In most cases, if page->mapping is NULL, we do not need to retry at all: Linus points out that even /proc/sys/vm/drop_caches poses no problem, because it ends up using remove_mapping(), which takes care not to interfere when the page reference count is raised. But there is still one case which does need a retry: if memory pressure called shmem_writepage in between get_user_pages_fast dropping page table lock and our acquiring page lock, then the page gets switched from filecache to swapcache (and ->mapping set to NULL) whatever the refcount. Fault it back in to get the page->mapping needed for key->shared.inode. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31kernel: Map most files to use export.h instead of module.hPaul Gortmaker
The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-15Merge branch 'master' into for-nextJiri Kosina
Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.
2011-09-15futex: Fix spelling in a source code commentBart Van Assche
Change a single occurrence of "unlcoked" into "unlocked". Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15futex: uninitialized warning correctionsVitaliy Ivanov
The variables here are really not used uninitialized. kernel/futex.c: In function 'fixup_pi_state_owner.clone.17': kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function kernel/futex.c: In function 'handle_futex_death': kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function kernel/futex.c: In function 'do_futex': kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function kernel/futex.c:828:6: note: 'curval' was declared here kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function kernel/futex.c:890:6: note: 'oldval' was declared here Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-08-04Merge branch 'linus' into core/urgentIngo Molnar
2011-07-26futex: Fix regression with read only mappingsShawn Bohrer
commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw parameter from get_futex_key()) in 2.6.33 fixed two problems: First, It prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW MAP_PRIVATE futex operations by forcing the COW to occur by unconditionally performing a write access get_user_pages_fast() to get the page. The commit also introduced a user-mode regression in that it broke futex operations on read-only memory maps. For example, this breaks workloads that have one or more reader processes doing a FUTEX_WAIT on a futex within a read only shared file mapping, and a writer processes that has a writable mapping issuing the FUTEX_WAKE. This fixes the regression for valid futex operations on RO mappings by trying a RO get_user_pages_fast() when the RW get_user_pages_fast() fails. This change makes it necessary to also check for invalid use cases, such as anonymous RO mappings (which can never change) and the ZERO_PAGE which the commit referenced above was written to address. This patch does restore the original behavior with RO MAP_PRIVATE mappings, which have inherent user-mode usage problems and don't really make sense. With this patch performing a FUTEX_WAIT within a RO MAP_PRIVATE mapping will be successfully woken provided another process updates the region of the underlying mapped file. However, the mmap() man page states that for a MAP_PRIVATE mapping: It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region. So user-mode users attempting to use futex operations on RO MAP_PRIVATE mappings are depending on unspecified behavior. Additionally a RO MAP_PRIVATE mapping could fail to wake up in the following case. Thread-A: call futex(FUTEX_WAIT, memory-region-A). get_futex_key() return inode based key. sleep on the key Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A) Thread-B: write memory-region-A. COW happen. This process's memory-region-A become related to new COWed private (ie PageAnon=1) page. Thread-B: call futex(FUETX_WAKE, memory-region-A). get_futex_key() return mm based key. IOW, we fail to wake up Thread-A. Once again doing something like this is just silly and users who do something like this get what they deserve. While RO MAP_PRIVATE mappings are nonsensical, checking for a private mapping requires walking the vmas and was deemed too costly to avoid a userspace hang. This Patch is based on Peter Zijlstra's initial patch with modifications to only allow RO mappings for futex operations that need VERIFY_READ access. Reported-by: David Oliver <david@rgmadvisors.com> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: peterz@infradead.org Cc: eric.dumazet@gmail.com Cc: zvonler@rgmadvisors.com Cc: hughd@google.com Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-07-25mm/futex: fix futex writes on archs with SW tracking of dirty & youngBenjamin Herrenschmidt
I haven't reproduced it myself but the fail scenario is that on such machines (notably ARM and some embedded powerpc), if you manage to hit that futex path on a writable page whose dirty bit has gone from the PTE, you'll livelock inside the kernel from what I can tell. It will go in a loop of trying the atomic access, failing, trying gup to "fix it up", getting succcess from gup, go back to the atomic access, failing again because dirty wasn't fixed etc... So I think you essentially hang in the kernel. The scenario is probably rare'ish because affected architecture are embedded and tend to not swap much (if at all) so we probably rarely hit the case where dirty is missing or young is missing, but I think Shan has a piece of SW that can reliably reproduce it using a shared writable mapping & fork or something like that. On archs who use SW tracking of dirty & young, a page without dirty is effectively mapped read-only and a page without young unaccessible in the PTE. Additionally, some architectures might lazily flush the TLB when relaxing write protection (by doing only a local flush), and expect a fault to invalidate the stale entry if it's still present on another processor. The futex code assumes that if the "in_atomic()" access -EFAULT's, it can "fix it up" by causing get_user_pages() which would then be equivalent to taking the fault. However that isn't the case. get_user_pages() will not call handle_mm_fault() in the case where the PTE seems to have the right permissions, regardless of the dirty and young state. It will eventually update those bits ... in the struct page, but not in the PTE. Additionally, it will not handle the lazy TLB flushing that can be required by some architectures in the fault case. Basically, gup is the wrong interface for the job. The patch provides a more appropriate one which boils down to just calling handle_mm_fault() since what we are trying to do is simulate a real page fault. The futex code currently attempts to write to user memory within a pagefault disabled section, and if that fails, tries to fix it up using get_user_pages(). This doesn't work on archs where the dirty and young bits are maintained by software, since they will gate access permission in the TLB, and will not be updated by gup(). In addition, there's an expectation on some archs that a spurious write fault triggers a local TLB flush, and that is missing from the picture as well. I decided that adding those "features" to gup() would be too much for this already too complex function, and instead added a new simpler fixup_user_fault() which is essentially a wrapper around handle_mm_fault() which the futex code can call. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Shan Hai <haishan.bai@gmail.com> Tested-by: Shan Hai <haishan.bai@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <darren.hart@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-08plist: Remove the need to supply locks to plist headsDima Zavin
This was legacy code brought over from the RT tree and is no longer necessary. Signed-off-by: Dima Zavin <dima@android.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Walker <dwalker@codeaurora.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1310084879-10351-2-git-send-email-dima@android.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-15futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setupDarren Hart
The FLAGS_HAS_TIMEOUT flag was not getting set, causing the restart_block to restart futex_wait() without a timeout after a signal. Commit b41277dc7a18ee332d in 2.6.38 introduced the regression by accidentally removing the the FLAGS_HAS_TIMEOUT assignment from futex_wait() during the setup of the restart block. Restore the originaly behavior. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=32922 Reported-by: Tim Smith <tsmith201104@yahoo.com> Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Kacur <jkacur@redhat.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/%3Cdaac0eb3af607f72b9a4d3126b2ba8fb5ed3b883.1302820917.git.dvhart%40linux.intel.com%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-25Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Fix WARN_ON() test for UP WARN_ON_SMP(): Allow use in if() statements on UP x86, dumpstack: Use %pB format specifier for stack trace vsprintf: Introduce %pB format specifier lockdep: Remove unused 'factor' variable from lockdep_stats_show()
2011-03-25futex: Fix WARN_ON() test for UPSteven Rostedt
An update of the futex code had a WARN_ON(!spin_is_locked(q->lock_ptr)) But on UP, spin_is_locked() is always false, and will trigger this warning, and even worse, it will exit the function without doing the necessary work. Converting this to a WARN_ON_SMP() fixes the problem. Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <20110317192208.682654502@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-23userns: user namespaces: convert several capable() callsSerge E. Hallyn
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(), because the resource comes from current's own ipc namespace. setuid/setgid are to uids in own namespace, so again checks can be against current_user_ns(). Changelog: Jan 11: Use task_ns_capable() in place of sched_capable(). Jan 11: Use nsown_capable() as suggested by Bastian Blank. Jan 11: Clarify (hopefully) some logic in futex and sched.c Feb 15: use ns_capable for ipc, not nsown_capable Feb 23: let copy_ipcs handle setting ipc_ns->user_ns Feb 23: pass ns down rather than taking it from current [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-15Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rtmutex: tester: Remove the remaining BKL leftovers lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause rtmutex: Simplify PI algorithm and make highest prio task get lock rwsem: Remove redundant asmregparm annotation rwsem: Move duplicate function prototypes to linux/rwsem.h rwsem: Unify the duplicate rwsem_is_locked() inlines rwsem: Move duplicate init macros and functions to linux/rwsem.h rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h x86: Cleanup rwsem_count_t typedef rwsem: Cleanup includes locking: Remove deprecated lock initializers cred: Replace deprecated spinlock initialization kthread: Replace deprecated spinlock initialization xtensa: Replace deprecated spinlock initialization um: Replace deprecated spinlock initialization sparc: Replace deprecated spinlock initialization mips: Replace deprecated spinlock initialization cris: Replace deprecated spinlock initialization alpha: Replace deprecated spinlock initialization rtmutex-tester: Remove BKL tests
2011-03-14futex: Deobfuscate handle_futex_death()Thomas Gleixner
handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without disabling page faults. That's ok, but totally non obvious. We don't hold locks so we actually can and want to fault here, because the get_user() before futex_atomic_cmpxchg_inatomic() does not guarantee a R/W mapping. We could just add a big fat comment to explain this, but actually changing the code so that the functionality is entirely clear is better. Use the helper function which disables page faults around the futex_atomic_cmpxchg_inatomic() and handle a fault with a call to fault_in_user_writeable() as all other places in the futex code do as well. Pointed-out-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <darren@dvhart.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-12Merge branch 'tip/futex/devel' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-rt into core/futexes futex,plist: Pass the real head of the priority list to plist_del() futex,plist: Remove debug lock assignment from plist_node plist: Shrink struct plist_head plist: Add priority list test
2011-03-11futex,plist: Remove debug lock assignment from plist_nodeLai Jiangshan
The original code uses &plist_node->plist as the fake head of the priority list for plist_del(), these debug locks in the fake head are needed for CONFIG_DEBUG_PI_LIST. But now we always pass the real head to plist_del(), the debug locks in plist_node will not be used, so we remove these assignments. Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4D10797E.7040803@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-11futex,plist: Pass the real head of the priority list to plist_del()Lai Jiangshan
Some plist_del()s in kernel/futex.c are passed a faked head of the priority list. It does not fail because the current code does not require the real head in plist_del(). The current code of plist_del() just uses the head for checking, so it will not cause a bad result even when we use a faked head. But it is undocumented usage: /** * plist_del - Remove a @node from plist. * * @node: &struct plist_node pointer - entry to be removed * @head: &struct plist_head pointer - list head */ The document says that the @head is the "list head" head of the priority list. In futex code, several places use "plist_del(&q->list, &q->list.plist);", they pass a fake head. We need to fix them all. Thanks to Darren Hart for many suggestions. Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4D11984A.5030203@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>